Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to Hugging Face datasets #35

Merged
merged 11 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ jobs:
python-version: ["3.8"]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/pypi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ on:
jobs:
release:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
if: ${{ github.event.workflow_run.conclusion == 'success' && github.ref == 'refs/heads/main'}}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.8'
- name: Install dependencies
Expand All @@ -26,16 +26,16 @@ jobs:
pip install build
- name: Build package
run: python -m build
- name: publish_pypi
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
- name: Publish to PyPI
id: publishpypi
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.SECRET_PYPI }}
- name: Create Github release
if: steps.publish_pypi.outcome == 'success'
if: steps.publishpypi.outcome == 'success'
env:
GITHUB_TOKEN: ${{ secrets.CLI_GH_TOKEN }}
run: |
VERSION=$(python setup.py --version)
gh release create v$VERSION --title $VERSION --generate-notes

9 changes: 5 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ coverage.xml
.hypothesis/
.pytest_cache/

# put all coverage in one place
cov/
# put all test results and coverage in one place
.test_results/

# Translations
*.mo
Expand Down Expand Up @@ -162,9 +162,10 @@ run_scripts/checkpoint*
run_scripts/*.index
run_scripts/*.data-*
run_scripts/*.csv
run_scripts/*.pkl

# testing metadata
metadata.parquet

# test assets (will be downloaded the first time tests are run and then ignore by git)
assets/

src/dlomix/reports/quarto/.local_quarto_plots
11 changes: 9 additions & 2 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,18 @@ build:
os: ubuntu-20.04
tools:
python: "3.8"

jobs:
# Build the rst files in case API has changed or modules has been added
pre_build:
# M Modules first, E no headers, f force overwrite, l links, o output directory
- sphinx-apidoc -M -E -f -l -o docs/ src/dlomix/
- python docs/codify_package_titles.py


# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt
16 changes: 9 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,31 @@ install-dev:
test:
make uninstall
make install
mkdir -p cov/
mkdir -p .test_results/cov/

python -m pytest tests/ --junitxml=junit/test-results.xml --cov=dlomix --cov-report html:cov/cov_html --cov-report xml:cov/cov.xml --cov-report lcov:cov/cov.info --cov-report annotate:cov/cov_annotate
python -m pytest tests/ --junitxml=.test_results/junit/test-results.xml --cov=dlomix --cov-report html:.test_results/cov/cov_html --cov-report xml:.test_results/cov/cov.xml --cov-report lcov:.test_results/cov/cov.info --cov-report annotate:.test_results/cov/cov_annotate

test-local:
make uninstall
make install-nodeps
mkdir -p cov/
mkdir -p .test_results/cov/

python -m pytest tests/ --junitxml=junit/test-results.xml --cov=dlomix --cov-report html:cov/cov_html --cov-report xml:cov/cov.xml --cov-report lcov:cov/cov.info --cov-report annotate:cov/cov_annotate
python -m pytest tests/ --junitxml=.test_results/junit/test-results.xml --cov=dlomix --cov-report html:.test_results/cov/cov_html --cov-report xml:.test_results/cov/cov.xml --cov-report lcov:.test_results/cov/cov.info --cov-report annotate:.test_results/cov/cov_annotate


format:
black ./dlomix/*
black ./src/dlomix/*
isort --profile black .
black ./dlomix/*.py
black ./src/dlomix/*.py
black ./run_scripts/*.py
black ./tests/*.py

lint:
pylint --disable=R,C ./dlomix/*
pylint --disable=R,C ./src/dlomix/*

build-docs:
sphinx-apidoc -M -f -E -l -o docs/ src/dlomix/
python docs/codify_package_titles.py
cd docs && make clean html
cd docs/_build/html/ && open index.html

Expand Down
23 changes: 15 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ $ pip install dlomix[wandb]
```

**General Overview**
- `data`: structures for modeling the input data, currently based on `tf.Dataset`
- `data`: structures for modeling the input data, processing functions, and feature extractions based on Hugging Face datasets `Dataset` and `DatasetDict`
- `eval`: classes for evaluating models and reporting results
- `layers`: custom layers used for building models, based on `tf.keras.layers.Layer`
- `losses`: custom losses to be used for training with `model.fit()`
Expand All @@ -43,28 +43,36 @@ $ pip install dlomix[wandb]

**Use-cases**

- Retention Time Prediction:
- a regression problem where the retention time of a peptide sequence is to be predicted.
- Retention Time Prediction:
- a regression problem where the retention time of a peptide sequence is to be predicted.

- Fragment Ion Intensity Prediction:
- a multi-output regression problem where the intensity values for fragment ions are predicted given a peptide sequence along with some additional features.



**To-Do**

Functionality:
- [X] integrate prosit
- [ ] extend pipeline for different types of models and backbones
- [ ] extend pipeline to allow for fine-tuning with custom datasets
- [X] integrate hugging face datasets
- [X] extend data representation to include modifications
- [X] add PTM features
- [X] add residual plots to reporting, possibly other regression analysis tools
- [X] output reporting results as PDF
- [ ] extend data representation to include modifications
- [X] refactor reporting module to use W&B Report API (Retention Time)
- [ ] extend pipeline for different types of models and backbones
- [ ] extend pipeline to allow for fine-tuning with custom datasets



Package structure:

- [X] integrate `deeplc.py` into `models.py`, preferably introduce a package structure (e.g. `models.retention_time`)
- [X] add references for implemented models in the ReadMe
- [ ] introduce a style guide and checking (e.g. PEP)
- [X] introduce formatting and precommit hooks
- [X] plan documentation (sphinx and readthedocs)
- [X] refactor following best practices for cleaner install


## Developing DLOmix
Expand All @@ -87,4 +95,3 @@ Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens, Sven Degroe
bioRxiv 2020.03.28.013003; doi: 10.1101/2020.03.28.013003

[3] Bouwmeester, R., Gabriels, R., Hulstaert, N. et al. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat Methods 18, 1363–1369 (2021). https://doi.org/10.1038/s41592-021-01301-5

42 changes: 0 additions & 42 deletions dlomix/constants.py

This file was deleted.

Loading