Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pitch] Lazy data loading #92

Closed
jspaezp opened this issue Feb 14, 2023 · 1 comment
Closed

[Pitch] Lazy data loading #92

jspaezp opened this issue Feb 14, 2023 · 1 comment

Comments

@jspaezp
Copy link
Collaborator

jspaezp commented Feb 14, 2023

Current limitation: Right now mokapot reads all the .pin file to memory and then uses a subset to train the model, which later is used to score the data.

Suggestion: Do a first pass to check what 'subset ratio' is needed to read from the data, store only that into mem and train the model.

Expected complications: Adding the peptide and protein level confidences (as well as calculating the q-values) requires the whole data to be loaded to memory at once.

Possible solutions: ... polars ...

@wfondrie
Copy link
Owner

Indeed #89 would solve this 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants