Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: automatic identification/labelling of contaminant peptides #191

Open
cvanderaa opened this issue Aug 25, 2023 · 0 comments
Open

Comments

@cvanderaa
Copy link
Collaborator

Data sets may lack information about contaminant peptides when the user did not provide a contaminant database during raw data identification. We could provide functionality to automatically label peptide that map to a contaminant protein.

Contaminant proteins could be retrieved as described here. Once the function gets the list of contaminants, there could be two options (we could implement one of the two or both):

  1. Use the protein ID (Uniprot ID?) to match peptides that are mapped to these proteins. Drawback: Uniprot ids (or I guess any ID/naming system) is subject to change and may compromise the matching between the id in the data set and the id in the contaminant database.
  2. Retrieve the contaminant protein sequences and perform peptide alignment on these sequences. Drawback: it's a bit more complicated to implement. Also, should we consider polymorphisms during alignment?

Once the contaminant peptides are identified, we could add a logical column (eg isContaminant) in the rowData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant