Skip to content

Commit

Permalink
Merge pull request #23 from compomics/various-fixes
Browse files Browse the repository at this point in the history
Various fixes and improvements (v0.2.0)
  • Loading branch information
RalfG authored Nov 12, 2022
2 parents 986b696 + 9d7cf78 commit b231291
Show file tree
Hide file tree
Showing 32 changed files with 536 additions and 166 deletions.
4 changes: 2 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"esbonio.sphinx.confDir": "docs/source",
"esbonio.sphinx.confDir": "${workspaceFolder}/docs/source",
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {"source.organizeImports": true},
Expand All @@ -11,4 +11,4 @@
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
}
}
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,59 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0] - 2022-11-12

### Added

- `PSM`: Add `ion_mobility` field
- `PSMList`: Allow slicing with bool arrays (e.g., `psm_df[psm_df["retention_time"] < 2000]`)
- `rename_modifications`: Add support for fixed modifications
- Add example files
- Online: Add support for GZipped files
- Online: Add support for logarithmic score (e.g. e-values)
- Docs: Extend contributing with example contributions
- Docs: Add notes to `PSM.get_usi()` method
- Docs: Extend quickstart on PSMList
- Docs: Add "psm_utils tags" for file formats, as used in high-level read/write/convert functions
- Docs: Peptide Record: add notes on unsupported modification types; add example for C-terminal modification
- Docs: More clearly document conversion to DataFrame
- Docs: Add bioconda install instructions
- Docs: Add citation for preprint
- Tests: Added tests for PSMList `set_ranks` and `get_rank1_psms` methods

### Changed

- `PSMList`: Refactor `set_ranks` and `get_rank1_psms` methods
- Update `.vscode/settings.json`
- Typing: Replace Union with OR operator `|`
- Online: Use percentiles instead of randomly sampling for PP plot
- Docs: Force TOC-tree max depth
- Tests: Expand unit tests in general

### Fixed

- `PSMList`: Truncate __repr__ to first five entries only, avoiding crashing notebook output
- `Peptidoform`: Minor typing fix
- `add_fixed_modifications`: Allow input as dict as well as list of tuples
- `io`: Fix issue where the `NamedTemporaryFile` for `_supports_write_psm` was seen as invalid Percolator file
- `io.convert`: pass ` progressbar` argument to class, not `write_file`
- `io.mzid`: Add more supported MS-GF score names, make SpecEValue default
- `io.peptide_record`: `spec_id` is now a required column (`spectrum_id` is also required in PSM)
- `io.peptide_record`: Fix parsing of C-terminal modifications from proforma to peprec
- `io.percolator`: Fix Percolator peptide notation writing (fixes #18)
- `io.tsv`: Fix issue where `TSVReader` would not use string type for metadata
- `io.xtandem`: Fix issue where optional arguments were not accepted by `XTandemReader`
- `io.xtandem`: Do not split spectrum title on space
- `io.xtandem`: Fix issue where optional arguments were not accepted by `XTandemReader`
- Online: Fix pi-0 diagonal calculation
- Remove obsolete to do comments in code

## [0.1.0] - 2022-10-14

### Added

- Initial version
10 changes: 10 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ thread in
`GitHub Discussions <https://github.com/compomics/psm_utils/discussions>`_
or in `GitHub Issues <https://github.com/compomics/psm_utils/issues>`_.

Not sure where to start? Great contributions to
`psm_utils <https://github.com/compomics/psm_utils>`_ include:

- Adding support for more file formats.
- Adding functionality to the ``PSMList``, ``PSM``, or ``Peptidoform`` classes.
- Improving the performance of existing functions, e.g. parsing peptidoforms or
reading and writing PSM files.

Also check out the `open issues <https://github.com/compomics/psm_utils/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22+label%3A%22help+wanted%22>`_
that carry the ``good first issue`` or ``help wanted`` labels.


Development setup
Expand Down
51 changes: 36 additions & 15 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ engine results in Python.
:alt: PyPI
:target: https://pypi.org/project/psm-utils/

.. image:: https://img.shields.io/conda/vn/bioconda/psm-utils?style=flat-square
:alt: Bioconda
:target: http://bioconda.github.io/recipes/psm-utils/README.html

.. image:: https://img.shields.io/github/workflow/status/compomics/psm_utils/Test?label=tests&style=flat-square
:alt: GitHub Actions tests status
:target: https://github.com/compomics/psm_utils/actions/workflows/test.yml
Expand Down Expand Up @@ -82,18 +86,18 @@ Goals and non-goals
Supported file formats
**********************

===================================================================================================================== =============== ===============
File format Read support Write support
===================================================================================================================== =============== ===============
`OpenMS idXML <https://www.openms.de/>`_ ✅ ❌
`MaxQuant msms.txt <https://www.maxquant.org/>`_ ✅ ❌
`MS Amanda CSV <https://ms.imp.ac.at/?goto=msamanda>`_ ✅ ❌
`mzIdentML <https://psidev.info/mzidentml>`_ ✅ ✅
`Peptide Record <https://psm-utils.readthedocs.io/en/latest/api/psm_utils.io/#module-psm_utils.io.peptide_record>`_ ✅ ✅
`Percolator tab <https://github.com/percolator/percolator/wiki/Interface>`_ ✅ ✅
`TSV <https://psm-utils.readthedocs.io/en/latest/api/psm_utils.io/#module-psm_utils.io.tsv>`_ ✅ ✅
`X!Tandem XML <https://www.thegpm.org/tandem/>`_ ✅ ❌
===================================================================================================================== =============== ===============
===================================================================================================================== =============== =============== ===============
File format psm_utils tag Read support Write support
===================================================================================================================== =============== =============== ===============
`OpenMS idXML <https://www.openms.de/>`_ ``idxml`` ✅ ❌
`MaxQuant msms.txt <https://www.maxquant.org/>`_ ``msms`` ✅ ❌
`MS Amanda CSV <https://ms.imp.ac.at/?goto=msamanda>`_ ``msamanda`` ✅ ❌
`mzIdentML <https://psidev.info/mzidentml>`_ ``mzid`` ✅ ✅
`Peptide Record <https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io/#module-psm_utils.io.peptide_record>`_ ``peprec`` ✅ ✅
`Percolator tab <https://github.com/percolator/percolator/wiki/Interface>`_ ``percolator`` ✅ ✅
`TSV <https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io/#module-psm_utils.io.tsv>`_ ``tsv`` ✅ ✅
`X!Tandem XML <https://www.thegpm.org/tandem/>`_ ``xtandem`` ✅ ❌
===================================================================================================================== =============== =============== ===============

Legend: ✅ Supported, ❌ Unsupported, 🔧 Work in progress

Expand All @@ -117,15 +121,22 @@ above to get started!
Installation
############

Install with pip:
.. image:: https://img.shields.io/badge/install%20with-pip-brightgreen?style=flat-square
:alt: Install with pip
:target: https://pypi.org/project/psm-utils/

.. code-block:: sh
pip install psm-utils
Note: In the PyPI package name, a hyphen is used instead of an underscore, as
per `PEP8 convention <https://peps.python.org/pep-0008/#package-and-module-names>`_.
.. image:: https://img.shields.io/badge/install%20with-bioconda-blue?style=flat-square
:alt: Install with Bioconda
:target: http://bioconda.github.io/recipes/psm-utils/README.html

.. code-block:: sh
conda install -c bioconda install psm-utils
Expand All @@ -134,3 +145,13 @@ Full documentation

The full documentation, including a quickstart guide and Python API reference
is available on `psm_utils.readthedocs.io <https://psm-utils.readthedocs.io>`_.


Citation
########

If you use psm_utils for your research, please cite the following publication:

| **psm_utils: A high level Python API for parsing and handling peptide-spectrum-matches and proteomics search results.**
| Ralf Gabriels, Arthur Declercq, Robbin Bouwmeester, Sven Degroeve, Lennart Martens.
| ChemRxiv (2022). `doi:10.26434/chemrxiv-2022-9506r <https://doi.org/10.26434/chemrxiv-2022-9506r>`_
3 changes: 3 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
:caption: About
:hidden:
:includehidden:
:maxdepth: 2

Readme <self>
quickstart
Expand All @@ -17,6 +18,7 @@
:hidden:
:includehidden:
:glob:
:maxdepth: 2

api/*

Expand All @@ -25,5 +27,6 @@
:caption: Command line interface
:hidden:
:includehidden:
:maxdepth: 2

cli/cli
64 changes: 61 additions & 3 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,68 @@ match information:
The spectrum can be retrieved by the USI through the ProteomeXchange USI aggregator:
http://proteomecentral.proteomexchange.org/usi/?usi=mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09:scan:17555:VLHPLEGAVVIIFK/2
Note that this is only possible because the spectrum has been fully indexed in one of
the ProteomeXchange partner repositories (in this case both MassIVE and PeptideAtlas).


PSMList and psm_utils.io
########################
PSMList
#######

:py:class:`~psm_utils.psm.PSMList` is a simple list-like object that represents a
group of PSMs, from one or more mass spectrometry runs or collections. This simple,
Pythonic data structure can be flexibly implemented in various contexts.

.. code-block:: python
>>> psm_list = PSMList(psm_list=[
... PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2),
... PSM(peptidoform="CDEFR", spectrum_id=2, score=132.9, retention_time=1225.4),
... PSM(peptidoform="DEM[Oxidation]K", spectrum_id=3, score=55.7, retention_time=3389.1),
... ])
:py:class:`PSMList` directly supports iteration:

.. code-block:: python
>>> for psm in psm_list:
... print(psm.peptidoform.score)
140.2
132.9
55.7
:py:class:`PSM` properties can be accessed as a single Numpy array:

.. code-block:: python
>>> psm_list["score"]
array([140.2, 132.9, 55.7], dtype=object)
:py:class:`PSMList` supports indexing and slicing:

.. code-block:: python
>>> psm_list_subset = psm_list[0:2]
>>> psm_list_subset["score"]
array([140.2, 132.9], dtype=object)
>>> psm_list_subset = psm_list[0, 2]
>>> psm_list_subset["score"]
array([140.2, 55.7], dtype=object)
For more advanced and efficient vectorized access, converting the
:py:class:`PSMList` to a Pandas DataFrame is highly recommended:

.. code-block:: python
>>> psm_df = psm_list.to_dataframe()
>>> psm_df[(psm_df["retention_time"] < 2000) & (psm_df["score"] > 10)]
peptidoform spectrum_id run collection spectrum is_decoy score qvalue pep precursor_mz retention_time protein_list rank source provenance_data metadata rescoring_features
0 ACDK 1 None None None None 140.2 None None None 600.0 None None None None None None
1 CDEFR 2 None None None None 132.9 None None None 1225.0 None None None None None None
psm_utils.io
############

The :py:mod:`psm_utils.io` subpackage contains readers and writers for various
PSM file formats (see :ref:`Supported file formats`). Each reader parses the
Expand Down Expand Up @@ -173,7 +231,7 @@ Using a molecular formula or mass shift:
621.24282637892


A drawback of using the mass shift is that the composition is not be resolvable:
A drawback of using the mass shift is that the composition is not resolvable:

>>> Peptidoform("AC[+57.021464]DEK/2").composition
[...]
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added example_files/QExHF04054_tandem.idXML.gz
Binary file not shown.
Binary file added example_files/msms.txt.gz
Binary file not shown.
1 change: 1 addition & 0 deletions online/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ def __init__(self) -> None:
"output_filetype",
"fdr_threshold",
"reverse",
"log_scale",
"file_state",
"psm_list",
"psm_df",
Expand Down
20 changes: 13 additions & 7 deletions online/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,21 @@ def score_histogram(psm_df):

def pp_plot(psm_df):
"""Generate PP plot for given PSM dataframe."""
decoy_ratio = np.count_nonzero(psm_df["is_decoy"]) / len(psm_df)
if decoy_ratio == 0:
n_decoys = np.count_nonzero(psm_df["is_decoy"])
n_targets = len(psm_df) - n_decoys
pi_zero = n_decoys / n_targets
if n_decoys == 0:
raise ValueError("No decoy PSMs found in PSM file.")
target_scores = psm_df["score"][~psm_df["is_decoy"]]
decoy_scores = psm_df["score"][psm_df["is_decoy"]]
if len(psm_df) > 5000:
target_scores_sample = psm_df["score"][~psm_df["is_decoy"]].sample(5000)
target_ecdf = ECDF(target_scores)(target_scores_sample)
decoy_ecdf = ECDF(decoy_scores)(target_scores_sample)
if len(psm_df) > 1000:
target_scores_quantiles = psm_df["score"][~psm_df["is_decoy"]].quantile(
np.linspace(0, 1, 1000)
)
else:
target_scores_quantiles = target_scores
target_ecdf = ECDF(target_scores)(target_scores_quantiles)
decoy_ecdf = ECDF(decoy_scores)(target_scores_quantiles)

fig = go.Figure()
fig.add_trace(
Expand All @@ -78,7 +84,7 @@ def pp_plot(psm_df):
fig.add_trace(
go.Scatter(
x=[0, 1],
y=[0, decoy_ratio],
y=[0, pi_zero],
mode="lines",
line=go.scatter.Line(color="red"),
showlegend=True,
Expand Down
Loading

0 comments on commit b231291

Please sign in to comment.