Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update column for species in GTDB source #331

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

jplfaria
Copy link
Contributor

Summary

In the current implementation the species level GTDB terms have mappings to NCBI taxonomy IDs. This is done by retrieving the NCBI taxon IDs from the column ncbi_taxid in the source files:

#: AR stands for archea
GTDB_AR_URL = "https://data.gtdb.ecogenomic.org/releases/latest/ar53_metadata.tsv.gz"
#: BAC stands for bacteria
GTDB_BAC_URL = "https://data.gtdb.ecogenomic.org/releases/latest/bac120_metadata.tsv.gz"

I recently realized there is a specific column named ncbi_species_taxid and that the current ncbi_taxid doesnt always represent NCBI taxonomy at the species level. The correct column ncbi_species_taxid should be the one used for this purpose.

I want to use ncbi_taxid for a different purpose but that will be part of a different PR/discussion. For now the goal of this PR is to fix the current implemententation to correctly map to ncbi_species_taxid as originally intended.

…current field is the wrong one should be ncbi_species_taxid instead of current ncbi_taxid
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (main@ad0226d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/pyobo/sources/gtdb.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #331   +/-   ##
=======================================
  Coverage        ?   51.64%           
=======================================
  Files           ?      182           
  Lines           ?    11763           
  Branches        ?     1769           
=======================================
  Hits            ?     6075           
  Misses          ?     5478           
  Partials        ?      210           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cthoyt cthoyt changed the title updating column ID that maps species level gtdb taxons to ncbitaxon. … Update column for species in GTDB source Jan 22, 2025
@cthoyt
Copy link
Member

cthoyt commented Jan 22, 2025

@jplfaria thanks!

@cthoyt cthoyt merged commit 4e7558a into biopragmatics:main Jan 22, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants