Update column for species in GTDB source #331
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
In the current implementation the species level GTDB terms have mappings to NCBI taxonomy IDs. This is done by retrieving the NCBI taxon IDs from the column
ncbi_taxid
in the source files:I recently realized there is a specific column named
ncbi_species_taxid
and that the currentncbi_taxid
doesnt always represent NCBI taxonomy at the species level. The correct columnncbi_species_taxid
should be the one used for this purpose.I want to use
ncbi_taxid
for a different purpose but that will be part of a different PR/discussion. For now the goal of this PR is to fix the current implemententation to correctly map toncbi_species_taxid
as originally intended.