Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functionality re: lineages that match but have addl taxonomic detail? #1027

Merged
merged 4 commits into from
Jun 19, 2020

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Jun 18, 2020

So, I hadn't thought much about gather and needed to wrap my head around the gather_assignments functionality when lineages match, but one is more detailed:

We keep the more detailed info, so a;b;d + a;b;d;e yields a;b;d;e

This makes decent sense to me, as we always want to gather the best matches, and can summarize up later. But it's a little unintuitive to me that the count of a;b;d is 0 (abund test 4).

@ctb can you explain pls?

@codecov
Copy link

codecov bot commented Jun 18, 2020

Codecov Report

Merging #1027 into lca_summarize_weighted will decrease coverage by 0.33%.
The diff coverage is n/a.

Impacted file tree graph

@@                    Coverage Diff                     @@
##           lca_summarize_weighted    #1027      +/-   ##
==========================================================
- Coverage                   92.34%   92.01%   -0.34%     
==========================================================
  Files                          72       72              
  Lines                        5446     5446              
==========================================================
- Hits                         5029     5011      -18     
- Misses                        417      435      +18     
Impacted Files Coverage Δ
sourmash/nodegraph.py 77.67% <0.00%> (-16.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27e48ca...e13d34c. Read the comment docs.

@ctb
Copy link
Contributor

ctb commented Jun 19, 2020

nice tests! I love this way of asking questions 😂

this is based on my understanding of how kraken does things; see the first figure in http://ivory.idyll.org/blog/2020-sourmash-gtdb-oddities.html, where the leaf node for the highest weighted path is the one that is used.

See the make_lca code for details - it builds the LCA based on branching paths in the tree.

I added an explicit new test in e13d34c to make it clear that this is intentional behavior from make_lca.

@ctb ctb merged commit 0e39def into lca_summarize_weighted Jun 19, 2020
@ctb ctb deleted the testing_lca_abund branch June 19, 2020 13:29
ctb added a commit that referenced this pull request Jun 20, 2020
* an initial refactoring to pass abundance weighting around

* add exception to flag non-flattned signatures

* some tests for --with-abundance

* add new test module for lca functions specifically

* some tests for abund behavior. conflicted about results.

* weight hashvals properly

* fix lca summarize tests

* revise commentary

* put counting in original count_lca_for_assignments

* revert a lot of the original changes to command_summarize :)

* all tests working

* add a test for --with-abundance with real data

* fix calculation of total hashvals to get ratios working properly

* fixin' the tests

* add tests for a fake set of abundances (constructed/simulated)

* update summarize docs

* adjust language

* added and tested output messages

* functionality re: lineages that match but have addl taxonomic detail? (#1027)

* add test

* simpler test to showcase

* improve test decription

* comments & add test

Co-authored-by: C. Titus Brown <[email protected]>

Co-authored-by: Tessa Pierce <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants