NEW: Begin adding label_seqs #49

Oddant1 · 2020-07-07T21:57:29Z

No description provided.

gregcaporaso · 2020-07-08T14:03:12Z

This is looking good to me so far @Oddant1. I like the label/delabel functionality being available from the same action. Things I noticed that are missing at this point are:

descriptions for the new action in plugin_setup.py
a test for the case where the metadata is missing an id that is present in the sequences (an error message indicating the first sequence id that is missing from the metadata would be ideal)
a test for the case where a column requested by the user is not present in the metadata (an error message indicating the column name would be ideal)

Let me know if I can do anything to help - happy to have another look this morning.

Oddant1 · 2020-07-08T17:45:11Z

@gregcaporaso it turns out in regards to request 2

a test for the case where the metadata is missing an id that is present in the sequences (an error message indicating the first sequence id that is missing from the metadata would be ideal)

It's just as easy to display every id that is present in the sequences and not in the metadata. Should we still only display the first? I'm in favor of displaying all of them because if we only display the first and there are a very large number missing we could lull the user into thinking it's a small problem when in reality their data is completely blown up. If we show them all the user will immediately know the scale of the problem.

Additionally, do we want a separate error for ids present in the metadata but not in the sequences? Or do we want to ignore this case? It seems like if we're going to worry about the sequence and metadata ids matching at all we should be making sure they match fully, but I don't know.

ALSO NOTE: The description text has not been written yet.

gregcaporaso · 2020-07-08T18:45:33Z

It's just as easy to display every id that is present in the sequences and not in the metadata. Should we still only display the first? I'm in favor of displaying all of them...

Yep, I agree - let's display all of them if it's just as easy.

Additionally, do we want a separate error for ids present in the metadata but not in the sequences?

No, we usually just ignore that case since we don't want to encourage having multiple copies of metadata because they tend to get out of sync really easily. That'll be particularly important here since after the genome-sampler run the user will have a small set of context sequences, but still have the metadata for all of the context sequences.

gregcaporaso

This looks good to me overall, just a few suggestions here and there on error messages and descriptions that I think will improve clarity.

Do we need a test that ensures sequence description fields are handled correctly? I'm not sure if those end up in the formats that we're working with here. For example, what happens if the sequence header line looks like:

>id1 hello world

In this case the hello world shouldn't be considered part of the id, but I don't know if it ends up in the relabeled header or not. Seems like it wouldn't hurt to keep it, but I don't really have a strong feeling either way.

genome_sampler/plugin_setup.py

genome_sampler/label_seqs.py

Oddant1 · 2020-07-08T20:24:40Z

I don't know if it ends up in the relabeled header or not.

It should end up in the relabeled header. In this code it would basically be treated as a part of the id.

gregcaporaso · 2020-07-08T20:32:06Z

In this code it would basically be treated as a part of the id.

That would probably be an issue since the metadata wouldn't have it, but I think that might not actually be what happens. I think the transformer to the Series differentiates the id and description fields and uses the id field only as the index. This is probably a non-issue - I think the description ends up getting stripped in this process, which is probably fine.

ebolyen · 2020-07-08T20:36:08Z

Yep description is dropped in all cases here, so it isn't an issue. The transformer we are using also lacks a way to preserve it. This is consistent with some of the filter operations in q2-feature-table.

ebolyen · 2020-07-08T20:37:41Z

actually on second thought, the transformer may be able to preserve it if we returned an skbio object, but I'm not especially motivated to preserve that...

gregcaporaso · 2020-07-08T20:54:06Z

I'm not especially motivated to preserve that

I agree - let's not worry about it now.

genome_sampler/plugin_setup.py

gregcaporaso · 2020-07-08T20:57:34Z

Looks good to me, I just had one more minor comment. Good to go after that - I'm about to get on a call so I'll approve now but be sure to hit that last comment.

Oddant1 · 2020-07-08T21:20:19Z

@gregcaporaso the backticks in the description text don't underline. It's a bug in the cli. I will backtick it anyway to make it stand out.

gregcaporaso · 2020-07-08T22:39:36Z

Is this one ready for merge, or is @ebolyen still reviewing?

ebolyen · 2020-07-08T22:41:20Z

ready for merge! just got sidetracked

Oddant1 added 9 commits July 7, 2020 14:00

NEW: Begin adding label_seqs

4e2afec

SQUASH: Delabel test

337a5b2

SQUASH: Whitespace

ffce673

SQUASH: Whitespace

a109697

SQUASH: Add EOF blank line

e111ccd

SQUASH: Remove dead import

34b4e3e

SQUASH: Add comments for weird behavior

623d15b

SQUASH: Add test with one column

b2deb8d

SQUASH: Make sure columns and metadata are passed together

709de13

SQUASH: Addressing @gregcaporaso's feedback

dd7cbd9

Oddant1 added 2 commits July 8, 2020 11:26

SQUASH: Modify ids missing in metadata error

be99083

SQUASH: Add description text

b316375

Oddant1 added 3 commits July 8, 2020 11:56

SQUASH: Add test for more than 10 missing ids

c1d0e23

SQUASH: Remove AlignedSequence support due to framework bug

4676f58

SQUASH: Actually we want only Aligned

77729fa

gregcaporaso self-requested a review July 8, 2020 19:42

gregcaporaso requested changes Jul 8, 2020

View reviewed changes

SQUASH: Addressing @gregcaporaso's review

f12365c

gregcaporaso reviewed Jul 8, 2020

View reviewed changes

genome_sampler/plugin_setup.py Outdated Show resolved Hide resolved

gregcaporaso approved these changes Jul 8, 2020

View reviewed changes

SQUASH: Add backticks to delimiter in description

d9037f0

gregcaporaso merged commit 2c08df5 into caporaso-lab:master Jul 8, 2020

Oddant1 mentioned this pull request Jul 29, 2020

BUG: Fix bug with TypeMatch and Set or List Union qiime2/qiime2#547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW: Begin adding label_seqs #49

NEW: Begin adding label_seqs #49

Oddant1 commented Jul 7, 2020

gregcaporaso commented Jul 8, 2020 •

edited

Loading

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

gregcaporaso left a comment

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

ebolyen commented Jul 8, 2020

ebolyen commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020 •

edited

Loading

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

ebolyen commented Jul 8, 2020 via email

NEW: Begin adding label_seqs #49

NEW: Begin adding label_seqs #49

Conversation

Oddant1 commented Jul 7, 2020

gregcaporaso commented Jul 8, 2020 • edited Loading

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

gregcaporaso left a comment

Choose a reason for hiding this comment

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

ebolyen commented Jul 8, 2020

ebolyen commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020 • edited Loading

Oddant1 commented Jul 8, 2020

gregcaporaso commented Jul 8, 2020

ebolyen commented Jul 8, 2020 via email

gregcaporaso commented Jul 8, 2020 •

edited

Loading

gregcaporaso commented Jul 8, 2020 •

edited

Loading