-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestion: aliases in @SQ section #100
Comments
I think we should have this for the contigs in VCF too. |
This is IMHO a rather good idea, although we'd have to come up with a different separator as e.g. NCBI likes to use We'd also have to think about what rules are necessary around distinctness, and to be explicit whether alignment records in SAM files would be allowed to use these aliases in RNAME/RNEXT (surely not!). |
It seems NCBI will be phasing out their |
are aliases known at write time? seems like aliases are read time decision to me. |
Most programs don't currently know about them when they make BAM files, but that could be changed. I have to mention that dealing with different naming schemes is a huge annoyance, particularly on things like Galaxy where you have novice users who are often not aware of this issue and inevitably need to be walked through properly munging things. |
We discussed this at some length back at our September meeting (and I'm finally writing up my notes from then). We continue to mostly like the idea in principle, but need to spell out rules around uniqueness and so on. We noted that we can use this as an opportunity to define the alias regexp to be the excellently tight regexp (disallowing especially There was concern expressed that adding this reduces pressure on GRC to agree on a “chr” vs “” prefix. The opposing view to that is to admit that the world has not yet agreed on which end of the prefix to crack open and providing tools to reduce the ensuing pain is useful, as espoused by others on this thread. In PR #103, @lindenb proposes the following text (thanks!):
I've taken this on, and will propose additions to this that spells out distinctness requirements etc. |
Enables tools to allow users to make queries with e.g. "1" or "chr1" interchangeably. Also allows for the possibility of tools using an alias when displaying sequence names to the user. Hat tip @lindenb, fixes samtools#100. However aliases must not appear elsewhere within the SAM file, in particular not in RNAME/RNEXT fields. This ensures that files will still be parsed correctly by non-@SQ-AN-aware tools.
Enables tools to allow users to make queries with e.g. "1" or "chr1" interchangeably. Also allows for the possibility of tools using an alias when displaying sequence names to the user. Hat tip @lindenb, fixes samtools#100. However aliases must not appear elsewhere within the SAM file, in particular not in RNAME/RNEXT fields. This ensures that files will still be parsed correctly by non-@SQ-AN-aware tools.
Enables tools to allow users to make queries with e.g. "1" or "chr1" interchangeably. Also allows for the possibility of tools using an alias when displaying sequence names to the user. Hat tip @lindenb, fixes samtools#100. However aliases must not appear elsewhere within the SAM file, in particular not in RNAME/RNEXT fields. This ensures that files will still be parsed correctly by non-@SQ-AN-aware tools.
Enables tools to allow users to make queries with e.g. "1" or "chr1" interchangeably. Also allows for the possibility of tools using an alias when displaying sequence names to the user. Hat tip @lindenb, fixes samtools#100. However aliases must not appear elsewhere within the SAM file, in particular not in RNAME/RNEXT fields. This ensures that files will still be parsed correctly by non-@SQ-AN-aware tools.
Enables tools to allow users to make queries with e.g. "1" or "chr1" interchangeably. Also allows for the possibility of tools using an alias when displaying sequence names to the user. Hat tip @lindenb, fixes samtools#100. However aliases must not appear elsewhere within the SAM file, in particular not in RNAME/RNEXT fields. This ensures that files will still be parsed correctly by non-@SQ-AN-aware tools.
A suggestion for the Sam Sequence dictionary: would it be possible to add an optional list of aliases in the @sq header ? softwares would use those aliases to fix the various nomenclatures (UCSC, ENSEMBL )
Something like:
then
would return the same ouput than
The text was updated successfully, but these errors were encountered: