You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using blobtoolkit as a pipeline on a server and ran it successfully on 2 genomes but now I am getting errors related to the parsing of the blastp results.
My blobtoolkit install was crated like this in a mamba environment (following https://blobtoolkit.genomehubs.org/install/)
pip3 install blobtoolkit[full]
mamba install -c tolkit blobtk
the file where the error pops up:
blobtoolkit/lib/python3.9/site-packages/blobtools/lib/hits.py
the log: data//blobtools/logs//run_blobtools_create.log
Reading all TSV files in ../window_stats
Loading parsed taxdump
Traceback (most recent call last):
File "/home/ek/progz/conda_envs/blobtoolkit/bin/blobtools", line 8, in <module>
sys.exit(cli())
File "/home/ek/progz/conda_envs/blobtoolkit/lib/python3.9/site-packages/blobtools/blobtools.py", line 105, in cli
sys.exit(subcommand())
File "/home/ek/progz/conda_envs/blobtoolkit/lib/python3.9/site-packages/blobtools/lib/add.py", line 203, in cli
main(args)
File "/home/ek/progz/conda_envs/blobtoolkit/lib/python3.9/site-packages/blobtools/lib/add.py", line 149, in main
parsed = field["module"].parse(
File "/home/ek/progz/conda_envs/blobtoolkit/lib/python3.9/site-packages/blobtools/lib/hits.py", line 541, in parse
blast = parse_blast(
File "/home/ek/progz/conda_envs/blobtoolkit/lib/python3.9/site-packages/blobtools/lib/hits.py", line 59, in parse_blast
"end": int(re.sub(r'\d', '', end)),
ValueError: invalid literal for int() with base 10: '|+'
Diamond_blastp output file (part) causing the error:
Shown are 2 lines only: 2 lines, first line=no problem, second line and following "Contig:start-end" eg AxFerruginea009:9075748-9076881|+=1890943at2=single.
If I remove the "|+" then other errors are thrown.
Other species/genomes have te same error due to similar looking blastp outputs.
In another species which eventually worked, our work around was modifying the parsing "end": int(end), to "end": int(re.sub(r'\d', '', end)),
if ":" in query and "=" in query:
# parse blastp
parts = query.split("=")
if query in bitscores and score <= bitscores[query]:
continue
if len(parts) == 3 and parts[2] == "fragmented":
continue
bitscores[query] = score
seq_id, start, end = re.split(r"[:-]", parts[0])
hit = {
"subject": row[cols["sseqid"]],
"score": score,
"start": int(start),
"end": int(re.sub(r'\d', '', end)),
"file": index,
"title": parts[1],
}
I am stuck with the above error now. Any ideas how to fix this?
The text was updated successfully, but these errors were encountered:
estolle
changed the title
blobtoolkit pipeline error related to parsing of diamond_blatp output (hits.py, "end"=int(end) fails)
blobtoolkit pipeline error related to parsing of diamond_blastp output (hits.py, "end"=int(end) fails)
Oct 16, 2024
Sorry I'd let this one slip by for a while. Based on the comments in #223, it looks like this is being introduced by changes in the newer version of BUSCO, I've push a potential fix, just need to test it in a container build before making a new pip release
Hi
I am using blobtoolkit as a pipeline on a server and ran it successfully on 2 genomes but now I am getting errors related to the parsing of the blastp results.
My blobtoolkit install was crated like this in a mamba environment (following https://blobtoolkit.genomehubs.org/install/)
pip3 install blobtoolkit[full]
mamba install -c tolkit blobtk
the file where the error pops up:
blobtoolkit/lib/python3.9/site-packages/blobtools/lib/hits.py
the log: data//blobtools/logs//run_blobtools_create.log
Diamond_blastp output file (part) causing the error:
Shown are 2 lines only: 2 lines, first line=no problem, second line and following "Contig:start-end" eg AxFerruginea009:9075748-9076881|+=1890943at2=single.
If I remove the "|+" then other errors are thrown.
Other species/genomes have te same error due to similar looking blastp outputs.
In another species which eventually worked, our work around was modifying the parsing
"end": int(end),
to"end": int(re.sub(r'\d', '', end)),
I am stuck with the above error now. Any ideas how to fix this?
The text was updated successfully, but these errors were encountered: