Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use polars to go from SRA accessions to run identifiers #3485

Open
ctb opened this issue Jan 12, 2025 · 1 comment
Open

use polars to go from SRA accessions to run identifiers #3485

ctb opened this issue Jan 12, 2025 · 1 comment
Labels
code herein lies code fyi Information that is interesting or useful

Comments

@ctb
Copy link
Contributor

ctb commented Jan 12, 2025

Luiz shows us a nice command:

AllTheBacteria/AllTheBacteria#54 (comment)

@ctb ctb added fyi Information that is interesting or useful code herein lies code labels Jan 12, 2025
@bluegenes
Copy link
Contributor

bluegenes commented Jan 13, 2025

I modified Luiz's script to get run accs from a bioproject, worked like a charm:

# /// script
# dependencies = [
#   "polars>=1.17.1,<2",
# ]
# ///

import polars as pl

sra_prj = "PRJEB74559"

sra_metadata = pl.scan_parquet(
    "s3://sra-pub-metadata-us-east-1/sra/metadata/",
    storage_options={"skip_signature": "true"},
).select(["acc", "bioproject"])

# Filter the SRA metadata for the specified Bioproject and write to a TSV file
sra_metadata.filter(pl.col("bioproject") == sra_prj).sink_csv(
    "PRJEB74559.accs.tsv", separator="\t"
)

saved as sra-project-to-runs.py
uv run sra-project-to-runs.py produces a file that looks like this:

acc     bioproject
ERR13485822     PRJEB74559
ERR13480653     PRJEB74559
ERR13480654     PRJEB74559
ERR13480650     PRJEB74559
ERR13488553     PRJEB74559
ERR13480663     PRJEB74559

🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code herein lies code fyi Information that is interesting or useful
Projects
None yet
Development

No branches or pull requests

2 participants