Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Improve trackplot_combine layout logic #116

Merged
merged 4 commits into from
Aug 1, 2024
Merged

Conversation

bnprks
Copy link
Owner

@bnprks bnprks commented Jul 31, 2024

Add several improvements to the trackplot_combine layout logic.

For plots without sideplots, before and after are as follows:

  • Preserve vertical margins when stacking scalebars, and make sure they won't remove the x-axis label of the plot above them
  • Detect when tracks change the genomic region they are plotting and show a new x-axis appropriately
  • Fix the trackplot_gene() color legend to always show both + and - strand
  • Fix trackplot_scalebar() to change the font size as requested (before it just affected track height)
  • Switch color legends to vertical layout by default, and only use horizontal layout when a side plot is present
  • Shorten the y-axis label for trackplot_coverage() to reduce (though not eliminate) the likelihood of the label getting cut off

And, when side_plot is present:

  • Use horizontal color legend layout
  • Pick the first plot that accepts a side-plot rather than the last (some day we might want to add an argument to manually position the side plot)
Script used for plotting demo (relies on the `pbmc-3k` vignette data)
library(BPCells)
# devtools::load_all()

frags_raw <- open_fragments_dir("vignettes/pbmc-3k-data/pbmc_3k_frags")
mat_raw <- open_matrix_dir("vignettes/pbmc-3k-data/pbmc_3k_rna_raw")

blacklist <- read_encode_blacklist("vignettes/pbmc-3k-data/references", genome="hg38")
chrom_sizes <- read_ucsc_chrom_sizes("vignettes/pbmc-3k-data/references", genome="hg38")
transcripts <- read_gencode_transcripts("vignettes/pbmc-3k-data/references", release="42")
genes <- read_gencode_genes("vignettes/pbmc-3k-data/references", release="42")

atac_qc <- qc_scATAC(frags_raw, genes, blacklist)

pass_atac <- atac_qc |>
    dplyr::filter(nFrags > 1000, TSSEnrichment > 10) |>
    dplyr::pull(cellName)
pass_rna <- colnames(mat_raw)[colSums(mat_raw) > 1e3]
keeper_cells <- intersect(pass_atac, pass_rna)

frags <- frags_raw |> select_cells(keeper_cells)
keeper_genes <- rowSums(mat_raw) > 3
mat <- mat_raw[keeper_genes,keeper_cells]


######################################
# RNA-based clustering
######################################
# Normalize by reads-per-cell
mat <- multiply_cols(mat, 1/Matrix::colSums(mat))
# Log normalization
mat <- log1p(mat * 10000)
stats <- matrix_stats(mat, row_stats="variance")

# To keep the example small, we'll do a very naive variable gene selection
variable_genes <- order(stats$row_stats["variance",], decreasing=TRUE) |> 
  head(1000) |> 
  sort()
mat_norm <- mat[variable_genes,]
mat_norm <- mat_norm |> write_matrix_memory(compress=FALSE)
gene_means <- stats$row_stats["mean",variable_genes]
gene_vars <- stats$row_stats["variance", variable_genes]
mat_norm <- (mat_norm - gene_means) / gene_vars

svd <- BPCells::svds(mat_norm, k=50)
pca <- multiply_cols(svd$v, svd$d)

clusts <- knn_hnsw(pca, ef=500) |> # Find approximate nearest neighbors
  knn_to_snn_graph() |> # Convert to a SNN graph
  cluster_graph_louvain() # Perform graph-based clustering

cluster_annotations <- c("1" = "T", "2" = "CD8 T", "3" = "B", "4" = "T", "5" = "NK", "6" = "Mono", "7" = "Mono", "8" = "Mono", "9" = "T", "10" = "DC", "11" = "Mono", "12" = "DC")
cell_types <- cluster_annotations[clusts]


region <- gene_region(genes, "CD19", extend_bp = 1e5)
region_narrow <- list(chr="chr16", start=28931964-100, end=28931964+100)
read_counts <- atac_qc$nFrags[
  match(cellNames(frags), atac_qc$cellName)
]

coverage_plot <- trackplot_coverage(
  frags,
  region = region, 
  groups=cell_types,
  read_counts,
  bins=500
)

gene_plot <- trackplot_gene(transcripts, region)
scalebar_plot <- trackplot_scalebar(region)

fake_peaks <- c(28846253, 28863641, 28925327, 28950995, 28974593)
fake_loops <- tibble::tibble(
  chr = "chr16",
  start = fake_peaks[c(1,1,2,3)],
  end = fake_peaks[c(2,3,4,5)],
  score = c(4,1,3,2)
)
loop_plot <- trackplot_loop(fake_loops, region, color_by="score")


region_narrow <- list(chr="chr16", start=28931964-500, end=28931964+500)
cell_subset <- cellNames(frags)[cell_types %in% c("B", "DC")]
coverage_plot_narrow <- trackplot_coverage(
    frags |> select_cells(cell_subset), region=region_narrow, groups=cell_types[cell_types %in% c("B", "DC")], read_counts[cell_types %in% c("B", "DC")], bins=200, 
)
gene_plot_narrow <- trackplot_gene(transcripts, region_narrow)
scalebar_narrow <- trackplot_scalebar(region_narrow, font_pt=8)

expression <- collect_features(mat, "CD19")
names(expression) <- "gene"
expression_plot <- ggplot2::ggplot(expression, ggplot2::aes(cell_types, gene, fill=cell_types)) +
    ggplot2::geom_boxplot() + 
    ggplot2::guides(y="none", fill="none") + 
    ggplot2::labs(x=NULL, y="RNA") +
    ggplot2::scale_fill_manual(values=discrete_palette("stallion"), drop=FALSE) +
    BPCells:::trackplot_theme()

plot1 <- trackplot_combine(
  list(
    scalebar_plot, 
    coverage_plot, 
    gene_plot,
    loop_plot,
    coverage_plot_narrow,
    gene_plot_narrow,
    scalebar_narrow
  )
)
ggplot2::ggsave(plot=plot1, "pre_trackplot.png", width=8, height=8, units="in")

plot2 <- trackplot_combine(
  list(
    scalebar_plot, 
    coverage_plot, 
    gene_plot,
    loop_plot,
    coverage_plot_narrow,
    gene_plot_narrow,
    scalebar_narrow
  ),
  side_plot = expression_plot
)
ggplot2::ggsave(plot=plot2, "pre_trackplot_with_side.png", width=8, height=8, units="in")
Before (no sideplot)

pre_trackplot

After (no sideplot)

trackplot

Before (with sideplot)

pre_trackplot_with_side

After (with sideplot)

trackplot_with_side

After with a low sideplot (see code from comment below)

trackplot_with_low_side

(Edited to update plot previews with the later improvements)

@bnprks bnprks marked this pull request as ready for review July 31, 2024 20:17
@bnprks
Copy link
Owner Author

bnprks commented Jul 31, 2024

From offline suggestion, just added some smarter logic to decide whether to put the color legend above/below the side plot. Earlier plots are unchanged, but this new variant will look better:

Additional code
expression_plot_subset <- ggplot2::ggplot(expression[cell_types %in% c("B", "DC"),], ggplot2::aes(cell_types[cell_types %in% c("B", "DC")], gene, fill=cell_types[cell_types %in% c("B", "DC")])) +
    ggplot2::geom_boxplot() + 
    ggplot2::guides(y="none", fill="none") + 
    ggplot2::labs(x=NULL, y="RNA") +
    ggplot2::scale_fill_manual(values=discrete_palette("stallion"), drop=FALSE) +
    BPCells:::trackplot_theme()

coverage_no_sidebar <- coverage_plot
coverage_no_sidebar$trackplot$takes_sideplot <- FALSE
plot3 <- trackplot_combine(
  list(
    scalebar_plot, 
    coverage_no_sidebar, 
    gene_plot,
    loop_plot,
    coverage_plot_narrow,
    gene_plot_narrow,
    scalebar_narrow
  ),
  side_plot = expression_plot_subset
)
ggplot2::ggsave(plot=plot3, "pre_trackplot_with_low_side.png", width=8, height=8, units="in")
Plot with low side plot

pre_trackplot_with_low_side

@bnprks
Copy link
Owner Author

bnprks commented Jul 31, 2024

And it turns out we can fix some of the y-axis labels getting cut off. Patchwork puts plots later in the list on top of plots earlier in the list. So I added a shuffle to put the plots without y-axis labels first in the list

New plot with fix for label cutoffs

trackplot

# Reduce cut-off y-axis labels. Put plots with y axis labels later in the plot list, as they will take layer priority with patchwork
has_y_axis <- vapply(tracks, function(t) is(t, "ggplot") && !is.null(t$labels$y), logical(1))
tracks <- c(tracks[!has_y_axis], tracks[has_y_axis])
areas <- c(areas[!has_y_axis], areas[has_y_axis])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's elegant!

Copy link
Collaborator

@immanuelazn immanuelazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with the trackplotting code a little bit and it looks really good now. Even fixed all the issues I mentioned on slack, with the cut off coverage labels and legends. Great job Ben

if (tracks[[i]]$trackplot$keep_vertical_margin) {
collapse_upper_margin[i] <- FALSE
if (i < length(tracks)) collapse_upper_margin[i+1] <- FALSE
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pragmatically speaking and unrelated, isn't it possible to have this done to group by different regions, and only collapse within region? Would probably remove the dependence on comparing last_region

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the function so that plots are always shown top-to-bottom in the order given, so I didn't want to do any re-ordering in that way. My aim with this is to do as much collapsing as is possible while to keeping the order the same

@bnprks bnprks merged commit 1ae6c15 into main Aug 1, 2024
@bnprks bnprks deleted the bp/trackplot-combine-tweaks branch August 1, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants