semi_t_matrices_neg.Rmd

---
title: "Differential abundance analysis, semi-targeted approach"
subtitle: "Negative polarity"
author: "Christa Malfertheiner, Johannes Rainer"
output:
  BiocStyle::html_document:
    toc: true
    number_sections: false
    toc_float: true
bibliography: references.bib
csl: biomed-central.csl
references:
  - family: noname
    given: noname
---

```{r setup, echo = FALSE, results = "asis", warning = FALSE}
library(BiocStyle)
BiocStyle::markdown()
knitr::opts_chunk$set(echo = TRUE, message = FALSE, dev = c("png", "pdf"))
```

```{r parameters, echo = FALSE, warning = FALSE}
## Set general parameters
polarity <- "NEG" # specify "POS" or "NEG"
p.cut <- 0.05     # cut-off for significance.
m.cut <- 0.7      # cut-off for log2 fold change
set.seed(123)
## Setting golden ratio to save images
phi <- (1+sqrt(5))/2
FILE_NAME <- "semi_t_matrices_neg"
## Define paths:
IMAGE_PATH <- paste0("images/", FILE_NAME, "/")
if (dir.exists(IMAGE_PATH)) unlink(IMAGE_PATH, recursive = TRUE, force = TRUE)
dir.create(IMAGE_PATH, recursive = TRUE, showWarnings = FALSE)
RDATA_PATH <- paste0("data/RData/", FILE_NAME, "/")
dir.create(RDATA_PATH, recursive = TRUE, showWarnings = FALSE)
RESULT_PATH <- paste0("data/results/", FILE_NAME, "/")
dir.create(RESULT_PATH, recursive = TRUE, showWarnings = FALSE)
FIG_PATH <- paste0(IMAGE_PATH, "FIG/")
dir.create(FIG_PATH, showWarnings = FALSE)
```

# Introduction

In this document we perform the differential abundance analysis of the features
previously identified for the *MitYOU* project, with the aim of identifying
significant sample source-related features. This task is performed by hypothesis
testing, where we try to identify which metabolites have the most different 
concentrations between plasma samples, venous and capillary blood samples. 
We follow a semi-targeted approach, where we look at concentrations of features
corresponding to lab-internal set of standards.


# Data import

First, we load the required packages and the data, after preprocessing and
normalization. The end result of these steps is a `SummarizedExperiment` that
contains aligned data, where features are grouped (after correspondence), and
that have undergone gap filling, normalization by the median, linear fitting and 
per-feature between-batch normalization to remove any unwanted variability. 
The `SummarizedExperiment` lets us store all the information regarding the 
normalization steps in the form of `assays`, which we are still able to access 
to proceed with the analysis.

```{r load-data, echo = FALSE, warning = FALSE}
library(xcms)
library(limma)
library(pheatmap)
library(writexl)
library(SummarizedExperiment)
library(RColorBrewer)
library(MsFeatures)
library(CompMetaboTools)
library(pander)
setMSnbaseFastLoad(FALSE)
load("data/RData/vams_normalization_neg/res_neg.RData")
res_neg$sample_pair <- paste0(res_neg$source, ".", res_neg$sample)
```

It is important now to remove the `POOL` samples from the dataset, because the
analysis has to be performed only on study samples; the `POOL` samples, though
are still required to evaluate the goodness of the detected features, therefore
they will be stored in a separate `SummarizedExperiment` object that can be
accessed when needed.
We also exclude the `RBC` samples from our analysis, as these samples showed
a noisy signal beforehand. We also store them separatedly in a 
`SummarizedExperiment` object

We also assign the colours as seen before.

```{r split-qc, echo = TRUE}
res_qc <- res_neg[, res_neg$source == "all"]
res_rbc <- res_neg[, res_neg$source == "RBC"]
res_neg <- res_neg[, res_neg$source != "all"]
res_neg <- res_neg[, res_neg$source != "RBC"]
res_neg$source <- factor(as.character(res_neg$source))
res_neg$sex <- factor(as.character(res_neg$sex))

col_source <- brewer.pal(6, name = "Set1")[c(6, 2, 1)]
names(col_source) <- c("plasma",        #' yellow
                       "venous",        #' blue
                       "capillary")     #' red

col_sex <- brewer.pal(4, name = "Set1") [c(1, 2, 3)]
names(col_sex) <- c("F",           # red
                    "M",           # blue
                    "POOL")        # green
## Setting golden ratio to save images
phi <- (1+sqrt(5))/2
```

Next, we merge the technical replicates.

```{r}
#' Average 
averageSE <- function(x, column = character(), mainAssay = character()) {
    if (!column %in% colnames(colData(x)))
        stop("Column '", "' not found in 'colData' of 'x'")
    f <- factor(colData(x)[, column], levels = unique(colData(x)[, column]))
    ## new colData: take the first element for each replicate.
    cd <- colData(x)[match(levels(f), f), ]
    rownames(cd) <- cd[, column]
    ## loop over the assays and average them.
    a <- lapply(assays(x), function(z) {
        z <- split.data.frame(t(z), f = f)
        z <- do.call(cbind, lapply(z, colMeans, na.rm = TRUE))
        z[is.na(z)] <- NA
        z
    })
    if (length(mainAssay)) {
        tmp <- split.data.frame(t(assay(x, mainAssay)), f = f)
        tmp <- do.call(cbind, lapply(tmp, function(y) {
            apply(y, MARGIN = 2, FUN = sd, na.rm = TRUE)
        }))
        tmp[is.na(tmp)] <- NA
        a[[paste0(mainAssay, "_sd")]] <- tmp
    }
    SummarizedExperiment(assays = a, rowData = rowData(x),
                         colData = cd, metadata = metadata(x))
}
## Average technical replicates:
res_neg <- averageSE(res_neg, column = "sample_pair",
                     mainAssay = "normalized_filled")
```

The samples used in this analysis are listed below.

```{r, echo = FALSE, results = "asis"}
tab <- colData(res_neg)[, c("source", "sex", "age")]
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Samples used in this analysis")
```

# Semi-targeted analysis

First, we restrict the analysis to a set of known compounds, whose
mass-to-charge ratio and retention time have been previously measured from the
pure standards. To start, a table of standards is loaded, then
a search in the m/z and retention time dimensions is carried out to match the
features found before to these compounds. Subsequently, EICs for all the
assigned features are plotted and visually inspected: this step is necessary to
accurately pair the detected peaks to the standards. When the list of assigned
standards is complete, we perform a differential abundance analysis on the
subset of features.

First, we load the information on the standards, with the aim of identifying
features potentially matching these and plotting the EICs for all of these.

```{r known-cmps, message = FALSE, warning = FALSE}
## Extract known compunds
library("MetaboCoreUtils")
library(Rdisop)
std_info <- read.table(
    "https://raw.githubusercontent.com/EuracBiomedicalResearch/lcms-standards/master/data/standards_dilution.txt",
    sep = "\t", header = TRUE, as.is = TRUE)
std_info <- std_info[!is.na(std_info[, "NEG"]), ]
rownames(std_info) <- 1:nrow(std_info)
std_info$mzneut = NA
std_info$mz_ion = NA
for (i in seq(nrow(std_info))) {
    if (grepl("C", std_info$formula[i])) {
        std_info$mzneut[i] <- getMolecule(
            as.character(std_info$formula[i]))$exactmass
    } else {
        std_info$mzneut[i] = as.numeric(std_info$formula[i])
    }
    ## Calculate also the m/z
    std_info$mz_ion[i] <- mass2mz(
        std_info$mzneut[i], adduct = std_info[i, "NEG"])[1, 1]
}
std_info <- std_info[!is.na(std_info$mz_ion), ]
std_info <- std_info[order(std_info$name), ]
## Manually adding an additional row for Creatinine and Creatine
to_add <- std_info[std_info$name %in% c("Creatinine", "Creatine"), ]
to_add$name <- paste0(to_add$name, "-full")
std_info <- rbind(std_info, to_add)
dr <- paste0(IMAGE_PATH, "/standards/")
dir.create(dr, recursive = TRUE, showWarnings = FALSE)
##load data_neg
load("data/RData/vams_normalization_neg/data_neg_filled.RData")
## Subset to the samples we're currently analyzing.
tmp <- filterFile(data_neg, match(res_neg$mzML_file, data_neg$mzML_file),
                  keepFeatures = TRUE)
```

We next match features from our data set against m/z of the expected ions and
the retention times for the set of lab-internal standards, extract their ion
chromatogram and plot these.

```{r}
library(MetaboAnnotation)
## Match feature's m/z and rt values against expecte values for standards.
rowData(res_neg)$ft <- row.names(res_neg)
par <- MzRtParam(tolerance = 0, ppm = 20, toleranceRt = 30)
mo <- matchMz(res_neg, std_info, param = par, mzColname = c("mzmed", "mz_ion"), 
              rtColname = c("rtmed", "RT"))
## Subset to matching features.
mo <- mo[whichQuery(mo)]
mo <- pruneTarget(mo)
chrs <- featureChromatograms(tmp, features = mo$ft, expandRt = 7, filled = TRUE)
sample_colors <- col_source[tmp$source]
for (i in seq_len(length(mo$ft))) {
    chr <- chrs[i, ]
    pks <- chromPeaks(chr)
    fl <- mo$target_name[i]
    png(paste0(dr, fl, "-", mo$ft[i], ".png"),
        width = 10, height = 8, units = "cm", res = 300, pointsize = 6)
    plot(chr, col = "#00000040",
         peakCol = paste0(sample_colors[pks[, "column"]], 50),
         peakBg = paste0(sample_colors[pks[, "column"]], 10))
    abline(v = mo$target_RT[i])
    legend("topleft", legend = c(mo$ft[i], fl,
                                 paste0("rt: ", mo$target_RT[i]),
                                 paste0("mz: ", mo$target_mz_ion[i])))
    dev.off()
}
```

The EICs have been manually inspected and the best matching feature has been
manually assigned to the corresponding standard. Some features found a better
match than others, such as xyz; in this case, in fact, there is a single
peak detected, with high intensity, that is well within the theoretical m/z
range and it is very close to the theoretical retention time.

```{r assign-feature-metabolite, echo = FALSE, warning = FALSE, message = FALSE}
## This is the tricky manual thing:
## - Got through all plots for all standards and if there is one feature
##   that clearly matches (i.e. retention time close to the expected retention
##   time and a single peak present in the wider rt range) assign it.
## The code below helps creating the assignment:
## tmp <- cbind(mo$ft, mo$target_name)
## rownames(tmp)
## tmp <- tmp[order(tmp[, 2]), ]
## tmp[, 2, drop = FALSE]
to_keep <- c(
    FT00485 = "1-Methylhistidine",
    FT00948 = "1,5-anhydro D-glucitol",
    FT00066 = "2-Hydroxybutyric acid",
    FT00060 = "2-Ketobutyric acid",
    FT00066 = "3-Hydroxybutyric Acid",
    FT00485 = "3-Methylhistidine",
    FT01695 = "Acetyl-Glucosamine",
    FT00723 = "Acetylmethionine",
    FT00176 = "Acetylalanine",
    FT02496 = "Acetylneuraminic Acid",
    FT02569 = "Adenosine",
    FT00377 = "Alanine",
    FT00383 = "Allantoin",
    FT00105 = "Alpha-ketoisovaleric acid",
    FT04311 = "alpha-Lactose",
    FT10805 = "AMP",
    FT00537 = "Arginine",
    FT00188 = "Asparagine",
    FT00603 = "Aspirin",
    FT01143 = "Carnosine",
    FT04651 = "CDP",
    FT07631 = "CDP-choline",
    FT05599 = "CDP-ethanolamine",
    FT00554 = "Citrulline",
    FT01310 = "Cystine",
    FT03173 = "Dihydroxyacetone phosphate",
    FT00061 = "Dimethylglycine",
    FT00604 = "Fructose",
    FT00785 = "Gluconic Acid",
    FT01139 = "Glucose",
    FT00287 = "Glutamine",
    FT00186 = "Glutaric acid",              # new
    FT09252 = "Glutathione Oxidized",
    FT09252 = "Glutathione Reduced",
    FT00071 = "Glyceric Acid",
    FT02155 = "Glycero-phosphocholine",
    FT00362 = "Histidine",
    FT00628 = "homovanillic acid",
    FT00177 = "Hydroxyproline",
    FT00236 = "Hypoxanthine",
    FT00905 = "Indolelactic acid",
    FT01712 = "Inosine",
    FT00737 = "Isocitric Acid",
    FT00181 = "Isoleucine",
    FT00172 = "Ketoleucine",
    FT00195 = "L-Aspartic Acid",
    FT00132 = "L-Cysteine",
    FT00294 = "L-Glutamic Acid",
    FT00181 = "Leucine",
    FT00289 = "Lysine",
    FT01139 = "Mannose",
    FT00313 = "Methionine",
    FT00439 = "Methioninesulfoxide",
    FT01141 = "Myo-Inositol",
    FT00176 = "N-Acetyl-beta-alanine",
    FT00535 = "N-Acetylornithine",
    FT10215 = "NAD",
    FT00191 = "Ornithine",
    FT00334 = "p-Hydroxyphenylacetic acid",
    FT00247 = "PABA",
    FT01063 = "Pantothenic Acid",
    FT00442 = "Phenylalanine",
    FT00433 = "Phenylpyruvic acid",
    FT00474 = "Phosphoenolpyruvic Acid",
    FT01944 = "Phosphorylethanolamine",
    FT00028 = "Pyruvic Acid",
    FT00464 = "Quinolinic acid",            # new
    FT00323 = "Ribose",
    FT04217 = "SAH",
    FT00253 = "Salicylic acid",
    FT08606 = "Sedoheptulose-7-phosphate",  # new
    FT00067 = "Serine",
    FT03307 = "Sphingosine",
    FT04106 = "Sphingosine-1-phosphate",
    FT00534 = "Suberic Acid",
    FT00118 = "Succinic Acid",
    FT00384 = "Succinylacetone",
    FT04312 = "Sucrose",
    FT00145 = "Taurine",
    FT00236 = "Threonic Acid",
    FT00123 = "Threonine",
    FT00891 = "Tryptophan",
    FT02088 = "Uridine",
    FT00333 = "Xanthine"
)
## Missing from previous analysis:
## FT00612 = "Tyrosine"
## FT00629 = "Galactitol" 
mo <- filterMatches(mo, queryValue = names(to_keep),
                    targetValue = to_keep,
                    queryColname = "ft",
                    targetColname = "target_name",
                    keep = TRUE)
mo <- mo[whichQuery(mo)]
## Handling duplicates.
md <- as.data.frame(matchedData(mo, c("ft", "target_name", "target_HMDB.code",
                                      "target_formula", "target_RT",
                                      "score", "score_rt")))
md <- split(md, md$ft)
md <- do.call(rbind, lapply(md, function(z) {
    tmp <- data.frame(ft = z$ft[1L])
    tmp$name <- paste0(unique(z$target_name), collapse = ";")
    tmp$HMDB <- paste0(unique(z$target_HMDB.code), collapse = ";")
    tmp$target_formula <- paste0(unique(z$target_formula), collapse = ";")
    tmp$diff_mz <- paste0(unique(z$score), collapse = ";")
    tmp$diff_rt <- paste0(unique(z$score_rt), collapse = ";")
    tmp
}))
```

A total of `r sum(nrow(md))` standards have been identified. The features
identified and the corresponding metabolite are summarized in this table:

```{r result-table-ft-std, echo = FALSE, results = "asis"}
## Write result table
tab <- md[, c("name"), drop = FALSE]
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features assigned to known compounds")
```

Next, only the features assigned to the standards are taken into consideration
and are subsetted in the `std_res` object.

```{r std-subset, echo = FALSE}
std_res <- query(mo)
rowData(std_res) <- cbind(rowData(std_res),
                          md[rownames(std_res), ])
```

The subsetting reduced the number of features to `r length(std_res)`. 

A PCA analysis is then performed on the subset to verify whether anything has
changed and if any similarities among the samples are visible or not.

```{r standards-pca-all, echo = FALSE}
pc <- prcomp(t(log2(assay(std_res, "normalized_filled_imputed"))),
                 center = TRUE, scale. = FALSE)
```

```{r standards-pca-plot, fig.path = IMAGE_PATH, fig.cap = "PCA of the samples based on intensities of known compounds.", fig.width = 7 * phi, fig.height = 7, echo = FALSE}
par(mfrow = c(1, 2))
plot_pca(pc, col = paste0(col_source[as.character(std_res$source)], 90),
         pc_x = 1, pc_y = 2, labels = std_res$differentiation)
plot_pca(pc, col = paste0(col_source[as.character(std_res$source)], 90),
         pc_x = 3, pc_y = 4)
legend("topleft", col = col_source, legend = names(col_source),
       title = "phenotype", pch = 16, ncol = 2)
```

In the PC1 plot, we see a clear separation between the three matrices, which is 
also seen in PC2.In PC3, some of the capillary samples do not cluster with the
rest of the samples, but it needs to be considered that the variance described 
by this dimension is rather low.

## Heatmap 

To visualize the intensity of all detected standards across the three matrices,
we created a heatmap of the known compounds we found in our samples.

```{r heatmap-source-semitargeted, fig.width = 15, fig.height = 13, fig.cap = "Heatmap of known compounds. Note that for better visibility, the color range has been restricted to -5 to 5, thus, differences larger than these values are assigned the extreme colors.", echo = FALSE, fig.path = IMAGE_PATH}
## Create heatmap
tmp <- log2(assay(
    std_res, "normalized_filled_imputed")[rownames(std_res), , drop = FALSE])
tmp <- tmp - rowMeans(tmp, na.rm = TRUE)
rownames(tmp) <- rowData(std_res)$name
colnames(tmp) <- seq_len(ncol(tmp))
ann <- data.frame(matrix = colData(std_res)[, c("source")])
rownames(ann) <- colnames(tmp)
pm <- pheatmap(tmp, annotation_col = ann, labels_col = colnames(tmp),
               breaks = seq(-5, 5, length.out = 101),
               annotation_color = list(matrix = col_source),
               show_colnames = FALSE)
```

```{r, echo = FALSE}
pheatmap(tmp, annotation_col = ann, labels_col = colnames(tmp),
         breaks = seq(-5, 5, length.out = 101),
         annotation_color = list(matrix = col_source),
         show_colnames = FALSE,
         filename = paste0(FIG_PATH, "FIG_3_NEG_heatmap_semitargeted.png"),
         width = 10, height = 7.5)
```

In the heatmap we see two main cluster, one containing all plasma samples and 
the other containing all venous and capillar< blood samples, which indicates
a higher similarity in the intensities of the lab internal standards. The 
second cluster is then split up in two subclusters, containing all samples 
from each matrix.
FT01310, **Cystine**, shows a high intensity in plasma samples, but has a low
intensity in capillary and venous samples. This was also seen in positive
polarity.
Two features, FT00377 which is **Alanine** and FT04651 which is **CDP** show a 
high intensity in plasma an venous samples, but seem to be absent in capillary
blood.
There are 8 metabolites, that have a high intensity in venous and capillary 
samples, but very low intensities in plasma samples. Another set of 7
metabolites have higher abundances in capillary samples and lower abundance in
the other matrices.


## Present and absent features

We then wanted to check if there are compounds that are only present in some of
the three matrices. For a metabolite to be *present* we require that a signal
was detected by the MS instrument for that metabolite in a large fraction of
samples of a certain sample matrix. We thus call a metabolite to be present, if
a chromatographic peak was detected in 2/3 or samples per matrix. Therefore, we
use below the `raw` data of our three sample sources (which only contains
intensity values for detected chromatographic peaks) and looked, which compounds
are present in more than 2/3 of the samples.

```{r Present and absent features}
plas <- assay(std_res, "raw")[, std_res$source == "plasma"]
plas_present <- moreAreValidThan(plas, prop = 1/3)
ven <- assay(std_res, "raw")[, std_res$source == "venous"]
ven_present <- moreAreValidThan(ven, prop = 1/3)
cap <- assay(std_res, "raw")[, std_res$source == "capillary"]
cap_present <- moreAreValidThan(cap, prop = 1/3)
```

We then create a plot an `UpSetR` plot to see the overlap of present features:

```{r present-feature-comparison-upset, fig.path = IMAGE_PATH, fig.cap = "Overlap of present features across the 3 sample matrices.", echo = FALSE, eval = TRUE}
library(UpSetR)
listInput <- list(capillary = which(cap_present, useNames = TRUE),
                  venous = which(ven_present, useNames = TRUE),
                  plasma = which(plas_present, useNames = TRUE))
upset(fromList(listInput), order.by = "freq")
```

```{r, echo = FALSE}
png(paste0(FIG_PATH, "FIG_4_NEG_upset_semitargeted.png"), width = 7, height = 7,
    res = 600, pointsize = 4, units = "cm")
par(mfrow = c(1, 1), mar = c(4.2, 4.5, 1.5, 1), cex.lab = 1.5)
upset(fromList(listInput), keep.order = FALSE,
      sets.bar.color = col_source[c("capillary", "venous", "plasma")],
      order.by = "freq")
dev.off()

```


The majority of known compounds, 48 features, are present in all three sample
sources, whereas 17 features are only present in capillary and venous blood 
samples and 5 features are only present in capillary blood samples. 3 features
are present in plasma and venous blood samples, but absent in capillary blood 
samples and two features are only present in plasma samples, and  lastly, one 
feature is shared between plasma and capillary blood samples but absent in 
venous blood.

We want to identify the features present in only some of the matrices, starting
with features that are only present in capillary blood samples:

```{r present-features_cap, echo = FALSE, results = "asis"}
#identify features present in only one or two matrices
only_cap <- rowData(std_res)[cap_present & !ven_present & !plas_present, ]
only_plas <- rowData(std_res)[plas_present & !ven_present & !cap_present, ]
only_ven <- rowData(std_res)[!plas_present & ven_present & !cap_present, ]
only_capven <- rowData(std_res)[cap_present & ven_present & !plas_present, ]
only_venplas <- rowData(std_res)[ven_present & plas_present & !cap_present, ]
only_capplas <- rowData(std_res)[!ven_present & plas_present & cap_present, ]
## Calculate average concentration per matrix
aname <- "normalized_filled_imputed"
std_res_avg = cbind(
    capillary = rowMeans(assay(std_res, aname)[, std_res$source == "capillary"],
                         na.rm = TRUE),
    venous = rowMeans(assay(std_res, aname)[, std_res$source == "venous"],
                         na.rm = TRUE),
    plasma = rowMeans(assay(std_res, aname)[, std_res$source == "plasma"],
                         na.rm = TRUE)
    )
## Write result table
tab <- data.frame(name = only_cap[, c("name")],
                  std_res_avg[rownames(only_cap), ])
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features only present in capillary blood samples")
```

Three of those features, **Methioninesulfoxide** , **Phosphorylethanolamine** 
and **Carnosine** were also found to be only present in capillary blood samples
in the analysis of positive polarity.

So, we extract the (full) MS1 spectrum at the features' (peaks') apex position
for Methioninesulfoxide:

```{r extraction-MS1-spectrum-Methioninesulfoxide, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Methioninesulfoxide"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Restrict to files for the present analysis
data <- filterFile(data_neg, match(res_neg$mzML_file, data_neg$mzML_file),
                   keepFeatures = TRUE)

library(Spectra)
library(MsCoreUtils)

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]


## Plot labeling peaks with intensity > 2000 with their m/z
label_fun <- function(x) {
    ints <- unlist(intensity(x))
    mzs <- format(unlist(mz(x)), digits = 4)
    mzs[ints < 2000] <- ""
    mzs
}

```

```{r MS1-EIC-Methioninesulfoxide, fig.path = IMAGE_PATH, fig.cap = "EIC of Methioninesulfoxide.", echo = FALSE, eval = TRUE}
plot_eic <- function(x) {
    col_sample <- col_source[x$source]
    col_peak <- col_sample[chromPeaks(x)[, "sample"]]
    plot(x, col = paste0(col_sample, 80),
         peakBg = paste0(col_peak, 10), peakCol = paste0(col_peak, 40))
    legend("topright", col = col_source[c("plasma", "capillary", "venous")],
           legend = c("plasma", "capillary", "venous"), pch = 15)
}
plot_eic(ft_chr)
```

```{r MS1-spectrum-Methioninesulfoxide, fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of Methioninesulfoxide in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample <- function() {
    par(mfrow = c(2, 3), mar = c(3.5, 3.5, 1.5, 0.5))
    plotSpectra(sp_plas, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30)
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "plasma")
    plotSpectra(sp_cap, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30)
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "capillary")
    plotSpectra(sp_ven, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30)
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "venous")
    
    ## Focus on peak
    plotSpectra(sp_plas, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30, xlim = ft_mz + c(-50, 50))
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "plasma")
    plotSpectra(sp_cap, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30, xlim = ft_mz + c(-50, 50))
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "capillary")
    plotSpectra(sp_ven, labels = label_fun, labelPos = 4, labelOffset = 0.2,
                labelSrt = 30, xlim = ft_mz + c(-50, 50))
    grid()
    abline(v = ft_mz, lty = 3, col = "#ff000060", lwd = 2)
    legend("topright", "venous")
}
plot_ms1_single_sample()

```

At the retention time around 180, where we expect Methioninesulfoxide, only
peaks in the mid spectrum, which derived from the capillary blood samples,
can be observed.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-Methioninesulfoxide-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of Methioninesulfoxide in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra <- function(x, target_mz, mzdiff = 3) {
    x$source <- droplevels(
        factor(pData(data)$source[x$fromFile],
               levels = c("plasma", "capillary", "venous")))
    
    ## Aggregate spectra keeping only peaks present in 1/3 of samples
    ft_sps_agg <- combineSpectra(x, f = x$source, p = x$source,
                                 peaks = "intersect", minProp = 1/3, ppm = 10,
                                 intensityFun = median, mzFun = median)
    par(mfrow = c(2, length(ft_sps_agg)), mar = c(3.4, 3.5, 1.5, 0.5))
    for (i in seq_along(ft_sps_agg)) {
        plotSpectra(ft_sps_agg[i], labels = label_fun, labelPos = 4,
                    labelOffset = 0.2, labelSrt = 30)
        grid()
        abline(v = target_mz, lty = 3, col = "#ff000060", lwd = 2)
        legend("topright", legend = ft_sps_agg$source[i])
    }
    ft_sps_agg <- filterMzRange(ft_sps_agg, mz = target_mz + c(-mzdiff, mzdiff))
    for (i in seq_along(ft_sps_agg)) {
        if (lengths(ft_sps_agg[i]) > 1)
            plotSpectra(ft_sps_agg[i], labels = label_fun, labelPos = 4,
                        labelOffset = 0.2, labelSrt = 30)
        else plotSpectra(ft_sps_agg[i])
        grid()
        abline(v = target_mz, lty = 3, col = "#ff000060", lwd = 2)
        legend("topright", legend = ft_sps_agg$source[i])
    }
}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz)
```

Again, only in capillary samples a peak with a signal well over the noise level
was identified.

Then, we extract the (full) MS1 spectrum at the features' (peaks') apex position
for Carnosine:

```{r extraction-MS1-spectrum-Carnosine, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Carnosine"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]
```

```{r MS1-EIC-Carnosine, fig.path = IMAGE_PATH, fig.cap = "EIC of Carnosine.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```

Chromatographic peaks were detected mostly in capillary samples and also in some
venous samples.

```{r MS1-spectrum-Carnosine, fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of Carnosine in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample()
```

For Carnosine, we see a low intensity peak in the left spectrum deriving from
a plasma sample and a high intensity peak in the mid spectrum which was 
obtained from a capillary blood sample. In the venous blood sample, we can't
observe a peak at rt 190, but right next to that a peak with really high 
intensity was detected.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-Carnosine-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of Carnosine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 1)
```

The Carnosine peak is right to a very close peak with a higher intensity.

Then, we extract the (full) MS1 spectrum at the features' (peaks') apex position
for Phosphorylethanolamine:

```{r extraction-MS1-spectrum-Phosphorylethanolamine, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Phosphorylethanolamine"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]

```

```{r MS1-EIC-Phosphorylethanolamine, fig.path = IMAGE_PATH, fig.cap = "EIC of Phosphorylethanolamine.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```


```{r MS1-spectrum-Phosphorylethanolamine, fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of Phosphorylethanolamine in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample()
```

The MS1 spectra for Phosphorylethanolamine in plasma (left) and venous blood 
(right) samples show no detected peak at a rt around 195, but really high
intensity peaks right next to it. In the sectra obtained from the capillary 
blood sample (mid) a peak was detected for Phosphorylethanolamine, and also 
a high intensity peak is observed like in the other two sample matrices.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-Phosphorylethanolamine-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of  Phosphorylethanolamine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 2)
```

A peak is present in all 3 matrices, but the signal in capillary samples is ~ 5
times larger.

Then, we extract the (full) MS1 spectrum at the features' (peaks') apex 
position for Glycero-phosphocholine:

```{r extraction-MS1-spectrum-Glycero-phosphocholine, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Glycero-phosphocholine"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

```

```{r MS1-EIC-Glycero-phosphocholine, fig.path = IMAGE_PATH, fig.cap = "EIC of Glycero-phosphocholine.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```

Only signal in capillary samples was detected.

```{r MS1-spectrum-Glycero-phosphocholine-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of  Glycero-phosphocholine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 2)
```

In addition we extract also the MS1 spectrum closest to the average retention
time of this feature from all files (including plasma and venous samples).

```{r MS1-spectrum-Glycero-phosphocholine-aggregated-2, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of  Glycero-phosphocholine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
sps <- Spectra(fileNames(data))
diff_rt <- abs(rtime(sps) - median(rtime(ft_sps)))
diff_rt <- data.frame(index = seq_along(diff_rt), diff = diff_rt,
                      fromFile = match(sps$dataOrigin, fileNames(data)))
res <- vapply(split(diff_rt, diff_rt$fromFile),
              function(x) {
                  x$index[which.min(x$diff)]
              }, integer(1))
sps_all <- sps[res]
sps_all$source <- data$source
sps_all$fromFile <- seq_along(sps_all)
plot_aggregated_spectra(sps_all, target_mz = ft_mz, mzdiff = 1)

```

It looks like the peak in plasma and venous samples is slightly off.

Lastly, we extract the (full) MS1 spectrum at the features' (peaks') apex 
position for Adenosine :

```{r extraction-MS1-spectrum-Adenosine , echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Adenosine"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]

```

```{r MS1-EIC-Adenosine, fig.path = IMAGE_PATH, fig.cap = "EIC of Adenosine.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```

```{r MS1-spectrum-Adenosine , fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of Adenosine in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample()
```

In all three sample matrices we observe a high intensity peak right next to
the retention time Adenosine is expected at. Adenosine was only found to be 
present in capillary blood samples using the MetaboAnnotation package.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-Adenosine-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of Adenosine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 2)
```


Then we take a look at the features, which is only present in plasma samples:

```{r present-features_plas, echo = FALSE, results = "asis"}
tab <- data.frame(name = only_plas[, c("name")],
                  std_res_avg[rownames(only_plas), ])
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features only present in plasma samples")
```

**Cystine** was also found to be only present in plasma samples in 
positive polarity.

Then, we extract the (full) MS1 spectrum at the features' (peaks') apex position
for Cystine:

```{r extraction-MS1-spectrum-Cystine, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "Cystine"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]

```

```{r MS1-EIC-Cystine, fig.path = IMAGE_PATH, fig.cap = "EIC of Cystine.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```

```{r MS1-spectrum-Cystine, fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of Cystine in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample()
```

A peak for Cystine was only detected in the MS1 spectrum on the left side, 
which derives from a plasma sample. In venous and capillary blood no peak
was measured at a rt around 209. Also, interestingly, the large number of peaks
in the high m/z range seen in positive polarity is not detected here.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-Cystine-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of Cystine in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 2)
```


Then, we extract the (full) MS1 spectrum at the features' (peaks') apex position
for 2-Ketobutyric acid:

```{r extraction-MS1-spectrum-2-Ketobutyric acid, echo = FALSE}
## Define the ID of the feature.
ft <- rownames(std_res)[rowData(std_res)$name == "2-Ketobutyric acid"]
ft_mz <- rowData(std_res)[ft, "mzmed"]

## Extract for each sample the MS1 spectrum with the retention time closest
## to the apex position of the chromatographic peak
ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                         features = ft, method = "closest_rt",
                         skipFilled = FALSE)

## Extract also the EIC
ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)

## Get a spectrum for a plasma sample
sp_plas <- ft_sps[pData(data)$source[ft_sps$fromFile] == "plasma"][1L]

## Get a spectrum for a capillary sample
sp_cap <- ft_sps[pData(data)$source[ft_sps$fromFile] == "capillary"][1L]

sp_ven <- ft_sps[pData(data)$source[ft_sps$fromFile] == "venous"][1L]

```

```{r MS1-EIC-2-Ketobutyric acid, fig.path = IMAGE_PATH, fig.cap = "EIC of 2-Ketobutyric acid.", echo = FALSE, eval = TRUE}
plot_eic(ft_chr)
```

```{r MS1-spectrum-2-Ketobutyric acid, fig.path = IMAGE_PATH, fig.cap = "MS1 spectra of 2-Ketobutyric acid in a plasma sample (left), in a capillary blood sample (mid) and in a venous blood sample (right).", echo = FALSE, eval = TRUE}
plot_ms1_single_sample()
```

2-Ketobutyric acid was detected as only "present" in plasma when using the
MetaboAnnotation package. In plasma and capillary blood samples, peaks with
high intensity are detected next to the rt of 2-Ketobutyric acid.

In addition to plotting a representative spectrum for each sample matrix we
aggregate the MS1 spectra per source matrix and compare those.

```{r MS1-spectrum-2-Ketobutyric acid-aggregated, fig.path = IMAGE_PATH, fig.cap = "Aggregated MS1 spectra of 2-Ketobutyric acid in the 3 sample matrices.", echo = FALSE, eval = TRUE}
plot_aggregated_spectra(ft_sps, target_mz = ft_mz, mzdiff = 2)
```


Next, we want to identify the features that are present in capillary and venous
blood samples, but absent in plasma samples:

```{r present-features_cap_ven, echo = FALSE, results = "asis"}
tab <- data.frame(name = only_capven[, c("name")],
                  std_res_avg[rownames(only_capven), ])
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features only present in capillary and venous blood
             samples")
```

**SAH**, **AMP** and **L-Aspartic Acid** showed the same result in positive
polarity. 

Next, we take a look at the features found in plasma and venous blood
samples, but not in capillary blood:

```{r present-features_ven_plas, echo = FALSE, results = "asis"}
tab <- data.frame(name = only_venplas[, c("name")],
                  std_res_avg[rownames(only_venplas), ])
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features only present in plasma and venous blood
             samples")
```

And lastly, we take a look at the feature found in plasma and capillary blood
samples, but not in venous blood:

```{r present-features_cap_plas, echo = FALSE, results = "asis"}
tab <- data.frame(name = only_capplas[, c("name")],
                  std_res_avg[rownames(only_capplas), ])
pandoc.table(as.data.frame(tab), style = "rmarkdown",
             caption = "Features only present in plasma and capillary blood
             samples")
```

Now, we start with the differential abundance analysis.

## Differential abundance analysis

We next identify metabolites with significant differences in their abundances
between the sample matrices. The differential abundance analysis is performed on
the subset of features that have previously been assigned to the standards. We
apply feature-wise multiple linear regression using the `lmFit` function and we
add the matrix defining the contrasts using `contrast.fit`. Then, we calculate
the p-values with `eBayes`.  Subsequently, we generate a data frame with the
coefficients, the raw and adjusted p-values (we apply a Benjamini-Hochberg
correction for better control of the false discovery rate), the average
intensity of signals in plasma samples, capillary and venous blood samples and
whether or not a feature is to be considered significant.  This data frame is
then added to the `rowData` of the `res_capven` object.

We start the analysis for the capillary and venous blood samples:

```{r standards-analysis cap/ven, echo = FALSE}
source <- factor(std_res$source)
sex <- factor(std_res$sex)
dsgn <- model.matrix(~ 0 + source + sex)
fit <- lmFit(log2(assay(std_res, "normalized_filled_imputed")), design = dsgn)
## Fit the actual contrasts of interest
contr_mat <- makeContrasts(
    CapvsVen = sourcecapillary - sourcevenous,
    CapvsPlas = sourcecapillary - sourceplasma,
    VenvsPlas = sourcevenous - sourceplasma,
    levels = dsgn)
fit <- contrasts.fit(fit, contrasts = contr_mat)
fit <- eBayes(fit)
adjp <- apply(fit$p.value, 2, p.adjust, method = "BH")
tmp <- data.frame(
    coef = fit$coefficient,
    pvalue = fit$p.value,
    adjp = adjp,
    avg = std_res_avg,
    significant = adjp < p.cut & abs(fit$coefficient) > m.cut
)
rowData(std_res) <- cbind(rowData(std_res), tmp)
```

```{r standards-p-value-histogram, echo = FALSE, fig.path = IMAGE_PATH, fig.width = 7 * phi, fig.height = 7, fig.cap = "Distribution of raw (left) and adjusted p-values (right) for all 3 comparisons."}
par(mfrow = c(3, 2))
hist(rowData(std_res)$pvalue.CapvsPlas, breaks = 64, xlab = "p value",
     main = "Capillary vs plasma")
hist(rowData(std_res)$adjp.CapvsPlas, breaks = 64,
     xlab = expression(p[BH]~value),
     main = "Capillary vs plasma")
hist(rowData(std_res)$pvalue.VenvsPlas, breaks = 64, xlab = "p value",
     main = "Venous vs plasma")
hist(rowData(std_res)$adjp.VenvsPlas, breaks = 64,
     xlab = expression(p[BH]~value),
     main = "Venous vs plasma")
hist(rowData(std_res)$pvalue.CapvsVen, breaks = 64, xlab = "p value",
     main = "Capillary vs venous")
hist(rowData(std_res)$adjp.CapvsVen, breaks = 64,
     xlab = expression(p[BH]~value),
     main = "Capillary vs venous")
```

Thus, most of the metabolites were identified to have significantly different
concentrations between the comparison. A table with the number of significant
metabolites is shown below.

```{r table-sig, echo = FALSE, results = "asis"}
tab <- colSums(as.matrix(rowData(std_res)[, grep("significant", colnames(rowData(std_res)))]))
pandoc.table(tab, style = "rmarkdown",
             caption = "Number of significant metabolites.")
```


```{r, echo = FALSE}
## significant.
sel <- rowData(std_res)$significant.CapvsVen |
                      rowData(std_res)$significant.CapvsPlas |
                                     rowData(std_res)$significant.VenvsPlas
tmp <- log2(assay(
    std_res, "normalized_filled_imputed")[sel, , drop = FALSE])
tmp <- tmp - rowMeans(tmp, na.rm = TRUE)
rownames(tmp) <- rowData(std_res)$name[sel]
colnames(tmp) <- seq_len(ncol(tmp))
ann <- data.frame(matrix = colData(std_res)[, c("source")])
rownames(ann) <- colnames(tmp)
pheatmap(tmp, annotation_col = ann, labels_col = colnames(tmp),
         breaks = seq(-5, 5, length.out = 101),
         annotation_color = list(matrix = col_source),
         show_colnames = FALSE,
         filename = paste0(FIG_PATH, "FIG_7_NEG_heatmap_semitargeted_significant.png"),
         width = 9.5, height = 8.5)
```

```{r, echo = FALSE}
## Export results
write_xlsx(as.data.frame(rowData(std_res)),
           path = paste0(RESULT_PATH, "matrices_semi_targeted.xlsx"))
```


### Capillary *vs* venous

The volcano plot below shows the results for the comparison of capillary and
venous samples.

```{r standards-volcano-cap-ven, echo = FALSE, fig.path = IMAGE_PATH, fig.width = 6, fig.height = 6, fig.cap = "Volcano plot for the comparison of capillary against venous samples."}
par(mfrow = c(1, 1))
plot_volc <- function(x, main, cex = 1, pch = 16, ...) {
    plot(rowData(std_res)[, paste0("coef.", x)],
         -log10(rowData(std_res)[, paste0("adjp.", x)]),
         xlab = expression(log[2]~difference),
         ylab = expression(-log[10]~p[BH]), pch = pch, col = "#00000060",
         main = main, cex = cex, ...)
    rect(xleft = -100, ybottom = -log10(p.cut), xright = -m.cut, ytop = 100,
         border = NA, col = paste0(brewer.pal(3, "Set1")[2], 10))
    rect(xleft = m.cut, ybottom = -log10(p.cut), xright = 100, ytop = 100,
         border = NA, col = paste0(brewer.pal(3, "Set1")[2], 10))
    sigs <- rowData(std_res)[, paste0("significant.", x)]
    if (any(sigs))
        points(rowData(std_res)[sigs, paste0("coef.", x)],
               -log10(rowData(std_res)[sigs, paste0("adjp.", x)]),
               col = "#377EB880", cex = cex, pch = pch)
}
plot_volc("CapvsVen", main = "Capillary vs venous")
```

In total `r sum(rowData(std_res)$significant.CapvsVen)` feature were found to 
have a significant difference in abundances between the two groups. The table 
below lists these features (or the 20 features with the smallest p-values if no
feature was found significant).

```{r significant-standards-result-table capillary/venous, echo = FALSE, results = "asis"}
sig_table <- function(x, caption = "") {
    sigs <- rowData(std_res)[, paste0("significant.", x)]
    tab <- rowData(std_res)[sigs, c("name", paste0("coef.", x),
                                    paste0("adjp.", x), "avg.capillary",
                                    "avg.venous", "avg.plasma")]
    tab <- tab[order(tab[, paste0("adjp.", x)]), ]
    pandoc.table(
        as.data.frame(tab), style = "rmarkdown",
        caption = caption
    )
    invisible(tab)
}
tab <- sig_table(
    "CapvsVen", "Metabolites significant between capillary and venous samples.")
```

```{r, echo = FALSE}
png(paste0(FIG_PATH, "FIG_5_NEG_volcano_semitargeted.png"), width = 13.5,
    height = 4.5, res = 600, pointsize = 4, units = "cm")
par(mfrow = c(1, 3), mar = c(4.2, 4.5, 2.5, 1), cex.lab = 2,
    cex.axis = 1.5, cex.main = 2)
xl <- c(-10, 15)
yl <- c(0, 70)
plot_volc("CapvsVen", main = "Capillary vs venous", xlim = xl,
          ylim = yl, cex = 2)
plot_volc("CapvsPlas", main = "Capillary vs plasma", xlim = xl,
          ylim = yl, cex = 2)
plot_volc("VenvsPlas", main = "Venous vs plasma", xlim = xl,
          ylim = yl, cex = 2)
dev.off()
```

We now create the beeswarm plot:

```{r beeswarm capillary/venous, echo = FALSE}
library(beeswarm)
bee_plots <- function(x, tab) {
    std_sig <- std_res[rowData(std_res)[, paste0("significant.", x)], ]
    tmp <- log2(assay(std_sig, "normalized_filled_imputed"))
    rownames(tmp) <- rowData(std_sig)$name
    dr <- paste0(IMAGE_PATH, x, "-beeswarm/")
    dir.create(dr, recursive = TRUE, showWarning = FALSE)
    for (i in seq_len(nrow(tmp))) {
        vals <- split(tmp[i, ], f = std_sig$source)
        names(vals) <- c("Capillary", "Plasma", "Venous")
        col <- col_source[tolower(names(vals))]
        png(file = paste0(dr, rownames(std_sig)[i],
                          "_", rownames(tmp)[i], ".png"),
            width = 3.5, height = 4, units = "cm", res = 600,
            pointsize = 3)
        par(mar = c(4, 5, 3, 1))
        beeswarm(vals, bg = paste0(col, "cc"), col = "#00000040",
                 pch = 21, cex = 1.4,
                 spacing = 0.4,
                 cex.main = 1.5,
                 cex.lab = 1.5,
                 cex.axis = 1.3,
                 main = rownames(tmp)[i],
                 ylab = expression(log[2]~intensity))
        bxplot(vals, probs = 0.5, col = "#00000060", add = TRUE, width = 0.9)
        ## grid(nx = NA, ny = NULL)
        dev.off()
    }
}
bee_plots("CapvsVen", tab)
```

![](images/semi_t_matrices_neg/CapvsVen-beeswarm/FT00287_Glutamine.png)

**Glutamine** shows a significant higher concentration in capillary blood
samples compared to venous blood samples, whereas the intensity in plasma and
venous blood samples looks comparble.

![](images/semi_t_matrices_neg/CapvsVen-beeswarm/FT00188_Asparagine.png)

Also, the intensity of **Asparagine** is higher in capillary samples than in the
other two matrices.

![](images/semi_t_matrices_neg/CapvsVen-beeswarm/FT00554_Citrulline.png)

**Citrulline** shows higher intensity levels in capillary blood samples 
compared to plasma and venous blood samples too.

Of the `r sum(rowData(std_res)$significant)` significant features, only eight
show a higher intensity in venous blood:

L-Cysteine, Alanine, Phosphoenolpyruvic Acid, Fructose, Tyrosine, CDP, 
Galactitol/Sorbitol and Glucose/Mannose.

```{r echo = FALSE}
summary_plots <- function(x, tab) {
    dr <- paste0(IMAGE_PATH, x, "/")
    dir.create(dr, recursive = TRUE, showWarning = FALSE)
    for (i in seq_len(nrow(tab))) {
        ft <- rownames(tab)[i]
        name <- tab$name[i]
        ft_mz <- rowData(std_res)[ft, "mzmed"]
        ft_sps <- featureSpectra(data, msLevel = 1L, return.type = "Spectra",
                                 features = ft, method = "closest_rt",
                                 skipFilled = FALSE)
        ft_chr <- featureChromatograms(data, expandRt = 5, features = ft)
        png(file = paste0(dr, name, "_", ft, "_EIC.png"),
            width = 6 * phi, height = 6, units = "in", res = 300,
            pointsize = 12)
        plot_eic(ft_chr)
        dev.off()
        png(file = paste0(dr, name, "_", ft, "_MS1.png"),
            width = 6 * phi, height = 6, units = "in", res = 300,
            pointsize = 12)
        plot_aggregated_spectra(ft_sps, ft_mz)
        dev.off()
    }
}

summary_plots("CapvsVen", tab)
```


### Capillary *vs* plasma

The volcano plot below shows the results for the comparison of capillary and
plasma samples.

```{r standards-volcano-cap-plas, echo = FALSE, fig.path = IMAGE_PATH, fig.width = 6, fig.height = 6, fig.cap = "Volcano plot for the comparison of capillary against plasma samples."}
par(mfrow = c(1, 1))
plot_volc("CapvsPlas", main = "Capillary vs plasma")
```

In total `r sum(rowData(std_res)$significant.CapvsPlas)` feature were found to 
have a significant difference in abundances between the two groups. The table 
below lists these features (or the 20 features with the smallest p-values if no
feature was found significant).

```{r significant-standards-result-table capillary/plasma, echo = FALSE, results = "asis"}
tab <- sig_table(
    "CapvsPlas", "Metabolites significant between capillary and plasma samples.")
```

We now create the beeswarm plot:

```{r beeswarm capillary/plasma, echo = FALSE}
bee_plots("CapvsPlas", tab)
```

![](images/semi_t_matrices_neg/CapvsPlas-beeswarm/FT09252_Glutathione Oxidized;Glutathione Reduced.png)

**Glutathione** shows a much higher intensity in capillary blood samples 
compared plasma samples. 

![](images/semi_t_matrices_neg/CapvsPlas-beeswarm/FT00188_Asparagine.png)

![](images/semi_t_matrices_neg/CapvsPlas-beeswarm/FT04217_SAH.png)

Both, **Asparagine** and **SAH** also show higher intensities in capillary blood 
samples compared to plasma samples.

Of the `r sum(rowData(std_res)$significant)` significant features, eight showed
a higher intensity in plasma: 

Hydroxyproline, Alanine, Fructose, Tyrosine, Galactitol/Sorbitol, Isocitric
Acid, Cystine and CDP.

```{r, echo = FALSE}
summary_plots("CapvsPlas", tab)
```


### Venous *vs* plasma

The volcano plot below shows the results for the comparison of capillary and
venous samples.

```{r standards-volcano-ven-plas, echo = FALSE, fig.path = IMAGE_PATH, fig.width = 6, fig.height = 6, fig.cap = "Volcano plot for the comparison of venous against plasma samples."}
par(mfrow = c(1, 1))
plot_volc("VenvsPlas", main = "Venous vs plasma")
```

In total `r sum(rowData(std_res)$significant.VenvsPlas)` feature were found to 
have a significant difference in abundances between the two groups. The table 
below lists these features (or the 20 features with the smallest p-values if no
feature was found significant).

```{r significant-standards-result-table venous/plasma, echo = FALSE, results = "asis"}
tab <- sig_table(
    "VenvsPlas", "Metabolites significant between venous and plasma samples.")
```

We now create the beeswarm plot:

```{r beeswarm venous/plasma, echo = FALSE}
bee_plots("VenvsPlas", tab)
```

As already seen previously, **Glutathione** and **SAH** show a higher intensity
in venous blood samples compared to plasma samples.

![](images/semi_t_matrices_neg/VenvsPlas-beeswarm/FT03173_Dihydroxyacetone phosphate.png)

The intensity level of **Dihydroxyacetone phosphate**  is higher in 
capillary and venous blood samples, compared to plasma samples.

Of the `r sum(rowData(std_res)$significant)` features found to have a 
significant difference in intensity between plasma and venous blood samples,
only six have a higher intensity in plasma samples:

Pyruvic Acid, Glyceric Acid, Phenylalanine, Arginine, Tryptophan and Cystine.

Finally, we take a look at the overlap of features that show a significant 
difference between the intensity in capillary and plasma samples and features 
that show a significant difference between the intensity in venous and plasma
samples:

```{r significant-feature-comparison-upset, fig.path = IMAGE_PATH, fig.cap = "Overlap of significant features in matrix comparisons.", echo = FALSE, eval = TRUE}
listInput <- list(
    capillaryVSplasma = which(rowData(std_res)$significant.CapvsPlas),
    venousVSplasma = which(rowData(std_res)$significant.VenvsPlas))
upset(fromList(listInput), order.by = "freq")
```

```{r, echo = FALSE}
png(paste0(FIG_PATH, "FIG_4_NEG_upset_semitargeted-sig.png"), width = 7,
    height = 7, res = 600, pointsize = 4, units = "cm")
par(mfrow = c(1, 1), mar = c(4.2, 4.5, 1.5, 1), cex.lab = 1.5)
upset(fromList(listInput),
      order.by = "freq")
dev.off()

```

```{r, echo = FALSE}
summary_plots("CapvsPlas", tab)
```

# Difference between male and female participants

```{r}
all_present <- plas_present & cap_present & ven_present
```

At last we perform also a differential abundance analysis between male and
female participants for each sample matrix.

```{r diff-abundance-capillary}
## Restrict to capillary samples
res_cap <- std_res[, std_res$source == "capillary"]

## Define the linear model and perform the analysis
sex <- factor(res_cap$sex)
age <- res_cap$age
dsgn <- model.matrix(~ 0 + sex)
fit <- lmFit(log2(assay(res_cap, "normalized_filled_imputed")), design = dsgn)

## Fit the actual contrasts of interest
contr_mat <- makeContrasts(
  MvsF = sexM - sexF,
  levels = dsgn)
fit <- contrasts.fit(fit, contrasts = contr_mat)
fit <- eBayes(fit)
adjp <- apply(fit$p.value, 2, p.adjust, method = "BH")
tmp <- data.frame(
    coef.MvsF = fit$coefficient[, "MvsF"],
    pvalue.MvsF = fit$p.value[, "MvsF"],
    adjp.MvsF = p.adjust(fit$p.value[, "MvsF"], method = "BH")
)
tmp$significant.MvsF <- tmp$adjp.MvsF < p.cut & abs(tmp$coef.MvsF) > m.cut

tmp$avg.M <- rowMeans(
    log2(assay(res_cap, "normalized_filled_imputed")[, res_cap$sex == "M"]))
tmp$avg.F <- rowMeans(
    log2(assay(res_cap, "normalized_filled_imputed")[, res_cap$sex == "F"]))
rowData(res_cap) <- cbind(rowData(res_cap), tmp)
```

The same analysis is performed for venous and plasma samples.

```{r diff-abundance-others, echo = FALSE}
## venous samples
res_ven <- std_res[, std_res$source == "venous"]

## Define the linear model and perform the analysis
sex <- factor(res_ven$sex)
age <- res_ven$age
dsgn <- model.matrix(~ 0 + sex)
fit <- lmFit(log2(assay(res_ven, "normalized_filled_imputed")), design = dsgn)

## Fit the actual contrasts of interest
contr_mat <- makeContrasts(
  MvsF = sexM - sexF,
  levels = dsgn)
fit <- contrasts.fit(fit, contrasts = contr_mat)
fit <- eBayes(fit)
adjp <- apply(fit$p.value, 2, p.adjust, method = "BH")
tmp <- data.frame(
    coef.MvsF = fit$coefficient[, "MvsF"],
    pvalue.MvsF = fit$p.value[, "MvsF"],
    adjp.MvsF = p.adjust(fit$p.value[, "MvsF"], method = "BH")
)
tmp$significant.MvsF <- tmp$adjp.MvsF < p.cut & abs(tmp$coef.MvsF) > m.cut

tmp$avg.M <- rowMeans(
    log2(assay(res_ven, "normalized_filled_imputed")[, res_ven$sex == "M"]))
tmp$avg.F <- rowMeans(
    log2(assay(res_ven, "normalized_filled_imputed")[, res_ven$sex == "F"]))
rowData(res_ven) <- cbind(rowData(res_ven), tmp)

## plasma samples
res_plas <- std_res[, std_res$source == "plasma"]

## Define the linear model and perform the analysis
sex <- factor(res_plas$sex)
age <- res_plas$age
dsgn <- model.matrix(~ 0 + sex)
fit <- lmFit(log2(assay(res_plas, "normalized_filled_imputed")), design = dsgn)

## Fit the actual contrasts of interest
contr_mat <- makeContrasts(
  MvsF = sexM - sexF,
  levels = dsgn)
fit <- contrasts.fit(fit, contrasts = contr_mat)
fit <- eBayes(fit)
adjp <- apply(fit$p.value, 2, p.adjust, method = "BH")
tmp <- data.frame(
    coef.MvsF = fit$coefficient[, "MvsF"],
    pvalue.MvsF = fit$p.value[, "MvsF"],
    adjp.MvsF = p.adjust(fit$p.value[, "MvsF"], method = "BH")
)
tmp$significant.MvsF <- tmp$adjp.MvsF < p.cut & abs(tmp$coef.MvsF) > m.cut

tmp$avg.M <- rowMeans(
    log2(assay(res_plas, "normalized_filled_imputed")[, res_plas$sex == "M"]))
tmp$avg.F <- rowMeans(
    log2(assay(res_plas, "normalized_filled_imputed")[, res_plas$sex == "F"]))
rowData(res_plas) <- cbind(rowData(res_plas), tmp)
```

The results from the analyses are shown in the volcano plots below.

```{r volcano-m-vs-f, echo = FALSE, fig.path = IMAGE_PATH, fig.width = 12, fig.height = 4, fig.cap = "Volcano plot showing the results for the comparison between male and female participants in capillary (left), venous (middle) and plasma (right) samples."}
plot_volcano <- function(x, contrast = y, main = "", xlim, ylim) {
    X <- rowData(x)[, paste0("coef.", contrast)]
    Y <- rowData(x)[, paste0("adjp.", contrast)]
    minp <- min(Y[Y > 0])
    Y[Y == 0] <- minp / 100
    Y <- -log10(Y)
    if (missing(xlim))
        xlim <- range(X)
    if (missing(ylim))
        ylim <- range(Y)
    plot(X, Y,
         xlab = expression(log[2]~difference),
         ylab = expression(-log[10]~p[BH]), pch = 16, col = "#00000060",
         xlim = xlim, ylim = ylim, main = main)
    rect(xleft = -100, ybottom = -log10(p.cut), xright = -m.cut, ytop = 100,
         border = NA, col = paste0(brewer.pal(3, "Set1")[2], 10))
    rect(xleft = m.cut, ybottom = -log10(p.cut), xright = 100, ytop = 100,
         border = NA, col = paste0(brewer.pal(3, "Set1")[2], 10))
        sigs <- rowData(x)[, paste0("significant.", contrast)]
    if (any(sigs))
        points(X[sigs], Y[sigs], col = paste0(brewer.pal(3, "Set1")[2], "ce"))
}

par(mfrow = c(1, 3))
XL <- c(-2, 2.3)
YL <- c(0, 1)
plot_volcano(res_cap, "MvsF", xlim = XL, ylim = YL,
             main = "Capillary")
grid()
plot_volcano(res_ven, "MvsF", xlim = XL, ylim = YL,
             main = "Venous")
grid()
plot_volcano(res_plas, "MvsF", xlim = XL, ylim = YL,
             main = "Plasma")
grid()
```

The tables with the top 10 metabolites for each matrix are listed below.

```{r, echo = FALSE, results = "asis"}
top_cap <- rowData(res_cap)[, c("name", "HMDB", "coef.MvsF",
                                "pvalue.MvsF", "adjp.MvsF")] |>
as.data.frame()
colnames(top_cap) <- sub(".MvsF", "", colnames(top_cap), fixed = TRUE)
top_cap <- top_cap[order(top_cap$pvalue), ][1:10, ]
pandoc.table(top_cap, style = "rmarkdown", split.tables = Inf,
             caption = paste0("Top 10 features for difference male vs female",
                              " in capillary samples"))
```

```{r, echo = FALSE, results = "asis"}
top_ven <- rowData(res_ven)[, c("name", "HMDB", "coef.MvsF",
                                "pvalue.MvsF", "adjp.MvsF")] |>
as.data.frame()
colnames(top_ven) <- sub(".MvsF", "", colnames(top_ven), fixed = TRUE)
top_ven <- top_ven[order(top_ven$pvalue), ][1:10, ]
pandoc.table(top_ven, style = "rmarkdown", split.tables = Inf,
             caption = paste0("Top 10 features for difference male vs female",
                              " in venous samples"))
```

```{r, echo = FALSE, results = "asis"}
top_plas <- rowData(res_plas)[, c("name", "HMDB", "coef.MvsF",
                                  "pvalue.MvsF", "adjp.MvsF")] |>
as.data.frame()
colnames(top_plas) <- sub(".MvsF", "", colnames(top_plas), fixed = TRUE)
top_plas <- top_plas[order(top_plas$pvalue), ][1:10, ]
pandoc.table(top_plas, style = "rmarkdown", split.tables = Inf,
             caption = paste0("Top 10 features for difference male vs female",
                              " in plasma samples"))
```

```{r, echo = FALSE}
## beeswarm plots for selected metabolites.
std_matrices <- c("Pyruvic Acid",
                  "Glyceric Acid",
                  "Phenylalanine",
                  "Arginine",
                  "Tryptophan",
                  "Leucine",
                  "Glucose")

std_sex <- c("Citrulline",
             "Gluconic Acid",
             "L-Aspartic Acid",
             "SAH",
             "SAMe",
             "Cystine",
             "L-Glutamic Acid",
             "Leucine",
             "Methionine",
             "Cystine")

library(beeswarm)
for (std_current in std_matrices) {
    tmp <- log2(assay(std_res, "normalized_filled_imputed"))
    idx <- grep(std_current, rowData(std_res)$name)
    if (length(idx)) {
        vals <- split(tmp[idx, ], f = std_res$source)
        names(vals) <- c("Capillary", "Plasma", "Venous")
        col <- col_source[tolower(names(vals))]
        png(paste0(FIG_PATH, rowData(std_res)$name[idx], "_matrices_NEG.png"),
            width = 3.5, height = 4, units = "cm", res = 600,
            pointsize = 3)
        par(mar = c(4, 5, 3, 1))
        beeswarm(vals, bg = paste0(col, "cc"), col = "#00000040",
                 pch = 21, cex = 1.4,
                 spacing = 0.4,
                 cex.main = 1.5,
                 cex.lab = 1.5,
                 cex.axis = 1.3,
                 main = rowData(std_res)$name[idx],
                 ylab = expression(log[2]~intensity))
        bxplot(vals, probs = 0.5, col = "#00000060", add = TRUE, width = 0.9)
        grid(nx = NA, ny = NULL)
        dev.off()
    }
}

for (std_current in std_sex) {
    tmp <- log2(assay(std_res, "normalized_filled_imputed"))
    idx <- grep(std_current, rowData(std_res)$name)
    f <- factor(paste0(std_res$source, std_res$sex))
    if (length(idx)) {
        vals <- split(tmp[idx, ], f = f)
        names(vals) <- rep(c("Female", "Male"), 3)
        col <- rep(col_source[c("capillary", "plasma", "venous")], each = 2)
        png(paste0(FIG_PATH, rowData(std_res)$name[idx], "_sex_NEG.png"),
            width = 4.5, height = 4, units = "cm", res = 600,
            pointsize = 3)
        par(mar = c(4, 5, 3, 1))
        beeswarm(vals, bg = paste0(col, "cc"), col = "#00000040",
                 pch = 21, cex = 1.4,
                 spacing = 0.4,
                 cex.main = 1.5,
                 cex.lab = 1.5,
                 cex.axis = 1.3,
                 main = rowData(std_res)$name[idx],
                 ylab = expression(log[2]~intensity))
        bxplot(vals, probs = 0.5, col = "#00000060", add = TRUE, width = 0.9)
        grid(nx = NA, ny = NULL)
        dev.off()
    }
}


```

```{r, echo = FALSE}
## Export results
write_xlsx(as.data.frame(rowData(res_cap)),
           path = paste0(RESULT_PATH, "sex_semi_targeted_capillary.xlsx"))
write_xlsx(as.data.frame(rowData(res_plas)),
           path = paste0(RESULT_PATH, "sex_semi_targeted_plasma.xlsx"))
write_xlsx(as.data.frame(rowData(res_ven)),
           path = paste0(RESULT_PATH, "sex_semi_targeted_venous.xlsx"))
```


# Session information

The versions of R and the individually used packges are listed below.

```{r}
sessionInfo()
```