-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathlab_qc_beforemapping.Rmd
127 lines (77 loc) · 3.66 KB
/
lab_qc_beforemapping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Quality control before mapping"
subtitle: Workshop on RNA-Seq
output:
bookdown::html_document2:
highlight: textmate
toc: true
toc_float:
collapsed: true
smooth_scroll: true
print: false
toc_depth: 4
number_sections: true
df_print: default
code_folding: none
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
include:
after_body: assets/footer-lab.html
---
```{r,child="assets/header-lab.Rmd"}
```
In this tutorial we will go through some of the key steps in performing a quality control on your samples. We will start with the read based quality control, using FastQC. This will analyse your RNA fragments and see if there are any biases
All the data you need for this lab is available in the subfolder `fastq` of the data folder that you have downloaded to this course .
# FastQC on single file
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to get a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are:
* Import of data from BAM, SAM or FASTQ files (any variant)
* Providing a quick overview to tell you in which areas there may be problems
* Summary graphs and tables to quickly assess your data
* Export of results to an HTML-based permanent report
* Offline operation to allow automated generation of reports without running the interactive application
You can read more about the program and have a look at example reports at [the FastQC website](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
<i class="fas fa-lightbulb"></i> This program can be used for any type of NGS data, not only RNA-seq.
## How to run FastQC
```{sh,eval=FALSE,chunk.title=TRUE}
# To see help information on the FastQC package:
fastqc --help
# Run for one FASTQ file:
fastqc -o outdir fastqfile
# Run on multiple FASTQ files:
fastqc -o outdir fastqfile1 fastqfile2 etc.
# You can use wildcards to run on all FASTQ files in a directory:
fastqc -o outdir /the/folder/where/you/have/your/data/*.fastq(.gz)
```
<i class="fas fa-clipboard-list"></i> Take a look at some other file and see if it looks similar in quality.
# MultiQC on FastQC reports
MultiQC is a program that creates summaries over all samples for several different types of QC-measures. You can read more about the program [here](http://multiqc.info/). It will automatically look for output files from the supported tools and make a summary of them. You can either go to the folder where you have the Fastqc output or run it with the path to the folder.
```{sh,eval=FALSE,chunk.title=TRUE}
multiqc /folder/with/FastQC_results/
```
This should create a folder named `multiqc_data` with some general stats, links to all files etc. And one file named `multiqc_report.html`.
<i class="fas fa-clipboard-list"></i> Have a look at the report you created.
***
# Code to run all the steps above on data on course data
```{sh ,eval=FALSE,chunk.title=TRUE}
###############################################################################
### PRE-MAPPING QC ANALYSIS #####
###############################################################################
################
### FASTQC ###
################
cd ~/RNAseq
for i in $(ls rawData/fastqFiles/*.fq.gz); do
fastqc \
-o results/01-preAlignmentQC/ \
$i &
done
wait
################
### MultiQC ###
################
cd ~/RNAseq/results/01-preAlignmentQC/
multiqc .
```