Alignment and quantification of RNA-seq data.
- Trim adapters and low quality bases (Trimmomatic).
- Align to the reference genome (STAR).
- Align to other species and common contaminants (fastq_screen).
- Generate normalized genome browser tracks.
- Determine the distribution of the bases within the transcripts and 5’/3’ biases (Picard).
- Determine if the library is stranded and the strand orientation.
- Generate genes-samples counts matrix (featureCounts).
For differential expression analysis, follow with rna-star-groups-dge.
Set up a new analysis (common across all routes). If running for the first time, check the detailed usage instructions for an explanation of every step.
cd <project dir> git clone --depth 1 https://github.com/igordot/sns sns/generate-settings <genome> sns/gather-fastqs <fastq dir>
Check for potential problems.
grep "ERROR:" logs-sbatch/*
BAM-STAR: BAM files. Can be used for visual inspection of individual reads or additional analysis.
BIGWIG: BigWig files normalized to the total number of reads. Can be used for visual inspection of relative expression levels.
quant.featurecounts.counts.txt: Matrix of raw counts for all genes and samples.
summary-combined.rna-star.csv: Summary table that includes the number of reads, unique and multi-mapping alignment rate, number of counts assigned to genes, fraction of coding/UTR/intronic/intergenic bases.
summary.fastqscreen.png: Alignment rates for common species and contaminants.
summary.qc-picard-rnaseqmetrics.png: Distribution of the bases within the transcripts to determine potential 5’/3’ biases.
Additional output (can usually be deleted or used for troubleshooting):
genes.featurecounts.txt: Table of genes based on the reference GTF.
quant-*: Raw counts for all genes for individual samples.