Route: rna-star

Alignment and quantification of RNA-seq data using STAR.

Segments:

Trim adapters and low quality bases (Trimmomatic).
Align to the reference genome (STAR).
Align to other species and common contaminants (fastq_screen).
Generate normalized genome browser tracks.
Determine the distribution of the bases within the transcripts and 5’/3’ biases (Picard).
Determine if the library is stranded and the strand orientation.
Generate genes-samples counts matrix (featureCounts).

For differential expression analysis, follow with rna-star-groups-dge.

Usage

Set up a new analysis (common across all routes). If running for the first time, check the detailed usage instructions for an explanation of every step.

cd <project dir>
git clone --depth 1 https://github.com/igordot/sns
sns/generate-settings <genome>
sns/gather-fastqs <fastq dir>

Run rna-star route.

sns/run rna-star

Check for potential problems.

grep "ERROR:" logs-sbatch/*

Output

Primary results:

BAM-STAR: BAM files. Can be used for visual inspection of individual reads or additional analysis.
BIGWIG: BigWig files normalized to the total number of reads. Can be used for visual inspection of relative expression levels.
quant.featurecounts.counts.txt: Matrix of raw counts for all genes and samples.

Run metrics:

summary-combined.rna-star.csv: Summary table that includes the number of reads, unique and multi-mapping alignment rate, number of counts assigned to genes, fraction of coding/UTR/intronic/intergenic bases.
summary.fastqscreen.png: Alignment rates for common species and contaminants.
summary.qc-picard-rnaseqmetrics.png: Distribution of the bases within the transcripts to determine potential 5’/3’ biases.

Additional output (can usually be deleted or used for troubleshooting):

genes.featurecounts.txt: Table of genes based on the reference GTF.
quant-*: Raw counts for all genes for individual samples.