Seq-N-Slide RNA-seq demo

About

This RNA-seq demo is based on data from tumor associated macrophages (TAMs) isolated from wildtype (WT) or Myeloid-specific Tet2 knockout (KO) mice (GSE98964). The raw FASTQs have already been downloaded on UltraViolet/BigPurple. Only a subset of reads (up to 10M) was kept to speed up the analysis.

Prepare the pipeline

Create the project directory (choose any name):

mkdir proj_dir

Navigate to the created project directory:

cd proj_dir

Load git module (git is not available by default on UltraViolet/BigPurple):

module add git

Download the pipeline code:

git clone --depth 1 https://github.com/igordot/sns

This will create an sns sub-directory in the current directory.

Process the individual RNA-seq samples

Specify the reference genome (in this case, mm10 for mouse):

sns/generate-settings mm10

This will create settings.txt, which contains the information about the different reference files.

Specify the location of the raw FASTQ files:

sns/gather-fastqs /gpfs/data/igorlab/tutorials/FASTQ-RNA

This will create samples.fastq-raw.csv, which contains the sample names and the corresponding files. If a single sample (same sample name) has multiple FASTQs, each FASTQ (or FASTQ pair) will be listed on a separate line. The FASTQs will be automatically merged. Sample names are automatically detected based on the file names, but they can be edited in this file to be more readable. All downstream file names will contain the sample names specified in this file.

The raw sequencing data from GTC (the sequencing core) is usually deposited in /gpfs/data/sequence/results/[lab]/[date].

Execute the pipeline (rna-star route for standard RNA-seq analysis):

sns/run rna-star

Check progress

Check if the jobs are submitted and running:

squeue -u $USER

Check for errors:

grep "ERROR:" logs-sbatch/*

The command will search all the log files for any errors. This can be done while the pipeline is still running and should be done after the pipeline completes. There should be no output if everything ran without problems. If any errors are detected, open the log file where they are found to see the full context.

Perform differential expression analysis

After the pipeline is finished, edit samples.groups.csv and specify groups. The differential expression step will compare all groups against each other.

Run the differential expression step (rna-star-groups-dge route):

sns/run rna-star-groups-dge