Route: wes

Alignment and variant detection for whole genome/exome/targeted sequencing data.

Segments:

  • Trim adapters and low quality bases (Trimmomatic).
  • Align to the reference genome (BWA-MEM).
  • Remove duplicate reads (Sambamba).
  • Realign and recalibrate (GATK).
  • Determine fragment size distribution.
  • Determine capture efficiency and depth of coverage (GATK).
  • Call point mutations and small insertions/deletions (GATK HaplotypeCaller and LoFreq).

For somatic variant detection, follow with wes-pairs-snv.

Usage

Set up a new analysis (common across all routes). If running for the first time, check the detailed usage instructions for an explanation of every step.

cd <project dir>
git clone --depth 1 https://github.com/igordot/sns
sns/generate-settings <genome>
sns/gather-fastqs <fastq dir>

Add a BED file defining the genomic regions targeted for capture to the project directory. The targeted regions (or primary targets) are the regions your capture kit attempts to cover, usually exons of genes of interest.

Run wes route.

sns/run wes

Check for potential problems.

grep "ERROR:" logs-sbatch/*

Output

Primary results:

  • BAM-GATK-RA-RC: Final BAM files (deduplicated, realigned, and recalibrated). Can be used for visual inspection of variants or additional analysis.
  • VCF-*: VCF files generated by GATK HaplotypeCaller and LoFreq variant callers.
  • VCF-*-annot.all.txt: Table of functionally annotated variants.
  • VCF-*-annot.coding.txt: Table of coding region variants (subset of all variants).
  • VCF-*-annot.nonsyn.txt: Table of non-synonymous, frameshift, and splicing variants (subset of coding variants).

Run metrics:

  • summary-combined.wes.csv: Summary table that includes the number of reads, alignment rate, fraction of PCR duplicates, capture efficiency (enrichment in targeted regions), and depth/evenness of coverage.
  • summary.qc-fragment-sizes.png: Distribution of fragment sizes.
  • summary.VCF-*-annot.csv: Number of mutations per sample for different variant callers.