Alignment and variant detection for whole genome/exome/targeted sequencing data.
- Trim adapters and low quality bases (Trimmomatic).
- Align to the reference genome (BWA-MEM).
- Remove duplicate reads (Sambamba).
- Realign and recalibrate (GATK).
- Determine fragment size distribution.
- Determine capture efficiency and depth of coverage (GATK).
- Call point mutations and small insertions/deletions (GATK HaplotypeCaller and LoFreq).
For somatic variant detection, follow with wes-pairs-snv.
Set up a new analysis (common across all routes). If running for the first time, check the detailed usage instructions for an explanation of every step.
cd <project dir> git clone --depth 1 https://github.com/igordot/sns sns/generate-settings <genome> sns/gather-fastqs <fastq dir>
Add a BED file defining the genomic regions targeted for capture to the project directory. The targeted regions (or primary targets) are the regions your capture kit attempts to cover, usually exons of genes of interest.
Check for potential problems.
grep "ERROR:" logs-sbatch/*
BAM-GATK-RA-RC: Final BAM files (deduplicated, realigned, and recalibrated). Can be used for visual inspection of variants or additional analysis.
VCF-*: VCF files generated by GATK HaplotypeCaller and LoFreq variant callers.
VCF-*-annot.all.txt: Table of functionally annotated variants.
VCF-*-annot.coding.txt: Table of coding region variants (subset of all variants).
VCF-*-annot.nonsyn.txt: Table of non-synonymous, frameshift, and splicing variants (subset of coding variants).
summary-combined.wes.csv: Summary table that includes the number of reads, alignment rate, fraction of PCR duplicates, capture efficiency (enrichment in targeted regions), and depth/evenness of coverage.
summary.qc-fragment-sizes.png: Distribution of fragment sizes.
summary.VCF-*-annot.csv: Number of mutations per sample for different variant callers.