Input:

Output:

Goal:

Useful Background

Pipeline Outline

  1. Sex check
  2. SNP call rate: plot histogram & filter
  3. Person call rate: plot histogram & filter
  4. Calculate Hardy-Weinberg statistics & filter SNPs
  5. LD prune for relationship check & heterozygosity calculation
  6. Relationship check: plot IBD stats & filter relateds (See table here for IBD info)
  7. Heterozygosity check: plot & filter outliers
  8. Principal component analysis (PCA) to determine genetic ancestry
    • check genome build (NCBI36/hg18 or GRCh37/hg19 or newer?)
    • automate the merge with HapMap3 genotypes (/home/wheelerlab2/Data/HAPMAP3_hg1*/)
    • run smartpca to get principal components (see documentation in /home/wheelerlab2/EIG-6.1.4/EIGENSTRAT/README)
    • plot and choose threshold for filtering people (probably can’t automate)
    • rerun smartpca with filtered set (no HapMap3)
  9. Plate effects analysis (if data is available)
  10. Prepare for imputation



Return to Course Schedule