Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • GWAS tests SNP markers to identify causal variants of a trait.
  • Establishing a connection between the surrogate model and the true causal model.
  • Population structure is accounted for in GWAS by modelling the variant of interest and not the trait.
  • Environmental confounding can be partially corrected using genetic covariates.

Paper Content

Introduction

  • GWAS identifies regions in the genome responsible for variation in a trait
  • SNPs are tested for association with the trait
  • SNPs are dense and widespread across the genome
  • GWAS uses a marker-additive model (MAM) to estimate parameters
  • MAM parameters have no direct causal interpretation
  • This work considers a causal-additive model (CAM) with direct causal interpretation

Setup

  • Population membership is described by C i and M i which take values 0, 1, 2
  • Marker-additive model (MAM) includes marker effect size β and noise variable δ
  • Marginal testing is used, where M i1 is the marker being tested
  • Leave one chromosome out (LOCO) approach removes markers close to the variant being tested
  • Causal-additive model (CAM) includes causal effect size α and noise variable ǫ
  • Pritchard-Stephens-Donnelly (PSD) model describes population structure incorporating admixture
  • Random mating is assumed, with haplotype frequencies and linkage disequilibrium (LD) parameters
  • Linear projection of X respect to Y is used
  • Genotype at different haplotypes are conditionally independent
  • Goal is to characterize the estimand of the regression under the CAM
  • Estimand β 1 (S i ) is a weighted average over β 1 (S i )

Main results

  • GWAS is interested in the contribution of linkage with a physically proximal causal variant
  • It is partially achievable to separate path (1) from path (2) under population-based design
  • It is achievable under within-sibship design
  • Population structure affects β 1,nocov in two ways: attenuates true signal and puts undesirable signals into estimand
  • Weights of linkage term in Theorem 3.2 sum up to a value smaller than 1
  • Prediction error becomes negligible with large number of markers as covariates

Within-sibship gwas.

  • Within-sibship GWAS observes family membership of each individual
  • Regression (3.1) is equivalent to the famous sibling difference regression when there are only two siblings per family
  • Estimand of regression (3.1) is equal to β 1,s, the population estimand when S is known
  • Non-genetic confounding is partially resolved with genetic markers in linear regression

Conclusions

  • Aimed to solve the identification problem of GWAS of quantitative traits using linear regression in a structured population
  • Established the connection between the CAM and the MAM which provides a closedform formula for what is being estimated using linear regression
  • Population structure exhibits a two-fold effect in which it induces an additive confounding term together with an attenuation of the true effect of a causal variant
  • Within-sibship design can overcome this problem due to direct access to family membership
  • Bias is corrected by modelling the distribution of the variant and not the trait
  • Bias is never completely removed because the expectation of the variant being tested is never truly linear respect to the covariate markers
  • Genetic covariates can further correct environmental confounding
  • Framework to be extended to incorporate other important evolutionary processes such as assortative mating and inbreeding
  • Haplotype dependence induced by such evolutionary processes is likely to have an non-trivial impact on GWAS estimands
  • Shortcoming of the work is that it only deals with the identification and tells little about the estimation process