Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

GWAS tests SNP markers to identify causal variants of a trait.
Establishing a connection between the surrogate model and the true causal model.
Population structure is accounted for in GWAS by modelling the variant of interest and not the trait.
Environmental confounding can be partially corrected using genetic covariates.

GWAS identifies regions in the genome responsible for variation in a trait
SNPs are tested for association with the trait
SNPs are dense and widespread across the genome
GWAS uses a marker-additive model (MAM) to estimate parameters
MAM parameters have no direct causal interpretation
This work considers a causal-additive model (CAM) with direct causal interpretation

Population membership is described by C i and M i which take values 0, 1, 2
Marker-additive model (MAM) includes marker effect size β and noise variable δ
Marginal testing is used, where M i1 is the marker being tested
Leave one chromosome out (LOCO) approach removes markers close to the variant being tested
Causal-additive model (CAM) includes causal effect size α and noise variable ǫ
Pritchard-Stephens-Donnelly (PSD) model describes population structure incorporating admixture
Random mating is assumed, with haplotype frequencies and linkage disequilibrium (LD) parameters
Linear projection of X respect to Y is used
Genotype at different haplotypes are conditionally independent
Goal is to characterize the estimand of the regression under the CAM
Estimand β 1 (S i ) is a weighted average over β 1 (S i )

GWAS is interested in the contribution of linkage with a physically proximal causal variant
It is partially achievable to separate path (1) from path (2) under population-based design
It is achievable under within-sibship design
Population structure affects β 1,nocov in two ways: attenuates true signal and puts undesirable signals into estimand
Weights of linkage term in Theorem 3.2 sum up to a value smaller than 1
Prediction error becomes negligible with large number of markers as covariates

Within-sibship GWAS observes family membership of each individual
Regression (3.1) is equivalent to the famous sibling difference regression when there are only two siblings per family
Estimand of regression (3.1) is equal to β 1,s, the population estimand when S is known
Non-genetic confounding is partially resolved with genetic markers in linear regression

Aimed to solve the identification problem of GWAS of quantitative traits using linear regression in a structured population
Established the connection between the CAM and the MAM which provides a closedform formula for what is being estimated using linear regression
Population structure exhibits a two-fold effect in which it induces an additive confounding term together with an attenuation of the true effect of a causal variant
Within-sibship design can overcome this problem due to direct access to family membership
Bias is corrected by modelling the distribution of the variant and not the trait
Bias is never completely removed because the expectation of the variant being tested is never truly linear respect to the covariate markers
Genetic covariates can further correct environmental confounding
Framework to be extended to incorporate other important evolutionary processes such as assortative mating and inbreeding
Haplotype dependence induced by such evolutionary processes is likely to have an non-trivial impact on GWAS estimands
Shortcoming of the work is that it only deals with the identification and tells little about the estimation process