McGill Statistics Seminars, 21 octobre 2011

le 17 octobre 2011 à 15:19
Jean-Francois Plante

Le vendredi 21 octobre 2011, 15:30, salle 1205 du Pavillon Burnside.

Bayesian modelling of GWAS data using linear mixed models
William Astle, McGill University

Genome-wide association studies (GWAS) are used to identify physical positions (loci) on the genome where genetic variation is causally associated with a phenotype of interest at the population level. Typical studies are based on the measurement of several hundred thousand single nucleotide polymorphism (SNP) variants spread across the genome, in a few thousand individuals. The resulting datasets are large and require computationally efficient methods of statistical analysis.

Two variance components linear mixed models have recently been proposed as a method of analysis for GWAS data that can control for the confounding effects of population stratification, by modelling the correlation between study subjects induced by relatedness. Unfortunately, standard methods for fitting linear mixed models are computationally intensive because computation of the likelihood depends on the inversion of a large matrix which is a function of the model parameters. I will describe a fast method for calculating the likelihood of a two variance components linear model which allows analysis of a large GWAS dataset using mixed models by Bayesian inference. A Bayesian analysis of GWAS provides a natural way of overcoming the so-called « multiple-testing » problem which arises from the large dimension of the predictor variable space. In the Bayesian framework we should have low prior belief that any particular genetic variant explains a large proportion of the phenotypic variation. The normal-exponential-gamma prior as been proposed as a good representation of such belief and I will describe an efficient MCMC algorithm which allows to incorporate this prior into the modelling.