McGill Statistics Seminars, 4 novembre 2011

Le vendredi 4 novembre 2011, 15:30, salle 1205 du Pavillon Burnside.

A Bayesian method of parametric inference for diffusion processes
Martin Lysy, Harvard University

Diffusion processes have been used to model a multitude of continuous-time phenomena in Engineering and the Natural Sciences, and as in this case, the volatility of financial assets. However, parametric inference has long been complicated by an intractable likelihood function. For many models the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow. On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.

We consider here a novel method of dividing the inference process into separate data batches, each small enough to benefit from joint proposals, to be processed consecutively. A filter combines batch contributions to produce likelihood inference based on the whole dataset. Although the result is not always unbiased, it has very low variability, often achieving considerable accuracy in a short amount of time. We present an example using Heston’s popular model for option pricing, but much of the methodology can be extended beyond diffusions to Hidden Markov and other State-Space models.

McGill Statistics Seminars, 3 novembre 2011

Le jeudi 3 novembre 2011, 16:00, salle 1205 du Pavillon Burnside.

Maximum likelihood estimation in network models
Alessandro Rinaldo, Carnegie Mellon University

This talk is concerned with maximum likelihood estimation (MLE) in exponential statistical models for networks (random graphs) and, in particular, with the beta model, a simple model for undirected graphs in which the degree sequence is the minimal sufficient statistic. The speaker will present necessary and sufficient conditions for the existence of the MLE of the beta model parameters that are based on a geometric object known as the polytope of degree sequences. Using this result, it is possible to characterize in a combinatorial fashion sample points leading to a non-existent MLE and non-estimability of the probability parameters under a non-existent MLE. The speaker will further indicate some conditions guaranteeing that the MLE exists with probability tending to 1 as the number of nodes increases. Much of this analysis applies also to other well-known models for networks, such as the Rasch model, the Bradley-Terry model and the more general p1 model of Holland and Leinhardt. These results are in fact instantiations of rather general geometric properties of exponential families with polyhedral support that will be illustrated with a simple exponential random graph model.

5 à 7 du Comité pour l’avancement de la statistique à l’Université Laval, 10 novembre 2011

Le CASUL, le Comité pour l’avancement de la statistique de l’Université Laval, vous convie à son 5 à 7 annuel qui se tiendra le jeudi 10 novembre dans la salle 1240 du pavillon Vachon de l’Université Laval.

Cette année, David Emond, étudiant à la maîtrise en statistique à l’Université Laval, nous entretiendra de son projet de maîtrise qui a des applications dans le monde du hockey professionnel. La présentation sera suivie par un souper-pizza au local 1039b du pavillon Vachon durant lequel vous pourrez vous entretenir avec le présentateur et fraterniser avec vos collègues tout en vous mettant sous la dent quelques grignotines. Le coût de l’activité est de 5 $ si vous réservez votre place avant le mercredi 10 novembre et de 7 $ à l’entrée, le soir de l’évènement. Ceci inclut le souper et les accompagnements à volonté!

Pour plus de détails, veuillez consulter le site web du CASUL.

Résumé:
Que ce soit pour comparer les performances de nos équipes ou de nos joueurs préférés, les statistiques sont à maintes reprises à notre service, nous qui sommes les amateurs des Canadiens, ou d’une autre équipe de la LNH. Avec le potentiel retour de nos Nordiques et le retour des Jets, c’est maintenant au tour de LA statistique, la science, d’apporter sa contribution à notre sport national. Dans cet exposé, des méthodes de classification seront utilisées pour vérifier si les divisions actuelles de la LNH sont optimales d’un point de vue des coordonnées géographiques des villes hébergeant les équipes. Il sera aussi possible de voir comment il faudrait réorganiser les divisions en intégrant le retour des villes de Winnipeg et de Québec dans la Grande Ligue.

Séminaire de statistique, Université Laval, 27 octobre 2011

Le jeudi 27 octobre 2011, à 13 h 30, en la salle 2512 du pavillon Adrien-Pouliot

Analyse d’un modèle à deux niveaux par vraisemblance composite avec des données provenant d’un plan de sondage informatif
François Verret, Statistique Canada

Les modèles multi niveaux sont souvent utilisés afin d’analyser des données d’enquête lorsque la structure hiérarchique du plan de sondage correspond à celle du modèle. Rao, Verret et Hidiroglou (2010) ont proposé une approche basée sur la vraisemblance composite pour les modèles à deux niveaux menant à des estimateurs convergents selon le modèle et le plan de sondage aux paramètres du modèle, à condition que le nombre de grappes échantillonnées soit suffisamment grand. Dans cette présentation, on résumera une partie de leurs travaux. On présentera l’estimation par vraisemblance composite (Lele et Taper, 2002) et on verra des façons d’adapter cette méthode d’estimation et des méthodes classiques à l’analyse de données d’enquête. Finalement, pour comparer ces estimateurs on montrera les résultats d’une étude par simulations.

Travail effectué en collaboration avec Jon N.K. Rao (Université Carleton) et Michel A. Hidiroglou (Statistique Canada)

Références
Lele, S. & Taper, M.L. (2002). A composite likelihood approach to (co)variance components estimation. Journal of Statistical Planning and Inference, 109, 117-135.
Rao, J.N.K., Verret, F. & Hidiroglou, M.A. (2010). A weighted estimating equations approach to inference for two-level models from survey data. Recueils de la section des méthodes d’enquête de l’assemblée annuelle de la SSC 2010.

McGill Statistics Seminars, 28 octobre 2011

Le vendredi 28 octobre 2011, 15:00, salle 1205 du Pavillon Burnside.

Simulated method of moments estimation for copula-based multivariate models
Andrew J. Patton, Duke University

This paper considers the estimation of the parameters of a copula via a simulated method of moments type approach. This approach is attractive when the likelihood of the copula model is not known in closed form, or when the researcher has a set of dependence measures or other functionals of the copula, such as pricing errors, that are of particular interest. The proposed approach naturally also nests method of moments and generalized method of moments estimators. Combining existing results on simulation based estimation with recent results from empirical copula process theory, we show the consistency and asymptotic normality of the proposed estimator, and obtain a simple test of over-identifying restrictions as a goodness-of-fit test. The results apply to both iid and time series data. We analyze the finite-sample behavior of these estimators in an extensive simulation study. We apply the model to a group of seven financial stock returns and find evidence of statistically significant tail dependence, and that the dependence between these assets is stronger in crashes than booms.

Séminaire de finance, HEC Montréal, 27 octobre 2011

Jeudi le 27 octobre 2011, 15h, Salle Sony, HEC Montréal, Pavillon principal, 3000 chemin de la Côte-Sainte-Catherine, Montréal.

Modelling Dependence in High Dimensions with Factor Copulas
Andrew J. Patton, Duke University – Department of Economics

Séminaire de statistique, Université Laval, 20 octobre 2011

Le jeudi 20 octobre 2011, à 13 h 30, en la salle 2512 du pavillon Adrien-Pouliot.

L’inventaire forestier au Québec, estimer les volumes de bois au niveau des peuplements forestiers à l’aide de l’approche k-NN
Bastien Ferland-Raymond, Ministère des ressources naturelles et de la faune du Québec

L’inventaire écoforestier est réalisé périodiquement dans la portion sud du Québec pour acquérir et diffuser les connaissances sur les différents écosystèmes forestiers. Le processus d’inventaire comprend différentes étapes. Premièrement, le territoire est photographié et cartographié de manière à délimiter et décrire les différents peuplements écoforestiers. Ensuite, des placettes sont implantées sur le terrain afin de mesurer les différents attributs de la forêt. Finalement, les données de la carte et de l’inventaire sont compilées pour générer une estimation des volumes de bois par essence pour l’ensemble des peuplements écoforestiers. Au cours de la dernière décennie, cette compilation était produite à l’aide d’un logiciel maison (SCIF) basé sur l’échantillonnage aléatoire stratifié. Ce procédé de compilation était complexe, subjectif et requérait un nombre important de placettes terrain pour atteindre une précision intéressante. Une nouvelle technique de compilation basée sur k-NN (k-nearest neighbor) a donc été développée pour pallier à ces difficultés. Cette méthode statistique consiste à sélectionner des placettes-échantillons pour chaque peuplement de la carte écoforestière en se basant sur une analyse de similarité entre les peuplements sondés versus les autres peuplements. La similarité est quantifiée à partir de variables explicatives disponibles pour tous les peuplements de la carte écoforestière. Les placettes sélectionnées sont ensuite utilisées pour estimer les variables d’intérêt (p. ex. : volumes par essence) à l’échelle des peuplements écoforestiers. Les résultats obtenus avec l’approche k-NN se comparent avantageusement à l’ancienne approche de compilation SCIF.

McGill Statistics Seminars, 21 octobre 2011

Le vendredi 21 octobre 2011, 15:30, salle 1205 du Pavillon Burnside.

Bayesian modelling of GWAS data using linear mixed models
William Astle, McGill University

Genome-wide association studies (GWAS) are used to identify physical positions (loci) on the genome where genetic variation is causally associated with a phenotype of interest at the population level. Typical studies are based on the measurement of several hundred thousand single nucleotide polymorphism (SNP) variants spread across the genome, in a few thousand individuals. The resulting datasets are large and require computationally efficient methods of statistical analysis.

Two variance components linear mixed models have recently been proposed as a method of analysis for GWAS data that can control for the confounding effects of population stratification, by modelling the correlation between study subjects induced by relatedness. Unfortunately, standard methods for fitting linear mixed models are computationally intensive because computation of the likelihood depends on the inversion of a large matrix which is a function of the model parameters. I will describe a fast method for calculating the likelihood of a two variance components linear model which allows analysis of a large GWAS dataset using mixed models by Bayesian inference. A Bayesian analysis of GWAS provides a natural way of overcoming the so-called « multiple-testing » problem which arises from the large dimension of the predictor variable space. In the Bayesian framework we should have low prior belief that any particular genetic variant explains a large proportion of the phenotypic variation. The normal-exponential-gamma prior as been proposed as a good representation of such belief and I will describe an efficient MCMC algorithm which allows to incorporate this prior into the modelling.

Séminaire d’actuariat, 20 octobre 2011, Université Laval

Jeudi 20 octobre 2011, 15h30 – 16h30, Salle COP-1168 (auditorium du Pavillon photonique), Université Laval.

Multivariate Integer Autoregressive Models
Dimitris Karlis, Athens University of Economics and Business

Non-negative integer-valued time series are often encountered in many different scientific fields, usually in the form of counts of events at consecutive time points. Representative examples can be found in epidemiology, ecology, finance and elsewhere. Due to their frequent occurrence, a wide variety of models appropriate for treating count time series data have been proposed in the literature. The vast majority of such models consider the univariate case since the analysis of multivariate counting processes presents much more difficulties. In specific, the need to account for both serial and cross-correlation complicates model specification, estimation and inference. Many of the models that have been built for count time series data are based on the thinning operator of Steutel and van Harn (1979). The model in its simplest form. i.e. the first order integer valued autoregressive model (INAR(1)), was introduced by McKenzie (1985) and Al-Osh and Alzaid (1987).

In this talk, extensions to the multi-dimensional space will be discussed. We thus define a multivariate integer valued autoregressive process of order 1 (MINAR(1)) and examine its basic statistical properties. To help the exposition special care will be given to the bivariate case. The multivariate case has certain challenges especially as far as estimation is concerned. Such estimation problems do not arise in the bivariate case where estimation can be achieved using either the maximum likelihood approach or the method of Yule-Walker. Extensions to incorporate covariate information are also discussed while emphasis is placed on models with multivariate Poisson and multivariate negative binomial innovations. Real data problems are used to illustrate the model.

Colloque CRM-ISM-GERAD, 14 octobre 2011, McGill

Le vendredi 14 octobre 2011, Université McGill, Trottier Building, 3630 rue University, salle TROTTIER 1080

14 h – 15 h:
Modeling non-stationary extremes: The case of heat waves
Debbie Dupuis, HEC Montréal

Environmental processes are often non-stationary since climate patterns cause systematic seasonal effects and long-term climate changes cause trends. The usual limit models are not applicable for non-stationary processes, but models from standard extreme value theory can be used along with statistical modeling to provide useful inference. Traditional approaches include letting model parameters be a function of covariates or using time-varying thresholds. These approaches are inadequate for the study of heat waves however and we show how a recent pre-processing approach by Eastoe and Tawn~(2009) can be used in conjunction with an innovative change-point analysis to model daily maximum temperature. The model is then fitted to data from four U.S. cities and used to estimate the recurrence probabilities of runs over seasonally high temperatures. We show that the probability of long and intense heat waves has increased considerably over 50 years.

15 h 30 – 16 h 30:
Estimating Extremal Dependence in Time Series via the Extremogram
Richard A. Davis (Columbia University)

The extremogram is a flexible quantitative tool that measures various types of extremal dependence in a stationary time series. In many respects, the extremogram can be viewed as an extreme-value analogue of the autocorrelation function (ACF) for a time series. Under mixing conditions, the asymptotic normality of the empirical extremogram was derived in Davis and Mikosch (2009). Unfortunately, the limiting variance is a difficult quantity to estimate. Instead we employ the stationary bootstrap to the empirical extremogram and establish that this resampling procedure provides an asymptotically correct approximation to the central limit theorem. This in turn can be used for constructing credible confidence bounds for the sample extremogram. The use of the stationary bootstrap for the extremogram is illustrated in a variety of real and simulated data sets. The cross-extremogram measures cross-sectional extremal dependence in multivariate time series. A measure of this dependence, especially left tail dependence, is of great importance in the calculation of portfolio risk. We find that after devolatilizing the marginal series, extremal dependence still remains, which suggests that the extremal dependence is not due solely to the heteroskedasticity in the stock returns process. However, for the univariate series, the filtering removes all extremal dependence. Following Geman and Chang (2010), a return time extremogram which measures the waiting time between rare or extreme events in univariate and bivariate stationary time series is calculated.. The return time extremogram suggests the existence of extremal clustering in the return times of extreme events for financial assets. The stationary bootstrap can again provide an asymptotically correct approximation to the central limit theorem and can be used for constructing credible confidence bounds for this return time extremogram. (This is joint work with Thomas Mikosch and Ivor Cribben.)




2136 chemin Sainte-Foy, Suite 200
Québec (Québec)
G1V 1R8

Adresse électronique :  assq@association-assq.qc.ca