logo MissData 2015

AGROCAMPUS OUEST
Rennes, France
June 18-19, 2015
backgound

Conference

2 days of conferences 18-19 June. Posters will be presented during the 2 days of the conference with a special animated poster session the 18 June evening during the reception. Pre-conference tutorials 17 June.

Download booklet with complete program with all the abstracts

June 18, 2015|June 19, 2015|Tutorials|

14H00

Handling missing data in R with MICE

Gerko Vink and Stefan van Buuren

Multiple imputation (Rubin 1987, 1996) is a recommended method for complex incomplete data problems.  Two general approaches for imputing multivariate data have emerged: joint modeling (JM) and fully conditional specification (FCS) (van Buuren 2007). Multivariate Imputation by Chained Equations (MICE) is the name of software for imputing incomplete multivariate data by FCS which will be presented in this tutorial.

Topics will include: concise theory on multiple imputation - a description of how the algorithm in MICE works - specification of the imputation model - sensitivity analysis under MNAR - interacting with other software

Prerequisites: elementary knowledge of general statistical concepts and (linear) statistical models is assumed. Moreover, basic programming in R is useful.

16h00-16h30

Coffee break

16h30

Model-based clustering/imputation with missing/binned/mixed data using the new software MixComp

Christophe Biernacki - slides

The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence.

Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985).

MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing/binned/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models.

Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well.

Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteria

Prerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.