Joint Parsimonious Modeling and Model Order Selection for Multivariate Gaussian Mixtures

Open Access
- Author:
- Markley, Scott Conrad
- Graduate Program:
- Electrical Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- April 10, 2009
- Committee Members:
- David Jonathan Miller, Thesis Advisor/Co-Advisor
David Jonathan Miller, Thesis Advisor/Co-Advisor - Keywords:
- Expectation Maximization
Model Order Selection
Bayesian Information Criterion
Multivariate Gaussian - Abstract:
- Multivariate Gaussian mixtures are widely used in science and engineering for density estimation, model-based data clustering, and statistical classification. A difficult problem, of special interest for clustering, is estimating the model order, i.e. the number of mixture components. Use of full covariance matrices, with number of parameters quadratic in the feature dimension, entails high model complexity, and thus may underestimate order, while naive Bayes mixtures may introduce model bias and lead to order overestimates. We develop a parsimonious modeling and model order selection method for multivariate Gaussian mixtures which allows for and optimizes over parameter tying configurations across mixture components applied to each individual parameter, including the covariates. We derive a generalized Expectation-Maximization algorithm for (BIC-based) penalized likelihood minimization. This algorithm, coupled with sequential model order reduction, forms our joint learning and model selection method. Our method searches over a rich space of models with different (data representation, model complexity) tradeoffs and, consistent with minimizing BIC, achieves fine-grained matching of model complexity to the amount of available data. We have found our method to be effective and largely robust in learning accurate model orders and parameter-tying structures for simulated ground-truth mixtures. We also compared against naive Bayes and standard full-covariance Gaussian mixtures for several different criteria: i) accuracy in estimating the number of (ground-truth) components; ii) test set log-likelihood; iii) unsupervised (and semisupervised) classification accuracy; and iv) accuracy when class-conditional mixtures are used in a plug-in Bayes classifier. The results bear out that our parsimonious mixtures and coupled learning give improved accuracy with respect to each of these performance measures.