An Investigation of Wavelet Features for Automated Biomedical Image Classification

Open Access
Fry, Jonathan Richard
Graduate Program:
Electrical Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
November 26, 2012
Committee Members:
  • Yanxi Liu, Thesis Advisor
  • Vishal Monga, Thesis Advisor
  • wavelet
  • feature space
  • classification
  • machine learning
  • pattern recognition
  • image analysis
  • feature selection
In this thesis, we present a systematic investigation of the wavelet feature space for automated biological and biomedical image classification. This thesis addresses the lack of generalizability in past research regarding the parameterization of the wavelet feature space. Specifically, we aim to identify trends in the four parameters in wavelet feature extraction: the wavelet basis function, the number of levels of decomposition, the specific detail spaces from which statistical features are calculated, and the types of statistical features. Identifying these trends is crucial for the design of complex, automated imagery analysis systems. This research ultimately impacts many areas, especially biomedical research and medical diagnostics in regards to high throughput imagery analysis. We have experimented on a wide variety of publically available 2D microscope imagery datasets, representative of different modalities of microscope imaging, including phase contrast, fluorescence, brightfield, and differential interference contrast (DIC) microscopy. These datasets represent many common classification tasks, such as identification of subcellular organelles, age, gender, and diet classification, cancer type classification, and genotype-from-phenotype classification in gene knockout cells. We have also experimented on MR imagery of the human brain for Alzheimer’s disease to contrast the 2D and 3D imaging modalities. This presents the currently challenging classification problem of differentiating normal individuals from individuals with mild cognitive impairment and Alzheimer’s disease. In total, we have selected 10 datasets consisting of more than 5000 images total, allowing us to use 13 independent multiclass classification experiments to explore the wavelet feature space. To complete our research goal, we have created a feature extraction pipeline that sweeps across the four critical wavelet feature parameters. Each dataset is converted into a wavelet feature representation, and evaluated using a feature ranking, feature subset selection, and classification pipeline. The classification performance and feature subset selection rates govern our evaluation of the wavelet feature parameter space. This work has identified several key results regarding the parameterization of wavelet feature extraction. First, many imagery classification problems can favor one (or two) wavelet basis function(s). The follow-on to this is that the selection of a wavelet basis function is both dataset specific, and task specific within the same dataset, and there is no “magic bullet” wavelet basis as has been suggested in previous research. We have identified four univariate statistics (variance, kurtosis, entropy, and channel energy) that are relevant across many different classification problems with many combinations of other parameters. We have noted the “clustering” selection of gray level co-occurrence matrix (GLCM) statistics, a tendency for a particular classification experiment to favor all orientations of a particular GLCM statistic. We have noted the significance in terms of selection frequency of the highest and lowest levels of wavelet decomposition. We have significantly improved performance on image classification problems involving common model organisms, such as mice (95% for age classification, and 98% for gender classification, as compared to 51% and 69% in literature) and the C. elegans worm (78.5% for age classification as compared to 60% in literature). We have identified sequential forward feature selection as an excellent feature space compression method for wavelet features, achieving feature space compression ratios as high as 10-4. Finally we have identified future directions for further investigation into the wavelet feature space, most notably search methods for the wavelet parameter space.