Data Augmentation Strategies For Cervical Histopathology Image Classification
Open Access
- Author:
- Zhou, Qianying
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 20, 2020
- Committee Members:
- Sharon Xiaolei Huang, Thesis Advisor/Co-Advisor
Peng Liu, Committee Member
Zihan Zhou, Committee Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Data augmentation
Deep learning
Generative Adversarial Network
Cervical Cancer
Histopathology image - Abstract:
- Microscopic examination of tissue is an important approach for cervical cancer diagnosis. Pathologists grade the precancerous stage by the occurrence of abnormal cells in the epithelium tissue samples. The precancerous stage can be divided into four categories from mild to severe: normal, CIN1, CIN2, and CIN3. Examination of histopathology slides is time-consuming and often has inter- and intra- pathologists difference. Therefore, an automatic grading system is in demand. However, existing state-of-the-art deep learning models require a large amount of annotated training data, and the data should cover different input types to generalize the models. Collection of annotated training data is expensive and it limits the application of deep learning to cervical histopathology image classificaton. To tackle this problem, in this thesis we study on different data augmentation strategies. Three data augmentation strategies are investigated: (1) symbolic transformation operations; (2) synthetic augmentation based on generative adversarial network (GAN); (3) a novel filtering mechanism for synthetic images. We propose a novel filtering mechanism based on the divergence in feature space between synthetic images and real images. Our methods are evaluated on a small cervical histopathology image dataset. Experimental analysis shows that our filtering mechanism can improve the quality of synthetic images in feature space. Further, our GAN-based model with the filtering mechanism outperforms traditional augmentation methods and synthetic augmentation without filtering. We improve the classification accuracy by 5% compared with the baseline model. We believe that our method can be generalized to other diseases and computer-aided systems.