Deep Learning for Visual Data Analysis and Synthesis in Healthcare and Beyond
Restricted (Penn State Only)
- Author:
- Ni, Haomiao
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 10, 2024
- Committee Members:
- Dongwon Lee, Professor in Charge/Director of Graduate Studies
James Wang, Major Field Member
Sharon Huang, Chair & Dissertation Advisor
Fenglong Ma, Major Field Member
Ying Sun, Outside Unit & Field Member - Keywords:
- Deep Learning
Computer Vision
Healthcare AI
Generative AI - Abstract:
- Visual data, including images and videos, are indispensable across many applications. Researchers have developed computer vision algorithms and theories for automatic visual comprehension. With recent advancements in AI, deep learning has become ubiquitous in computer vision. Despite its notable achievements, challenges persist in medical image analysis and video understanding. Medical image analysis presents multifaceted complexities, including issues of low data quality, data diversity, the necessity for clinical verification, and interpretability. Moreover, collecting large-scale annotated medical image data poses challenges due to strict regulations and the requisite expertise. In video understanding, the combination of spatial content and temporal dynamics introduces difficulties, complicating the design of robust and efficient video modeling frameworks. To enhance the generalizability and efficiency of computer vision systems for processing medical images and videos, in this dissertation, we propose several advanced deep-learning-based methods for visual data analysis and synthesis, with applications in healthcare and other diverse domains. First, we present our frameworks designed for robust medical image analysis and synthesis, including an asymmetry disentanglement network that leverages clinical prior knowledge to achieve interpretable results, and a synthetic augmentation method that mitigates the need for large-scale labeled datasets. Second, we introduce our proposed video-based AI diagnosis systems. By utilizing our specially-designed model architectures, alongside techniques such as semi-supervised learning and adversarial training, these systems can effectively analyze infant movement videos and talking-face videos for diagnosing cerebral palsy and stroke without the need for fully-annotated videos. Third, we propose two generative methods based on diffusion models to address the conditional image-to-video generation task. These methods involve decoupling the generation of spatial content and temporal motions, as well as manipulating the reverse process of pretrained diffusion models, thus enabling computationally-efficient high-quality video modeling. Finally, we introduce our proposed generative models for cross-identity video motion retargeting, including the design of dual branches and the incorporation of 3D head information to better model appearance and maintain motion continuity. We subsequently demonstrate their application in healthcare AI by anonymizing the identities in videos, thereby achieving privacy protection in clinical diagnosis. By conducting comprehensive experiments on multiple datasets, we validate the effectiveness of our proposed methods in healthcare and other general domains.