Spatial, Temporal, and Morphological Perspectives: Advancing Understanding of Visual Data Depicting Humans
Open Access
- Author:
- Wu, Chenyan
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 05, 2023
- Committee Members:
- James Wang, Chair & Dissertation Advisor
Sharon Huang, Major Field Member
Alison Gernand, Outside Unit & Field Member
Jeffrey Bardzell, Program Head/Chair
C Lee Giles, Major Field Member - Keywords:
- Computer Vision
Deep Learning
Bodily Expressed Emotion Understanding
Human Pose Estimation
Human Orientation Estimation
Placenta - Abstract:
- Artificial intelligence (AI) has experienced significant transformation over the past decade, influencing a multitude of sectors and subsequently reshaping our industrial, economic, and societal frameworks. One outstanding application in this evolution is ChatGPT, stemming from the field of Natural Language Processing (NLP). This technology has been successfully integrated into programming assistance, education, brainstorming, etc., notably enhancing workforce efficiency. Concurrently, several promising computer vision (CV) applications—including autonomous driving, intelligent household robots, and AI medical diagnostics—are still in their developmental stages, with aspirations to reach milestones analogous to those accomplished by ChatGPT. Considering the mechanisms behind the above three CV applications, each requires collaborative interactions with humans. Thus, for these systems to gain widespread adoption, it is crucial that they deeply understand visual data depicting humans. This dissertation is dedicated to the analysis of such data, exploring it through three distinct perspectives: spatial and temporal, for the human body, and morphological, for organs. The human body can be conceptualized as a geometric entity in 3-D space. This dissertation begins with examining a primary spatial attribute of the human body: its orientation relative to the camera's perspective. Through building the most comprehensive human orientation dataset to date and developing a robust neural network, we achieve admirable results in estimating human orientation. Subsequently, we advance the spatial representation of human bodies to its utmost extent, aiming to reconstruct every human mesh within a single image. Instead of the conventional methods that depend on learning image features, we construct coherent multi-human meshes utilizing solely multi-human 2-D poses as input, processed through a single graph neural network. Surprisingly, this simple network, despite its minimal input information, performs comparable or even better than previous image-based approaches across various benchmarks. Such results indicate significant potential for future image-based approaches. Additionally, we investigate the human body from a temporal perspective. As human bodies move over time, static human images evolve into videos depicting human motion. To study human motion, we construct a highly precise video dataset focusing on human motor elements and propose a Transformer network to represent these elements. Our findings further demonstrate that the features derived from human motion can significantly improve comprehension in Bodily Expressed Emotion Understanding (BEEU), thereby setting a new state-of-the-art in BEEU. The aforementioned spatial-temporal characteristics do not account for human organs' detailed appearances and textures. This detailed visual information differentiates individuals and plays a vital role in AI medical diagnosis. Lastly, this dissertation employs segmentation methodologies to analyze the morphological characterizations of a specific human organ—the placenta. In sum, our research highlights the profound potential of AI in understanding the visual data of humans, paving the way for innovative applications and enhanced human-machine collaboration.