Towards automated recognition of bodily expression of emotion in the wild
Open Access
- Author:
- Luo, Yu
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 29, 2020
- Committee Members:
- James Wang, Co-Chair & Dissertation Advisor
Reginald Adams, Outside Unit & Field Member
Jia Li, Co-Chair & Dissertation Advisor
Zihan Zhou, Major Field Member
Mary Beth Rosson, Major Field Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Body language
emotional expression
computer vision
crowdsourcing
video analysis
perception
statistical modeling
human mesh reconstruction
. - Abstract:
- Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9,876 video clips of body movements and 13,239 human characters, named BoLD (Body Language Dataset), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans. Computationally representing human body movements from images is another aspect towards automated recognition of bodily expression. A fine-grained mesh of human pose and shape provides rich geometric information that enables many applications including bodily expression recognition. Estimating an accurate 3D human mesh from an image captured by a passive sensor is a highly challenging research problem. The mainstream approach, which uses deep learning, requires large-scale human pose/shape annotations in the training process. Currently, those annotations are mostly created from expensive indoor motion capture systems, thus both diversity and quantity are limited. We propose a new method to train a deep human mesh estimation model using a large quantity of unlabeled RGB-D images, which are inexpensive and convenient to collect. Depth information encoded in the data is used in the training process to achieve higher model accuracy. Our method is easy-to-implement and amenable to any other state-of-the-art parametric mesh modeling framework. We empirically demonstrate the effectiveness of this method based on real-world datasets, validating the value of the proposed ``learning from depth'' approach.