3D Human pose estimation on Taiji sequence
Open Access
- Author:
- Wang, Tianhe
- Graduate Program:
- Electrical Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- July 13, 2018
- Committee Members:
- Yanxi Liu, Thesis Advisor/Co-Advisor
William Evan Higgins, Committee Member
Robert Collins, Committee Member - Keywords:
- Pose estimation
Neural Networks
Motion Capture
Regression - Abstract:
- Human pose estimation is a task that has been extensively studied in the field of computer vision. Given a video frame or an image, a 2D pose or 3D pose estimation can be generated directly. An alternative to 3D pose estimation is to estimate a 2D human pose first and then predict the 3D pose from 2D joint locations. In our experiments, we have found that the state-of-the-art 3D pose estimators have over 60 mm MPJPE (mean per joint position error), it is unacceptable for biomedical applications where expected errors are 1% or less of the height of the person (e.g. 17mm for a person with 1.7m tall). To be able to achieve the precision expected in the biomedical applications, training on a biomedically validated dataset is a start. The goal of this thesis is to achieve quantified initial results of a 3D pose estimator trained on biomedically validated Taiji Quan sequence dataset. This thesis contains three parts: (1) A tool designed to align MoCap data with video temporally; (2) A network trained to estimate 3D human pose from the 2D skeleton on the Taiji dataset. The 2D skeleton is generated by projecting, randomly, MoCap data into multiple 2D planes. (3) With the aligned video-MoCap data and OpenPose as a 2D human joint detector, a 3D human pose estimator was implemented. The network from (2) was fine-tuned to work on the noisy 2D joint results from video frames. As a result, The 3D skeleton reconstruction from the 2D skeletons of two different views by triangulation achieves the least MPJPE 0.7cm; the 3D pose estimation from video achieves around 26cm MPJPE and one of the state-of-the-art 3D pose estimators (Lifting from the deep) achieves around 43cm MPJPE. As a conclusion, Dual-view outperforms single-view significantly as expected because depth information of 3D skeleton can be obtained from the Dual-view. For single-views, the network trained on biomedically validated Taiji dataset outperforms "Lifting from the deep." But there is still a long way to go to meet the expected errors for biomedical applications for single-view pose estimators.