3D Human pose estimation on Taiji sequence

Open Access
Author:
Wang, Tianhe
Graduate Program:
Electrical Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
July 13, 2018
Committee Members:
  • Yanxi Liu, Thesis Advisor
  • William Evan Higgins, Committee Member
  • Robert Collins, Committee Member
Keywords:
  • Pose estimation
  • Neural Networks
  • Motion Capture
  • Regression
Abstract:
Human pose estimation is a task that has been extensively studied in the field of computer vision. Given a video frame or an image, a 2D pose or 3D pose estimation can be generated directly. An alternative to 3D pose estimation is to estimate a 2D human pose first and then predict the 3D pose from 2D joint locations. In our experiments, we have found that the state-of-the-art 3D pose estimators have over 60 mm MPJPE (mean per joint position error), it is unacceptable for biomedical applications where expected errors are 1% or less of the height of the person (e.g. 17mm for a person with 1.7m tall). To be able to achieve the precision expected in the biomedical applications, training on a biomedically validated dataset is a start. The goal of this thesis is to achieve quantified initial results of a 3D pose estimator trained on biomedically validated Taiji Quan sequence dataset. This thesis contains three parts: (1) A tool designed to align MoCap data with video temporally; (2) A network trained to estimate 3D human pose from the 2D skeleton on the Taiji dataset. The 2D skeleton is generated by projecting, randomly, MoCap data into multiple 2D planes. (3) With the aligned video-MoCap data and OpenPose as a 2D human joint detector, a 3D human pose estimator was implemented. The network from (2) was fine-tuned to work on the noisy 2D joint results from video frames. As a result, The 3D skeleton reconstruction from the 2D skeletons of two different views by triangulation achieves the least MPJPE 0.7cm; the 3D pose estimation from video achieves around 26cm MPJPE and one of the state-of-the-art 3D pose estimators (Lifting from the deep) achieves around 43cm MPJPE. As a conclusion, Dual-view outperforms single-view significantly as expected because depth information of 3D skeleton can be obtained from the Dual-view. For single-views, the network trained on biomedically validated Taiji dataset outperforms "Lifting from the deep." But there is still a long way to go to meet the expected errors for biomedical applications for single-view pose estimators.