Video-based Motion Capture

TNT members involved in this project:
Prof. Dr.-Ing. Bodo Rosenhahn
Timo von Marcard, M.Sc.
Bastian Wandt, M.Sc.

Motion Capture is the process of analyzing movements of objects or humans from video data. Potential application fields are animation for 3D-movie production, sports science and medical applications. Instead of using artificial markers attached to the body and expensive lab equipment we are interested in tracking humans from video streams without special preparation of the subject. This is even more challenging in the context of outdoor scenes, clothed people and people interaction.

The main goal is to reconstruct the three-dimensional pose of a person from image data only. It can be split in multiple subtasks, e.g. people detection/tracking, 3d reconstruction, human model building, and animation. Our research in this field focuses on either one of this subtasks or their combination.

3D Reconstruction of Human Motion from Monocular Image Sequences

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2016)

Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn

Abstract: This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.



Metric Regression Forests for Human Pose Estimation

British Machine Vision Conference (BMVC)

Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, and Andrew Fitzgibbon

Abstract: We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training objective designed to directly optimize the entropy of distributions in a metric space. When applied to a model surface, viewed as a metric space defined by geodesic distances, MSIG aims to minimize image-to-model correspondence uncertainty. A naïve implementation of MSIG would scale quadratically with the number of training examples. As this is intractable for large datasets, we propose a method to compute MSIG in linear time. Our method is a principled generalization of the proxy classification objective, and does not require an extrinsic isometric embedding of the model surface in Euclidean space. Our experiments demonstrate that this leads to correspondences that are considerably more accurate than state of the art, using far fewer training images.



Show all publications
  • Timo von Marcard, Gerard Pons-Moll, Bodo Rosenhahn
    Human Pose Estimation from Video and IMUs
    Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1533-1547, January 2016
  • Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn
    3D Reconstruction of Human Motion from Monocular Image Sequences
    Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1505-1516, 2016
  • Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn
    3D Human Motion Capture from Monocular Image Sequences
    IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, June 2015