TNT logo LUH TNT

Motion Capture with Video and IMUs

TNT members involved in this project:
Prof. Dr.-Ing. Bodo Rosenhahn
Timo von Marcard, M.Sc.
Show all

The recording of human motion has many applications in the fields of movie productions, computer animation, medical analysis and sport sciences. Compared to commercial marker-based systems, video-based marker-less motion capture systems are very appealing because they are inexpensive and non-intrusive. Unfortunately, occlusions, partial observations and image ambiguities make the problem very hard. Hence, there is still a gap between the accuracy and reliability of marker-less systems compared to marker-based solutions.
One of our research goals is to combine information coming from video cameras with information coming from a small number of inertial measurement units (IMUs). In particular, we are interested in using only 5-6 IMUs attached at the body extremities of the subject. By fusing video and sparse IMU data the tracking performance increases in both accuracy and stability. The proposed tracking solutions are an inexpensive alternative to commercial marker-based systems to perform motion capture. Although it is more intrusive than pure marker-less systems, few miniature IMU sensors do not hamper the range of motions a subject can perform. This makes it a very appealing and practical solution for applications where high accuracy and realism is required.
Check out some of our selected publications dealing with sensor fusion for human motion capture. Also, we provide several datasets that have been used in our projects here.

 

Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics)

Timo von Marcard, Bodo Rosenhahn, Michael J. Black, and Gerard Pons-Moll

Abstract: We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall.

Links:

 


Human Pose Estimation from Video and IMUs

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2016)

Timo von Marcard Gerard Pons-Moll, and Bodo Rosenhahn

Abstract: In this work, we present an approach to fuse video with sparse orientation data obtained from inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.

Links:

 


Outdoor Human Motion Capture using Inverse Kinematics and von Mises-Fisher Sampling

IEEE International Conference on Computer Vision (ICCV 2011)

Gerard Pons-Moll, Andreas Baak, Juergen Gall, Laura Leal-Taixe,
Meinard Mueller, Hans-Peter Seidel, and Bodo Rosenhahn

Abstract: Human motion capturing (HMC) from multiview image sequences constitutes an extremely difficult problem due to depth and orientation ambiguities and the high dimensionality of the state space. In this paper, we introduce a novel hybrid HMC system that combines video input with sparse inertial sensor input. Employing an annealing particle-based optimization scheme, our idea is to use orientation cues derived from the inertial input to sample particles from the manifold of valid poses. Then, visual cues derived from the video input are used to weight these particles and to iteratively derive the final pose. As our main contribution, we propose an efficient sampling procedure where hypothesis are derived analytically using state decomposition and inverse kinematics on the orientation cues. Additionally, we introduce a novel sensor noise model to account for uncertainties based on the von Mises-Fisher distribution. Doing so, orientation constraints are naturally fulfilled and the number of needed particles can be kept very small. More generally, our method can be used to sample poses that fulfill arbitrary orientation or positional kinematic constraints. In the experiments, we show that our system can track even highly dynamic motions in an outdoor setting with changing illumination, background clutter, and shadows.

Links:

 


Multisensor-Fusion for 3D Full-Body Human Motion Capture

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010)

Gerard Pons-Moll, Andreas Baak, Thomas Helten, Meinard Müller,
Hans-Peter Seidel, and Bodo Rosenhahn

Abstract: In this work, we present an approach to fuse video with orientation data obtained from extended inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for drift- free estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.

Links:

 


Show all publications
  • Timo von Marcard, Bodo Rosenhahn, Michael Black, Gerard Pons-Moll
    Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs
    Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), 2017
  • Timo von Marcard, Gerard Pons-Moll, Bodo Rosenhahn
    Human Pose Estimation from Video and IMUs
    Transactions on Pattern Analysis and Machine Intelligence, IEEE, Vol. 38, No. 8, pp. 1533-1547, January 2016
  • Gerard Pons-Moll, Andreas Baak, Juergen Gall, Laura Leal-Taixe, Meinard Mueller, Hans-Peter Seidel, Bodo Rosenhahn
    Outdoor Human Motion Capture using Inverse Kinematics and von Mises-Fisher Sampling
    IEEE International Conference on Computer Vision (ICCV), November 2011