Automatic Object Segmentation of Images and Videos

TNT members involved in this project:

One of the fundamental problems in computer vision is automatic segmentation of complex content of images and videos, so called object segmentation, which is to produce a binary segmentation, separating foreground objects from their background. In video object segmentation, one has to automatically segment the objects in an unannotated video. This is a challenging task, as local image measurements often provide only a weak cue. Object appearance may significantly change over the video frames due to changes in the camera viewpoint, scene illumination or object deformation. Most approaches are to extend single image segmentation techniques to multiple frames, exploiting the fact that there is redundancy along the time axis and that the motion field is smooth. While this can be attempted by analyzing individual image frames independently, video provides rich additional cues beyond a single image. These cues include object motion, temporal continuity, and long-range temporal object interactions, etc.

Goal

In this project, we aim to obtain a spatio-temporal foreground segmentation of a video that respects object boundaries, and at the same time temporally associates object pixels whenever they appear in the video. The problem will be formulated as inference in a conditional random field (CRF). We make use of point trajectories, which have rich grouping information in their motion differences. The CRF contains binary variables representing the class labels of image pixels as well as binary variables indicating the correctness of trajectory clustering.

Approach

The problem of video segmentation is formulated as inference in a CRF. The challenge here is to model object labeling and trajectory clustering in a unified probabilistic framework to facilitate video segmentation. Therefore, joint object and trajectory segmentation will be formulated as a pixel and trajectory labeling problem of assigning each pixel and trajectory with either foreground or background. The random field contains binary variables representing the class labels of image pixels as well as binary variables indicating the correctness of trajectory clustering. This model combines different views on the video data by specific potentials and the trajectory labeling accounting for long range motion cues, as illustrated in the above figure. Object and trajectory will be optimized in the joint space via the space-time CRF.

Publications

Show recent publications only

Conference Contributions
- Michael Ying Yang
  A Generic Probabilistic Graphical Model for Region-based Scene Interpretation
  International Conference on Computer Vision Theory and Applications, accpeted for publication, March 2015
  (DOI) BibTeX
- Michael Ying Yang, Yu Qiang, Bodo Rosenhahn
  A global-to-local framework for infrared and visible image sequence registration
  IEEE Winter Conference on Applications of Computer Vision, accpeted for publication, January 2015
  (DOI) BibTeX
- Michael Ying Yang, Sitong Feng, Bodo Rosenhahn
  Sparse optimization for motion segmentation
  ACCV Workshop on Video Segmentation in Computer Vision, November 2014
  (DOI) BibTeX
- Michael Ying Yang, Bodo Rosenhahn
  Video Segmentation with Joint Object and Trajectory Labeling
  IEEE Winter Conference on Applications of Computer Vision, IEEE, March 2014
  (pdfDOI) BibTeX
- Michael Ying Yang
  Image Segmentation by Bilayer Superpixel Grouping
  Asian Conference on Pattern Recognition , accpeted for publication, Okinawa, Japan, November 2013
  (pdfDOI) BibTeX
- Michael Ying Yang, Wolfgang Förstner
  A Hierarchical Conditional Random Field Model for Labeling and Classifying Images of Man-made Scenes
  ICCV Workshop on Computer Vision for Remote Sensing of the Environment , IEEE, p. 196 – 203, 2011
  (pdf) BibTeX
Journals
- W. Huang, X. Gong, Michael Ying Yang
  Joint object segmentation and depth upsampling
  Signal Processing Letters, IEEE, Vol. 22, No. 2, p. 192–196, 2015
  (DOI) BibTeX