Context and Observation Driven Latent Variable Model for Human Pose Estimation


Current approaches to pose estimation and tracking can
be classified into two categories: generative and discriminative. While generative approaches can accurately determine human pose from image observations, they are computationally intractable due to search in the high dimensional human pose space. On the other hand, discriminative approaches do not generalize well, but are computationally efficient. We present a hybrid model that combines the strengths of the two in an integrated learning and inference framework. We extend the Gaussian process latent variable model (GPLVM) to include an embedding from
observation space (the space of image features) to the latent space. GPLVM is a generative model, but the inclusion
of this mapping provides a discriminative component,
making the model observation driven. Observation Driven
GPLVM (OD-GPLVM) not only provides a faster inference
approach, but also more accurate estimates (compared to
GPLVM) in cases where dynamics are not sufficient for the
initialization of search in the latent space.

We also extend OD-GPLVM to learn and estimate poses
from parameterized actions/gestures. Parameterized gestures
are actions which exhibit large systematic variation
in joint angle space for different instances due to difference in contextual variables. For example, the joint angles in a forehand tennis shot are function of the height of the ball (Figure 2). We learn these systematic variations as a function of the contextual variables. We then present an approach to use information from scene/object to provide
context for human pose estimation for such parameterized