This project was a pump priming project concerning large scale learning of human motion capture data sets using latent variable models, in particular the GP-LVM. The aim of the project is to develop the dimensionality reduction methods for modelling human motion with a view towards constructing systems for markerless motion capture.
The achievements in the project were:
- We have developed a set of tools for loading MOCAP data into MATLAB (see software below).
- We now have convergent sparse GP-LVM for sparse learning, details are given in the large scale learning paper below.
- We have developed a hierarchical model for decomposition of the component parts of human motion (see the Hierarchical GP-LVM paper below).
- We have developed a shared latent space model for jointly learning about joint angles and silhouettes (see the MLMI paper below).
- We presented a paper on extensions of the shared latent space model (see the second MLMI paper below).
- We presented a paper on topological extensions of the model that allow different styles to be interpolated (see ICML paper below, this was an international collaboration with MIT, Toronto and `Berkeley).
Personnel from ML@SITraN
The following software has been made available either wholly or partly as a result of work on this project:- fgplvm FGPLVM Toolbox for large scale learning of GP-LVMs.
hgplvm HGPLVM Toolbox for hierarchical learning of GP-LVMs.
mocap Simple MATLAB utility toolbox for loading motion capture data sets.
The Gaussian process latent variable model (GP-LVM) is a recently proposed probabilistic approach to obtaining a reduced dimension representation of a data set. In this tutorial we motivate and describe the GP-LVM, giving reviews of the model itself and some of the concepts behind it.
“Computer vision reading group: the Gaussian process latent variable model”. Presented at Computer Vision Reading Group, Visual Geometry Group, Department of Engineering Science, University of Oxford, U.K. on 27/1/2006. [PDF Slides][PDF Notes][Demos Software][Main Software][Google Scholar Search](2006)
The Gaussian process latent variable model (GP-LVM) is a recently proposed probabilistic approach to obtaining a reduced dimension representation of a data set. In this tutorial we motivate and describe the GP-LVM, giving a review of the model itself and some of the concepts behind it.
“Probabilistic dimensional reduction with the Gaussian process latent variable model”. Presented at Google Research, New York, N.Y., U.S.A. on 12/2/2007. [PDF][YouTube][Demos Software][Main Software][Google Scholar Search][Video](2007)
Density modelling in high dimensions is a very difficult problem. Traditional approaches, such as mixtures of Gaussians, typically fail to capture the structure of data sets in high dimensional spaces. In this talk we will argue that for many data sets of interest, the data can be represented as a lower dimensional manifold immersed in the higher dimensional space. We will then present the Gaussian Process Latent Variable Model (GP-LVM), a non-linear probabilistic variant of principal component analysis (PCA) which implicitly assumes that the data lies on a lower dimensional space. Having introduced the GP-LVM we will review extensions to the algorithm, including dynamics, learning of large data sets and back constraints. We will demonstrate the application of the model and its extensions to a range of data sets, including human motion data, a vowel data set and a robot mapping problem.
The following conference publications were made associated with this project.
“Learning for larger datasets with the Gaussian process latent variable model” in M. Meila and X. Shen (eds) Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics, Omnipress, San Juan, Puerto Rico, pp 243–250. [Software][PDF][Google Scholar Search](2007)
In this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. Each approach is then implemented on a well known benchmark data set and compared with earlier attempts to sparsify the model.
“Hierarchical Gaussian process latent variable models” in Z. Ghahramani (ed.) Proceedings of the International Conference in Machine Learning, Omnipress, , pp 481–488. [Software][PDF][Google Scholar Search](2007)
The Gaussian process latent variable model (GP-LVM) is a powerful approach for probabilistic modelling of high dimensional data through dimensional reduction. In this paper we extend the GP-LVM through hierarchies. A hierarchical model (such as a tree) allows us to express conditional independencies in the data as well as the manifold structure. We first introduce Gaussian process hierarchies through a simple dynamical model, we then extend the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.
“Gaussian process latent variable models for human pose estimation” in A. Popescu-Belis, S. Renals and H. Bourlard (eds) Machine Learning for Multimodal Interaction (MLMI 2007), Springer-Verlag, Brno, Czech Republic, pp 132–143. [Software][PDF][DOI][Google Scholar Search](2008)
We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM)  encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.
“Ambiguity modeling in latent spaces” in A. Popescu-Belis and R. Stiefelhagen (eds) Machine Learning for Multimodal Interaction (MLMI 2008), Springer-Verlag, , pp 62–73. [Software][PDF][Google Scholar Search](2008)
We are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation are complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that are specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.
“Topologically-constrained latent variable models” in S. Roweis and A. Mccallum (eds) Proceedings of the International Conference in Machine Learning, Omnipress, , pp 1080-1087. [PDF][DOI][Google Scholar Search](2008)
In dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.