gmr: Gaussian Mixture Regression

• EM implementation that only requires numpy and scipy • computation of conditional distributions • sampling from confidence regions of multivariate Gaussians • collapsing a GMM to a single Gaussian • extraction of individual Gaussians from a (conditional) GMM • plotting of covariance ellipses • unscented transform (Uhlmann, 1995) to estimate the effect of a nonlinear function on a Gaussian distribution


Statement of Need
The library gmr is fully compatible with scikit-learn (Pedregosa et al., 2011). It has its own implementation of expectation maximization (EM), but it can also be initialized with a GMM from scikit-learn, which means that we can also initialize it from a Bayesian GMM of scikitlearn. The prediction process for regression is not available in scikit-learn and, thus, will be provided by gmr.
Note that while scikit-learn has the function GaussianMixture.predict, it does not perform regression. This function computes the index of the Gaussian for which the inputs have the highest probability. Furthermore, multimodal regression often requires a more complicated interface to extract not just the mean but also individual Gaussians or sample from the predicted distribution.
The library gmr provides a simple interface and several useful features to deal with multimodal regression, mixtures of Gaussians, and multivariate Gaussian distributions: • EM implementation that only requires numpy and scipy • computation of conditional distributions • sampling from confidence regions of multivariate Gaussians • collapsing a GMM to a single Gaussian • extraction of individual Gaussians from a (conditional) GMM • plotting of covariance ellipses • unscented transform (Uhlmann, 1995) to estimate the effect of a nonlinear function on a Gaussian distribution Multimodal regression and Gaussian mixture regression has been used mostly by the robotics community. For example, inverse problems such as inverse kinematics cannot be easily modeled with standard regression approaches. Furthermore, Gaussian mixture regression is the basis of many programming by demonstration approaches (Billard et al., 2008).

Background
Gaussian mixture regression via EM has been proposed first by Ghahramani & Jordan (1994). Calinon et al. (2007) introduced the term Gaussian mixture regression in the context of imitation learning for trajectories of robots and many publications that use GMR in this domain followed. Stulp & Sigaud (2015) present Gaussian mixture regression in a more recent survey.

Training
During the training phase we learn a Gaussian mixture model through EM, where N k (x, y|µ xy k , Σ xy k ) are Gaussian distributions with mean µ xy k and covariance Σ xy k , K is the number of Gaussians, and π k ∈ [0, 1] are priors that sum up to one.

Gaussian mixture regression can be used to predict distributions of variables y by computing the conditional distribution p(y|x). The conditional distribution of each individual Gaussian
N (x, y|µ xy , Σ xy ) In a Gaussian mixture model we compute the conditional distribution of each individual Gaussian and their priors

Examples
Here is an example of a dataset where multiple outputs y are valid predictions for one input x. It was introduced by Bishop (1994). On the left side of Figure 1 we see the training data and the fitted GMM indicated by ellipses corresponding to its components. On the right side we see the predicted probability density p(y|x = 0.5). There are three peaks that correspond to three different valid predictions. Each peak is represented by at least one of the Gaussians of the GMM.
We can use GMR to represent demonstrated motions. Here is an example in 2D, in which we have a dataset that is a sequence of positions x and corresponding velocitiesẋ. We train a GMM to represent p(x,ẋ). Then we can generate a new trajectory by iteratively samplinġ x t ∼ p(ẋ|x = x t ) and computing the next position as x t+1 = x t + ∆tẋ t . In Figure 2 we can see that in the middle of the eight we have multiple modes: one velocity vector would lead to the left and one to the right. Sampling from the conditional GMM is only one possible solution here. Another one would be to select the component that contributes the most to the probability density and take its mean. When we sample, we often want to ensure that we do not end up in a region of low probability. Hence, we can resample as long as we are not in an α-confidence region, where α ∈ [0, 1]. This strategy is used here and is directly provided by the library.