Multivariate¶
MvNormal (mu[, cov, tau, chol, lower]) 
Multivariate normal loglikelihood. 
Wishart (nu, V, *args, **kwargs) 
Wishart loglikelihood. 
LKJCholeskyCov (eta, n, sd_dist, *args, **kwargs) 
Covariance matrix with LKJ distributed correlations. 
LKJCorr ([eta, n, p, transform]) 
The LKJ (Lewandowski, Kurowicka and Joe) loglikelihood. 
Multinomial (n, p, *args, **kwargs) 
Multinomial loglikelihood. 
Dirichlet (a[, transform]) 
Dirichlet loglikelihood. 

class
pymc3.distributions.multivariate.
MvNormal
(mu, cov=None, tau=None, chol=None, lower=True, *args, **kwargs)¶ Multivariate normal loglikelihood.
\[f(x \mid \pi, T) = \frac{T^{1/2}}{(2\pi)^{1/2}} \exp\left\{ \frac{1}{2} (x\mu)^{\prime} T (x\mu) \right\}\]Support \(x \in \mathbb{R}^k\) Mean \(\mu\) Variance \(T^{1}\) Parameters:  mu (array) – Vector of means.
 cov (array) – Covariance matrix. Exactly one of cov, tau, or chol is needed.
 tau (array) – Precision matrix. Exactly one of cov, tau, or chol is needed.
 chol (array) – Cholesky decomposition of covariance matrix. Exactly one of cov, tau, or chol is needed.
 lower (bool, default=True) – Whether chol is the lower tridiagonal cholesky factor.
Examples
Define a multivariate normal variable for a given covariance matrix:
cov = np.array([[1., 0.5], [0.5, 2]]) mu = np.zeros(2) vals = pm.MvNormal('vals', mu=mu, cov=cov, shape=(5, 2))
Most of the time it is preferable to specify the cholesky factor of the covariance instead. For example, we could fit a multivariate outcome like this (see the docstring of LKJCholeskyCov for more information about this):
mu = np.zeros(3) true_cov = np.array([[1.0, 0.5, 0.1], [0.5, 2.0, 0.2], [0.1, 0.2, 1.0]]) data = np.random.multivariate_normal(mu, true_cov, 10) sd_dist = pm.HalfCauchy.dist(beta=2.5, shape=3) chol_packed = pm.LKJCholeskyCov('chol_packed', n=3, eta=2, sd_dist=sd_dist) chol = pm.expand_packed_triangular(3, chol_packed) vals = pm.MvNormal('vals', mu=mu, chol=chol, observed=data)
For unobserved values it can be better to use a noncentered parametrization:
sd_dist = pm.HalfCauchy.dist(beta=2.5, shape=3) chol_packed = pm.LKJCholeskyCov('chol_packed', n=3, eta=2, sd_dist=sd_dist) chol = pm.expand_packed_triangular(3, chol_packed) vals_raw = pm.Normal('vals_raw', mu=0, sd=1, shape=(5, 3)) vals = pm.Deterministic('vals', tt.dot(chol, vals_raw.T).T)

class
pymc3.distributions.multivariate.
MvStudentT
(nu, Sigma, mu=None, *args, **kwargs)¶ Multivariate StudentT loglikelihood.
\[f(\mathbf{x} \nu,\mu,\Sigma) = \frac {\Gamma\left[(\nu+p)/2\right]} {\Gamma(\nu/2)\nu^{p/2}\pi^{p/2} \left{\Sigma}\right^{1/2} \left[ 1+\frac{1}{\nu} ({\mathbf x}{\mu})^T {\Sigma}^{1}({\mathbf x}{\mu}) \right]^{(\nu+p)/2}}\]Support \(x \in \mathbb{R}^k\) Mean \(\mu\) if \(\nu > 1\) else undefined Variance  \(\frac{\nu}{\mu2}\Sigma\)
 if \(\nu>2\) else undefined
Parameters:  nu (int) – Degrees of freedom.
 Sigma (matrix) – Covariance matrix.
 mu (array) – Vector of means.

class
pymc3.distributions.multivariate.
Dirichlet
(a, transform=<pymc3.distributions.transforms.StickBreaking object>, *args, **kwargs)¶ Dirichlet loglikelihood.
\[f(\mathbf{x}) = \frac{\Gamma(\sum_{i=1}^k \theta_i)}{\prod \Gamma(\theta_i)} \prod_{i=1}^{k1} x_i^{\theta_i  1} \left(1\sum_{i=1}^{k1}x_i\right)^\theta_k\]Support \(x_i \in (0, 1)\) for \(i \in \{1, \ldots, K\}\) such that \(\sum x_i = 1\) Mean \(\dfrac{a_i}{\sum a_i}\) Variance \(\dfrac{a_i  \sum a_0}{a_0^2 (a_0 + 1)}\) where \(a_0 = \sum a_i\) Parameters: a (array) – Concentration parameters (a > 0). Notes
Only the first k1 elements of x are expected. Can be used as a parent of Multinomial and Categorical nevertheless.

class
pymc3.distributions.multivariate.
Multinomial
(n, p, *args, **kwargs)¶ Multinomial loglikelihood.
Generalizes binomial distribution, but instead of each trial resulting in “success” or “failure”, each one results in exactly one of some fixed finite number k of possible outcomes over n independent trials. ‘x[i]’ indicates the number of times outcome number i was observed over the n trials.
\[f(x \mid n, p) = \frac{n!}{\prod_{i=1}^k x_i!} \prod_{i=1}^k p_i^{x_i}\]Support \(x \in \{0, 1, \ldots, n\}\) such that \(\sum x_i = n\) Mean \(n p_i\) Variance \(n p_i (1  p_i)\) Covariance \(n p_i p_j\) for \(i \ne j\) Parameters:  n (int or array) – Number of trials (n > 0).
 p (one or twodimensional array) – Probability of each one of the different outcomes. Elements must be nonnegative and sum to 1 along the last axis. They will be automatically rescaled otherwise.

class
pymc3.distributions.multivariate.
Wishart
(nu, V, *args, **kwargs)¶ Wishart loglikelihood.
The Wishart distribution is the probability distribution of the maximumlikelihood estimator (MLE) of the precision matrix of a multivariate normal distribution. If V=1, the distribution is identical to the chisquare distribution with nu degrees of freedom.
\[f(X \mid nu, T) = \frac{{\mid T \mid}^{nu/2}{\mid X \mid}^{(nuk1)/2}}{2^{nu k/2} \Gamma_p(nu/2)} \exp\left\{ \frac{1}{2} Tr(TX) \right\}\]where \(k\) is the rank of \(X\).
Support \(X(p x p)\) positive definite matrix Mean \(nu V\) Variance \(nu (v_{ij}^2 + v_{ii} v_{jj})\) Parameters:  nu (int) – Degrees of freedom, > 0.
 V (array) – p x p positive definite matrix.
Note
This distribution is unusable in a PyMC3 model. You should instead use LKJCholeskyCov or LKJCorr.

pymc3.distributions.multivariate.
WishartBartlett
(name, S, nu, is_cholesky=False, return_cholesky=False, testval=None)¶ Bartlett decomposition of the Wishart distribution. As the Wishart distribution requires the matrix to be symmetric positive semidefinite it is impossible for MCMC to ever propose acceptable matrices.
Instead, we can use the Barlett decomposition which samples a lower diagonal matrix. Specifically:
\[ \begin{align}\begin{aligned}\begin{split}\text{If} L \sim \begin{pmatrix} \sqrt{c_1} & 0 & 0 \\ z_{21} & \sqrt{c_2} & 0 \\ z_{31} & z_{32} & \sqrt{c_3} \end{pmatrix}\end{split}\\\begin{split}\text{with} c_i \sim \chi^2(ni+1) \text{ and } n_{ij} \sim \mathcal{N}(0, 1), \text{then} \\ L \times A \times A.T \times L.T \sim \text{Wishart}(L \times L.T, \nu)\end{split}\end{aligned}\end{align} \]See http://en.wikipedia.org/wiki/Wishart_distribution#Bartlett_decomposition for more information.
Parameters:  S (ndarray) – p x p positive definite matrix Or: p x p lowertriangular matrix that is the Cholesky factor of the covariance matrix.
 nu (int) – Degrees of freedom, > dim(S).
 is_cholesky (bool (default=False)) – Input matrix S is already Cholesky decomposed as S.T * S
 return_cholesky (bool (default=False)) – Only return the Cholesky decomposed matrix.
 testval (ndarray) – p x p positive definite matrix used to initialize
Note
This is not a standard Distribution class but follows a similar interface. Besides the Wishart distribution, it will add RVs c and z to your model which make up the matrix.
This distribution is usually a bad idea to use as a prior for multivariate normal. You should instead use LKJCholeskyCov or LKJCorr.

class
pymc3.distributions.multivariate.
LKJCorr
(eta=None, n=None, p=None, transform=’interval’, *args, **kwargs)¶ The LKJ (Lewandowski, Kurowicka and Joe) loglikelihood.
The LKJ distribution is a prior distribution for correlation matrices. If eta = 1 this corresponds to the uniform distribution over correlation matrices. For eta > oo the LKJ prior approaches the identity matrix.
Support Upper triangular matrix with values in [1, 1] Parameters:  n (int) – Dimension of the covariance matrix (n > 0).
 eta (float) – The shape parameter (eta > 0) of the LKJ distribution. eta = 1 implies a uniform distribution of the correlation matrices; larger values put more weight on matrices with few correlations.
Notes
This implementation only returns the values of the upper triangular matrix excluding the diagonal. Here is a schematic for n = 5, showing the indexes of the elements:
[[ 0 1 2 3] [  4 5 6] [   7 8] [    9] [    ]]
References
[LKJ2009] Lewandowski, D., Kurowicka, D. and Joe, H. (2009). “Generating random correlation matrices based on vines and extended onion method.” Journal of multivariate analysis, 100(9), pp.19892001.

class
pymc3.distributions.multivariate.
LKJCholeskyCov
(eta, n, sd_dist, *args, **kwargs)¶ Covariance matrix with LKJ distributed correlations.
This defines a distribution over cholesky decomposed covariance matrices, such that the underlying correlation matrices follow an LKJ distribution [1] and the standard deviations follow an arbitray distribution specified by the user.
Parameters:  n (int) – Dimension of the covariance matrix (n > 0).
 eta (float) – The shape parameter (eta > 0) of the LKJ distribution. eta = 1 implies a uniform distribution of the correlation matrices; larger values put more weight on matrices with few correlations.
 sd_dist (pm.Distribution) – A distribution for the standard deviations.
Notes
Since the cholesky factor is a lower triangular matrix, we use packed storge for the matrix: We store and return the values of the lower triangular matrix in a onedimensional array, numbered by row:
[[0   ] [1 2  ] [3 4 5 ] [6 7 8 9]]
You can use pm.expand_packed_triangular(packed_cov, lower=True) to convert this to a regular twodimensional array.
Examples
with pm.Model() as model: # Note that we access the distribution for the standard # deviations, and do not create a new random variable. sd_dist = pm.HalfCauchy.dist(beta=2.5) packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=10, sd_dist=sd_dist) chol = pm.expand_packed_triangular(10, packed_chol, lower=True) # Define a new MvNormal with the given covariance vals = pm.MvNormal('vals', mu=np.zeros(10), chol=chol, shape=10) # Or transform an uncorrelated normal: vals_raw = pm.Normal('vals_raw', mu=np.zeros(10), sd=1) vals = tt.dot(chol, vals_raw) # Or compute the covariance matrix cov = tt.dot(chol, chol.T) # Extract the standard deviations stds = tt.sqrt(tt.diag(cov))
In the unconstrained space all values of the cholesky factor are stored untransformed, except for the diagonal entries, where we use a logtransform to restrict them to positive values.
To correctly compute loglikelihoods for the standard deviations and the correlation matrix seperatly, we need to consider a second transformation: Given a cholesky factorization \(LL^T = \Sigma\) of a covariance matrix we can recover the standard deviations \(\sigma\) as the euclidean lengths of the rows of \(L\), and the cholesky factor of the correlation matrix as \(U = \text{diag}(\sigma)^{1}L\). Since each row of \(U\) has length 1, we do not need to store the diagonal. We define a transformation \(\phi\) such that \(\phi(L)\) is the lower triangular matrix containing the standard deviations \(\sigma\) on the diagonal and the correlation matrix \(U\) below. In this form we can easily compute the different likelihoods seperatly, as the likelihood of the correlation matrix only depends on the values below the diagonal, and the likelihood of the standard deviation depends only on the diagonal values.
We still need the determinant of the jacobian of \(\phi^{1}\). If we think of \(\phi\) as an automorphism on \(\mathbb{R}^{\tfrac{n(n+1)}{2}}\), where we order the dimensions as described in the notes above, the jacobian is a blockdiagonal matrix, where each block corresponds to one row of \(U\). Each block has arrowhead shape, and we can compute the determinant of that as described in [2]. Since the determinant of a blockdiagonal matrix is the product of the determinants of the blocks, we get
\[\text{det}(J_{\phi^{1}}(U)) = \left[ \prod_{i=2}^N u_{ii}^{i  1} L_{ii} \right]^{1}\]References
[1] Lewandowski, D., Kurowicka, D. and Joe, H. (2009). “Generating random correlation matrices based on vines and extended onion method.” Journal of multivariate analysis, 100(9), pp.19892001. [2] J. M. isn’t a mathematician (http://math.stackexchange.com/users/498/ jmisntamathematician), Different approaches to evaluate this determinant, URL (version: 20120414): http://math.stackexchange.com/q/130026