# API Reference¶

## Distributions¶

### Continuous¶

 Uniform([lower, upper, transform]) Continuous uniform log-likelihood. Flat(\*args, \*\*kwargs) Uninformative log-likelihood that returns 0 regardless of the passed value. Normal([mu, sd, tau]) Univariate normal log-likelihood. Beta([alpha, beta, mu, sd]) Beta log-likelihood. Exponential(lam, \*args, \*\*kwargs) Exponential log-likelihood. Laplace(mu, b, \*args, \*\*kwargs) Laplace log-likelihood. StudentT(nu[, mu, lam, sd]) Non-central Student’s T log-likelihood. Cauchy(alpha, beta, \*args, \*\*kwargs) Cauchy log-likelihood. HalfCauchy(beta, \*args, \*\*kwargs) Half-Cauchy log-likelihood. Gamma([alpha, beta, mu, sd]) Gamma log-likelihood. Weibull(alpha, beta, \*args, \*\*kwargs) Weibull log-likelihood. StudentTpos(\*args, \*\*kwargs) Lognormal([mu, sd, tau]) Log-normal log-likelihood. ChiSquared(nu, \*args, \*\*kwargs) $$\chi^2$$ log-likelihood. HalfNormal([sd, tau]) Half-normal log-likelihood. Wald([mu, lam, phi, alpha]) Wald log-likelihood. Pareto(alpha, m, \*args, \*\*kwargs) Pareto log-likelihood. InverseGamma(alpha[, beta]) Inverse gamma log-likelihood, the reciprocal of the gamma distribution. ExGaussian(mu, sigma, nu, \*args, \*\*kwargs) Exponentially modified Gaussian log-likelihood.

pymc3.distributions

A collection of common probability distributions for stochastic nodes in PyMC.

class pymc3.distributions.continuous.Uniform(lower=0, upper=1, transform='interval', *args, **kwargs)

Continuous uniform log-likelihood.

$f(x \mid lower, upper) = \frac{1}{upper-lower}$
 Support $$x \in [lower, upper]$$ Mean $$\dfrac{lower + upper}{2}$$ Variance $$\dfrac{(upper - lower)^2}{12}$$
Parameters: lower : float Lower limit. upper : float Upper limit.
class pymc3.distributions.continuous.Flat(*args, **kwargs)

Uninformative log-likelihood that returns 0 regardless of the passed value.

class pymc3.distributions.continuous.Normal(mu=0, sd=None, tau=None, **kwargs)

Univariate normal log-likelihood.

$f(x \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \exp\left\{ -\frac{\tau}{2} (x-\mu)^2 \right\}$
 Support $$x \in \mathbb{R}$$ Mean $$\mu$$ Variance $$\dfrac{1}{\tau}$$ or $$\sigma^2$$

Normal distribution can be parameterized either in terms of precision or standard deviation. The link between the two parametrizations is given by

$\tau = \dfrac{1}{\sigma^2}$
Parameters: mu : float Mean. sd : float Standard deviation (sd > 0). tau : float Precision (tau > 0).
class pymc3.distributions.continuous.Beta(alpha=None, beta=None, mu=None, sd=None, *args, **kwargs)

Beta log-likelihood.

$f(x \mid \alpha, \beta) = \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}$
 Support $$x \in (0, 1)$$ Mean $$\dfrac{\alpha}{\alpha + \beta}$$ Variance $$\dfrac{\alpha \beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$$

Beta distribution can be parameterized either in terms of alpha and beta or mean and standard deviation. The link between the two parametrizations is given by

\begin{align}\begin{aligned}\begin{split}\alpha &= \mu \kappa \\ \beta &= (1 - \mu) \kappa\end{split}\\\text{where } \kappa = \frac{\mu(1-\mu)}{\sigma^2} - 1\end{aligned}\end{align}
Parameters: alpha : float alpha > 0. beta : float beta > 0. mu : float Alternative mean (0 < mu < 1). sd : float Alternative standard deviation (sd > 0).

Notes

Beta distribution is a conjugate prior for the parameter $$p$$ of the binomial distribution.

class pymc3.distributions.continuous.Exponential(lam, *args, **kwargs)

Exponential log-likelihood.

$f(x \mid \lambda) = \lambda \exp\left\{ -\lambda x \right\}$
 Support $$x \in [0, \infty)$$ Mean $$\dfrac{1}{\lambda}$$ Variance $$\dfrac{1}{\lambda^2}$$
Parameters: lam : float Rate or inverse scale (lam > 0)
class pymc3.distributions.continuous.Laplace(mu, b, *args, **kwargs)

Laplace log-likelihood.

$f(x \mid \alpha, \beta) = \frac{1}{2b} \exp \left\{ - \frac{|x - \mu|}{b} \right\}$
 Support $$x \in \mathbb{R}$$ Mean $$\mu$$ Variance $$2 b^2$$
Parameters: mu : float Location parameter. b : float Scale parameter (b > 0).
class pymc3.distributions.continuous.StudentT(nu, mu=0, lam=None, sd=None, *args, **kwargs)

Non-central Student’s T log-likelihood.

Describes a normal variable whose precision is gamma distributed. If only nu parameter is passed, this specifies a standard (central) Student’s T.

$f(x|\mu,\lambda,\nu) = \frac{\Gamma(\frac{\nu + 1}{2})}{\Gamma(\frac{\nu}{2})} \left(\frac{\lambda}{\pi\nu}\right)^{\frac{1}{2}} \left[1+\frac{\lambda(x-\mu)^2}{\nu}\right]^{-\frac{\nu+1}{2}}$
 Support $$x \in \mathbb{R}$$
Parameters: nu : int Degrees of freedom (nu > 0). mu : float Location parameter. lam : float Scale parameter (lam > 0).
class pymc3.distributions.continuous.Cauchy(alpha, beta, *args, **kwargs)

Cauchy log-likelihood.

Also known as the Lorentz or the Breit-Wigner distribution.

$f(x \mid \alpha, \beta) = \frac{1}{\pi \beta [1 + (\frac{x-\alpha}{\beta})^2]}$
 Support $$x \in \mathbb{R}$$ Mode $$\alpha$$ Mean undefined Variance undefined
Parameters: alpha : float Location parameter beta : float Scale parameter > 0
class pymc3.distributions.continuous.HalfCauchy(beta, *args, **kwargs)

Half-Cauchy log-likelihood.

$f(x \mid \beta) = \frac{2}{\pi \beta [1 + (\frac{x}{\beta})^2]}$
 Support $$x \in \mathbb{R}$$ Mode 0 Mean undefined Variance undefined
Parameters: beta : float Scale parameter (beta > 0).
class pymc3.distributions.continuous.Gamma(alpha=None, beta=None, mu=None, sd=None, *args, **kwargs)

Gamma log-likelihood.

Represents the sum of alpha exponentially distributed random variables, each of which has mean beta.

$f(x \mid \alpha, \beta) = \frac{\beta^{\alpha}x^{\alpha-1}e^{-\beta x}}{\Gamma(\alpha)}$
 Support $$x \in (0, \infty)$$ Mean $$\dfrac{\alpha}{\beta}$$ Variance $$\dfrac{\alpha}{\beta^2}$$

Gamma distribution can be parameterized either in terms of alpha and beta or mean and standard deviation. The link between the two parametrizations is given by

$\begin{split}\alpha &= \frac{\mu^2}{\sigma^2} \\ \beta &= \frac{\mu}{\sigma^2}\end{split}$
Parameters: alpha : float Shape parameter (alpha > 0). beta : float Rate parameter (beta > 0). mu : float Alternative shape parameter (mu > 0). sd : float Alternative scale parameter (sd > 0).
class pymc3.distributions.continuous.Weibull(alpha, beta, *args, **kwargs)

Weibull log-likelihood.

$f(x \mid \alpha, \beta) = \frac{\alpha x^{\alpha - 1} \exp(-(\frac{x}{\beta})^{\alpha})}{\beta^\alpha}$
 Support $$x \in [0, \infty)$$ Mean $$\beta \Gamma(1 + \frac{1}{\alpha})$$ Variance $$\beta^2 \Gamma(1 + \frac{2}{\alpha} - \mu^2)$$
Parameters: alpha : float Shape parameter (alpha > 0). beta : float Scale parameter (beta > 0).
class pymc3.distributions.continuous.Lognormal(mu=0, sd=None, tau=None, *args, **kwargs)

Log-normal log-likelihood.

Distribution of any random variable whose logarithm is normally distributed. A variable might be modeled as log-normal if it can be thought of as the multiplicative product of many small independent factors.

$f(x \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \frac{\exp\left\{ -\frac{\tau}{2} (\ln(x)-\mu)^2 \right\}}{x}$
 Support $$x \in (0, 1)$$ Mean $$\exp\{\mu + \frac{1}{2\tau}\}$$ Variance $$\exp\{\frac{1}{\tau} - 1\} \exp\{2\mu + \frac{1}{\tau}\}$$
Parameters: mu : float Location parameter. tau : float Scale parameter (tau > 0).
class pymc3.distributions.continuous.ChiSquared(nu, *args, **kwargs)

$$\chi^2$$ log-likelihood.

$f(x \mid \nu) = \frac{x^{(\nu-2)/2}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}$
 Support $$x \in [0, \infty)$$ Mean $$\nu$$ Variance $$2 \nu$$
Parameters: nu : int Degrees of freedom (nu > 0).
class pymc3.distributions.continuous.HalfNormal(sd=None, tau=None, *args, **kwargs)

Half-normal log-likelihood.

$f(x \mid \tau) = \sqrt{\frac{2\tau}{\pi}} \exp\left\{ {\frac{-x^2 \tau}{2}}\right\}$
 Support $$x \in [0, \infty)$$ Mean $$0$$ Variance $$\dfrac{1}{\tau}$$ or $$\sigma^2$$
Parameters: sd : float Standard deviation (sd > 0). tau : float Precision (tau > 0).
class pymc3.distributions.continuous.Wald(mu=None, lam=None, phi=None, alpha=0.0, *args, **kwargs)

Wald log-likelihood.

$f(x \mid \mu, \lambda) = \left(\frac{\lambda}{2\pi)}\right)^{1/2} x^{-3/2} \exp\left\{ -\frac{\lambda}{2x}\left(\frac{x-\mu}{\mu}\right)^2 \right\}$
 Support $$x \in (0, \infty)$$ Mean $$\mu$$ Variance $$\dfrac{\mu^3}{\lambda}$$

Wald distribution can be parameterized either in terms of lam or phi. The link between the two parametrizations is given by

$\phi = \dfrac{\lambda}{\mu}$
Parameters: mu : float, optional Mean of the distribution (mu > 0). lam : float, optional Relative precision (lam > 0). phi : float, optional Alternative shape parameter (phi > 0). alpha : float, optional Shift/location parameter (alpha >= 0).

Notes

To instantiate the distribution specify any of the following

• only mu (in this case lam will be 1)
• mu and lam
• mu and phi
• lam and phi

References

 [Tweedie1957] Tweedie, M. C. K. (1957). Statistical Properties of Inverse Gaussian Distributions I. The Annals of Mathematical Statistics, Vol. 28, No. 2, pp. 362-377
 [Michael1976] Michael, J. R., Schucany, W. R. and Hass, R. W. (1976). Generating Random Variates Using Transformations with Multiple Roots. The American Statistician, Vol. 30, No. 2, pp. 88-90
class pymc3.distributions.continuous.Pareto(alpha, m, *args, **kwargs)

Pareto log-likelihood.

Often used to characterize wealth distribution, or other examples of the 80/20 rule.

$f(x \mid \alpha, m) = \frac{\alpha m^{\alpha}}{x^{\alpha+1}}$
 Support $$x \in [m, \infty)$$ Mean $$\dfrac{\alpha m}{\alpha - 1}$$ for $$\alpha \ge 1$$ Variance $$\dfrac{m \alpha}{(\alpha - 1)^2 (\alpha - 2)}$$ for $$\alpha > 2$$
Parameters: alpha : float Shape parameter (alpha > 0). m : float Scale parameter (m > 0).
class pymc3.distributions.continuous.InverseGamma(alpha, beta=1, *args, **kwargs)

Inverse gamma log-likelihood, the reciprocal of the gamma distribution.

$f(x \mid \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{-\alpha - 1} \exp\left(\frac{-\beta}{x}\right)$
 Support $$x \in (0, \infty)$$ Mean $$\dfrac{\beta}{\alpha-1}$$ for $$\alpha > 1$$ Variance $$\dfrac{\beta^2}{(\alpha-1)^2(\alpha)}$$ for $$\alpha > 2$$
Parameters: alpha : float Shape parameter (alpha > 0). beta : float Scale parameter (beta > 0).
class pymc3.distributions.continuous.ExGaussian(mu, sigma, nu, *args, **kwargs)

Exponentially modified Gaussian log-likelihood.

Results from the convolution of a normal distribution with an exponential distribution.

$f(x \mid \mu, \sigma, \tau) = \frac{1}{\nu}\; \exp\left\{\frac{\mu-x}{\nu}+\frac{\sigma^2}{2\nu^2}\right\} \Phi\left(\frac{x-\mu}{\sigma}-\frac{\sigma}{\nu}\right)$

where $$\Phi$$ is the cumulative distribution function of the standard normal distribution.

 Support $$x \in \mathbb{R}$$ Mean $$\mu + \nu$$ Variance $$\sigma^2 + \nu^2$$
Parameters: mu : float Mean of the normal distribution. sigma : float Standard deviation of the normal distribution (sigma > 0). nu : float Mean of the exponential distribution (nu > 0).

References

 [Rigby2005] Rigby R.A. and Stasinopoulos D.M. (2005). “Generalized additive models for location, scale and shape” Applied Statististics., 54, part 3, pp 507-554.
 [Lacouture2008] Lacouture, Y. and Couseanou, D. (2008). “How to use MATLAB to fit the ex-Gaussian and other probability functions to a distribution of response times”. Tutorials in Quantitative Methods for Psychology, Vol. 4, No. 1, pp 35-45.
class pymc3.distributions.continuous.VonMises(mu=0.0, kappa=None, transform='circular', *args, **kwargs)

Univariate VonMises log-likelihood.

$f(x \mid \mu, \kappa) = \frac{e^{\kappa\cos(x-\mu)}}{2\pi I_0(\kappa)}$

where :I_0 is the modified Bessel function of order 0.

 Support $$x \in [-\pi, \pi]$$ Mean $$\mu$$ Variance $$1-\frac{I_1(\kappa)}{I_0(\kappa)}$$
Parameters: mu : float Mean. kappa : float Concentration (frac{1}{kappa} is analogous to sigma^2).
class pymc3.distributions.continuous.SkewNormal(mu=0.0, sd=None, tau=None, alpha=1, *args, **kwargs)

Univariate skew-normal log-likelihood.

$f(x \mid \mu, \tau, \alpha) = 2 \Phi((x-\mu)\sqrt{\tau}\alpha) \phi(x,\mu,\tau)$
 Support $$x \in \mathbb{R}$$ Mean $$\mu + \sigma \sqrt{\frac{2}{\pi}} \frac {\alpha }{{\sqrt {1+\alpha ^{2}}}}$$ Variance $$\sigma^2 \left( 1-\frac{2\alpha^2}{(\alpha^2+1) \pi} \right)$$

Skew-normal distribution can be parameterized either in terms of precision or standard deviation. The link between the two parametrizations is given by

$\tau = \dfrac{1}{\sigma^2}$
Parameters: mu : float Location parameter. sd : float Scale parameter (sd > 0). tau : float Alternative scale parameter (tau > 0). alpha : float Skewness parameter.

Notes

When alpha=0 we recover the Normal distribution and mu becomes the mean, tau the precision and sd the standard deviation. In the limit of alpha approaching plus/minus infinite we get a half-normal distribution.

### Discrete¶

 Binomial(n, p, \*args, \*\*kwargs) Binomial log-likelihood. BetaBinomial(alpha, beta, n, \*args, \*\*kwargs) Beta-binomial log-likelihood. Bernoulli(p, \*args, \*\*kwargs) Bernoulli log-likelihood Poisson(mu, \*args, \*\*kwargs) Poisson log-likelihood. NegativeBinomial(mu, alpha, \*args, \*\*kwargs) Negative binomial log-likelihood. ConstantDist(\*args, \*\*kwargs) ZeroInflatedPoisson(theta, psi, \*args, \*\*kwargs) Zero-inflated Poisson log-likelihood. DiscreteUniform(lower, upper, \*args, \*\*kwargs) Discrete uniform distribution. Geometric(p, \*args, \*\*kwargs) Geometric log-likelihood. Categorical(p, \*args, \*\*kwargs) Categorical log-likelihood.
class pymc3.distributions.discrete.Binomial(n, p, *args, **kwargs)

Binomial log-likelihood.

The discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

$f(x \mid n, p) = \binom{n}{x} p^x (1-p)^{n-x}$
 Support $$x \in \{0, 1, \ldots, n\}$$ Mean $$n p$$ Variance $$n p (1 - p)$$
Parameters: n : int Number of Bernoulli trials (n >= 0). p : float Probability of success in each trial (0 < p < 1).
class pymc3.distributions.discrete.BetaBinomial(alpha, beta, n, *args, **kwargs)

Beta-binomial log-likelihood.

Equivalent to binomial random variable with success probability drawn from a beta distribution.

$f(x \mid \alpha, \beta, n) = \binom{n}{x} \frac{B(x + \alpha, n - x + \beta)}{B(\alpha, \beta)}$
 Support $$x \in \{0, 1, \ldots, n\}$$ Mean $$n \dfrac{\alpha}{\alpha + \beta}$$ Variance $$n \dfrac{\alpha \beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}$$
Parameters: n : int Number of Bernoulli trials (n >= 0). alpha : float alpha > 0. beta : float beta > 0.
class pymc3.distributions.discrete.Bernoulli(p, *args, **kwargs)

Bernoulli log-likelihood

The Bernoulli distribution describes the probability of successes (x=1) and failures (x=0).

$f(x \mid p) = p^{x} (1-p)^{1-x}$
 Support $$x \in \{0, 1\}$$ Mean $$p$$ Variance $$p (1 - p)$$
Parameters: p : float Probability of success (0 < p < 1).
class pymc3.distributions.discrete.DiscreteWeibull(q, beta, *args, **kwargs)

Discrete Weibull log-likelihood

The discrete Weibull distribution is a flexible model of count data that can handle both over- and under-dispersion.

$f(x \mid q, \beta) = q^{x^{\beta}} - q^{(x + 1)^{\beta}}$
 Support $$x \in \mathbb{N}_0$$ Mean $$\mu = \sum_{x = 1}^{\infty} q^{x^{\beta}}$$ Variance $$2 \sum_{x = 1}^{\infty} x q^{x^{\beta}} - \mu - \mu^2$$
class pymc3.distributions.discrete.Poisson(mu, *args, **kwargs)

Poisson log-likelihood.

Often used to model the number of events occurring in a fixed period of time when the times at which events occur are independent.

$f(x \mid \mu) = \frac{e^{-\mu}\mu^x}{x!}$
 Support $$x \in \mathbb{N}_0$$ Mean $$\mu$$ Variance $$\mu$$
Parameters: mu : float Expected number of occurrences during the given interval (mu >= 0).

Notes

The Poisson distribution can be derived as a limiting case of the binomial distribution.

class pymc3.distributions.discrete.NegativeBinomial(mu, alpha, *args, **kwargs)

Negative binomial log-likelihood.

The negative binomial distribution describes a Poisson random variable whose rate parameter is gamma distributed.

$f(x \mid \mu, \alpha) = \frac{\Gamma(x+\alpha)}{x! \Gamma(\alpha)} (\alpha/(\mu+\alpha))^\alpha (\mu/(\mu+\alpha))^x$
 Support $$x \in \mathbb{N}_0$$ Mean $$\mu$$
Parameters: mu : float Poission distribution parameter (mu > 0). alpha : float Gamma distribution parameter (alpha > 0).
class pymc3.distributions.discrete.Constant(c, *args, **kwargs)

Constant log-likelihood.

Parameters: value : float or int Constant parameter.
class pymc3.distributions.discrete.ZeroInflatedPoisson(theta, psi, *args, **kwargs)

Zero-inflated Poisson log-likelihood.

Often used to model the number of events occurring in a fixed period of time when the times at which events occur are independent.

$\begin{split}f(x \mid \theta, \psi) = \left\{ \begin{array}{l} (1-\psi) + \psi e^{-\theta}, \text{if } x = 0 \\ \psi \frac{e^{-\theta}\theta^x}{x!}, \text{if } x=1,2,3,\ldots \end{array} \right.\end{split}$
 Support $$x \in \mathbb{N}_0$$ Mean $$\psi\theta$$ Variance $$\theta + \frac{1-\psi}{\psi}\theta^2$$
Parameters: theta : float Expected number of occurrences during the given interval (theta >= 0). psi : float Expected proportion of Poisson variates (0 < psi < 1)
class pymc3.distributions.discrete.ZeroInflatedNegativeBinomial(mu, alpha, psi, *args, **kwargs)

Zero-Inflated Negative binomial log-likelihood.

The Zero-inflated version of the Negative Binomial (NB). The NB distribution describes a Poisson random variable whose rate parameter is gamma distributed.

$\begin{split}f(x \mid \mu, \alpha, \psi) = \left\{ \begin{array}{l} (1-\psi) + \psi \left (\frac{\alpha}{\alpha+\mu} \right) ^\alpha, \text{if } x = 0 \\ \psi \frac{\Gamma(x+\alpha)}{x! \Gamma(\alpha)} \left (\frac{\alpha}{\mu+\alpha} \right)^\alpha \left( \frac{\mu}{\mu+\alpha} \right)^x, \text{if } x=1,2,3,\ldots \end{array} \right.\end{split}$
 Support $$x \in \mathbb{N}_0$$ Mean $$\psi\mu$$ Var $$\psi\mu + \left (1 + \frac{\mu}{\alpha} + \frac{1-\psi}{\mu} \right)$$
Parameters: mu : float Poission distribution parameter (mu > 0). alpha : float Gamma distribution parameter (alpha > 0). psi : float Expected proportion of NegativeBinomial variates (0 < psi < 1)
class pymc3.distributions.discrete.DiscreteUniform(lower, upper, *args, **kwargs)

Discrete uniform distribution.

$f(x \mid lower, upper) = \frac{1}{upper-lower}$
 Support $$x \in {lower, lower + 1, \ldots, upper}$$ Mean $$\dfrac{lower + upper}{2}$$ Variance $$\dfrac{(upper - lower)^2}{12}$$
Parameters: lower : int Lower limit. upper : int Upper limit (upper > lower).
class pymc3.distributions.discrete.Geometric(p, *args, **kwargs)

Geometric log-likelihood.

The probability that the first success in a sequence of Bernoulli trials occurs on the x’th trial.

$f(x \mid p) = p(1-p)^{x-1}$
 Support $$x \in \mathbb{N}_{>0}$$ Mean $$\dfrac{1}{p}$$ Variance $$\dfrac{1 - p}{p^2}$$
Parameters: p : float Probability of success on an individual trial (0 < p <= 1).
class pymc3.distributions.discrete.Categorical(p, *args, **kwargs)

Categorical log-likelihood.

The most general discrete distribution.

$f(x \mid p) = p_x$
 Support $$x \in \{1, 2, \ldots, |p|\}$$
Parameters: p : array of floats p > 0 and the elements of p must sum to 1. They will be automatically rescaled otherwise.

### Multivariate¶

 MvNormal(mu[, cov, tau]) Multivariate normal log-likelihood. Wishart(n, V, \*args, \*\*kwargs) Wishart log-likelihood. LKJCorr(n, p, \*args, \*\*kwargs) The LKJ (Lewandowski, Kurowicka and Joe) log-likelihood. Multinomial(n, p, \*args, \*\*kwargs) Multinomial log-likelihood. Dirichlet(a[, transform]) Dirichlet log-likelihood.
class pymc3.distributions.multivariate.MvNormal(mu, cov=None, tau=None, *args, **kwargs)

Multivariate normal log-likelihood.

$f(x \mid \pi, T) = \frac{|T|^{1/2}}{(2\pi)^{1/2}} \exp\left\{ -\frac{1}{2} (x-\mu)^{\prime} T (x-\mu) \right\}$
 Support $$x \in \mathbb{R}^k$$ Mean $$\mu$$ Variance $$T^{-1}$$
Parameters: mu : array Vector of means. cov : array, optional Covariance matrix. tau : array, optional Precision matrix.
class pymc3.distributions.multivariate.MvStudentT(nu, Sigma, mu=None, *args, **kwargs)

Multivariate Student-T log-likelihood.

$f(\mathbf{x}| \nu,\mu,\Sigma) = \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\Sigma}\right|^{1/2}\left[1+\frac{1}{\nu}({\mathbf x}-{\mu})^T{\Sigma}^{-1}({\mathbf x}-{\mu})\right]^{(\nu+p)/2}}$
 Support $$x \in \mathbb{R}^k$$ Mean $$\mu$$ if $$\nu > 1$$ else undefined Variance $$\frac{\nu}{\mu-2}\Sigma$$ if $$\nu>2$$ else undefined
Parameters: nu : int Degrees of freedom. Sigma : matrix Covariance matrix. mu : array Vector of means.
class pymc3.distributions.multivariate.Dirichlet(a, transform=<pymc3.distributions.transforms.StickBreaking object>, *args, **kwargs)

Dirichlet log-likelihood.

$f(\mathbf{x}) = \frac{\Gamma(\sum_{i=1}^k \theta_i)}{\prod \Gamma(\theta_i)} \prod_{i=1}^{k-1} x_i^{\theta_i - 1} \left(1-\sum_{i=1}^{k-1}x_i\right)^\theta_k$
 Support $$x_i \in (0, 1)$$ for $$i \in \{1, \ldots, K\}$$ such that $$\sum x_i = 1$$ Mean $$\dfrac{a_i}{\sum a_i}$$ Variance $$\dfrac{a_i - \sum a_0}{a_0^2 (a_0 + 1)}$$ where $$a_0 = \sum a_i$$
Parameters: a : array Concentration parameters (a > 0).

Notes

Only the first k-1 elements of x are expected. Can be used as a parent of Multinomial and Categorical nevertheless.

class pymc3.distributions.multivariate.Multinomial(n, p, *args, **kwargs)

Multinomial log-likelihood.

Generalizes binomial distribution, but instead of each trial resulting in “success” or “failure”, each one results in exactly one of some fixed finite number k of possible outcomes over n independent trials. ‘x[i]’ indicates the number of times outcome number i was observed over the n trials.

$f(x \mid n, p) = \frac{n!}{\prod_{i=1}^k x_i!} \prod_{i=1}^k p_i^{x_i}$
 Support $$x \in \{0, 1, \ldots, n\}$$ such that $$\sum x_i = n$$ Mean $$n p_i$$ Variance $$n p_i (1 - p_i)$$ Covariance $$-n p_i p_j$$ for $$i \ne j$$
Parameters: n : int or array Number of trials (n > 0). p : one- or two-dimensional array Probability of each one of the different outcomes. Elements must be non-negative and sum to 1 along the last axis. They will be automatically rescaled otherwise.
class pymc3.distributions.multivariate.Wishart(n, V, *args, **kwargs)

Wishart log-likelihood.

The Wishart distribution is the probability distribution of the maximum-likelihood estimator (MLE) of the precision matrix of a multivariate normal distribution. If V=1, the distribution is identical to the chi-square distribution with n degrees of freedom.

$f(X \mid n, T) = \frac{{\mid T \mid}^{n/2}{\mid X \mid}^{(n-k-1)/2}}{2^{nk/2} \Gamma_p(n/2)} \exp\left\{ -\frac{1}{2} Tr(TX) \right\}$

where $$k$$ is the rank of $$X$$.

 Support $$X(p x p)$$ positive definite matrix Mean $$n V$$ Variance $$n (v_{ij}^2 + v_{ii} v_{jj})$$
Parameters: n : int Degrees of freedom, > 0. V : array p x p positive definite matrix.
pymc3.distributions.multivariate.WishartBartlett(name, S, nu, is_cholesky=False, return_cholesky=False, testval=None)

Bartlett decomposition of the Wishart distribution. As the Wishart distribution requires the matrix to be symmetric positive semi-definite it is impossible for MCMC to ever propose acceptable matrices.

Instead, we can use the Barlett decomposition which samples a lower diagonal matrix. Specifically:

\begin{align}\begin{aligned}\begin{split}\text{If} L \sim \begin{pmatrix} \sqrt{c_1} & 0 & 0 \\ z_{21} & \sqrt{c_2} & 0 \\ z_{31} & z_{32} & \sqrt{c_3} \end{pmatrix}\end{split}\\\begin{split}\text{with} c_i \sim \chi^2(n-i+1) \text{ and } n_{ij} \sim \mathcal{N}(0, 1), \text{then} \\ L \times A \times A.T \times L.T \sim \text{Wishart}(L \times L.T, \nu)\end{split}\end{aligned}\end{align}

Parameters: S : ndarray p x p positive definite matrix Or: p x p lower-triangular matrix that is the Cholesky factor of the covariance matrix. nu : int Degrees of freedom, > dim(S). is_cholesky : bool (default=False) Input matrix S is already Cholesky decomposed as S.T * S return_cholesky : bool (default=False) Only return the Cholesky decomposed matrix. testval : ndarray p x p positive definite matrix used to initialize
class pymc3.distributions.multivariate.LKJCorr(n, p, *args, **kwargs)

The LKJ (Lewandowski, Kurowicka and Joe) log-likelihood.

The LKJ distribution is a prior distribution for correlation matrices. If n = 1 this corresponds to the uniform distribution over correlation matrices. For n -> oo the LKJ prior approaches the identity matrix.

 Support Upper triangular matrix with values in [-1, 1]
Parameters: n : float Shape parameter (n > 0). Uniform distribution at n = 1. p : int Dimension of correlation matrix (p > 0).

Notes

This implementation only returns the values of the upper triangular matrix excluding the diagonal. Here is a schematic for p = 5, showing the indexes of the elements:

[[- 0 1 2 3]
[- - 4 5 6]
[- - - 7 8]
[- - - - 9]
[- - - - -]]


References

 [LKJ2009] Lewandowski, D., Kurowicka, D. and Joe, H. (2009). “Generating random correlation matrices based on vines and extended onion method.” Journal of multivariate analysis, 100(9), pp.1989-2001.

### Mixture¶

 Mixture(w, comp_dists, \*args, \*\*kwargs) Mixture log-likelihood NormalMixture(w, mu, \*args, \*\*kwargs) Normal mixture log-likelihood
class pymc3.distributions.mixture.Mixture(w, comp_dists, *args, **kwargs)

Mixture log-likelihood

Often used to model subpopulation heterogeneity

$f(x \mid w, \theta) = \sum_{i = 1}^n w_i f_i(x \mid \theta_i)$
 Support $$\cap_{i = 1}^n \textrm{support}(f_i)$$ Mean $$\sum_{i = 1}^n w_i \mu_i$$
Parameters: w : array of floats w >= 0 and w <= 1 the mixutre weights comp_dists : multidimensional PyMC3 distribution or iterable of one-dimensional PyMC3 distributions the component distributions $$f_1, \ldots, f_n$$
class pymc3.distributions.mixture.NormalMixture(w, mu, *args, **kwargs)

Normal mixture log-likelihood

$f(x \mid w, \mu, \sigma^2) = \sum_{i = 1}^n w_i N(x \mid \mu_i, \sigma^2_i)$
 Support $$x \in \mathbb{R}$$ Mean $$\sum_{i = 1}^n w_i \mu_i$$ Variance $$\sum_{i = 1}^n w_i^2 \sigma^2_i$$

Parameters w : array of floats

w >= 0 and w <= 1 the mixutre weights
mu
: array of floats
the component means
sd
: array of floats
the component standard deviations
tau
: array of floats
the component precisions
pymc3.distributions.mixture.all_discrete(comp_dists)

Determine if all distributions in comp_dists are discrete

## Plots¶

pymc3.plots.traceplot(trace, varnames=None, transform=<function <lambda>>, figsize=None, lines=None, combined=False, plot_transformed=False, grid=False, alpha=0.35, priors=None, prior_alpha=1, prior_style='--', ax=None)

Plot samples histograms and values

Parameters: trace : result of MCMC run varnames : list of variable names Variables to be plotted, if None all variable are plotted transform : callable Function to transform data (defaults to identity) figsize : figure size tuple If None, size is (12, num of variables * 2) inch lines : dict Dictionary of variable name / value to be overplotted as vertical lines to the posteriors and horizontal lines on sample values e.g. mean of posteriors, true values of a simulation. If an array of values, line colors are matched to posterior colors. Otherwise, a default red line combined : bool Flag for combining multiple chains into a single chain. If False (default), chains will be plotted separately. plot_transformed : bool Flag for plotting automatically transformed variables in addition to original variables (defaults to False). grid : bool Flag for adding gridlines to histogram. Defaults to True. alpha : float Alpha value for plot line. Defaults to 0.35. priors : iterable of PyMC distributions PyMC prior distribution(s) to be plotted alongside poterior. Defaults to None (no prior plots). prior_alpha : float Alpha value for prior plot. Defaults to 1. prior_style : str Line style for prior plot. Defaults to ‘–’ (dashed line). ax : axes Matplotlib axes. Accepts an array of axes, e.g.: >>> fig, axs = plt.subplots(3, 2) # 3 RVs >>> pymc3.traceplot(trace, ax=axs)  Creates own axes by default. ax : matplotlib axes
pymc3.plots.forestplot(trace_obj, varnames=None, transform=<function <lambda>>, alpha=0.05, quartiles=True, rhat=True, main=None, xtitle=None, xrange=None, ylabels=None, chain_spacing=0.05, vline=0, gs=None, plot_transformed=False)

Forest plot (model summary plot) Generates a “forest plot” of 100*(1-alpha)% credible intervals for either the set of variables in a given model, or a specified set of nodes.

Parameters: trace_obj: NpTrace or MultiTrace object Trace(s) from an MCMC sample. varnames: list List of variables to plot (defaults to None, which results in all variables plotted). transform : callable Function to transform data (defaults to identity) alpha (optional): float Alpha value for (1-alpha)*100% credible intervals (defaults to 0.05). quartiles (optional): bool Flag for plotting the interquartile range, in addition to the (1-alpha)*100% intervals (defaults to True). rhat (optional): bool Flag for plotting Gelman-Rubin statistics. Requires 2 or more chains (defaults to True). main (optional): string Title for main plot. Passing False results in titles being suppressed; passing None (default) results in default titles. xtitle (optional): string Label for x-axis. Defaults to no label xrange (optional): list or tuple Range for x-axis. Defaults to matplotlib’s best guess. ylabels (optional): list or array User-defined labels for each variable. If not provided, the node __name__ attributes are used. chain_spacing (optional): float Plot spacing between chains (defaults to 0.05). vline (optional): numeric Location of vertical reference line (defaults to 0). gs : GridSpec Matplotlib GridSpec object. Defaults to None. plot_transformed : bool Flag for plotting automatically transformed variables in addition to original variables (defaults to False). gs : matplotlib GridSpec
pymc3.plots.autocorrplot(trace, varnames=None, max_lag=100, burn=0, plot_transformed=False, symmetric_plot=False, ax=None, figsize=None)

Bar plot of the autocorrelation function for a trace

Parameters: trace : result of MCMC run varnames : list of variable names Variables to be plotted, if None all variable are plotted. Vector-value stochastics are handled automatically. max_lag : int Maximum lag to calculate autocorrelation. Defaults to 100. burn : int Number of samples to discard from the beginning of the trace. Defaults to 0. plot_transformed : bool Flag for plotting automatically transformed variables in addition to original variables (defaults to False). symmetric_plot : boolean Plot from either [0, +lag] or [-lag, lag]. Defaults to False, [-, +lag]. ax : axes Matplotlib axes. Defaults to None. figsize : figure size tuple If None, size is (12, num of variables * 2) inches. Note this is not used if ax is supplied. ax : matplotlib axes
pymc3.plots.plot_posterior(trace, varnames=None, transform=<function <lambda>>, figsize=None, alpha_level=0.05, round_to=3, point_estimate='mean', rope=None, ref_val=None, kde_plot=False, plot_transformed=False, ax=None, **kwargs)

Plot Posterior densities in style of John K. Kruschke book

Parameters: trace : result of MCMC run varnames : list of variable names Variables to be plotted, if None all variable are plotted transform : callable Function to transform data (defaults to identity) figsize : figure size tuple If None, size is (12, num of variables * 2) inch alpha_level : float Defines range for High Posterior Density round_to : int Controls formatting for floating point numbers point_estimate: str Must be in (‘mode’, ‘mean’, ‘median’) rope: list or numpy array Lower and upper values of the Region Of Practical Equivalence ref_val: bool display the percentage below and above ref_val kde_plot: bool if True plot a KDE instead of a histogram plot_transformed : bool Flag for plotting automatically transformed variables in addition to original variables (defaults to False). ax : axes Matplotlib axes. Defaults to None. **kwargs Passed as-is to plt.hist() or plt.plot() function, depending on the value of the argument kde_plot Some defaults are added, if not specified color=’#87ceeb’ will match the style in the book ax : matplotlib axes

## Stats¶

Statistical utility functions for PyMC

pymc3.stats.autocorr(pymc3_obj, *args, **kwargs)

Sample autocorrelation at specified lag. The autocorrelation is the correlation of x_i with x_{i+lag}.

pymc3.stats.autocov(pymc3_obj, *args, **kwargs)

Sample autocovariance at specified lag. The autocovariance is a 2x2 matrix with the variances of x[:-lag] and x[lag:] in the diagonal and the autocovariance on the off-diagonal.

pymc3.stats.dic(trace, model=None)

Calculate the deviance information criterion of the samples in trace from model Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

pymc3.stats.bpic(trace, model=None)

Calculates Bayesian predictive information criterion n of the samples in trace from model Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

pymc3.stats.waic(trace, model=None, pointwise=False)

Calculate the widely available information criterion, its standard error and the effective number of parameters of the samples in trace from model. Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

Parameters: trace : result of MCMC run model : PyMC Model Optional model. Default None, taken from context. pointwise: bool if True the pointwise predictive accuracy will be returned. Default False namedtuple with the following elements: waic: widely available information criterion waic_se: standard error of waic p_waic: effective number parameters waic_i: and array of the pointwise predictive accuracy, only if pointwise True
pymc3.stats.loo(trace, model=None, pointwise=False)

Calculates leave-one-out (LOO) cross-validation for out of sample predictive model fit, following Vehtari et al. (2015). Cross-validation is computed using Pareto-smoothed importance sampling (PSIS).

Parameters: trace : result of MCMC run model : PyMC Model Optional model. Default None, taken from context. pointwise: bool if True the pointwise predictive accuracy will be returned. Default False namedtuple with the following elements: loo: approximated Leave-one-out cross-validation loo_se: standard error of loo p_loo: effective number of parameters loo_i: and array of the pointwise predictive accuracy, only if pointwise True
pymc3.stats.hpd(pymc3_obj, *args, **kwargs)

Calculate highest posterior density (HPD) of array for given alpha. The HPD is the minimum width Bayesian credible interval (BCI).

Arguments: x : Numpy arrayAn array containing MCMC samples alpha : floatDesired probability of type I error (defaults to 0.05) transform : callableFunction to transform data (defaults to identity)
pymc3.stats.quantiles(pymc3_obj, *args, **kwargs)

Returns a dictionary of requested quantiles from array

Arguments: x : Numpy arrayAn array containing MCMC samples qlist : tuple or listA list of desired quantiles (defaults to (2.5, 25, 50, 75, 97.5)) transform : callableFunction to transform data (defaults to identity)
pymc3.stats.mc_error(pymc3_obj, *args, **kwargs)

Calculates the simulation standard error, accounting for non-independent samples. The trace is divided into batches, and the standard deviation of the batch means is calculated.

Arguments: x : Numpy arrayAn array containing MCMC samples batches : integerNumber of batches
pymc3.stats.summary(trace, varnames=None, alpha=0.05, start=0, batches=None, roundto=3, include_transformed=False, to_file=None)

Generate a pretty-printed summary of the node.

Parameters: trace : Trace object Trace containing MCMC sample varnames : list of strings List of variables to summarize. Defaults to None, which results in all variables summarized. alpha : float The alpha level for generating posterior intervals. Defaults to 0.05. start : int The starting index from which to summarize (each) chain. Defaults to zero. batches : None or int Batch size for calculating standard deviation for non-independent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None. roundto : int The number of digits to round posterior statistics. include_transformed : bool Flag for summarizing automatically transformed variables in addition to original variables (defaults to False). to_file : None or string File to write results to. If not given, print to stdout.
pymc3.stats.df_summary(trace, varnames=None, stat_funcs=None, extend=False, include_transformed=False, alpha=0.05, batches=None)

Create a data frame with summary statistics.

Parameters: trace : MultiTrace instance varnames : list Names of variables to include in summary stat_funcs : None or list A list of functions used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included. The functions will be given one argument, the samples for a variable as a 2 dimensional array, where the first axis corresponds to sampling iterations and the second axis represents the flattened variable (e.g., x__0, x__1,...). Each function should return either 1) A pandas.Series instance containing the result of calculating the statistic along the first axis. The name attribute will be taken as the name of the statistic. A pandas.DataFrame where each column contains the result of calculating the statistic along the first axis. The column names will be taken as the names of the statistics. extend : boolean If True, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not None. include_transformed : bool Flag for reporting automatically transformed variables in addition to original variables (defaults to False). alpha : float The alpha level for generating posterior intervals. Defaults to 0.05. This is only meaningful when stat_funcs is None. batches : None or int Batch size for calculating standard deviation for non-independent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None. pandas.DataFrame with summary statistics for each variable

summary
Generate a pretty-printed summary of a trace.

Examples

>>> import pymc3 as pm
>>> trace.mu.shape
(1000, 2)
>>> pm.df_summary(trace, ['mu'])
mean        sd  mc_error     hpd_5    hpd_95
mu__0  0.106897  0.066473  0.001818 -0.020612  0.231626
mu__1 -0.046597  0.067513  0.002048 -0.174753  0.081924


Other statistics can be calculated by passing a list of functions.

>>> import pandas as pd
>>> def trace_sd(x):
...     return pd.Series(np.std(x, 0), name='sd')
...
>>> def trace_quantiles(x):
...     return pd.DataFrame(pm.quantiles(x, [5, 50, 95]))
...
>>> pm.df_summary(trace, ['mu'], stat_funcs=[trace_sd, trace_quantiles])
sd         5        50        95
mu__0  0.066473  0.000312  0.105039  0.214242
mu__1  0.067513 -0.159097 -0.045637  0.062912


## Diagnostics¶

Convergence diagnostics and model validation

pymc3.diagnostics.geweke(pymc3_obj, *args, **kwargs)

Return z-scores for convergence diagnostics.

Compare the mean of the first % of series with the mean of the last % of series. x is divided into a number of segments for which this difference is computed. If the series is converged, this score should oscillate between -1 and 1.

Parameters: x : array-like The trace of some stochastic parameter. first : float The fraction of series at the beginning of the trace. last : float The fraction of series at the end to be compared with the section at the beginning. intervals : int The number of segments. scores : list [[]] Return a list of [i, score], where i is the starting index for each interval and score the Geweke score on the interval.

Notes

The Geweke score on some series x is computed by:

$\frac{E[x_s] - E[x_e]}{\sqrt{V[x_s] + V[x_e]}}$

where $$E$$ stands for the mean, $$V$$ the variance, $$x_s$$ a section at the start of the series and $$x_e$$ a section at the end of the series.

References

Geweke (1992)

pymc3.diagnostics.gelman_rubin(mtrace)

Returns estimate of R for a set of traces.

The Gelman-Rubin diagnostic tests for lack of convergence by comparing the variance between multiple chains to the variance within each chain. If convergence has been achieved, the between-chain and within-chain variances should be identical. To be most effective in detecting evidence for nonconvergence, each chain should have been initialized to starting values that are dispersed relative to the target distribution.

Parameters: mtrace : MultiTrace A MultiTrace object containing parallel traces (minimum 2) of one or more stochastic parameters. Rhat : dict Returns dictionary of the potential scale reduction factors, $$\hat{R}$$

Notes

The diagnostic is computed by:

$\hat{R} = \frac{\hat{V}}{W}$

where $$W$$ is the within-chain variance and $$\hat{V}$$ is the posterior variance estimate for the pooled traces. This is the potential scale reduction factor, which converges to unity when each of the traces is a sample from the target posterior. Values greater than one indicate that one or more chains have not yet converged.

References

Brooks and Gelman (1998) Gelman and Rubin (1992)

pymc3.diagnostics.effective_n(mtrace)

Returns estimate of the effective sample size of a set of traces.

Parameters: mtrace : MultiTrace A MultiTrace object containing parallel traces (minimum 2) of one or more stochastic parameters. n_eff : float Return the effective sample size, $$\hat{n}_{eff}$$

Notes

The diagnostic is computed by:

$\hat{n}_{eff} = \frac{mn}{1 + 2 \sum_{t=1}^T \hat{\rho}_t}$

where $$\hat{\rho}_t$$ is the estimated autocorrelation at lag t, and T is the first odd positive integer for which the sum $$\hat{\rho}_{T+1} + \hat{\rho}_{T+1}$$ is negative.

References

Gelman et al. (2014)

## Inference¶

### Sampling¶

pymc3.sampling.sample(draws, step=None, init='advi', n_init=200000, start=None, trace=None, chain=0, njobs=1, tune=None, progressbar=True, model=None, random_seed=-1)

Draw a number of samples using the given step method. Multiple step methods supported via compound step method returns the amount of time taken.

Parameters: draws : int The number of samples to draw. step : function or iterable of functions A step function or collection of functions. If no step methods are specified, or are partially specified, they will be assigned automatically (defaults to None). init : str {‘ADVI’, ‘ADVI_MAP’, ‘MAP’, ‘NUTS’, None} Initialization method to use. * ADVI : Run ADVI to estimate starting points and diagonal covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean. * ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point. * MAP : Use the MAP as starting point. * NUTS : Run NUTS to estimate starting points and covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean. * None : Do not initialize. n_init : int Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘nuts’, number of draws. start : dict Starting point in parameter space (or partial point) Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not (defaults to empty dict). trace : backend, list, or MultiTrace This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used. Passing either “text” or “sqlite” is taken as a shortcut to set up the corresponding backend (with “mcmc” used as the base name). chain : int Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here. njobs : int Number of parallel jobs to start. If None, set to number of cpus in the system - 2. tune : int Number of iterations to tune, if applicable (defaults to None) progressbar : bool Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA). model : Model (optional if in with context) random_seed : int or list of ints A list is accepted if more if njobs is greater than one. MultiTrace object with access to sampling values
pymc3.sampling.iter_sample(draws, step, start=None, trace=None, chain=0, tune=None, model=None, random_seed=-1)

Generator that returns a trace on each iteration using the given step method. Multiple step methods supported via compound step method returns the amount of time taken.

Parameters: draws : int The number of samples to draw step : function Step function start : dict Starting point in parameter space (or partial point) Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not (defaults to empty dict) trace : backend, list, or MultiTrace This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used. chain : int Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here. tune : int Number of iterations to tune, if applicable (defaults to None) model : Model (optional if in with context) random_seed : int or list of ints A list is accepted if more if njobs is greater than one.
pymc3.sampling.sample_ppc(trace, samples=None, model=None, vars=None, size=None, random_seed=None, progressbar=True)

Generate posterior predictive samples from a model given a trace.

Parameters: trace : backend, list, or MultiTrace Trace generated from MCMC sampling samples : int Number of posterior predictive samples to generate. Defaults to the length of trace model : Model (optional if in with context) Model used to generate trace vars : iterable Variables for which to compute the posterior predictive samples. Defaults to model.observed_RVs. size : int The number of random draws from the distribution specified by the parameters in each sample of the trace. Dictionary keyed by vars, where the values are the corresponding posterior predictive samples.
pymc3.sampling.init_nuts(init='ADVI', njobs=1, n_init=500000, model=None, random_seed=-1, **kwargs)

Initialize and sample from posterior of a continuous model.

This is a convenience function. NUTS convergence and sampling speed is extremely dependent on the choice of mass/scaling matrix. In our experience, using ADVI to estimate a diagonal covariance matrix and using this as the scaling matrix produces robust results over a wide class of continuous models.

Parameters: init : str {‘ADVI’, ‘ADVI_MAP’, ‘MAP’, ‘NUTS’} Initialization method to use. * ADVI : Run ADVI to estimate posterior mean and diagonal covariance matrix. * ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point. * MAP : Use the MAP as starting point. * NUTS : Run NUTS and estimate posterior mean and covariance matrix. njobs : int Number of parallel jobs to start. n_init : int Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘metropolis’, number of draws. model : Model (optional if in with context) **kwargs : keyword arguments Extra keyword arguments are forwarded to pymc3.NUTS. start, nuts_sampler start : pymc3.model.Point Starting point for sampler nuts_sampler : pymc3.step_methods.NUTS Instantiated and initialized NUTS sampler object

### Step-methods¶

#### Metropolis¶

class pymc3.step_methods.metropolis.Metropolis(vars=None, S=None, proposal_dist=<class 'pymc3.step_methods.metropolis.NormalProposal'>, scaling=1.0, tune=True, tune_interval=100, model=None, mode=None, **kwargs)

Metropolis-Hastings sampling step

Parameters: vars : list List of variables for sampler S : standard deviation or covariance matrix Some measure of variance to parameterize proposal distribution proposal_dist : function Function that returns zero-mean deviates when parameterized with S (and n). Defaults to normal. scaling : scalar or array Initial scale factor for proposal. Defaults to 1. tune : bool Flag for tuning. Defaults to True. tune_interval : int The frequency of tuning. Defaults to 100 iterations. model : PyMC Model Optional model for sampling step. Defaults to None (taken from context). mode : string or Mode instance. compilation mode passed to Theano functions
class pymc3.step_methods.metropolis.BinaryMetropolis(vars, scaling=1.0, tune=True, tune_interval=100, model=None)

Metropolis-Hastings optimized for binary variables

Parameters: vars : list List of variables for sampler scaling : scalar or array Initial scale factor for proposal. Defaults to 1. tune : bool Flag for tuning. Defaults to True. tune_interval : int The frequency of tuning. Defaults to 100 iterations. model : PyMC Model Optional model for sampling step. Defaults to None (taken from context).
static competence(var)

BinaryMetropolis is only suitable for binary (bool) and Categorical variables with k=1.

class pymc3.step_methods.metropolis.BinaryGibbsMetropolis(vars, order='random', model=None)

A Metropolis-within-Gibbs step method optimized for binary variables

static competence(var)

BinaryMetropolis is only suitable for Bernoulli and Categorical variables with k=2.

class pymc3.step_methods.metropolis.CategoricalGibbsMetropolis(vars, proposal='uniform', order='random', model=None)

A Metropolis-within-Gibbs step method optimized for categorical variables. This step method works for Bernoulli variables as well, but it is not optimized for them, like BinaryGibbsMetropolis is. Step method supports two types of proposals: A uniform proposal and a proportional proposal, which was introduced by Liu in his 1996 technical report “Metropolized Gibbs Sampler: An Improvement”.

static competence(var)

CategoricalGibbsMetropolis is only suitable for Bernoulli and Categorical variables.

#### Slice¶

class pymc3.step_methods.slicer.Slice(vars=None, w=1.0, tune=True, model=None, **kwargs)

Univariate slice sampler step method

Parameters: vars : list List of variables for sampler. w : float Initial width of slice (Defaults to 1). tune : bool Flag for tuning (Defaults to True). model : PyMC Model Optional model for sampling step. Defaults to None (taken from context).

### Variational¶

1. 2016, John Salvatier & Taku Yoshioka
pymc3.variational.advi.advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False, optimizer=None, learning_rate=0.001, epsilon=0.1, mode=None, random_seed=None)

Perform automatic differentiation variational inference (ADVI).

This function implements the meanfield ADVI, where the variational posterior distribution is assumed to be spherical Gaussian without correlation of parameters and fit to the true posterior distribution. The means and standard deviations of the variational posterior are referred to as variational parameters.

The return value of this function is an ADVIfit object, which has variational parameters. If you want to draw samples from the variational posterior, you need to pass the ADVIfit object to pymc3.variational.sample_vp().

The variational parameters are defined on the transformed space, which is required to do ADVI on an unconstrained parameter space as described in [KTR+2016]. The parameters in the ADVIfit object are in the transformed space, while traces returned by sample_vp() are in the original space as obtained by MCMC sampling methods in PyMC3.

The variational parameters are optimized with given optimizer, which is a function that returns a dictionary of parameter updates as provided to Theano function. If no optimizer is provided, optimization is performed with a modified version of adagrad, where only the last (n_window) gradient vectors are used to control the learning rate and older gradient vectors are ignored. n_window denotes the size of time window and fixed to 10.

Parameters: vars : object Random variables. start : Dict or None Initial values of parameters (variational means). model : Model Probabilistic model. n : int Number of interations updating parameters. accurate_elbo : bool If true, 100 MC samples are used for accurate calculation of ELBO. optimizer : (loss, tensor) -> dict or OrderedDict A function that returns parameter updates given loss and parameter tensor. If None (default), a default Adagrad optimizer is used with parameters learning_rate and epsilon below. learning_rate: float Base learning rate for adagrad. This parameter is ignored when optimizer is given. epsilon : float Offset in denominator of the scale of learning rate in Adagrad. This parameter is ignored when optimizer is given. random_seed : int or None Seed to initialize random state. None uses current seed. mode : string or Mode instance. Compilation mode passed to Theano functions ADVIFit Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’. ‘means’ is the mean. ‘stds’ is the standard deviation. ‘elbo_vals’ is the trace of ELBO values during optimizaiton.

References

 [KTR+2016] Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv preprint arXiv:1603.00788.
pymc3.variational.advi.sample_vp(vparams, draws=1000, model=None, local_RVs=None, random_seed=None, hide_transformed=True, progressbar=True)

Draw samples from variational posterior.

Parameters: vparams : dict or pymc3.variational.ADVIFit Estimated variational parameters of the model. draws : int Number of random samples. model : pymc3.Model Probabilistic model. random_seed : int or None Seed of random number generator. None to use current seed. hide_transformed : bool If False, transformed variables are also sampled. Default is True. trace : pymc3.backends.base.MultiTrace Samples drawn from the variational posterior.

pymc3.variational.advi_minibatch.advi_minibatch(vars=None, start=None, model=None, n=5000, n_mcsamples=1, minibatch_RVs=None, minibatch_tensors=None, minibatches=None, global_RVs=None, local_RVs=None, observed_RVs=None, encoder_params=None, total_size=None, optimizer=None, learning_rate=0.001, epsilon=0.1, random_seed=None, mode=None)

This function implements a mini-batch automatic differentiation variational inference (ADVI; Kucukelbir et al., 2015) with the meanfield approximation. Autoencoding variational Bayes (AEVB; Kingma and Welling, 2014) is also supported.

For explanation, we classify random variables in probabilistic models into three types. Observed random variables $${\cal Y}=\{\mathbf{y}_{i}\}_{i=1}^{N}$$ are $$N$$ observations. Each $$\mathbf{y}_{i}$$ can be a set of observed random variables, i.e., $$\mathbf{y}_{i}=\{\mathbf{y}_{i}^{k}\}_{k=1}^{V_{o}}$$, where $$V_{k}$$ is the number of the types of observed random variables in the model.

The next ones are global random variables $$\Theta=\{\theta^{k}\}_{k=1}^{V_{g}}$$, which are used to calculate the probabilities for all observed samples.

The last ones are local random variables $${\cal Z}=\{\mathbf{z}_{i}\}_{i=1}^{N}$$, where $$\mathbf{z}_{i}=\{\mathbf{z}_{i}^{k}\}_{k=1}^{V_{l}}$$. These RVs are used only in AEVB.

The goal of ADVI is to approximate the posterior distribution $$p(\Theta,{\cal Z}|{\cal Y})$$ by variational posterior $$q(\Theta)\prod_{i=1}^{N}q(\mathbf{z}_{i})$$. All of these terms are normal distributions (mean-field approximation).

$$q(\Theta)$$ is parametrized with its means and standard deviations. These parameters are denoted as $$\gamma$$. While $$\gamma$$ is a constant, the parameters of $$q(\mathbf{z}_{i})$$ are dependent on each observation. Therefore these parameters are denoted as $$\xi(\mathbf{y}_{i}; \nu)$$, where $$\nu$$ is the parameters of $$\xi(\cdot)$$. For example, $$\xi(\cdot)$$ can be a multilayer perceptron or convolutional neural network.

In addition to $$\xi(\cdot)$$, we can also include deterministic mappings for the likelihood of observations. We denote the parameters of the deterministic mappings as $$\eta$$. An example of such mappings is the deconvolutional neural network used in the convolutional VAE example in the PyMC3 notebook directory.

This function maximizes the evidence lower bound (ELBO) $${\cal L}(\gamma, \nu, \eta)$$ defined as follows:

$\begin{split}{\cal L}(\gamma,\nu,\eta) & = \mathbf{c}_{o}\mathbb{E}_{q(\Theta)}\left[ \sum_{i=1}^{N}\mathbb{E}_{q(\mathbf{z}_{i})}\left[ \log p(\mathbf{y}_{i}|\mathbf{z}_{i},\Theta,\eta) \right]\right] \\ & - \mathbf{c}_{g}KL\left[q(\Theta)||p(\Theta)\right] - \mathbf{c}_{l}\sum_{i=1}^{N} KL\left[q(\mathbf{z}_{i})||p(\mathbf{z}_{i})\right],\end{split}$

where $$KL[q(v)||p(v)]$$ is the Kullback-Leibler divergence

$KL[q(v)||p(v)] = \int q(v)\log\frac{q(v)}{p(v)}dv,$

$$\mathbf{c}_{o/g/l}$$ are vectors for weighting each term of ELBO. More precisely, we can write each of the terms in ELBO as follows:

$\begin{split}\mathbf{c}_{o}\log p(\mathbf{y}_{i}|\mathbf{z}_{i},\Theta,\eta) & = & \sum_{k=1}^{V_{o}}c_{o}^{k} \log p(\mathbf{y}_{i}^{k}| {\rm pa}(\mathbf{y}_{i}^{k},\Theta,\eta)) \\ \mathbf{c}_{g}KL\left[q(\Theta)||p(\Theta)\right] & = & \sum_{k=1}^{V_{g}}c_{g}^{k}KL\left[ q(\theta^{k})||p(\theta^{k}|{\rm pa(\theta^{k})})\right] \\ \mathbf{c}_{l}KL\left[q(\mathbf{z}_{i}||p(\mathbf{z}_{i})\right] & = & \sum_{k=1}^{V_{l}}c_{l}^{k}KL\left[ q(\mathbf{z}_{i}^{k})|| p(\mathbf{z}_{i}^{k}|{\rm pa}(\mathbf{z}_{i}^{k}))\right],\end{split}$

where $${\rm pa}(v)$$ denotes the set of parent variables of $$v$$ in the directed acyclic graph of the model.

When using mini-batches, $$c_{o}^{k}$$ and $$c_{l}^{k}$$ should be set to $$N/M$$, where $$M$$ is the number of observations in each mini-batch. Another weighting scheme was proposed in (Blundell et al., 2015) for accelarating model fitting.

For working with ADVI, we need to give the probabilistic model (model), the three types of RVs (observed_RVs, global_RVs and local_RVs), the tensors to which mini-bathced samples are supplied (minibatches) and parameters of deterministic mappings $$\xi$$ and $$\eta$$ (encoder_params) as input arguments.

observed_RVs is a OrderedDict of the form {y_k: c_k}, where y_k is a random variable defined in the PyMC3 model. c_k is a scalar ($$c_{o}^{k}$$) and it can be a shared variable.

global_RVs is a OrderedDict of the form {t_k: c_k}, where t_k is a random variable defined in the PyMC3 model. c_k is a scalar ($$c_{g}^{k}$$) and it can be a shared variable.

local_RVs is a OrderedDict of the form {z_k: ((m_k, s_k), c_k)}, where z_k is a random variable defined in the PyMC3 model. c_k is a scalar ($$c_{l}^{k}$$) and it can be a shared variable. (m_k, s_k) is a pair of tensors of means and log standard deviations of the variational distribution; samples drawn from the variational distribution replaces z_k. It should be noted that if z_k has a transformation that changes the dimension (e.g., StickBreakingTransform), the variational distribution must have the same dimension. For example, if z_k is distributed with Dirichlet distribution with p choices, $$m_k$$ and s_k has the shape (n_samples_in_minibatch, p - 1).

minibatch_tensors is a list of tensors (can be shared variables) to which mini-batch samples are set during the optimization. These tensors are observations (obs=) in observed_RVs.

minibatches is a generator of a list of numpy.ndarray. Each item of the list will be set to tensors in minibatch_tensors.

encoder_params is a list of shared variables of the parameters $$\nu$$ and $$\eta$$. We do not need to include the variational parameters of the global variables, $$\gamma$$, because these are automatically created and updated in this function.

The following is a list of example notebooks using advi_minibatch:

Parameters: vars : object List of random variables. If None, variational posteriors (normal distribution) are fit for all RVs in the given model. start : Dict or None Initial values of parameters (variational means). model : Model Probabilistic model. n : int Number of iterations updating parameters. n_mcsamples : int Number of Monte Carlo samples to approximate ELBO. minibatch_RVs : list of ObservedRVs Random variables in the model for which mini-batch tensors are set. When this argument is given, both of arguments local_RVs and observed_RVs must be None. minibatch_tensors : list of (tensors or shared variables) Tensors used to create ObservedRVs in minibatch_RVs. minibatches : generator of list Generates a set of minibatches when calling next(). The length of the returned list must be the same with the number of random variables in minibatch_tensors. total_size : int Total size of training samples. This is used to appropriately scale the log likelihood terms corresponding to mini-batches in ELBO. observed_RVs : Ordered dict Include a scaling constant for the corresponding RV. See the above description. global_RVs : Ordered dict or None Include a scaling constant for the corresponding RV. See the above description. If None, it is set to {v: 1 for v in grvs}, where grvs is list(set(vars) - set(list(local_RVs) + list(observed_RVs))). local_RVs : Ordered dict or None Include encoded variational parameters and a scaling constant for the corresponding RV. See the above description. encoder_params : list of theano shared variables Parameters of encoder. optimizer : (loss, list of shared variables) -> dict or OrderedDict A function that returns parameter updates given loss and shared variables of parameters. If None (default), a default Adagrad optimizer is used with parameters learning_rate and epsilon below. learning_rate: float Base learning rate for adagrad. This parameter is ignored when optimizer is set. epsilon : float Offset in denominator of the scale of learning rate in Adagrad. This parameter is ignored when optimizer is set. random_seed : int Seed to initialize random state. ADVIFit Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’.

References

• Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. stat, 1050, 1.
• Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568-576).
• Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 1613-1622).

## Backends¶

Backends for traces

### Available backends¶

1. NumPy array (pymc3.backends.NDArray)
2. Text files (pymc3.backends.Text)
3. SQLite (pymc3.backends.SQLite)

The NDArray backend holds the entire trace in memory, whereas the Text and SQLite backends store the values while sampling.

### Selecting a backend¶

By default, a NumPy array is used as the backend. To specify a different backend, pass a backend instance to sample.

For example, the following would save the sampling values to CSV files in the directory ‘test’.

>>> import pymc3 as pm
>>> db = pm.backends.Text('test')
>>> trace = pm.sample(..., trace=db)


### Selecting values from a backend¶

After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.

>>> trace['x']  # or trace.x or trace[x]


The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the njobs argument.)

To discard the first N values of each chain, slicing syntax can be used.

>>> trace['x', 1000:]


The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.

>>> trace.get_values('x', burn=1000, combine=False)


The chains parameter of get_values can be used to limit the chains that are retrieved.

>>> trace.get_values('x', burn=1000, chains=[0, 2])


MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.

>>> sliced_trace = trace[1000:]


The backend for the new trace is always NDArray, regardless of the type of original trace. Only the NDArray backend supports a stop value in the slice.

Saved backends can be loaded using load function in the module for the specific backend.

>>> trace = pm.backends.text.load('test')


### Writing custom backends¶

Backends consist of a class that handles sampling storage and value selection. Three sampling methods of backend will be called:

• setup: Before sampling is started, the setup method will be called with two arguments: the number of draws and the chain number. This is useful setting up any structure for storing the sampling values that require the above information.
• record: Record the sampling results for the current draw. This method will be called with a dictionary of values mapped to the variable names. This is the only sampling function that must do something to have a meaningful backend.
• close: This method is called following sampling and should perform any actions necessary for finalizing and cleaning up the backend.

The base storage class backends.base.BaseTrace provides common model setup that is used by all the PyMC backends.

Several selection methods must also be defined:

• get_values: This is the core method for selecting values from the backend. It can be called directly and is used by __getitem__ when the backend is indexed with a variable name or object.
• _slice: Defines how the backend returns a slice of itself. This is called if the backend is indexed with a slice range.
• point: Returns values for each variable at a single iteration. This is called if the backend is indexed with a single integer.
• __len__: This should return the number of draws.

When pymc3.sample finishes, it wraps all trace objects in a MultiTrace object that provides a consistent selection interface for all backends. If the traces are stored on disk, then a load function should also be defined that returns a MultiTrace object.

For specific examples, see pymc3.backends.{ndarray,text,sqlite}.py.

### ndarray¶

NumPy array trace backend

Store sampling values in memory as a NumPy array.

class pymc3.backends.ndarray.NDArray(name=None, model=None, vars=None)

NDArray trace object

Parameters: name : str Name of backend. This has no meaning for the NDArray backend. model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variable names as keys.

record(point, sampler_stats=None)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain, sampler_vars=None)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number sampler_vars : list of dicts Names and dtypes of the variables that are exported by the samplers.

### sqlite¶

SQLite trace backend

Store and retrieve sampling values in SQLite database file.

#### Database format¶

For each variable, a table is created with the following format:

recid (INT), draw (INT), chain (INT), v0 (FLOAT), v1 (FLOAT), v2 (FLOAT) ...

The variable column names are extended to reflect additional dimensions. For example, a variable with the shape (2, 2) would be stored as

key (INT), draw (INT), chain (INT), v0_0 (FLOAT), v0_1 (FLOAT), v1_0 (FLOAT) ...

The key is autoincremented each time a new row is added to the table. The chain column denotes the chain index and starts at 0.

class pymc3.backends.sqlite.SQLite(name, model=None, vars=None)

SQLite trace object

Parameters: name : str Name of database file model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number
pymc3.backends.sqlite.load(name, model=None)

Parameters: name : str Path to SQLite database file model : Model If None, the model is taken from the with context. A MultiTrace instance

### text¶

Text file trace backend

Store sampling values as CSV files.

#### File format¶

Sampling values for each chain are saved in a separate file (under a directory specified by the name argument). The rows correspond to sampling iterations. The column names consist of variable names and index labels. For example, the heading

x,y__0_0,y__0_1,y__1_0,y__1_1,y__2_0,y__2_1

represents two variables, x and y, where x is a scalar and y has a shape of (3, 2).

class pymc3.backends.text.Text(name, model=None, vars=None)

Text trace object

Parameters: name : str Name of directory to store text files model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number
pymc3.backends.text.dump(name, trace, chains=None)

Store values from NDArray trace as CSV files.

Parameters: name : str Name of directory to store CSV files in trace : MultiTrace of NDArray traces Result of MCMC run with default NDArray backend chains : list Chains to dump. If None, all chains are dumped.
pymc3.backends.text.load(name, model=None)

Parameters: name : str Name of directory with files (one per chain) model : Model If None, the model is taken from the with context. A MultiTrace instance

### tracetab¶

Functions for converting traces into a table-like format

pymc3.backends.tracetab.trace_to_dataframe(trace, chains=None, varnames=None, hide_transformed_vars=True)

Convert trace to Pandas DataFrame.

Parameters: trace : NDarray trace chains : int or list of ints Chains to include. If None, all chains are used. A single chain value can also be given. varnames : list of variable names Variables to be included in the DataFrame, if None all variable are included. hide_transformed_vars: boolean If true transformed variables will not be included in the resulting DataFrame.

## GP¶

 ExpQuad(input_dim, lengthscales[, active_dims]) The exponentiated quadratic kernel. RatQuad(input_dim, lengthscales, alpha[, ...]) The rational quadratic kernel. Matern32(input_dim, lengthscales[, active_dims]) The Matern kernel with nu = 3/2. Matern52(input_dim, lengthscales[, active_dims]) The Matern kernel with nu = 5/2. Exponential(input_dim, lengthscales[, ...]) The Exponential kernel. Cosine(input_dim, lengthscales[, active_dims]) The cosine kernel. Linear(input_dim, c[, active_dims]) The linear kernel. Polynomial(input_dim, c, d, offset[, ...]) The polynomial covariance function. WarpedInput(input_dim, cov_func, warp_func) Warp the inputs of any covariance function using an arbitrary function defined using Theano.
class pymc3.gp.cov.ExpQuad(input_dim, lengthscales, active_dims=None)

The exponentiated quadratic kernel. Also refered to as the squared exponential, or radial basis function kernel.

$k(x, x') = \mathrm{exp}\left[ -\frac{(x - x')^2}{2 \ell^2} \right]$
class pymc3.gp.cov.RatQuad(input_dim, lengthscales, alpha, active_dims=None)

$k(x, x') = \left(1 + \frac{(x - x')^2}{2\alpha\ell^2} \right)^{-\alpha}$
class pymc3.gp.cov.Exponential(input_dim, lengthscales, active_dims=None)

The Exponential kernel.

$k(x, x') = \mathrm{exp}\left[ -\frac{||x - x'||}{2\ell^2} \right]$
class pymc3.gp.cov.Matern52(input_dim, lengthscales, active_dims=None)

The Matern kernel with nu = 5/2.

$k(x, x') = \left(1 + \frac{\sqrt{5(x - x')^2}}{\ell} + \frac{5(x-x')^2}{3\ell^2}\right) \mathrm{exp}\left[ - \frac{\sqrt{5(x - x')^2}}{\ell} \right]$
class pymc3.gp.cov.Matern32(input_dim, lengthscales, active_dims=None)

The Matern kernel with nu = 3/2.

$k(x, x') = \left(1 + \frac{\sqrt{3(x - x')^2}}{\ell}\right)\mathrm{exp}\left[ - \frac{\sqrt{3(x - x')^2}}{\ell} \right]$
class pymc3.gp.cov.Linear(input_dim, c, active_dims=None)

The linear kernel.

$k(x, x') = (x - c)(x' - c)$
class pymc3.gp.cov.Polynomial(input_dim, c, d, offset, active_dims=None)

The polynomial covariance function.

$k(x, x') = [(x - c)(x' - c) + \mathrm{offset}]^{d}$
class pymc3.gp.cov.Cosine(input_dim, lengthscales, active_dims=None)

The cosine kernel.

$k(x, x') = \mathrm{cos}\left( \frac{||x - x'||}{ \ell^2} \right)$
class pymc3.gp.cov.WarpedInput(input_dim, cov_func, warp_func, args=None, active_dims=None)

Warp the inputs of any covariance function using an arbitrary function defined using Theano.

$k_{\mathrm{warped}}(x, x') = k(w(x), w(x'))$
Parameters: cov_func : Covariance warp_func : function Theano function of X and additional optional arguments. args : optional, tuple or list of scalars or PyMC3 variables Additional inputs (besides X or Z) to warp_func.

## Math¶

This submodule contains various mathematical functions. Most of them are imported directly from theano.tensor (see there for more details). Doing any kind of math with PyMC3 random variables, or defining custom likelihoods or priors requires you to use these theano expressions rather than NumPy or Python code.

 dot(a, b) Computes the dot product of two variables. constant(x[, name, ndim, dtype]) flatten(x[, outdim]) Reshapes the variable x by keeping the first outdim-1 dimension size(s) of x the same, and making the last dimension size of x equal to the multiplication of its remaining dimension size(s). zeros_like(model[, dtype, opt]) equivalent of numpy.zeros_like ones_like(model[, dtype, opt]) equivalent of numpy.ones_like stack(\*tensors, \*\*kwargs) Stack tensors in sequence on given axis (default is 0). concatenate(tensor_list[, axis]) Alias for join(axis, *tensor_list). sum(input[, axis, dtype, keepdims, acc_dtype]) Computes the sum along the given axis(es) of a tensor input. prod(input[, axis, dtype, keepdims, ...]) Computes the product along the given axis(es) of a tensor input. lt a < b gt a > b le a <= b ge a >= b eq a == b neq a != b switch if cond then ift else iff clip Clip x to be between min and max. where if cond then ift else iff and_ bitwise a & b or_ bitwise a | b abs_ |a| exp e^a log base e logarithm of a cos cosine of a sin sine of a tan tangent of a cosh hyperbolic cosine of a sinh hyperbolic sine of a tanh hyperbolic tangent of a sqr square of a sqrt square root of a erf error function erfinv inverse error function dot(a, b) Computes the dot product of two variables. maximum elemwise maximum. See max for the maximum in one tensor minimum elemwise minimum. See min for the minimum in one tensor sgn sign of a ceil ceiling of a floor floor of a det Matrix determinant. matrix_inverse Computes the inverse of a matrix $$A$$. extract_diag Return the diagonal of a matrix. matrix_dot(\*args) Shorthand for product between several dots. trace(X) Returns the sum of diagonal elements of matrix X. sigmoid Generalizes a scalar op to tensors. logsumexp(x[, axis]) invlogit(x[, eps]) logit(p)