API Reference

Distributions

Continuous

Uniform([lower, upper, transform]) Continuous uniform log-likelihood.
Flat(\*args, \*\*kwargs) Uninformative log-likelihood that returns 0 regardless of the passed value.
Normal([mu, sd, tau]) Univariate normal log-likelihood.
Beta([alpha, beta, mu, sd]) Beta log-likelihood.
Exponential(lam, \*args, \*\*kwargs) Exponential log-likelihood.
Laplace(mu, b, \*args, \*\*kwargs) Laplace log-likelihood.
StudentT(nu[, mu, lam, sd]) Non-central Student’s T log-likelihood.
Cauchy(alpha, beta, \*args, \*\*kwargs) Cauchy log-likelihood.
HalfCauchy(beta, \*args, \*\*kwargs) Half-Cauchy log-likelihood.
Gamma([alpha, beta, mu, sd]) Gamma log-likelihood.
Weibull(alpha, beta, \*args, \*\*kwargs) Weibull log-likelihood.
StudentTpos(\*args, \*\*kwargs)
Lognormal([mu, sd, tau]) Log-normal log-likelihood.
ChiSquared(nu, \*args, \*\*kwargs) \(\chi^2\) log-likelihood.
HalfNormal([sd, tau]) Half-normal log-likelihood.
Wald([mu, lam, phi, alpha]) Wald log-likelihood.
Pareto(alpha, m, \*args, \*\*kwargs) Pareto log-likelihood.
InverseGamma(alpha[, beta]) Inverse gamma log-likelihood, the reciprocal of the gamma distribution.
ExGaussian(mu, sigma, nu, \*args, \*\*kwargs) Exponentially modified Gaussian log-likelihood.

pymc3.distributions

A collection of common probability distributions for stochastic nodes in PyMC.

class pymc3.distributions.continuous.Uniform(lower=0, upper=1, transform='interval', *args, **kwargs)

Continuous uniform log-likelihood.

\[f(x \mid lower, upper) = \frac{1}{upper-lower}\]
Support \(x \in [lower, upper]\)
Mean \(\dfrac{lower + upper}{2}\)
Variance \(\dfrac{(upper - lower)^2}{12}\)
Parameters:

lower : float

Lower limit.

upper : float

Upper limit.

class pymc3.distributions.continuous.Flat(*args, **kwargs)

Uninformative log-likelihood that returns 0 regardless of the passed value.

class pymc3.distributions.continuous.Normal(mu=0, sd=None, tau=None, **kwargs)

Univariate normal log-likelihood.

\[f(x \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \exp\left\{ -\frac{\tau}{2} (x-\mu)^2 \right\}\]
Support \(x \in \mathbb{R}\)
Mean \(\mu\)
Variance \(\dfrac{1}{\tau}\) or \(\sigma^2\)

Normal distribution can be parameterized either in terms of precision or standard deviation. The link between the two parametrizations is given by

\[\tau = \dfrac{1}{\sigma^2}\]
Parameters:

mu : float

Mean.

sd : float

Standard deviation (sd > 0).

tau : float

Precision (tau > 0).

class pymc3.distributions.continuous.Beta(alpha=None, beta=None, mu=None, sd=None, *args, **kwargs)

Beta log-likelihood.

\[f(x \mid \alpha, \beta) = \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}\]
Support \(x \in (0, 1)\)
Mean \(\dfrac{\alpha}{\alpha + \beta}\)
Variance \(\dfrac{\alpha \beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\)

Beta distribution can be parameterized either in terms of alpha and beta or mean and standard deviation. The link between the two parametrizations is given by

\[ \begin{align}\begin{aligned}\begin{split}\alpha &= \mu \kappa \\ \beta &= (1 - \mu) \kappa\end{split}\\\text{where } \kappa = \frac{\mu(1-\mu)}{\sigma^2} - 1\end{aligned}\end{align} \]
Parameters:

alpha : float

alpha > 0.

beta : float

beta > 0.

mu : float

Alternative mean (0 < mu < 1).

sd : float

Alternative standard deviation (sd > 0).

Notes

Beta distribution is a conjugate prior for the parameter \(p\) of the binomial distribution.

class pymc3.distributions.continuous.Exponential(lam, *args, **kwargs)

Exponential log-likelihood.

\[f(x \mid \lambda) = \lambda \exp\left\{ -\lambda x \right\}\]
Support \(x \in [0, \infty)\)
Mean \(\dfrac{1}{\lambda}\)
Variance \(\dfrac{1}{\lambda^2}\)
Parameters:

lam : float

Rate or inverse scale (lam > 0)

class pymc3.distributions.continuous.Laplace(mu, b, *args, **kwargs)

Laplace log-likelihood.

\[f(x \mid \alpha, \beta) = \frac{1}{2b} \exp \left\{ - \frac{|x - \mu|}{b} \right\}\]
Support \(x \in \mathbb{R}\)
Mean \(\mu\)
Variance \(2 b^2\)
Parameters:

mu : float

Location parameter.

b : float

Scale parameter (b > 0).

class pymc3.distributions.continuous.StudentT(nu, mu=0, lam=None, sd=None, *args, **kwargs)

Non-central Student’s T log-likelihood.

Describes a normal variable whose precision is gamma distributed. If only nu parameter is passed, this specifies a standard (central) Student’s T.

\[f(x|\mu,\lambda,\nu) = \frac{\Gamma(\frac{\nu + 1}{2})}{\Gamma(\frac{\nu}{2})} \left(\frac{\lambda}{\pi\nu}\right)^{\frac{1}{2}} \left[1+\frac{\lambda(x-\mu)^2}{\nu}\right]^{-\frac{\nu+1}{2}}\]
Support \(x \in \mathbb{R}\)
Parameters:

nu : int

Degrees of freedom (nu > 0).

mu : float

Location parameter.

lam : float

Scale parameter (lam > 0).

class pymc3.distributions.continuous.Cauchy(alpha, beta, *args, **kwargs)

Cauchy log-likelihood.

Also known as the Lorentz or the Breit-Wigner distribution.

\[f(x \mid \alpha, \beta) = \frac{1}{\pi \beta [1 + (\frac{x-\alpha}{\beta})^2]}\]
Support \(x \in \mathbb{R}\)
Mode \(\alpha\)
Mean undefined
Variance undefined
Parameters:

alpha : float

Location parameter

beta : float

Scale parameter > 0

class pymc3.distributions.continuous.HalfCauchy(beta, *args, **kwargs)

Half-Cauchy log-likelihood.

\[f(x \mid \beta) = \frac{2}{\pi \beta [1 + (\frac{x}{\beta})^2]}\]
Support \(x \in \mathbb{R}\)
Mode 0
Mean undefined
Variance undefined
Parameters:

beta : float

Scale parameter (beta > 0).

class pymc3.distributions.continuous.Gamma(alpha=None, beta=None, mu=None, sd=None, *args, **kwargs)

Gamma log-likelihood.

Represents the sum of alpha exponentially distributed random variables, each of which has mean beta.

\[f(x \mid \alpha, \beta) = \frac{\beta^{\alpha}x^{\alpha-1}e^{-\beta x}}{\Gamma(\alpha)}\]
Support \(x \in (0, \infty)\)
Mean \(\dfrac{\alpha}{\beta}\)
Variance \(\dfrac{\alpha}{\beta^2}\)

Gamma distribution can be parameterized either in terms of alpha and beta or mean and standard deviation. The link between the two parametrizations is given by

\[\begin{split}\alpha &= \frac{\mu^2}{\sigma^2} \\ \beta &= \frac{\mu}{\sigma^2}\end{split}\]
Parameters:

alpha : float

Shape parameter (alpha > 0).

beta : float

Rate parameter (beta > 0).

mu : float

Alternative shape parameter (mu > 0).

sd : float

Alternative scale parameter (sd > 0).

class pymc3.distributions.continuous.Weibull(alpha, beta, *args, **kwargs)

Weibull log-likelihood.

\[f(x \mid \alpha, \beta) = \frac{\alpha x^{\alpha - 1} \exp(-(\frac{x}{\beta})^{\alpha})}{\beta^\alpha}\]
Support \(x \in [0, \infty)\)
Mean \(\beta \Gamma(1 + \frac{1}{\alpha})\)
Variance \(\beta^2 \Gamma(1 + \frac{2}{\alpha} - \mu^2)\)
Parameters:

alpha : float

Shape parameter (alpha > 0).

beta : float

Scale parameter (beta > 0).

class pymc3.distributions.continuous.Lognormal(mu=0, sd=None, tau=None, *args, **kwargs)

Log-normal log-likelihood.

Distribution of any random variable whose logarithm is normally distributed. A variable might be modeled as log-normal if it can be thought of as the multiplicative product of many small independent factors.

\[f(x \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \frac{\exp\left\{ -\frac{\tau}{2} (\ln(x)-\mu)^2 \right\}}{x}\]
Support \(x \in (0, 1)\)
Mean \(\exp\{\mu + \frac{1}{2\tau}\}\)
Variance \(\exp\{\frac{1}{\tau} - 1\} \exp\{2\mu + \frac{1}{\tau}\}\)
Parameters:

mu : float

Location parameter.

tau : float

Scale parameter (tau > 0).

class pymc3.distributions.continuous.ChiSquared(nu, *args, **kwargs)

\(\chi^2\) log-likelihood.

\[f(x \mid \nu) = \frac{x^{(\nu-2)/2}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}\]
Support \(x \in [0, \infty)\)
Mean \(\nu\)
Variance \(2 \nu\)
Parameters:

nu : int

Degrees of freedom (nu > 0).

class pymc3.distributions.continuous.HalfNormal(sd=None, tau=None, *args, **kwargs)

Half-normal log-likelihood.

\[f(x \mid \tau) = \sqrt{\frac{2\tau}{\pi}} \exp\left\{ {\frac{-x^2 \tau}{2}}\right\}\]
Support \(x \in [0, \infty)\)
Mean \(0\)
Variance \(\dfrac{1}{\tau}\) or \(\sigma^2\)
Parameters:

sd : float

Standard deviation (sd > 0).

tau : float

Precision (tau > 0).

class pymc3.distributions.continuous.Wald(mu=None, lam=None, phi=None, alpha=0.0, *args, **kwargs)

Wald log-likelihood.

\[f(x \mid \mu, \lambda) = \left(\frac{\lambda}{2\pi)}\right)^{1/2} x^{-3/2} \exp\left\{ -\frac{\lambda}{2x}\left(\frac{x-\mu}{\mu}\right)^2 \right\}\]
Support \(x \in (0, \infty)\)
Mean \(\mu\)
Variance \(\dfrac{\mu^3}{\lambda}\)

Wald distribution can be parameterized either in terms of lam or phi. The link between the two parametrizations is given by

\[\phi = \dfrac{\lambda}{\mu}\]
Parameters:

mu : float, optional

Mean of the distribution (mu > 0).

lam : float, optional

Relative precision (lam > 0).

phi : float, optional

Alternative shape parameter (phi > 0).

alpha : float, optional

Shift/location parameter (alpha >= 0).

Notes

To instantiate the distribution specify any of the following

  • only mu (in this case lam will be 1)
  • mu and lam
  • mu and phi
  • lam and phi

References

[Tweedie1957]Tweedie, M. C. K. (1957). Statistical Properties of Inverse Gaussian Distributions I. The Annals of Mathematical Statistics, Vol. 28, No. 2, pp. 362-377
[Michael1976]Michael, J. R., Schucany, W. R. and Hass, R. W. (1976). Generating Random Variates Using Transformations with Multiple Roots. The American Statistician, Vol. 30, No. 2, pp. 88-90
class pymc3.distributions.continuous.Pareto(alpha, m, *args, **kwargs)

Pareto log-likelihood.

Often used to characterize wealth distribution, or other examples of the 80/20 rule.

\[f(x \mid \alpha, m) = \frac{\alpha m^{\alpha}}{x^{\alpha+1}}\]
Support \(x \in [m, \infty)\)
Mean \(\dfrac{\alpha m}{\alpha - 1}\) for \(\alpha \ge 1\)
Variance \(\dfrac{m \alpha}{(\alpha - 1)^2 (\alpha - 2)}\) for \(\alpha > 2\)
Parameters:

alpha : float

Shape parameter (alpha > 0).

m : float

Scale parameter (m > 0).

class pymc3.distributions.continuous.InverseGamma(alpha, beta=1, *args, **kwargs)

Inverse gamma log-likelihood, the reciprocal of the gamma distribution.

\[f(x \mid \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{-\alpha - 1} \exp\left(\frac{-\beta}{x}\right)\]
Support \(x \in (0, \infty)\)
Mean \(\dfrac{\beta}{\alpha-1}\) for \(\alpha > 1\)
Variance \(\dfrac{\beta^2}{(\alpha-1)^2(\alpha)}\) for \(\alpha > 2\)
Parameters:

alpha : float

Shape parameter (alpha > 0).

beta : float

Scale parameter (beta > 0).

class pymc3.distributions.continuous.ExGaussian(mu, sigma, nu, *args, **kwargs)

Exponentially modified Gaussian log-likelihood.

Results from the convolution of a normal distribution with an exponential distribution.

\[f(x \mid \mu, \sigma, \tau) = \frac{1}{\nu}\; \exp\left\{\frac{\mu-x}{\nu}+\frac{\sigma^2}{2\nu^2}\right\} \Phi\left(\frac{x-\mu}{\sigma}-\frac{\sigma}{\nu}\right)\]

where \(\Phi\) is the cumulative distribution function of the standard normal distribution.

Support \(x \in \mathbb{R}\)
Mean \(\mu + \nu\)
Variance \(\sigma^2 + \nu^2\)
Parameters:

mu : float

Mean of the normal distribution.

sigma : float

Standard deviation of the normal distribution (sigma > 0).

nu : float

Mean of the exponential distribution (nu > 0).

References

[Rigby2005]Rigby R.A. and Stasinopoulos D.M. (2005). “Generalized additive models for location, scale and shape” Applied Statististics., 54, part 3, pp 507-554.
[Lacouture2008]Lacouture, Y. and Couseanou, D. (2008). “How to use MATLAB to fit the ex-Gaussian and other probability functions to a distribution of response times”. Tutorials in Quantitative Methods for Psychology, Vol. 4, No. 1, pp 35-45.
class pymc3.distributions.continuous.VonMises(mu=0.0, kappa=None, transform='circular', *args, **kwargs)

Univariate VonMises log-likelihood.

\[f(x \mid \mu, \kappa) = \frac{e^{\kappa\cos(x-\mu)}}{2\pi I_0(\kappa)}\]

where :I_0 is the modified Bessel function of order 0.

Support \(x \in [-\pi, \pi]\)
Mean \(\mu\)
Variance \(1-\frac{I_1(\kappa)}{I_0(\kappa)}\)
Parameters:

mu : float

Mean.

kappa : float

Concentration (frac{1}{kappa} is analogous to sigma^2).

class pymc3.distributions.continuous.SkewNormal(mu=0.0, sd=None, tau=None, alpha=1, *args, **kwargs)

Univariate skew-normal log-likelihood.

\[f(x \mid \mu, \tau, \alpha) = 2 \Phi((x-\mu)\sqrt{\tau}\alpha) \phi(x,\mu,\tau)\]
Support \(x \in \mathbb{R}\)
Mean \(\mu + \sigma \sqrt{\frac{2}{\pi}} \frac {\alpha }{{\sqrt {1+\alpha ^{2}}}}\)
Variance \(\sigma^2 \left( 1-\frac{2\alpha^2}{(\alpha^2+1) \pi} \right)\)

Skew-normal distribution can be parameterized either in terms of precision or standard deviation. The link between the two parametrizations is given by

\[\tau = \dfrac{1}{\sigma^2}\]
Parameters:

mu : float

Location parameter.

sd : float

Scale parameter (sd > 0).

tau : float

Alternative scale parameter (tau > 0).

alpha : float

Skewness parameter.

Notes

When alpha=0 we recover the Normal distribution and mu becomes the mean, tau the precision and sd the standard deviation. In the limit of alpha approaching plus/minus infinite we get a half-normal distribution.

Discrete

Binomial(n, p, \*args, \*\*kwargs) Binomial log-likelihood.
BetaBinomial(alpha, beta, n, \*args, \*\*kwargs) Beta-binomial log-likelihood.
Bernoulli(p, \*args, \*\*kwargs) Bernoulli log-likelihood
Poisson(mu, \*args, \*\*kwargs) Poisson log-likelihood.
NegativeBinomial(mu, alpha, \*args, \*\*kwargs) Negative binomial log-likelihood.
ConstantDist(\*args, \*\*kwargs)
ZeroInflatedPoisson(theta, psi, \*args, \*\*kwargs) Zero-inflated Poisson log-likelihood.
DiscreteUniform(lower, upper, \*args, \*\*kwargs) Discrete uniform distribution.
Geometric(p, \*args, \*\*kwargs) Geometric log-likelihood.
Categorical(p, \*args, \*\*kwargs) Categorical log-likelihood.
class pymc3.distributions.discrete.Binomial(n, p, *args, **kwargs)

Binomial log-likelihood.

The discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

\[f(x \mid n, p) = \binom{n}{x} p^x (1-p)^{n-x}\]
Support \(x \in \{0, 1, \ldots, n\}\)
Mean \(n p\)
Variance \(n p (1 - p)\)
Parameters:

n : int

Number of Bernoulli trials (n >= 0).

p : float

Probability of success in each trial (0 < p < 1).

class pymc3.distributions.discrete.BetaBinomial(alpha, beta, n, *args, **kwargs)

Beta-binomial log-likelihood.

Equivalent to binomial random variable with success probability drawn from a beta distribution.

\[f(x \mid \alpha, \beta, n) = \binom{n}{x} \frac{B(x + \alpha, n - x + \beta)}{B(\alpha, \beta)}\]
Support \(x \in \{0, 1, \ldots, n\}\)
Mean \(n \dfrac{\alpha}{\alpha + \beta}\)
Variance \(n \dfrac{\alpha \beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}\)
Parameters:

n : int

Number of Bernoulli trials (n >= 0).

alpha : float

alpha > 0.

beta : float

beta > 0.

class pymc3.distributions.discrete.Bernoulli(p, *args, **kwargs)

Bernoulli log-likelihood

The Bernoulli distribution describes the probability of successes (x=1) and failures (x=0).

\[f(x \mid p) = p^{x} (1-p)^{1-x}\]
Support \(x \in \{0, 1\}\)
Mean \(p\)
Variance \(p (1 - p)\)
Parameters:

p : float

Probability of success (0 < p < 1).

class pymc3.distributions.discrete.DiscreteWeibull(q, beta, *args, **kwargs)

Discrete Weibull log-likelihood

The discrete Weibull distribution is a flexible model of count data that can handle both over- and under-dispersion.

\[f(x \mid q, \beta) = q^{x^{\beta}} - q^{(x + 1)^{\beta}}\]
Support \(x \in \mathbb{N}_0\)
Mean \(\mu = \sum_{x = 1}^{\infty} q^{x^{\beta}}\)
Variance \(2 \sum_{x = 1}^{\infty} x q^{x^{\beta}} - \mu - \mu^2\)
class pymc3.distributions.discrete.Poisson(mu, *args, **kwargs)

Poisson log-likelihood.

Often used to model the number of events occurring in a fixed period of time when the times at which events occur are independent.

\[f(x \mid \mu) = \frac{e^{-\mu}\mu^x}{x!}\]
Support \(x \in \mathbb{N}_0\)
Mean \(\mu\)
Variance \(\mu\)
Parameters:

mu : float

Expected number of occurrences during the given interval (mu >= 0).

Notes

The Poisson distribution can be derived as a limiting case of the binomial distribution.

class pymc3.distributions.discrete.NegativeBinomial(mu, alpha, *args, **kwargs)

Negative binomial log-likelihood.

The negative binomial distribution describes a Poisson random variable whose rate parameter is gamma distributed.

\[f(x \mid \mu, \alpha) = \frac{\Gamma(x+\alpha)}{x! \Gamma(\alpha)} (\alpha/(\mu+\alpha))^\alpha (\mu/(\mu+\alpha))^x\]
Support \(x \in \mathbb{N}_0\)
Mean \(\mu\)
Parameters:

mu : float

Poission distribution parameter (mu > 0).

alpha : float

Gamma distribution parameter (alpha > 0).

class pymc3.distributions.discrete.Constant(c, *args, **kwargs)

Constant log-likelihood.

Parameters:

value : float or int

Constant parameter.

class pymc3.distributions.discrete.ZeroInflatedPoisson(theta, psi, *args, **kwargs)

Zero-inflated Poisson log-likelihood.

Often used to model the number of events occurring in a fixed period of time when the times at which events occur are independent.

\[\begin{split}f(x \mid \theta, \psi) = \left\{ \begin{array}{l} (1-\psi) + \psi e^{-\theta}, \text{if } x = 0 \\ \psi \frac{e^{-\theta}\theta^x}{x!}, \text{if } x=1,2,3,\ldots \end{array} \right.\end{split}\]
Support \(x \in \mathbb{N}_0\)
Mean \(\psi\theta\)
Variance \(\theta + \frac{1-\psi}{\psi}\theta^2\)
Parameters:

theta : float

Expected number of occurrences during the given interval (theta >= 0).

psi : float

Expected proportion of Poisson variates (0 < psi < 1)

class pymc3.distributions.discrete.ZeroInflatedNegativeBinomial(mu, alpha, psi, *args, **kwargs)

Zero-Inflated Negative binomial log-likelihood.

The Zero-inflated version of the Negative Binomial (NB). The NB distribution describes a Poisson random variable whose rate parameter is gamma distributed.

\[\begin{split}f(x \mid \mu, \alpha, \psi) = \left\{ \begin{array}{l} (1-\psi) + \psi \left (\frac{\alpha}{\alpha+\mu} \right) ^\alpha, \text{if } x = 0 \\ \psi \frac{\Gamma(x+\alpha)}{x! \Gamma(\alpha)} \left (\frac{\alpha}{\mu+\alpha} \right)^\alpha \left( \frac{\mu}{\mu+\alpha} \right)^x, \text{if } x=1,2,3,\ldots \end{array} \right.\end{split}\]
Support \(x \in \mathbb{N}_0\)
Mean \(\psi\mu\)
Var \(\psi\mu + \left (1 + \frac{\mu}{\alpha} + \frac{1-\psi}{\mu} \right)\)
Parameters:

mu : float

Poission distribution parameter (mu > 0).

alpha : float

Gamma distribution parameter (alpha > 0).

psi : float

Expected proportion of NegativeBinomial variates (0 < psi < 1)

class pymc3.distributions.discrete.DiscreteUniform(lower, upper, *args, **kwargs)

Discrete uniform distribution.

\[f(x \mid lower, upper) = \frac{1}{upper-lower}\]
Support \(x \in {lower, lower + 1, \ldots, upper}\)
Mean \(\dfrac{lower + upper}{2}\)
Variance \(\dfrac{(upper - lower)^2}{12}\)
Parameters:

lower : int

Lower limit.

upper : int

Upper limit (upper > lower).

class pymc3.distributions.discrete.Geometric(p, *args, **kwargs)

Geometric log-likelihood.

The probability that the first success in a sequence of Bernoulli trials occurs on the x’th trial.

\[f(x \mid p) = p(1-p)^{x-1}\]
Support \(x \in \mathbb{N}_{>0}\)
Mean \(\dfrac{1}{p}\)
Variance \(\dfrac{1 - p}{p^2}\)
Parameters:

p : float

Probability of success on an individual trial (0 < p <= 1).

class pymc3.distributions.discrete.Categorical(p, *args, **kwargs)

Categorical log-likelihood.

The most general discrete distribution.

\[f(x \mid p) = p_x\]
Support \(x \in \{1, 2, \ldots, |p|\}\)
Parameters:

p : array of floats

p > 0 and the elements of p must sum to 1. They will be automatically rescaled otherwise.

Multivariate

MvNormal(mu[, cov, tau]) Multivariate normal log-likelihood.
Wishart(n, V, \*args, \*\*kwargs) Wishart log-likelihood.
LKJCorr(n, p, \*args, \*\*kwargs) The LKJ (Lewandowski, Kurowicka and Joe) log-likelihood.
Multinomial(n, p, \*args, \*\*kwargs) Multinomial log-likelihood.
Dirichlet(a[, transform]) Dirichlet log-likelihood.
class pymc3.distributions.multivariate.MvNormal(mu, cov=None, tau=None, *args, **kwargs)

Multivariate normal log-likelihood.

\[f(x \mid \pi, T) = \frac{|T|^{1/2}}{(2\pi)^{1/2}} \exp\left\{ -\frac{1}{2} (x-\mu)^{\prime} T (x-\mu) \right\}\]
Support \(x \in \mathbb{R}^k\)
Mean \(\mu\)
Variance \(T^{-1}\)
Parameters:

mu : array

Vector of means.

cov : array, optional

Covariance matrix.

tau : array, optional

Precision matrix.

class pymc3.distributions.multivariate.MvStudentT(nu, Sigma, mu=None, *args, **kwargs)

Multivariate Student-T log-likelihood.

\[f(\mathbf{x}| \nu,\mu,\Sigma) = \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\Sigma}\right|^{1/2}\left[1+\frac{1}{\nu}({\mathbf x}-{\mu})^T{\Sigma}^{-1}({\mathbf x}-{\mu})\right]^{(\nu+p)/2}}\]
Support \(x \in \mathbb{R}^k\)
Mean \(\mu\) if \(\nu > 1\) else undefined
Variance
\(\frac{\nu}{\mu-2}\Sigma\)
if \(\nu>2\) else undefined
Parameters:

nu : int

Degrees of freedom.

Sigma : matrix

Covariance matrix.

mu : array

Vector of means.

class pymc3.distributions.multivariate.Dirichlet(a, transform=<pymc3.distributions.transforms.StickBreaking object>, *args, **kwargs)

Dirichlet log-likelihood.

\[f(\mathbf{x}) = \frac{\Gamma(\sum_{i=1}^k \theta_i)}{\prod \Gamma(\theta_i)} \prod_{i=1}^{k-1} x_i^{\theta_i - 1} \left(1-\sum_{i=1}^{k-1}x_i\right)^\theta_k\]
Support \(x_i \in (0, 1)\) for \(i \in \{1, \ldots, K\}\) such that \(\sum x_i = 1\)
Mean \(\dfrac{a_i}{\sum a_i}\)
Variance \(\dfrac{a_i - \sum a_0}{a_0^2 (a_0 + 1)}\) where \(a_0 = \sum a_i\)
Parameters:

a : array

Concentration parameters (a > 0).

Notes

Only the first k-1 elements of x are expected. Can be used as a parent of Multinomial and Categorical nevertheless.

class pymc3.distributions.multivariate.Multinomial(n, p, *args, **kwargs)

Multinomial log-likelihood.

Generalizes binomial distribution, but instead of each trial resulting in “success” or “failure”, each one results in exactly one of some fixed finite number k of possible outcomes over n independent trials. ‘x[i]’ indicates the number of times outcome number i was observed over the n trials.

\[f(x \mid n, p) = \frac{n!}{\prod_{i=1}^k x_i!} \prod_{i=1}^k p_i^{x_i}\]
Support \(x \in \{0, 1, \ldots, n\}\) such that \(\sum x_i = n\)
Mean \(n p_i\)
Variance \(n p_i (1 - p_i)\)
Covariance \(-n p_i p_j\) for \(i \ne j\)
Parameters:

n : int or array

Number of trials (n > 0).

p : one- or two-dimensional array

Probability of each one of the different outcomes. Elements must be non-negative and sum to 1 along the last axis. They will be automatically rescaled otherwise.

class pymc3.distributions.multivariate.Wishart(n, V, *args, **kwargs)

Wishart log-likelihood.

The Wishart distribution is the probability distribution of the maximum-likelihood estimator (MLE) of the precision matrix of a multivariate normal distribution. If V=1, the distribution is identical to the chi-square distribution with n degrees of freedom.

\[f(X \mid n, T) = \frac{{\mid T \mid}^{n/2}{\mid X \mid}^{(n-k-1)/2}}{2^{nk/2} \Gamma_p(n/2)} \exp\left\{ -\frac{1}{2} Tr(TX) \right\}\]

where \(k\) is the rank of \(X\).

Support \(X(p x p)\) positive definite matrix
Mean \(n V\)
Variance \(n (v_{ij}^2 + v_{ii} v_{jj})\)
Parameters:

n : int

Degrees of freedom, > 0.

V : array

p x p positive definite matrix.

pymc3.distributions.multivariate.WishartBartlett(name, S, nu, is_cholesky=False, return_cholesky=False, testval=None)

Bartlett decomposition of the Wishart distribution. As the Wishart distribution requires the matrix to be symmetric positive semi-definite it is impossible for MCMC to ever propose acceptable matrices.

Instead, we can use the Barlett decomposition which samples a lower diagonal matrix. Specifically:

\[ \begin{align}\begin{aligned}\begin{split}\text{If} L \sim \begin{pmatrix} \sqrt{c_1} & 0 & 0 \\ z_{21} & \sqrt{c_2} & 0 \\ z_{31} & z_{32} & \sqrt{c_3} \end{pmatrix}\end{split}\\\begin{split}\text{with} c_i \sim \chi^2(n-i+1) \text{ and } n_{ij} \sim \mathcal{N}(0, 1), \text{then} \\ L \times A \times A.T \times L.T \sim \text{Wishart}(L \times L.T, \nu)\end{split}\end{aligned}\end{align} \]

See http://en.wikipedia.org/wiki/Wishart_distribution#Bartlett_decomposition for more information.

Parameters:

S : ndarray

p x p positive definite matrix Or: p x p lower-triangular matrix that is the Cholesky factor of the covariance matrix.

nu : int

Degrees of freedom, > dim(S).

is_cholesky : bool (default=False)

Input matrix S is already Cholesky decomposed as S.T * S

return_cholesky : bool (default=False)

Only return the Cholesky decomposed matrix.

testval : ndarray

p x p positive definite matrix used to initialize

class pymc3.distributions.multivariate.LKJCorr(n, p, *args, **kwargs)

The LKJ (Lewandowski, Kurowicka and Joe) log-likelihood.

The LKJ distribution is a prior distribution for correlation matrices. If n = 1 this corresponds to the uniform distribution over correlation matrices. For n -> oo the LKJ prior approaches the identity matrix.

Support Upper triangular matrix with values in [-1, 1]
Parameters:

n : float

Shape parameter (n > 0). Uniform distribution at n = 1.

p : int

Dimension of correlation matrix (p > 0).

Notes

This implementation only returns the values of the upper triangular matrix excluding the diagonal. Here is a schematic for p = 5, showing the indexes of the elements:

[[- 0 1 2 3]
 [- - 4 5 6]
 [- - - 7 8]
 [- - - - 9]
 [- - - - -]]

References

[LKJ2009]Lewandowski, D., Kurowicka, D. and Joe, H. (2009). “Generating random correlation matrices based on vines and extended onion method.” Journal of multivariate analysis, 100(9), pp.1989-2001.

Mixture

Mixture(w, comp_dists, \*args, \*\*kwargs) Mixture log-likelihood
NormalMixture(w, mu, \*args, \*\*kwargs) Normal mixture log-likelihood
class pymc3.distributions.mixture.Mixture(w, comp_dists, *args, **kwargs)

Mixture log-likelihood

Often used to model subpopulation heterogeneity

\[f(x \mid w, \theta) = \sum_{i = 1}^n w_i f_i(x \mid \theta_i)\]
Support \(\cap_{i = 1}^n \textrm{support}(f_i)\)
Mean \(\sum_{i = 1}^n w_i \mu_i\)
Parameters:

w : array of floats

w >= 0 and w <= 1 the mixutre weights

comp_dists : multidimensional PyMC3 distribution or iterable of one-dimensional PyMC3 distributions

the component distributions \(f_1, \ldots, f_n\)

class pymc3.distributions.mixture.NormalMixture(w, mu, *args, **kwargs)

Normal mixture log-likelihood

\[f(x \mid w, \mu, \sigma^2) = \sum_{i = 1}^n w_i N(x \mid \mu_i, \sigma^2_i)\]
Support \(x \in \mathbb{R}\)
Mean \(\sum_{i = 1}^n w_i \mu_i\)
Variance \(\sum_{i = 1}^n w_i^2 \sigma^2_i\)

Parameters w : array of floats

w >= 0 and w <= 1 the mixutre weights
mu
: array of floats
the component means
sd
: array of floats
the component standard deviations
tau
: array of floats
the component precisions
pymc3.distributions.mixture.all_discrete(comp_dists)

Determine if all distributions in comp_dists are discrete

Plots

pymc3.plots.traceplot(trace, varnames=None, transform=<function <lambda>>, figsize=None, lines=None, combined=False, plot_transformed=False, grid=False, alpha=0.35, priors=None, prior_alpha=1, prior_style='--', ax=None)

Plot samples histograms and values

Parameters:

trace : result of MCMC run

varnames : list of variable names

Variables to be plotted, if None all variable are plotted

transform : callable

Function to transform data (defaults to identity)

figsize : figure size tuple

If None, size is (12, num of variables * 2) inch

lines : dict

Dictionary of variable name / value to be overplotted as vertical lines to the posteriors and horizontal lines on sample values e.g. mean of posteriors, true values of a simulation. If an array of values, line colors are matched to posterior colors. Otherwise, a default red line

combined : bool

Flag for combining multiple chains into a single chain. If False (default), chains will be plotted separately.

plot_transformed : bool

Flag for plotting automatically transformed variables in addition to original variables (defaults to False).

grid : bool

Flag for adding gridlines to histogram. Defaults to True.

alpha : float

Alpha value for plot line. Defaults to 0.35.

priors : iterable of PyMC distributions

PyMC prior distribution(s) to be plotted alongside poterior. Defaults to None (no prior plots).

prior_alpha : float

Alpha value for prior plot. Defaults to 1.

prior_style : str

Line style for prior plot. Defaults to ‘–’ (dashed line).

ax : axes

Matplotlib axes. Accepts an array of axes, e.g.:

>>> fig, axs = plt.subplots(3, 2) # 3 RVs
>>> pymc3.traceplot(trace, ax=axs)

Creates own axes by default.

Returns:

ax : matplotlib axes

pymc3.plots.forestplot(trace_obj, varnames=None, transform=<function <lambda>>, alpha=0.05, quartiles=True, rhat=True, main=None, xtitle=None, xrange=None, ylabels=None, chain_spacing=0.05, vline=0, gs=None, plot_transformed=False)

Forest plot (model summary plot) Generates a “forest plot” of 100*(1-alpha)% credible intervals for either the set of variables in a given model, or a specified set of nodes.

Parameters:

trace_obj: NpTrace or MultiTrace object

Trace(s) from an MCMC sample.

varnames: list

List of variables to plot (defaults to None, which results in all variables plotted).

transform : callable

Function to transform data (defaults to identity)

alpha (optional): float

Alpha value for (1-alpha)*100% credible intervals (defaults to 0.05).

quartiles (optional): bool

Flag for plotting the interquartile range, in addition to the (1-alpha)*100% intervals (defaults to True).

rhat (optional): bool

Flag for plotting Gelman-Rubin statistics. Requires 2 or more chains (defaults to True).

main (optional): string

Title for main plot. Passing False results in titles being suppressed; passing None (default) results in default titles.

xtitle (optional): string

Label for x-axis. Defaults to no label

xrange (optional): list or tuple

Range for x-axis. Defaults to matplotlib’s best guess.

ylabels (optional): list or array

User-defined labels for each variable. If not provided, the node __name__ attributes are used.

chain_spacing (optional): float

Plot spacing between chains (defaults to 0.05).

vline (optional): numeric

Location of vertical reference line (defaults to 0).

gs : GridSpec

Matplotlib GridSpec object. Defaults to None.

plot_transformed : bool

Flag for plotting automatically transformed variables in addition to original variables (defaults to False).

Returns:

gs : matplotlib GridSpec

pymc3.plots.autocorrplot(trace, varnames=None, max_lag=100, burn=0, plot_transformed=False, symmetric_plot=False, ax=None, figsize=None)

Bar plot of the autocorrelation function for a trace

Parameters:

trace : result of MCMC run

varnames : list of variable names

Variables to be plotted, if None all variable are plotted. Vector-value stochastics are handled automatically.

max_lag : int

Maximum lag to calculate autocorrelation. Defaults to 100.

burn : int

Number of samples to discard from the beginning of the trace. Defaults to 0.

plot_transformed : bool

Flag for plotting automatically transformed variables in addition to original variables (defaults to False).

symmetric_plot : boolean

Plot from either [0, +lag] or [-lag, lag]. Defaults to False, [-, +lag].

ax : axes

Matplotlib axes. Defaults to None.

figsize : figure size tuple

If None, size is (12, num of variables * 2) inches. Note this is not used if ax is supplied.

Returns:

ax : matplotlib axes

pymc3.plots.plot_posterior(trace, varnames=None, transform=<function <lambda>>, figsize=None, alpha_level=0.05, round_to=3, point_estimate='mean', rope=None, ref_val=None, kde_plot=False, plot_transformed=False, ax=None, **kwargs)

Plot Posterior densities in style of John K. Kruschke book

Parameters:

trace : result of MCMC run

varnames : list of variable names

Variables to be plotted, if None all variable are plotted

transform : callable

Function to transform data (defaults to identity)

figsize : figure size tuple

If None, size is (12, num of variables * 2) inch

alpha_level : float

Defines range for High Posterior Density

round_to : int

Controls formatting for floating point numbers

point_estimate: str

Must be in (‘mode’, ‘mean’, ‘median’)

rope: list or numpy array

Lower and upper values of the Region Of Practical Equivalence

ref_val: bool

display the percentage below and above ref_val

kde_plot: bool

if True plot a KDE instead of a histogram

plot_transformed : bool

Flag for plotting automatically transformed variables in addition to original variables (defaults to False).

ax : axes

Matplotlib axes. Defaults to None.

**kwargs

Passed as-is to plt.hist() or plt.plot() function, depending on the value of the argument kde_plot Some defaults are added, if not specified color=’#87ceeb’ will match the style in the book

Returns:

ax : matplotlib axes

Stats

Statistical utility functions for PyMC

pymc3.stats.autocorr(pymc3_obj, *args, **kwargs)

Sample autocorrelation at specified lag. The autocorrelation is the correlation of x_i with x_{i+lag}.

pymc3.stats.autocov(pymc3_obj, *args, **kwargs)

Sample autocovariance at specified lag. The autocovariance is a 2x2 matrix with the variances of x[:-lag] and x[lag:] in the diagonal and the autocovariance on the off-diagonal.

pymc3.stats.dic(trace, model=None)

Calculate the deviance information criterion of the samples in trace from model Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

pymc3.stats.bpic(trace, model=None)

Calculates Bayesian predictive information criterion n of the samples in trace from model Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

pymc3.stats.waic(trace, model=None, pointwise=False)

Calculate the widely available information criterion, its standard error and the effective number of parameters of the samples in trace from model. Read more theory here - in a paper by some of the leading authorities on Model Selection - dx.doi.org/10.1111/1467-9868.00353

Parameters:

trace : result of MCMC run

model : PyMC Model

Optional model. Default None, taken from context.

pointwise: bool

if True the pointwise predictive accuracy will be returned. Default False

Returns:

namedtuple with the following elements:

waic: widely available information criterion

waic_se: standard error of waic

p_waic: effective number parameters

waic_i: and array of the pointwise predictive accuracy, only if pointwise True

pymc3.stats.loo(trace, model=None, pointwise=False)

Calculates leave-one-out (LOO) cross-validation for out of sample predictive model fit, following Vehtari et al. (2015). Cross-validation is computed using Pareto-smoothed importance sampling (PSIS).

Parameters:

trace : result of MCMC run

model : PyMC Model

Optional model. Default None, taken from context.

pointwise: bool

if True the pointwise predictive accuracy will be returned. Default False

Returns:

namedtuple with the following elements:

loo: approximated Leave-one-out cross-validation

loo_se: standard error of loo

p_loo: effective number of parameters

loo_i: and array of the pointwise predictive accuracy, only if pointwise True

pymc3.stats.hpd(pymc3_obj, *args, **kwargs)

Calculate highest posterior density (HPD) of array for given alpha. The HPD is the minimum width Bayesian credible interval (BCI).

Arguments:
x
: Numpy array

An array containing MCMC samples

alpha
: float

Desired probability of type I error (defaults to 0.05)

transform
: callable

Function to transform data (defaults to identity)

pymc3.stats.quantiles(pymc3_obj, *args, **kwargs)

Returns a dictionary of requested quantiles from array

Arguments:
x
: Numpy array

An array containing MCMC samples

qlist
: tuple or list

A list of desired quantiles (defaults to (2.5, 25, 50, 75, 97.5))

transform
: callable

Function to transform data (defaults to identity)

pymc3.stats.mc_error(pymc3_obj, *args, **kwargs)

Calculates the simulation standard error, accounting for non-independent samples. The trace is divided into batches, and the standard deviation of the batch means is calculated.

Arguments:
x
: Numpy array

An array containing MCMC samples

batches
: integer

Number of batches

pymc3.stats.summary(trace, varnames=None, alpha=0.05, start=0, batches=None, roundto=3, include_transformed=False, to_file=None)

Generate a pretty-printed summary of the node.

Parameters:

trace : Trace object

Trace containing MCMC sample

varnames : list of strings

List of variables to summarize. Defaults to None, which results in all variables summarized.

alpha : float

The alpha level for generating posterior intervals. Defaults to 0.05.

start : int

The starting index from which to summarize (each) chain. Defaults to zero.

batches : None or int

Batch size for calculating standard deviation for non-independent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None.

roundto : int

The number of digits to round posterior statistics.

include_transformed : bool

Flag for summarizing automatically transformed variables in addition to original variables (defaults to False).

to_file : None or string

File to write results to. If not given, print to stdout.

pymc3.stats.df_summary(trace, varnames=None, stat_funcs=None, extend=False, include_transformed=False, alpha=0.05, batches=None)

Create a data frame with summary statistics.

Parameters:

trace : MultiTrace instance

varnames : list

Names of variables to include in summary

stat_funcs : None or list

A list of functions used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included.

The functions will be given one argument, the samples for a variable as a 2 dimensional array, where the first axis corresponds to sampling iterations and the second axis represents the flattened variable (e.g., x__0, x__1,...). Each function should return either 1) A pandas.Series instance containing the result of

calculating the statistic along the first axis. The name attribute will be taken as the name of the statistic.

  1. A pandas.DataFrame where each column contains the result of calculating the statistic along the first axis. The column names will be taken as the names of the statistics.

extend : boolean

If True, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not None.

include_transformed : bool

Flag for reporting automatically transformed variables in addition to original variables (defaults to False).

alpha : float

The alpha level for generating posterior intervals. Defaults to 0.05. This is only meaningful when stat_funcs is None.

batches : None or int

Batch size for calculating standard deviation for non-independent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None.

Returns:

pandas.DataFrame with summary statistics for each variable

See also

summary
Generate a pretty-printed summary of a trace.

Examples

>>> import pymc3 as pm
>>> trace.mu.shape
(1000, 2)
>>> pm.df_summary(trace, ['mu'])
           mean        sd  mc_error     hpd_5    hpd_95
mu__0  0.106897  0.066473  0.001818 -0.020612  0.231626
mu__1 -0.046597  0.067513  0.002048 -0.174753  0.081924

Other statistics can be calculated by passing a list of functions.

>>> import pandas as pd
>>> def trace_sd(x):
...     return pd.Series(np.std(x, 0), name='sd')
...
>>> def trace_quantiles(x):
...     return pd.DataFrame(pm.quantiles(x, [5, 50, 95]))
...
>>> pm.df_summary(trace, ['mu'], stat_funcs=[trace_sd, trace_quantiles])
             sd         5        50        95
mu__0  0.066473  0.000312  0.105039  0.214242
mu__1  0.067513 -0.159097 -0.045637  0.062912

Diagnostics

Convergence diagnostics and model validation

pymc3.diagnostics.geweke(pymc3_obj, *args, **kwargs)

Return z-scores for convergence diagnostics.

Compare the mean of the first % of series with the mean of the last % of series. x is divided into a number of segments for which this difference is computed. If the series is converged, this score should oscillate between -1 and 1.

Parameters:

x : array-like

The trace of some stochastic parameter.

first : float

The fraction of series at the beginning of the trace.

last : float

The fraction of series at the end to be compared with the section at the beginning.

intervals : int

The number of segments.

Returns:

scores : list [[]]

Return a list of [i, score], where i is the starting index for each interval and score the Geweke score on the interval.

Notes

The Geweke score on some series x is computed by:

\[\frac{E[x_s] - E[x_e]}{\sqrt{V[x_s] + V[x_e]}}\]

where \(E\) stands for the mean, \(V\) the variance, \(x_s\) a section at the start of the series and \(x_e\) a section at the end of the series.

References

Geweke (1992)

pymc3.diagnostics.gelman_rubin(mtrace)

Returns estimate of R for a set of traces.

The Gelman-Rubin diagnostic tests for lack of convergence by comparing the variance between multiple chains to the variance within each chain. If convergence has been achieved, the between-chain and within-chain variances should be identical. To be most effective in detecting evidence for nonconvergence, each chain should have been initialized to starting values that are dispersed relative to the target distribution.

Parameters:

mtrace : MultiTrace

A MultiTrace object containing parallel traces (minimum 2) of one or more stochastic parameters.

Returns:

Rhat : dict

Returns dictionary of the potential scale reduction factors, \(\hat{R}\)

Notes

The diagnostic is computed by:

\[\hat{R} = \frac{\hat{V}}{W}\]

where \(W\) is the within-chain variance and \(\hat{V}\) is the posterior variance estimate for the pooled traces. This is the potential scale reduction factor, which converges to unity when each of the traces is a sample from the target posterior. Values greater than one indicate that one or more chains have not yet converged.

References

Brooks and Gelman (1998) Gelman and Rubin (1992)

pymc3.diagnostics.effective_n(mtrace)

Returns estimate of the effective sample size of a set of traces.

Parameters:

mtrace : MultiTrace

A MultiTrace object containing parallel traces (minimum 2) of one or more stochastic parameters.

Returns:

n_eff : float

Return the effective sample size, \(\hat{n}_{eff}\)

Notes

The diagnostic is computed by:

\[\hat{n}_{eff} = \frac{mn}{1 + 2 \sum_{t=1}^T \hat{\rho}_t}\]

where \(\hat{\rho}_t\) is the estimated autocorrelation at lag t, and T is the first odd positive integer for which the sum \(\hat{\rho}_{T+1} + \hat{\rho}_{T+1}\) is negative.

References

Gelman et al. (2014)

Inference

Sampling

pymc3.sampling.sample(draws, step=None, init='advi', n_init=200000, start=None, trace=None, chain=0, njobs=1, tune=None, progressbar=True, model=None, random_seed=-1)

Draw a number of samples using the given step method. Multiple step methods supported via compound step method returns the amount of time taken.

Parameters:

draws : int

The number of samples to draw.

step : function or iterable of functions

A step function or collection of functions. If no step methods are specified, or are partially specified, they will be assigned automatically (defaults to None).

init : str {‘ADVI’, ‘ADVI_MAP’, ‘MAP’, ‘NUTS’, None}

Initialization method to use. * ADVI : Run ADVI to estimate starting points and diagonal covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean. * ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point. * MAP : Use the MAP as starting point. * NUTS : Run NUTS to estimate starting points and covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean. * None : Do not initialize.

n_init : int

Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘nuts’, number of draws.

start : dict

Starting point in parameter space (or partial point) Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not (defaults to empty dict).

trace : backend, list, or MultiTrace

This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used. Passing either “text” or “sqlite” is taken as a shortcut to set up the corresponding backend (with “mcmc” used as the base name).

chain : int

Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here.

njobs : int

Number of parallel jobs to start. If None, set to number of cpus in the system - 2.

tune : int

Number of iterations to tune, if applicable (defaults to None)

progressbar : bool

Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA).

model : Model (optional if in with context)

random_seed : int or list of ints

A list is accepted if more if njobs is greater than one.

Returns:

MultiTrace object with access to sampling values

pymc3.sampling.iter_sample(draws, step, start=None, trace=None, chain=0, tune=None, model=None, random_seed=-1)

Generator that returns a trace on each iteration using the given step method. Multiple step methods supported via compound step method returns the amount of time taken.

Parameters:

draws : int

The number of samples to draw

step : function

Step function

start : dict

Starting point in parameter space (or partial point) Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not (defaults to empty dict)

trace : backend, list, or MultiTrace

This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used.

chain : int

Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here.

tune : int

Number of iterations to tune, if applicable (defaults to None)

model : Model (optional if in with context)

random_seed : int or list of ints

A list is accepted if more if njobs is greater than one.

pymc3.sampling.sample_ppc(trace, samples=None, model=None, vars=None, size=None, random_seed=None, progressbar=True)

Generate posterior predictive samples from a model given a trace.

Parameters:

trace : backend, list, or MultiTrace

Trace generated from MCMC sampling

samples : int

Number of posterior predictive samples to generate. Defaults to the length of trace

model : Model (optional if in with context)

Model used to generate trace

vars : iterable

Variables for which to compute the posterior predictive samples. Defaults to model.observed_RVs.

size : int

The number of random draws from the distribution specified by the parameters in each sample of the trace.

Returns:

Dictionary keyed by vars, where the values are the corresponding

posterior predictive samples.

pymc3.sampling.init_nuts(init='ADVI', njobs=1, n_init=500000, model=None, random_seed=-1, **kwargs)

Initialize and sample from posterior of a continuous model.

This is a convenience function. NUTS convergence and sampling speed is extremely dependent on the choice of mass/scaling matrix. In our experience, using ADVI to estimate a diagonal covariance matrix and using this as the scaling matrix produces robust results over a wide class of continuous models.

Parameters:

init : str {‘ADVI’, ‘ADVI_MAP’, ‘MAP’, ‘NUTS’}

Initialization method to use. * ADVI : Run ADVI to estimate posterior mean and diagonal covariance matrix. * ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point. * MAP : Use the MAP as starting point. * NUTS : Run NUTS and estimate posterior mean and covariance matrix.

njobs : int

Number of parallel jobs to start.

n_init : int

Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘metropolis’, number of draws.

model : Model (optional if in with context)

**kwargs : keyword arguments

Extra keyword arguments are forwarded to pymc3.NUTS.

Returns:

start, nuts_sampler

start : pymc3.model.Point

Starting point for sampler

nuts_sampler : pymc3.step_methods.NUTS

Instantiated and initialized NUTS sampler object

Step-methods

NUTS

Metropolis

class pymc3.step_methods.metropolis.Metropolis(vars=None, S=None, proposal_dist=<class 'pymc3.step_methods.metropolis.NormalProposal'>, scaling=1.0, tune=True, tune_interval=100, model=None, mode=None, **kwargs)

Metropolis-Hastings sampling step

Parameters:

vars : list

List of variables for sampler

S : standard deviation or covariance matrix

Some measure of variance to parameterize proposal distribution

proposal_dist : function

Function that returns zero-mean deviates when parameterized with S (and n). Defaults to normal.

scaling : scalar or array

Initial scale factor for proposal. Defaults to 1.

tune : bool

Flag for tuning. Defaults to True.

tune_interval : int

The frequency of tuning. Defaults to 100 iterations.

model : PyMC Model

Optional model for sampling step. Defaults to None (taken from context).

mode : string or Mode instance.

compilation mode passed to Theano functions

class pymc3.step_methods.metropolis.BinaryMetropolis(vars, scaling=1.0, tune=True, tune_interval=100, model=None)

Metropolis-Hastings optimized for binary variables

Parameters:

vars : list

List of variables for sampler

scaling : scalar or array

Initial scale factor for proposal. Defaults to 1.

tune : bool

Flag for tuning. Defaults to True.

tune_interval : int

The frequency of tuning. Defaults to 100 iterations.

model : PyMC Model

Optional model for sampling step. Defaults to None (taken from context).

static competence(var)

BinaryMetropolis is only suitable for binary (bool) and Categorical variables with k=1.

class pymc3.step_methods.metropolis.BinaryGibbsMetropolis(vars, order='random', model=None)

A Metropolis-within-Gibbs step method optimized for binary variables

static competence(var)

BinaryMetropolis is only suitable for Bernoulli and Categorical variables with k=2.

class pymc3.step_methods.metropolis.CategoricalGibbsMetropolis(vars, proposal='uniform', order='random', model=None)

A Metropolis-within-Gibbs step method optimized for categorical variables. This step method works for Bernoulli variables as well, but it is not optimized for them, like BinaryGibbsMetropolis is. Step method supports two types of proposals: A uniform proposal and a proportional proposal, which was introduced by Liu in his 1996 technical report “Metropolized Gibbs Sampler: An Improvement”.

static competence(var)

CategoricalGibbsMetropolis is only suitable for Bernoulli and Categorical variables.

Slice

class pymc3.step_methods.slicer.Slice(vars=None, w=1.0, tune=True, model=None, **kwargs)

Univariate slice sampler step method

Parameters:

vars : list

List of variables for sampler.

w : float

Initial width of slice (Defaults to 1).

tune : bool

Flag for tuning (Defaults to True).

model : PyMC Model

Optional model for sampling step. Defaults to None (taken from context).

Hamiltonian Monte Carlo

Variational

ADVI

  1. 2016, John Salvatier & Taku Yoshioka
pymc3.variational.advi.advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False, optimizer=None, learning_rate=0.001, epsilon=0.1, mode=None, random_seed=None)

Perform automatic differentiation variational inference (ADVI).

This function implements the meanfield ADVI, where the variational posterior distribution is assumed to be spherical Gaussian without correlation of parameters and fit to the true posterior distribution. The means and standard deviations of the variational posterior are referred to as variational parameters.

The return value of this function is an ADVIfit object, which has variational parameters. If you want to draw samples from the variational posterior, you need to pass the ADVIfit object to pymc3.variational.sample_vp().

The variational parameters are defined on the transformed space, which is required to do ADVI on an unconstrained parameter space as described in [KTR+2016]. The parameters in the ADVIfit object are in the transformed space, while traces returned by sample_vp() are in the original space as obtained by MCMC sampling methods in PyMC3.

The variational parameters are optimized with given optimizer, which is a function that returns a dictionary of parameter updates as provided to Theano function. If no optimizer is provided, optimization is performed with a modified version of adagrad, where only the last (n_window) gradient vectors are used to control the learning rate and older gradient vectors are ignored. n_window denotes the size of time window and fixed to 10.

Parameters:

vars : object

Random variables.

start : Dict or None

Initial values of parameters (variational means).

model : Model

Probabilistic model.

n : int

Number of interations updating parameters.

accurate_elbo : bool

If true, 100 MC samples are used for accurate calculation of ELBO.

optimizer : (loss, tensor) -> dict or OrderedDict

A function that returns parameter updates given loss and parameter tensor. If None (default), a default Adagrad optimizer is used with parameters learning_rate and epsilon below.

learning_rate: float

Base learning rate for adagrad. This parameter is ignored when optimizer is given.

epsilon : float

Offset in denominator of the scale of learning rate in Adagrad. This parameter is ignored when optimizer is given.

random_seed : int or None

Seed to initialize random state. None uses current seed.

mode : string or Mode instance.

Compilation mode passed to Theano functions

Returns:

ADVIFit

Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’.

‘means’ is the mean. ‘stds’ is the standard deviation.

‘elbo_vals’ is the trace of ELBO values during optimizaiton.

References

[KTR+2016]Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv preprint arXiv:1603.00788.
pymc3.variational.advi.sample_vp(vparams, draws=1000, model=None, local_RVs=None, random_seed=None, hide_transformed=True, progressbar=True)

Draw samples from variational posterior.

Parameters:

vparams : dict or pymc3.variational.ADVIFit

Estimated variational parameters of the model.

draws : int

Number of random samples.

model : pymc3.Model

Probabilistic model.

random_seed : int or None

Seed of random number generator. None to use current seed.

hide_transformed : bool

If False, transformed variables are also sampled. Default is True.

Returns:

trace : pymc3.backends.base.MultiTrace

Samples drawn from the variational posterior.

ADVI minibatch

pymc3.variational.advi_minibatch.advi_minibatch(vars=None, start=None, model=None, n=5000, n_mcsamples=1, minibatch_RVs=None, minibatch_tensors=None, minibatches=None, global_RVs=None, local_RVs=None, observed_RVs=None, encoder_params=None, total_size=None, optimizer=None, learning_rate=0.001, epsilon=0.1, random_seed=None, mode=None)

Perform mini-batch ADVI.

This function implements a mini-batch automatic differentiation variational inference (ADVI; Kucukelbir et al., 2015) with the meanfield approximation. Autoencoding variational Bayes (AEVB; Kingma and Welling, 2014) is also supported.

For explanation, we classify random variables in probabilistic models into three types. Observed random variables \({\cal Y}=\{\mathbf{y}_{i}\}_{i=1}^{N}\) are \(N\) observations. Each \(\mathbf{y}_{i}\) can be a set of observed random variables, i.e., \(\mathbf{y}_{i}=\{\mathbf{y}_{i}^{k}\}_{k=1}^{V_{o}}\), where \(V_{k}\) is the number of the types of observed random variables in the model.

The next ones are global random variables \(\Theta=\{\theta^{k}\}_{k=1}^{V_{g}}\), which are used to calculate the probabilities for all observed samples.

The last ones are local random variables \({\cal Z}=\{\mathbf{z}_{i}\}_{i=1}^{N}\), where \(\mathbf{z}_{i}=\{\mathbf{z}_{i}^{k}\}_{k=1}^{V_{l}}\). These RVs are used only in AEVB.

The goal of ADVI is to approximate the posterior distribution \(p(\Theta,{\cal Z}|{\cal Y})\) by variational posterior \(q(\Theta)\prod_{i=1}^{N}q(\mathbf{z}_{i})\). All of these terms are normal distributions (mean-field approximation).

\(q(\Theta)\) is parametrized with its means and standard deviations. These parameters are denoted as \(\gamma\). While \(\gamma\) is a constant, the parameters of \(q(\mathbf{z}_{i})\) are dependent on each observation. Therefore these parameters are denoted as \(\xi(\mathbf{y}_{i}; \nu)\), where \(\nu\) is the parameters of \(\xi(\cdot)\). For example, \(\xi(\cdot)\) can be a multilayer perceptron or convolutional neural network.

In addition to \(\xi(\cdot)\), we can also include deterministic mappings for the likelihood of observations. We denote the parameters of the deterministic mappings as \(\eta\). An example of such mappings is the deconvolutional neural network used in the convolutional VAE example in the PyMC3 notebook directory.

This function maximizes the evidence lower bound (ELBO) \({\cal L}(\gamma, \nu, \eta)\) defined as follows:

\[\begin{split}{\cal L}(\gamma,\nu,\eta) & = \mathbf{c}_{o}\mathbb{E}_{q(\Theta)}\left[ \sum_{i=1}^{N}\mathbb{E}_{q(\mathbf{z}_{i})}\left[ \log p(\mathbf{y}_{i}|\mathbf{z}_{i},\Theta,\eta) \right]\right] \\ & - \mathbf{c}_{g}KL\left[q(\Theta)||p(\Theta)\right] - \mathbf{c}_{l}\sum_{i=1}^{N} KL\left[q(\mathbf{z}_{i})||p(\mathbf{z}_{i})\right],\end{split}\]

where \(KL[q(v)||p(v)]\) is the Kullback-Leibler divergence

\[KL[q(v)||p(v)] = \int q(v)\log\frac{q(v)}{p(v)}dv,\]

\(\mathbf{c}_{o/g/l}\) are vectors for weighting each term of ELBO. More precisely, we can write each of the terms in ELBO as follows:

\[\begin{split}\mathbf{c}_{o}\log p(\mathbf{y}_{i}|\mathbf{z}_{i},\Theta,\eta) & = & \sum_{k=1}^{V_{o}}c_{o}^{k} \log p(\mathbf{y}_{i}^{k}| {\rm pa}(\mathbf{y}_{i}^{k},\Theta,\eta)) \\ \mathbf{c}_{g}KL\left[q(\Theta)||p(\Theta)\right] & = & \sum_{k=1}^{V_{g}}c_{g}^{k}KL\left[ q(\theta^{k})||p(\theta^{k}|{\rm pa(\theta^{k})})\right] \\ \mathbf{c}_{l}KL\left[q(\mathbf{z}_{i}||p(\mathbf{z}_{i})\right] & = & \sum_{k=1}^{V_{l}}c_{l}^{k}KL\left[ q(\mathbf{z}_{i}^{k})|| p(\mathbf{z}_{i}^{k}|{\rm pa}(\mathbf{z}_{i}^{k}))\right],\end{split}\]

where \({\rm pa}(v)\) denotes the set of parent variables of \(v\) in the directed acyclic graph of the model.

When using mini-batches, \(c_{o}^{k}\) and \(c_{l}^{k}\) should be set to \(N/M\), where \(M\) is the number of observations in each mini-batch. Another weighting scheme was proposed in (Blundell et al., 2015) for accelarating model fitting.

For working with ADVI, we need to give the probabilistic model (model), the three types of RVs (observed_RVs, global_RVs and local_RVs), the tensors to which mini-bathced samples are supplied (minibatches) and parameters of deterministic mappings \(\xi\) and \(\eta\) (encoder_params) as input arguments.

observed_RVs is a OrderedDict of the form {y_k: c_k}, where y_k is a random variable defined in the PyMC3 model. c_k is a scalar (\(c_{o}^{k}\)) and it can be a shared variable.

global_RVs is a OrderedDict of the form {t_k: c_k}, where t_k is a random variable defined in the PyMC3 model. c_k is a scalar (\(c_{g}^{k}\)) and it can be a shared variable.

local_RVs is a OrderedDict of the form {z_k: ((m_k, s_k), c_k)}, where z_k is a random variable defined in the PyMC3 model. c_k is a scalar (\(c_{l}^{k}\)) and it can be a shared variable. (m_k, s_k) is a pair of tensors of means and log standard deviations of the variational distribution; samples drawn from the variational distribution replaces z_k. It should be noted that if z_k has a transformation that changes the dimension (e.g., StickBreakingTransform), the variational distribution must have the same dimension. For example, if z_k is distributed with Dirichlet distribution with p choices, \(m_k\) and s_k has the shape (n_samples_in_minibatch, p - 1).

minibatch_tensors is a list of tensors (can be shared variables) to which mini-batch samples are set during the optimization. These tensors are observations (obs=) in observed_RVs.

minibatches is a generator of a list of numpy.ndarray. Each item of the list will be set to tensors in minibatch_tensors.

encoder_params is a list of shared variables of the parameters \(\nu\) and \(\eta\). We do not need to include the variational parameters of the global variables, \(\gamma\), because these are automatically created and updated in this function.

The following is a list of example notebooks using advi_minibatch:

  • docs/source/notebooks/GLM-hierarchical-advi-minibatch.ipynb
  • docs/source/notebooks/bayesian_neural_network_advi.ipynb
  • docs/source/notebooks/convolutional_vae_keras_advi.ipynb
  • docs/source/notebooks/gaussian-mixture-model-advi.ipynb
  • docs/source/notebooks/lda-advi-aevb.ipynb
Parameters:

vars : object

List of random variables. If None, variational posteriors (normal distribution) are fit for all RVs in the given model.

start : Dict or None

Initial values of parameters (variational means).

model : Model

Probabilistic model.

n : int

Number of iterations updating parameters.

n_mcsamples : int

Number of Monte Carlo samples to approximate ELBO.

minibatch_RVs : list of ObservedRVs

Random variables in the model for which mini-batch tensors are set. When this argument is given, both of arguments local_RVs and observed_RVs must be None.

minibatch_tensors : list of (tensors or shared variables)

Tensors used to create ObservedRVs in minibatch_RVs.

minibatches : generator of list

Generates a set of minibatches when calling next(). The length of the returned list must be the same with the number of random variables in minibatch_tensors.

total_size : int

Total size of training samples. This is used to appropriately scale the log likelihood terms corresponding to mini-batches in ELBO.

observed_RVs : Ordered dict

Include a scaling constant for the corresponding RV. See the above description.

global_RVs : Ordered dict or None

Include a scaling constant for the corresponding RV. See the above description. If None, it is set to {v: 1 for v in grvs}, where grvs is list(set(vars) - set(list(local_RVs) + list(observed_RVs))).

local_RVs : Ordered dict or None

Include encoded variational parameters and a scaling constant for the corresponding RV. See the above description.

encoder_params : list of theano shared variables

Parameters of encoder.

optimizer : (loss, list of shared variables) -> dict or OrderedDict

A function that returns parameter updates given loss and shared variables of parameters. If None (default), a default Adagrad optimizer is used with parameters learning_rate and epsilon below.

learning_rate: float

Base learning rate for adagrad. This parameter is ignored when optimizer is set.

epsilon : float

Offset in denominator of the scale of learning rate in Adagrad. This parameter is ignored when optimizer is set.

random_seed : int

Seed to initialize random state.

Returns:

ADVIFit

Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’.

References

  • Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. stat, 1050, 1.
  • Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568-576).
  • Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 1613-1622).

Backends

Backends for traces

Available backends

  1. NumPy array (pymc3.backends.NDArray)
  2. Text files (pymc3.backends.Text)
  3. SQLite (pymc3.backends.SQLite)

The NDArray backend holds the entire trace in memory, whereas the Text and SQLite backends store the values while sampling.

Selecting a backend

By default, a NumPy array is used as the backend. To specify a different backend, pass a backend instance to sample.

For example, the following would save the sampling values to CSV files in the directory ‘test’.

>>> import pymc3 as pm
>>> db = pm.backends.Text('test')
>>> trace = pm.sample(..., trace=db)

Selecting values from a backend

After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.

>>> trace['x']  # or trace.x or trace[x]

The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the njobs argument.)

To discard the first N values of each chain, slicing syntax can be used.

>>> trace['x', 1000:]

The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.

>>> trace.get_values('x', burn=1000, combine=False)

The chains parameter of get_values can be used to limit the chains that are retrieved.

>>> trace.get_values('x', burn=1000, chains=[0, 2])

MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.

>>> sliced_trace = trace[1000:]

The backend for the new trace is always NDArray, regardless of the type of original trace. Only the NDArray backend supports a stop value in the slice.

Loading a saved backend

Saved backends can be loaded using load function in the module for the specific backend.

>>> trace = pm.backends.text.load('test')

Writing custom backends

Backends consist of a class that handles sampling storage and value selection. Three sampling methods of backend will be called:

  • setup: Before sampling is started, the setup method will be called with two arguments: the number of draws and the chain number. This is useful setting up any structure for storing the sampling values that require the above information.
  • record: Record the sampling results for the current draw. This method will be called with a dictionary of values mapped to the variable names. This is the only sampling function that must do something to have a meaningful backend.
  • close: This method is called following sampling and should perform any actions necessary for finalizing and cleaning up the backend.

The base storage class backends.base.BaseTrace provides common model setup that is used by all the PyMC backends.

Several selection methods must also be defined:

  • get_values: This is the core method for selecting values from the backend. It can be called directly and is used by __getitem__ when the backend is indexed with a variable name or object.
  • _slice: Defines how the backend returns a slice of itself. This is called if the backend is indexed with a slice range.
  • point: Returns values for each variable at a single iteration. This is called if the backend is indexed with a single integer.
  • __len__: This should return the number of draws.

When pymc3.sample finishes, it wraps all trace objects in a MultiTrace object that provides a consistent selection interface for all backends. If the traces are stored on disk, then a load function should also be defined that returns a MultiTrace object.

For specific examples, see pymc3.backends.{ndarray,text,sqlite}.py.

ndarray

NumPy array trace backend

Store sampling values in memory as a NumPy array.

class pymc3.backends.ndarray.NDArray(name=None, model=None, vars=None)

NDArray trace object

Parameters:

name : str

Name of backend. This has no meaning for the NDArray backend.

model : Model

If None, the model is taken from the with context.

vars : list of variables

Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:

varname : str

burn : int

thin : int

Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variable names as keys.

record(point, sampler_stats=None)

Record results of a sampling iteration.

Parameters:

point : dict

Values mapped to variable names

setup(draws, chain, sampler_vars=None)

Perform chain-specific setup.

Parameters:

draws : int

Expected number of draws

chain : int

Chain number

sampler_vars : list of dicts

Names and dtypes of the variables that are exported by the samplers.

sqlite

SQLite trace backend

Store and retrieve sampling values in SQLite database file.

Database format

For each variable, a table is created with the following format:

recid (INT), draw (INT), chain (INT), v0 (FLOAT), v1 (FLOAT), v2 (FLOAT) ...

The variable column names are extended to reflect additional dimensions. For example, a variable with the shape (2, 2) would be stored as

key (INT), draw (INT), chain (INT), v0_0 (FLOAT), v0_1 (FLOAT), v1_0 (FLOAT) ...

The key is autoincremented each time a new row is added to the table. The chain column denotes the chain index and starts at 0.

class pymc3.backends.sqlite.SQLite(name, model=None, vars=None)

SQLite trace object

Parameters:

name : str

Name of database file

model : Model

If None, the model is taken from the with context.

vars : list of variables

Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:

varname : str

burn : int

thin : int

Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters:

point : dict

Values mapped to variable names

setup(draws, chain)

Perform chain-specific setup.

Parameters:

draws : int

Expected number of draws

chain : int

Chain number

pymc3.backends.sqlite.load(name, model=None)

Load SQLite database.

Parameters:

name : str

Path to SQLite database file

model : Model

If None, the model is taken from the with context.

Returns:

A MultiTrace instance

text

Text file trace backend

Store sampling values as CSV files.

File format

Sampling values for each chain are saved in a separate file (under a directory specified by the name argument). The rows correspond to sampling iterations. The column names consist of variable names and index labels. For example, the heading

x,y__0_0,y__0_1,y__1_0,y__1_1,y__2_0,y__2_1

represents two variables, x and y, where x is a scalar and y has a shape of (3, 2).

class pymc3.backends.text.Text(name, model=None, vars=None)

Text trace object

Parameters:

name : str

Name of directory to store text files

model : Model

If None, the model is taken from the with context.

vars : list of variables

Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:

varname : str

burn : int

thin : int

Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters:

point : dict

Values mapped to variable names

setup(draws, chain)

Perform chain-specific setup.

Parameters:

draws : int

Expected number of draws

chain : int

Chain number

pymc3.backends.text.dump(name, trace, chains=None)

Store values from NDArray trace as CSV files.

Parameters:

name : str

Name of directory to store CSV files in

trace : MultiTrace of NDArray traces

Result of MCMC run with default NDArray backend

chains : list

Chains to dump. If None, all chains are dumped.

pymc3.backends.text.load(name, model=None)

Load Text database.

Parameters:

name : str

Name of directory with files (one per chain)

model : Model

If None, the model is taken from the with context.

Returns:

A MultiTrace instance

tracetab

Functions for converting traces into a table-like format

pymc3.backends.tracetab.trace_to_dataframe(trace, chains=None, varnames=None, hide_transformed_vars=True)

Convert trace to Pandas DataFrame.

Parameters:

trace : NDarray trace

chains : int or list of ints

Chains to include. If None, all chains are used. A single chain value can also be given.

varnames : list of variable names

Variables to be included in the DataFrame, if None all variable are included.

hide_transformed_vars: boolean

If true transformed variables will not be included in the resulting DataFrame.

GLM

GP

ExpQuad(input_dim, lengthscales[, active_dims]) The exponentiated quadratic kernel.
RatQuad(input_dim, lengthscales, alpha[, ...]) The rational quadratic kernel.
Matern32(input_dim, lengthscales[, active_dims]) The Matern kernel with nu = 3/2.
Matern52(input_dim, lengthscales[, active_dims]) The Matern kernel with nu = 5/2.
Exponential(input_dim, lengthscales[, ...]) The Exponential kernel.
Cosine(input_dim, lengthscales[, active_dims]) The cosine kernel.
Linear(input_dim, c[, active_dims]) The linear kernel.
Polynomial(input_dim, c, d, offset[, ...]) The polynomial covariance function.
WarpedInput(input_dim, cov_func, warp_func) Warp the inputs of any covariance function using an arbitrary function defined using Theano.
class pymc3.gp.cov.ExpQuad(input_dim, lengthscales, active_dims=None)

The exponentiated quadratic kernel. Also refered to as the squared exponential, or radial basis function kernel.

\[k(x, x') = \mathrm{exp}\left[ -\frac{(x - x')^2}{2 \ell^2} \right]\]
class pymc3.gp.cov.RatQuad(input_dim, lengthscales, alpha, active_dims=None)

The rational quadratic kernel.

\[k(x, x') = \left(1 + \frac{(x - x')^2}{2\alpha\ell^2} \right)^{-\alpha}\]
class pymc3.gp.cov.Exponential(input_dim, lengthscales, active_dims=None)

The Exponential kernel.

\[k(x, x') = \mathrm{exp}\left[ -\frac{||x - x'||}{2\ell^2} \right]\]
class pymc3.gp.cov.Matern52(input_dim, lengthscales, active_dims=None)

The Matern kernel with nu = 5/2.

\[k(x, x') = \left(1 + \frac{\sqrt{5(x - x')^2}}{\ell} + \frac{5(x-x')^2}{3\ell^2}\right) \mathrm{exp}\left[ - \frac{\sqrt{5(x - x')^2}}{\ell} \right]\]
class pymc3.gp.cov.Matern32(input_dim, lengthscales, active_dims=None)

The Matern kernel with nu = 3/2.

\[k(x, x') = \left(1 + \frac{\sqrt{3(x - x')^2}}{\ell}\right)\mathrm{exp}\left[ - \frac{\sqrt{3(x - x')^2}}{\ell} \right]\]
class pymc3.gp.cov.Linear(input_dim, c, active_dims=None)

The linear kernel.

\[k(x, x') = (x - c)(x' - c)\]
class pymc3.gp.cov.Polynomial(input_dim, c, d, offset, active_dims=None)

The polynomial covariance function.

\[k(x, x') = [(x - c)(x' - c) + \mathrm{offset}]^{d}\]
class pymc3.gp.cov.Cosine(input_dim, lengthscales, active_dims=None)

The cosine kernel.

\[k(x, x') = \mathrm{cos}\left( \frac{||x - x'||}{ \ell^2} \right)\]
class pymc3.gp.cov.WarpedInput(input_dim, cov_func, warp_func, args=None, active_dims=None)

Warp the inputs of any covariance function using an arbitrary function defined using Theano.

\[k_{\mathrm{warped}}(x, x') = k(w(x), w(x'))\]
Parameters:

cov_func : Covariance

warp_func : function

Theano function of X and additional optional arguments.

args : optional, tuple or list of scalars or PyMC3 variables

Additional inputs (besides X or Z) to warp_func.

Math

This submodule contains various mathematical functions. Most of them are imported directly from theano.tensor (see there for more details). Doing any kind of math with PyMC3 random variables, or defining custom likelihoods or priors requires you to use these theano expressions rather than NumPy or Python code.

dot(a, b) Computes the dot product of two variables.
constant(x[, name, ndim, dtype])
flatten(x[, outdim]) Reshapes the variable x by keeping the first outdim-1 dimension size(s) of x the same, and making the last dimension size of x equal to the multiplication of its remaining dimension size(s).
zeros_like(model[, dtype, opt]) equivalent of numpy.zeros_like
ones_like(model[, dtype, opt]) equivalent of numpy.ones_like
stack(\*tensors, \*\*kwargs) Stack tensors in sequence on given axis (default is 0).
concatenate(tensor_list[, axis]) Alias for `join`(axis, *tensor_list).
sum(input[, axis, dtype, keepdims, acc_dtype]) Computes the sum along the given axis(es) of a tensor input.
prod(input[, axis, dtype, keepdims, ...]) Computes the product along the given axis(es) of a tensor input.
lt a < b
gt a > b
le a <= b
ge a >= b
eq a == b
neq a != b
switch if cond then ift else iff
clip Clip x to be between min and max.
where if cond then ift else iff
and_ bitwise a & b
or_ bitwise a | b
abs_ |`a`|
exp e^`a`
log base e logarithm of a
cos cosine of a
sin sine of a
tan tangent of a
cosh hyperbolic cosine of a
sinh hyperbolic sine of a
tanh hyperbolic tangent of a
sqr square of a
sqrt square root of a
erf error function
erfinv inverse error function
dot(a, b) Computes the dot product of two variables.
maximum elemwise maximum. See max for the maximum in one tensor
minimum elemwise minimum. See min for the minimum in one tensor
sgn sign of a
ceil ceiling of a
floor floor of a
det Matrix determinant.
matrix_inverse Computes the inverse of a matrix \(A\).
extract_diag Return the diagonal of a matrix.
matrix_dot(\*args) Shorthand for product between several dots.
trace(X) Returns the sum of diagonal elements of matrix X.
sigmoid Generalizes a scalar op to tensors.
logsumexp(x[, axis])
invlogit(x[, eps])
logit(p)