Stats¶
Statistical utility functions for PyMC

pymc3.stats.
autocorr
(pymc3_obj, *args, **kwargs)¶ Sample autocorrelation at specified lag. The autocorrelation is the correlation of x_i with x_{i+lag}.

pymc3.stats.
autocov
(pymc3_obj, *args, **kwargs)¶ Sample autocovariance at specified lag. The autocovariance is a 2x2 matrix with the variances of x[:lag] and x[lag:] in the diagonal and the autocovariance on the offdiagonal.

pymc3.stats.
dic
(trace, model=None)¶ Calculate the deviance information criterion of the samples in trace from model Read more theory here  in a paper by some of the leading authorities on Model Selection  dx.doi.org/10.1111/14679868.00353

pymc3.stats.
bpic
(trace, model=None)¶ Calculates Bayesian predictive information criterion n of the samples in trace from model Read more theory here  in a paper by some of the leading authorities on Model Selection  dx.doi.org/10.1111/14679868.00353

pymc3.stats.
waic
(trace, model=None, pointwise=False)¶ Calculate the widely available information criterion, its standard error and the effective number of parameters of the samples in trace from model. Read more theory here  in a paper by some of the leading authorities on Model Selection  dx.doi.org/10.1111/14679868.00353
Parameters:  trace (result of MCMC run) –
 model (PyMC Model) – Optional model. Default None, taken from context.
 pointwise (bool) – if True the pointwise predictive accuracy will be returned. Default False
Returns:  namedtuple with the following elements
 waic (widely available information criterion)
 waic_se (standard error of waic)
 p_waic (effective number parameters)
 waic_i (and array of the pointwise predictive accuracy, only if pointwise True)

pymc3.stats.
loo
(trace, model=None, pointwise=False)¶ Calculates leaveoneout (LOO) crossvalidation for out of sample predictive model fit, following Vehtari et al. (2015). Crossvalidation is computed using Paretosmoothed importance sampling (PSIS).
Parameters:  trace (result of MCMC run) –
 model (PyMC Model) – Optional model. Default None, taken from context.
 pointwise (bool) – if True the pointwise predictive accuracy will be returned. Default False
Returns:  namedtuple with the following elements
 loo (approximated Leaveoneout crossvalidation)
 loo_se (standard error of loo)
 p_loo (effective number of parameters)
 loo_i (and array of the pointwise predictive accuracy, only if pointwise True)

pymc3.stats.
hpd
(pymc3_obj, *args, **kwargs)¶ Calculate highest posterior density (HPD) of array for given alpha. The HPD is the minimum width Bayesian credible interval (BCI).
Arguments:  x : Numpy array
An array containing MCMC samples
 alpha : float
Desired probability of type I error (defaults to 0.05)
 transform : callable
Function to transform data (defaults to identity)

pymc3.stats.
quantiles
(pymc3_obj, *args, **kwargs)¶ Returns a dictionary of requested quantiles from array
Arguments:  x : Numpy array
An array containing MCMC samples
 qlist : tuple or list
A list of desired quantiles (defaults to (2.5, 25, 50, 75, 97.5))
 transform : callable
Function to transform data (defaults to identity)

pymc3.stats.
mc_error
(pymc3_obj, *args, **kwargs)¶ Calculates the simulation standard error, accounting for nonindependent samples. The trace is divided into batches, and the standard deviation of the batch means is calculated.
Arguments:  x : Numpy array
An array containing MCMC samples
 batches : integer
Number of batches

pymc3.stats.
summary
(trace, varnames=None, transform=<function <lambda>>, alpha=0.05, start=0, batches=None, roundto=3, include_transformed=False, to_file=None)¶ Generate a prettyprinted summary of the node.
Parameters:  trace (Trace object) – Trace containing MCMC sample
 varnames (list of strings) – List of variables to summarize. Defaults to None, which results in all variables summarized.
 transform (callable) – Function to transform data (defaults to identity)
 alpha (float) – The alpha level for generating posterior intervals. Defaults to 0.05.
 start (int) – The starting index from which to summarize (each) chain. Defaults to zero.
 batches (None or int) – Batch size for calculating standard deviation for nonindependent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None.
 roundto (int) – The number of digits to round posterior statistics.
 include_transformed (bool) – Flag for summarizing automatically transformed variables in addition to original variables (defaults to False).
 to_file (None or string) – File to write results to. If not given, print to stdout.

pymc3.stats.
df_summary
(trace, varnames=None, stat_funcs=None, extend=False, include_transformed=False, alpha=0.05, batches=None)¶ Create a data frame with summary statistics.
Parameters:  trace (MultiTrace instance) –
 varnames (list) – Names of variables to include in summary
 stat_funcs (None or list) –
A list of functions used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included.
The functions will be given one argument, the samples for a variable as a 2 dimensional array, where the first axis corresponds to sampling iterations and the second axis represents the flattened variable (e.g., x__0, x__1,...). Each function should return either
 A pandas.Series instance containing the result of calculating the statistic along the first axis. The name attribute will be taken as the name of the statistic.
 A pandas.DataFrame where each column contains the result of calculating the statistic along the first axis. The column names will be taken as the names of the statistics.
 extend (boolean) – If True, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not None.
 include_transformed (bool) – Flag for reporting automatically transformed variables in addition to original variables (defaults to False).
 alpha (float) – The alpha level for generating posterior intervals. Defaults to 0.05. This is only meaningful when stat_funcs is None.
 batches (None or int) – Batch size for calculating standard deviation for nonindependent samples. Defaults to the smaller of 100 or the number of samples. This is only meaningful when stat_funcs is None.
See also
summary()
 Generate a prettyprinted summary of a trace.
Returns: pandas.DataFrame with summary statistics for each variable Examples
>>> import pymc3 as pm >>> trace.mu.shape (1000, 2) >>> pm.df_summary(trace, ['mu']) mean sd mc_error hpd_5 hpd_95 mu__0 0.106897 0.066473 0.001818 0.020612 0.231626 mu__1 0.046597 0.067513 0.002048 0.174753 0.081924
Other statistics can be calculated by passing a list of functions.
>>> import pandas as pd >>> def trace_sd(x): ... return pd.Series(np.std(x, 0), name='sd') ... >>> def trace_quantiles(x): ... return pd.DataFrame(pm.quantiles(x, [5, 50, 95])) ... >>> pm.df_summary(trace, ['mu'], stat_funcs=[trace_sd, trace_quantiles]) sd 5 50 95 mu__0 0.066473 0.000312 0.105039 0.214242 mu__1 0.067513 0.159097 0.045637 0.062912

pymc3.stats.
compare
(traces, models, ic='WAIC')¶ Compare models based on the widely available information criterion (WAIC) or leaveoneout (LOO) crossvalidation. Read more theory here  in a paper by some of the leading authorities on Model Selection  dx.doi.org/10.1111/14679868.00353
Parameters:  traces (list of PyMC3 traces) –
 models (list of PyMC3 models) – in the same order as traces.
 ic (string) – Information Criterion (WAIC or LOO) used to compare models. Default WAIC.
Returns:  A DataFrame, ordered from lowest to highest IC. The index reflects
 the order in which the models are passed to this function. The columns are
 IC (Information Criteria (WAIC or LOO).) – Smaller IC indicates higher outofsample predictive fit (“better” model). Default WAIC.
 pIC (Estimated effective number of parameters.)
 dIC (Relative difference between each IC (WAIC or LOO))
 and the lowest IC (WAIC or LOO). – It’s always 0 for the topranked model.
 weight (Akaike weights for each model.) – This can be loosely interpreted as the probability of each model (among the compared model) given the data. Be careful that these weights are based on point estimates of the IC (uncertainty is ignored).
 SE (Standard error of the IC estimate.) – For a “large enough” sample size this is an estimate of the uncertainty in the computation of the IC.
 dSE (Standard error of the difference in IC between each model and)
 the topranked model. – It’s always 0 for the topranked model.
 warning (A value of 1 indicates that the computation of the IC may not be)
 reliable see http (//arxiv.org/abs/1507.04544 for details.)