API Reference

This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC experimental.

pymc_experimental.bart

class pymc_experimental.bart.BART(name, X, Y, m=50, alpha=0.25, k=2, split_prior=None, **kwargs)[source]

Bayesian Additive Regression Tree distribution.

Distribution representing a sum over trees

Parameters
  • X (array-like) – The covariate matrix.

  • Y (array-like) – The response vector.

  • m (int) – Number of trees

  • alpha (float) – Control the prior probability over the depth of the trees. Even when it can takes values in the interval (0, 1), it is recommended to be in the interval (0, 0.5].

  • k (float) – Scale parameter for the values of the leaf nodes. Defaults to 2. Recomended to be between 1 and 3.

  • split_prior (array-like) – Each element of split_prior should be in the [0, 1] interval and the elements should sum to 1. Otherwise they will be normalized. Defaults to None, i.e. all covariates have the same prior probability to be selected.

classmethod dist(*params, **kwargs)[source]

Creates a RandomVariable corresponding to the cls distribution.

Parameters
  • dist_params (array-like) – The inputs to the RandomVariable Op.

  • shape (int, tuple, Variable, optional) –

    A tuple of sizes for each dimension of the new RV.

    An Ellipsis (…) may be inserted in the last position to short-hand refer to all the dimensions that the RV would get if no shape/size/dims were passed at all.

  • size (int, tuple, Variable, optional) – For creating the RV like in Aesara/NumPy.

Returns

rv – The created RV.

Return type

RandomVariable

logp(*inputs)[source]

Calculate log probability.

Parameters

x (numeric, TensorVariable) – Value for which log-probability is calculated.

Returns

Return type

TensorVariable

class pymc_experimental.bart.PGBART(*args, **kwargs)[source]

Particle Gibss BART sampling step.

Parameters
  • vars (list) – List of value variables for sampler

  • num_particles (int) – Number of particles for the conditional SMC sampler. Defaults to 40

  • max_stages (int) – Maximum number of iterations of the conditional SMC sampler. Defaults to 100.

  • batch (int or tuple) – Number of trees fitted per step. Defaults to “auto”, which is the 10% of the m trees during tuning and after tuning. If a tuple is passed the first element is the batch size during tuning and the second the batch size after tuning.

  • model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).

static competence(var, has_grad)[source]

PGBART is only suitable for BART distributions.

init_particles(tree_id: int) numpy.ndarray[source]

Initialize particles.

normalize(particles)[source]

Use logsumexp trick to get W_t and softmax to get normalized_weights.

update_weight(particle, old=False)[source]

Update the weight of a particle.

Since the prior is used as the proposal,the weights are updated additively as the ratio of the new and old log-likelihoods.

pymc_experimental.bart.plot_dependence(idata, X=None, Y=None, kind='pdp', xs_interval='linear', xs_values=None, var_idx=None, var_discrete=None, samples=50, instances=10, random_seed=None, sharey=True, rug=True, smooth=True, indices=None, grid='long', color='C0', color_mean='C0', alpha=0.1, figsize=None, smooth_kwargs=None, ax=None)[source]

Partial dependence or individual conditional expectation plot.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • X (array-like) – The covariate matrix.

  • Y (array-like) – The response vector.

  • kind (str) – Whether to plor a partial dependence plot (“pdp”) or an individual conditional expectation plot (“ice”). Defaults to pdp.

  • xs_interval (str) – Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.

  • xs_values (int or list) – Values of X used to evaluate the predicted function. If xs_interval="linear" number of points in the evenly spaced grid. If xs_interval="quantiles"``quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored when ``xs_interval="insample".

  • var_idx (list) – List of the indices of the covariate for which to compute the pdp or ice.

  • var_discrete (list) – List of the indices of the covariate treated as discrete.

  • samples (int) – Number of posterior samples used in the predictions. Defaults to 50

  • instances (int) – Number of instances of X to plot. Only relevant if ice kind="ice" plots.

  • random_seed (int) – Seed used to sample from the posterior. Defaults to None.

  • sharey (bool) – Controls sharing of properties among y-axes. Defaults to True.

  • rug (bool) – Whether to include a rugplot. Defaults to True.

  • smooth (bool) – If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.

  • grid (str or tuple) – How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.

  • color (matplotlib valid color) – Color used to plot the pdp or ice. Defaults to “C0”

  • color_mean (matplotlib valid color) – Color used to plot the mean pdp or ice. Defaults to “C0”,

  • alpha (float) – Transparency level, should in the interval [0, 1].

  • figsize (tuple) – Figure size. If None it will be defined automatically.

  • smooth_kwargs (dict) – Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.

  • ax (axes) – Matplotlib axes.

Returns

axes

Return type

matplotlib axes

pymc_experimental.bart.plot_variable_importance(idata, labels=None, figsize=None, samples=100, random_seed=None)[source]

Estimates variable importance from the BART-posterior.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • labels (list) – List of the names of the covariates.

  • figsize (tuple) – Figure size. If None it will be defined automatically.

  • samples (int) – Number of predictions used to compute correlation for subsets of variables. Defaults to 100

  • random_seed (int) – random_seed used to sample from the posterior. Defaults to None.

Returns

  • idxs (indexes of the covariates from higher to lower relative importance)

  • axes (matplotlib axes)

pymc_experimental.bart.predict(idata, rng, X_new=None, size=None, excluded=None)[source]

Generate samples from the BART-posterior.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • rng (NumPy random generator) –

  • X_new (array-like) – A new covariate matrix. Use it to obtain out-of-sample predictions

  • size (int or tuple) – Number of samples.

  • excluded (list) – indexes of the variables to exclude when computing predictions