API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC experimental.
pymc_experimental.bart
¶
- class pymc_experimental.bart.BART(name, X, Y, m=50, alpha=0.25, k=2, split_prior=None, **kwargs)[source]¶
Bayesian Additive Regression Tree distribution.
Distribution representing a sum over trees
- Parameters
X (array-like) – The covariate matrix.
Y (array-like) – The response vector.
m (int) – Number of trees
alpha (float) – Control the prior probability over the depth of the trees. Even when it can takes values in the interval (0, 1), it is recommended to be in the interval (0, 0.5].
k (float) – Scale parameter for the values of the leaf nodes. Defaults to 2. Recomended to be between 1 and 3.
split_prior (array-like) – Each element of split_prior should be in the [0, 1] interval and the elements should sum to 1. Otherwise they will be normalized. Defaults to None, i.e. all covariates have the same prior probability to be selected.
- classmethod dist(*params, **kwargs)[source]¶
Creates a RandomVariable corresponding to the cls distribution.
- Parameters
dist_params (array-like) – The inputs to the RandomVariable Op.
shape (int, tuple, Variable, optional) –
A tuple of sizes for each dimension of the new RV.
An Ellipsis (…) may be inserted in the last position to short-hand refer to all the dimensions that the RV would get if no shape/size/dims were passed at all.
size (int, tuple, Variable, optional) – For creating the RV like in Aesara/NumPy.
- Returns
rv – The created RV.
- Return type
RandomVariable
- class pymc_experimental.bart.PGBART(*args, **kwargs)[source]¶
Particle Gibss BART sampling step.
- Parameters
vars (list) – List of value variables for sampler
num_particles (int) – Number of particles for the conditional SMC sampler. Defaults to 40
max_stages (int) – Maximum number of iterations of the conditional SMC sampler. Defaults to 100.
batch (int or tuple) – Number of trees fitted per step. Defaults to “auto”, which is the 10% of the m trees during tuning and after tuning. If a tuple is passed the first element is the batch size during tuning and the second the batch size after tuning.
model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).
- pymc_experimental.bart.plot_dependence(idata, X=None, Y=None, kind='pdp', xs_interval='linear', xs_values=None, var_idx=None, var_discrete=None, samples=50, instances=10, random_seed=None, sharey=True, rug=True, smooth=True, indices=None, grid='long', color='C0', color_mean='C0', alpha=0.1, figsize=None, smooth_kwargs=None, ax=None)[source]¶
Partial dependence or individual conditional expectation plot.
- Parameters
idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group
X (array-like) – The covariate matrix.
Y (array-like) – The response vector.
kind (str) – Whether to plor a partial dependence plot (“pdp”) or an individual conditional expectation plot (“ice”). Defaults to pdp.
xs_interval (str) – Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.
xs_values (int or list) – Values of X used to evaluate the predicted function. If
xs_interval="linear"
number of points in the evenly spaced grid. Ifxs_interval="quantiles"``quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored when ``xs_interval="insample"
.var_idx (list) – List of the indices of the covariate for which to compute the pdp or ice.
var_discrete (list) – List of the indices of the covariate treated as discrete.
samples (int) – Number of posterior samples used in the predictions. Defaults to 50
instances (int) – Number of instances of X to plot. Only relevant if ice
kind="ice"
plots.random_seed (int) – Seed used to sample from the posterior. Defaults to None.
sharey (bool) – Controls sharing of properties among y-axes. Defaults to True.
rug (bool) – Whether to include a rugplot. Defaults to True.
smooth (bool) – If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.
grid (str or tuple) – How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.
color (matplotlib valid color) – Color used to plot the pdp or ice. Defaults to “C0”
color_mean (matplotlib valid color) – Color used to plot the mean pdp or ice. Defaults to “C0”,
alpha (float) – Transparency level, should in the interval [0, 1].
figsize (tuple) – Figure size. If None it will be defined automatically.
smooth_kwargs (dict) – Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.
ax (axes) – Matplotlib axes.
- Returns
axes
- Return type
matplotlib axes
- pymc_experimental.bart.plot_variable_importance(idata, labels=None, figsize=None, samples=100, random_seed=None)[source]¶
Estimates variable importance from the BART-posterior.
- Parameters
idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group
labels (list) – List of the names of the covariates.
figsize (tuple) – Figure size. If None it will be defined automatically.
samples (int) – Number of predictions used to compute correlation for subsets of variables. Defaults to 100
random_seed (int) – random_seed used to sample from the posterior. Defaults to None.
- Returns
idxs (indexes of the covariates from higher to lower relative importance)
axes (matplotlib axes)
- pymc_experimental.bart.predict(idata, rng, X_new=None, size=None, excluded=None)[source]¶
Generate samples from the BART-posterior.
- Parameters
idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group
rng (NumPy random generator) –
X_new (array-like) – A new covariate matrix. Use it to obtain out-of-sample predictions
size (int or tuple) – Number of samples.
excluded (list) – indexes of the variables to exclude when computing predictions