Inference¶
Sampling¶

pymc3.sampling.
sample
(draws, step=None, init='ADVI', n_init=200000, start=None, trace=None, chain=0, njobs=1, tune=None, progressbar=True, model=None, random_seed=1, live_plot=False, **kwargs)¶ Draw samples from the posterior using the given step methods.
Multiple step methods are supported via compound step methods.
Parameters:  draws (int) – The number of samples to draw.
 step (function or iterable of functions) – A step function or collection of functions. If there are variables without a step methods, step methods for those variables will be assigned automatically.
 init (str {'ADVI', 'ADVI_MAP', 'MAP', 'NUTS', None}) –
Initialization method to use. Only works for autoassigned step methods.
 ADVI: Run ADVI to estimate starting points and diagonal covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean.
 ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point.
 MAP: Use the MAP as starting point.
 NUTS: Run NUTS to estimate starting points and covariance matrix. If njobs > 1 it will sample starting points from the estimated posterior, otherwise it will use the estimated posterior mean.
 None: Do not initialize.
 n_init (int) – Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘nuts’, number of draws.
 start (dict) – Starting point in parameter space (or partial point) Defaults to trace.point(1)) if there is a trace provided and model.test_point if not (defaults to empty dict).
 trace (backend, list, or MultiTrace) – This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used. Passing either “text” or “sqlite” is taken as a shortcut to set up the corresponding backend (with “mcmc” used as the base name).
 chain (int) – Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here.
 njobs (int) – Number of parallel jobs to start. If None, set to number of cpus in the system  2.
 tune (int) – Number of iterations to tune, if applicable (defaults to None)
 progressbar (bool) – Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA).
 model (Model (optional if in with context)) –
 random_seed (int or list of ints) – A list is accepted if more if njobs is greater than one.
 live_plot (bool) – Flag for live plotting the trace while sampling
Returns: trace (pymc3.backends.base.MultiTrace) – A MultiTrace object that contains the samples.

pymc3.sampling.
iter_sample
(draws, step, start=None, trace=None, chain=0, tune=None, model=None, random_seed=1)¶ Generator that returns a trace on each iteration using the given step method. Multiple step methods supported via compound step method returns the amount of time taken.
Parameters:  draws (int) – The number of samples to draw
 step (function) – Step function
 start (dict) – Starting point in parameter space (or partial point) Defaults to trace.point(1)) if there is a trace provided and model.test_point if not (defaults to empty dict)
 trace (backend, list, or MultiTrace) – This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number chain. If None or a list of variables, the NDArray backend is used.
 chain (int) – Chain number used to store sample in backend. If njobs is greater than one, chain numbers will start here.
 tune (int) – Number of iterations to tune, if applicable (defaults to None)
 model (Model (optional if in with context)) –
 random_seed (int or list of ints) – A list is accepted if more if njobs is greater than one.
Example
for trace in iter_sample(500, step): ...

pymc3.sampling.
sample_ppc
(trace, samples=None, model=None, vars=None, size=None, random_seed=None, progressbar=True)¶ Generate posterior predictive samples from a model given a trace.
Parameters:  trace (backend, list, or MultiTrace) – Trace generated from MCMC sampling
 samples (int) – Number of posterior predictive samples to generate. Defaults to the length of trace
 model (Model (optional if in with context)) – Model used to generate trace
 vars (iterable) – Variables for which to compute the posterior predictive samples. Defaults to model.observed_RVs.
 size (int) – The number of random draws from the distribution specified by the parameters in each sample of the trace.
Returns: samples (dict) – Dictionary with the variables as keys. The values corresponding to the posterior predictive samples.

pymc3.sampling.
init_nuts
(init='ADVI', njobs=1, n_init=500000, model=None, random_seed=1, **kwargs)¶ Initialize and sample from posterior of a continuous model.
This is a convenience function. NUTS convergence and sampling speed is extremely dependent on the choice of mass/scaling matrix. In our experience, using ADVI to estimate a diagonal covariance matrix and using this as the scaling matrix produces robust results over a wide class of continuous models.
Parameters:  init (str {'ADVI', 'ADVI_MAP', 'MAP', 'NUTS'}) – Initialization method to use. * ADVI : Run ADVI to estimate posterior mean and diagonal covariance matrix. * ADVI_MAP: Initialize ADVI with MAP and use MAP as starting point. * MAP : Use the MAP as starting point. * NUTS : Run NUTS and estimate posterior mean and covariance matrix.
 njobs (int) – Number of parallel jobs to start.
 n_init (int) – Number of iterations of initializer If ‘ADVI’, number of iterations, if ‘metropolis’, number of draws.
 model (Model (optional if in with context)) –
 **kwargs (keyword arguments) – Extra keyword arguments are forwarded to pymc3.NUTS.
Returns:  start (pymc3.model.Point) – Starting point for sampler
 nuts_sampler (pymc3.step_methods.NUTS) – Instantiated and initialized NUTS sampler object
Stepmethods¶
NUTS¶

class
pymc3.step_methods.hmc.nuts.
NUTS
(vars=None, Emax=1000, target_accept=0.8, gamma=0.05, k=0.75, t0=10, adapt_step_size=True, max_treedepth=10, **kwargs)¶ A sampler for continuous variables based on Hamiltonian mechanics.
NUTS automatically tunes the step size and the number of steps per sample. A detailed description can be found at [1], “Algorithm 6: Efficient NoUTurn Sampler with Dual Averaging”.
Nuts provides a number of statistics that can be accessed with trace.get_sampler_stats:
 mean_tree_accept: The mean acceptance probability for the tree that generated this sample. The mean of these values across all samples but the burnin should be approximately target_accept (the default for this is 0.8).
 diverging: Whether the trajectory for this sample diverged. If there are any divergences after burnin, this indicates that the results might not be reliable. Reparametrization can often help, but you can also try to increase target_accept to something like 0.9 or 0.95.
 energy: The energy at the point in phasespace where the sample was accepted. This can be used to identify posteriors with problematically long tails. See below for an example.
 energy_change: The difference in energy between the start and the end of the trajectory. For a perfect integrator this would always be zero.
 max_energy_change: The maximum difference in energy along the whole trajectory.
 depth: The depth of the tree that was used to generate this sample
 tree_size: The number of leafs of the sampling tree, when the sample was accepted. This is usually a bit less than 2 ** depth. If the tree size is large, the sampler is using a lot of leapfrog steps to find the next sample. This can for example happen if there are strong correlations in the posterior, if the posterior has long tails, if there are regions of high curvature (“funnels”), or if the variance estimates in the mass matrix are inaccurate. Reparametrisation of the model or estimating the posterior variances from past samples might help.
 tune: This is True, if step size adaptation was turned on when this sample was generated.
 step_size: The step size used for this sample.
 step_size_bar: The current best known stepsize. After the tuning samples, the step size is set to this value. This should converge during tuning.
References
[1] Hoffman, Matthew D., & Gelman, Andrew. (2011). The NoUTurn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Parameters:  vars (list of Theano variables, default all continuous vars) –
 Emax (float, default 1000) – Maximum energy change allowed during leapfrog steps. Larger deviations will abort the integration.
 target_accept (float (0,1), default .8) – Try to find a step size such that the average acceptance probability across the trajectories are close to target_accept. Higher values for target_accept lead to smaller step sizes.
 step_scale (float, default=0.25) – Size of steps to take, automatically scaled down by 1/n**(1/4). If step size adaptation is switched off, the resulting step size is used. If adaptation is enabled, it is used as initial guess.
 gamma (float, default .05) –
 k (float (5,1) default .75) – scaling of speed of adaptation
 t0 (int, default 10) – slows initial adaptation
 adapt_step_size (bool, default=True) – Whether step size adaptation should be enabled. If this is disabled, k, t0, gamma and target_accept are ignored.
 integrator (str, default "leapfrog") – The integrator to use for the trajectories. One of “leapfrog”, “twostage” or “threestage”. The second two can increase sampling speed for some high dimensional problems.
 step_scale – Initial size of steps to take, automatically scaled down by 1/n**(1/4).
 scaling (array_like, ndim = {1,2}) – The inverse mass, or precision matrix. One dimensional arrays are interpreted as diagonal matrices. If is_cov is set to True, this will be interpreded as the mass or covariance matrix.
 is_cov (bool, default=False) – Treat the scaling as mass or covariance matrix.
 potential (Potential, optional) – An object that represents the Hamiltonian with methods velocity, energy, and random methods. It can be specified instead of the scaling matrix.
 model (pymc3.Model) – The model
 kwargs (passed to BaseHMC) –
Notes
The step size adaptation stops when self.tune is set to False. This is usually achieved by setting the tune parameter if pm.sample to the desired number of tuning steps.
Metropolis¶

class
pymc3.step_methods.metropolis.
Metropolis
(vars=None, S=None, proposal_dist=None, scaling=1.0, tune=True, tune_interval=100, model=None, mode=None, **kwargs)¶ MetropolisHastings sampling step
Parameters:  vars (list) – List of variables for sampler
 S (standard deviation or covariance matrix) – Some measure of variance to parameterize proposal distribution
 proposal_dist (function) – Function that returns zeromean deviates when parameterized with S (and n). Defaults to normal.
 scaling (scalar or array) – Initial scale factor for proposal. Defaults to 1.
 tune (bool) – Flag for tuning. Defaults to True.
 tune_interval (int) – The frequency of tuning. Defaults to 100 iterations.
 model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).
 mode (string or Mode instance.) – compilation mode passed to Theano functions

class
pymc3.step_methods.metropolis.
BinaryMetropolis
(vars, scaling=1.0, tune=True, tune_interval=100, model=None)¶ MetropolisHastings optimized for binary variables
Parameters:  vars (list) – List of variables for sampler
 scaling (scalar or array) – Initial scale factor for proposal. Defaults to 1.
 tune (bool) – Flag for tuning. Defaults to True.
 tune_interval (int) – The frequency of tuning. Defaults to 100 iterations.
 model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).

static
competence
(var)¶ BinaryMetropolis is only suitable for binary (bool) and Categorical variables with k=1.

class
pymc3.step_methods.metropolis.
BinaryGibbsMetropolis
(vars, order='random', model=None)¶ A MetropoliswithinGibbs step method optimized for binary variables

static
competence
(var)¶ BinaryMetropolis is only suitable for Bernoulli and Categorical variables with k=2.

static

class
pymc3.step_methods.metropolis.
CategoricalGibbsMetropolis
(vars, proposal='uniform', order='random', model=None)¶ A MetropoliswithinGibbs step method optimized for categorical variables. This step method works for Bernoulli variables as well, but it is not optimized for them, like BinaryGibbsMetropolis is. Step method supports two types of proposals: A uniform proposal and a proportional proposal, which was introduced by Liu in his 1996 technical report “Metropolized Gibbs Sampler: An Improvement”.

static
competence
(var)¶ CategoricalGibbsMetropolis is only suitable for Bernoulli and Categorical variables.

static
Slice¶

class
pymc3.step_methods.slicer.
Slice
(vars=None, w=1.0, tune=True, model=None, **kwargs)¶ Univariate slice sampler step method
Parameters:  vars (list) – List of variables for sampler.
 w (float) – Initial width of slice (Defaults to 1).
 tune (bool) – Flag for tuning (Defaults to True).
 model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).
Hamiltonian Monte Carlo¶

class
pymc3.step_methods.hmc.hmc.
HamiltonianMC
(vars=None, path_length=2.0, step_rand=<function unif>, **kwargs)¶ Parameters:  vars (list of theano variables) –
 path_length (float, default=2) – total length to travel
 step_rand (function float > float, default=unif) – A function which takes the step size and returns an new one used to randomize the step size at each iteration.
 step_scale (float, default=0.25) – Initial size of steps to take, automatically scaled down by 1/n**(1/4).
 scaling (array_like, ndim = {1,2}) – The inverse mass, or precision matrix. One dimensional arrays are interpreted as diagonal matrices. If is_cov is set to True, this will be interpreded as the mass or covariance matrix.
 is_cov (bool, default=False) – Treat the scaling as mass or covariance matrix.
 potential (Potential, optional) – An object that represents the Hamiltonian with methods velocity, energy, and random methods. It can be specified instead of the scaling matrix.
 model (pymc3.Model) – The model
 **kwargs (passed to BaseHMC) –
Variational¶
ADVI¶
 2016, John Salvatier & Taku Yoshioka

pymc3.variational.advi.
advi
(vars=None, start=None, model=None, n=5000, accurate_elbo=False, optimizer=None, learning_rate=0.001, epsilon=0.1, mode=None, tol_obj=0.01, eval_elbo=100, random_seed=None)¶ Perform automatic differentiation variational inference (ADVI).
This function implements the meanfield ADVI, where the variational posterior distribution is assumed to be spherical Gaussian without correlation of parameters and fit to the true posterior distribution. The means and standard deviations of the variational posterior are referred to as variational parameters.
The return value of this function is an
ADVIfit
object, which has variational parameters. If you want to draw samples from the variational posterior, you need to pass theADVIfit
object topymc3.variational.sample_vp()
.The variational parameters are defined on the transformed space, which is required to do ADVI on an unconstrained parameter space as described in [KTR+2016]. The parameters in the
ADVIfit
object are in the transformed space, while traces returned bysample_vp()
are in the original space as obtained by MCMC sampling methods in PyMC3.The variational parameters are optimized with given optimizer, which is a function that returns a dictionary of parameter updates as provided to Theano function. If no optimizer is provided, optimization is performed with a modified version of adagrad, where only the last (n_window) gradient vectors are used to control the learning rate and older gradient vectors are ignored. n_window denotes the size of time window and fixed to 10.
Parameters:  vars (object) – Random variables.
 start (Dict or None) – Initial values of parameters (variational means).
 model (Model) – Probabilistic model.
 n (int) – Number of interations updating parameters.
 accurate_elbo (bool) – If true, 100 MC samples are used for accurate calculation of ELBO.
 optimizer ((loss, tensor) > dict or OrderedDict) – A function that returns parameter updates given loss and parameter
tensor. If
None
(default), a default Adagrad optimizer is used with parameterslearning_rate
andepsilon
below.  learning_rate (float) – Base learning rate for adagrad. This parameter is ignored when optimizer is given.
 epsilon (float) – Offset in denominator of the scale of learning rate in Adagrad. This parameter is ignored when optimizer is given.
 tol_obj (float) – Relative tolerance for testing convergence of ELBO.
 eval_elbo (int) – Window for checking convergence of ELBO. Convergence will be checked for every multiple of eval_elbo.
 random_seed (int or None) – Seed to initialize random state. None uses current seed.
 mode (string or Mode instance.) – Compilation mode passed to Theano functions
Returns:  ADVIFit – Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’.
 ‘means’ is the mean. ‘stds’ is the standard deviation.
 ‘elbo_vals’ is the trace of ELBO values during optimizaiton.
References
[KTR+2016] Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv preprint arXiv:1603.00788.

pymc3.variational.advi.
sample_vp
(vparams, draws=1000, model=None, local_RVs=None, random_seed=None, hide_transformed=True, progressbar=True)¶ Draw samples from variational posterior.
Parameters:  vparams (dict or pymc3.variational.ADVIFit) – Estimated variational parameters of the model.
 draws (int) – Number of random samples.
 model (pymc3.Model) – Probabilistic model.
 random_seed (int or None) – Seed of random number generator. None to use current seed.
 hide_transformed (bool) – If False, transformed variables are also sampled. Default is True.
Returns: trace (pymc3.backends.base.MultiTrace) – Samples drawn from the variational posterior.
ADVI minibatch¶

pymc3.variational.advi_minibatch.
advi_minibatch
(vars=None, start=None, model=None, n=5000, n_mcsamples=1, minibatch_RVs=None, minibatch_tensors=None, minibatches=None, global_RVs=None, local_RVs=None, observed_RVs=None, encoder_params=None, total_size=None, optimizer=None, learning_rate=0.001, epsilon=0.1, random_seed=None, mode=None)¶ Perform minibatch ADVI.
This function implements a minibatch automatic differentiation variational inference (ADVI; Kucukelbir et al., 2015) with the meanfield approximation. Autoencoding variational Bayes (AEVB; Kingma and Welling, 2014) is also supported.
For explanation, we classify random variables in probabilistic models into three types. Observed random variables \({\cal Y}=\{\mathbf{y}_{i}\}_{i=1}^{N}\) are \(N\) observations. Each \(\mathbf{y}_{i}\) can be a set of observed random variables, i.e., \(\mathbf{y}_{i}=\{\mathbf{y}_{i}^{k}\}_{k=1}^{V_{o}}\), where \(V_{k}\) is the number of the types of observed random variables in the model.
The next ones are global random variables \(\Theta=\{\theta^{k}\}_{k=1}^{V_{g}}\), which are used to calculate the probabilities for all observed samples.
The last ones are local random variables \({\cal Z}=\{\mathbf{z}_{i}\}_{i=1}^{N}\), where \(\mathbf{z}_{i}=\{\mathbf{z}_{i}^{k}\}_{k=1}^{V_{l}}\). These RVs are used only in AEVB.
The goal of ADVI is to approximate the posterior distribution \(p(\Theta,{\cal Z}{\cal Y})\) by variational posterior \(q(\Theta)\prod_{i=1}^{N}q(\mathbf{z}_{i})\). All of these terms are normal distributions (meanfield approximation).
\(q(\Theta)\) is parametrized with its means and standard deviations. These parameters are denoted as \(\gamma\). While \(\gamma\) is a constant, the parameters of \(q(\mathbf{z}_{i})\) are dependent on each observation. Therefore these parameters are denoted as \(\xi(\mathbf{y}_{i}; \nu)\), where \(\nu\) is the parameters of \(\xi(\cdot)\). For example, \(\xi(\cdot)\) can be a multilayer perceptron or convolutional neural network.
In addition to \(\xi(\cdot)\), we can also include deterministic mappings for the likelihood of observations. We denote the parameters of the deterministic mappings as \(\eta\). An example of such mappings is the deconvolutional neural network used in the convolutional VAE example in the PyMC3 notebook directory.
This function maximizes the evidence lower bound (ELBO) \({\cal L}(\gamma, \nu, \eta)\) defined as follows:
\[\begin{split}{\cal L}(\gamma,\nu,\eta) & = \mathbf{c}_{o}\mathbb{E}_{q(\Theta)}\left[ \sum_{i=1}^{N}\mathbb{E}_{q(\mathbf{z}_{i})}\left[ \log p(\mathbf{y}_{i}\mathbf{z}_{i},\Theta,\eta) \right]\right] \\ &  \mathbf{c}_{g}KL\left[q(\Theta)p(\Theta)\right]  \mathbf{c}_{l}\sum_{i=1}^{N} KL\left[q(\mathbf{z}_{i})p(\mathbf{z}_{i})\right],\end{split}\]where \(KL[q(v)p(v)]\) is the KullbackLeibler divergence
\[KL[q(v)p(v)] = \int q(v)\log\frac{q(v)}{p(v)}dv,\]\(\mathbf{c}_{o/g/l}\) are vectors for weighting each term of ELBO. More precisely, we can write each of the terms in ELBO as follows:
\[\begin{split}\mathbf{c}_{o}\log p(\mathbf{y}_{i}\mathbf{z}_{i},\Theta,\eta) & = & \sum_{k=1}^{V_{o}}c_{o}^{k} \log p(\mathbf{y}_{i}^{k} {\rm pa}(\mathbf{y}_{i}^{k},\Theta,\eta)) \\ \mathbf{c}_{g}KL\left[q(\Theta)p(\Theta)\right] & = & \sum_{k=1}^{V_{g}}c_{g}^{k}KL\left[ q(\theta^{k})p(\theta^{k}{\rm pa(\theta^{k})})\right] \\ \mathbf{c}_{l}KL\left[q(\mathbf{z}_{i}p(\mathbf{z}_{i})\right] & = & \sum_{k=1}^{V_{l}}c_{l}^{k}KL\left[ q(\mathbf{z}_{i}^{k}) p(\mathbf{z}_{i}^{k}{\rm pa}(\mathbf{z}_{i}^{k}))\right],\end{split}\]where \({\rm pa}(v)\) denotes the set of parent variables of \(v\) in the directed acyclic graph of the model.
When using minibatches, \(c_{o}^{k}\) and \(c_{l}^{k}\) should be set to \(N/M\), where \(M\) is the number of observations in each minibatch. Another weighting scheme was proposed in (Blundell et al., 2015) for accelarating model fitting.
For working with ADVI, we need to give the probabilistic model (
model
), the three types of RVs (observed_RVs
,global_RVs
andlocal_RVs
), the tensors to which minibathced samples are supplied (minibatches
) and parameters of deterministic mappings \(\xi\) and \(\eta\) (encoder_params
) as input arguments.observed_RVs
is aOrderedDict
of the form{y_k: c_k}
, wherey_k
is a random variable defined in the PyMC3 model.c_k
is a scalar (\(c_{o}^{k}\)) and it can be a shared variable.global_RVs
is aOrderedDict
of the form{t_k: c_k}
, wheret_k
is a random variable defined in the PyMC3 model.c_k
is a scalar (\(c_{g}^{k}\)) and it can be a shared variable.local_RVs
is aOrderedDict
of the form{z_k: ((m_k, s_k), c_k)}
, wherez_k
is a random variable defined in the PyMC3 model.c_k
is a scalar (\(c_{l}^{k}\)) and it can be a shared variable.(m_k, s_k)
is a pair of tensors of means and log standard deviations of the variational distribution; samples drawn from the variational distribution replacesz_k
. It should be noted that ifz_k
has a transformation that changes the dimension (e.g., StickBreakingTransform), the variational distribution must have the same dimension. For example, ifz_k
is distributed with Dirichlet distribution withp
choices, \(m_k\) ands_k
has the shape(n_samples_in_minibatch, p  1)
.minibatch_tensors
is a list of tensors (can be shared variables) to which minibatch samples are set during the optimization. These tensors are observations (obs=
) inobserved_RVs
.minibatches
is a generator of a list ofnumpy.ndarray
. Each item of the list will be set to tensors inminibatch_tensors
.encoder_params
is a list of shared variables of the parameters \(\nu\) and \(\eta\). We do not need to include the variational parameters of the global variables, \(\gamma\), because these are automatically created and updated in this function.The following is a list of example notebooks using advi_minibatch:
 docs/source/notebooks/GLMhierarchicaladviminibatch.ipynb
 docs/source/notebooks/bayesian_neural_network_advi.ipynb
 docs/source/notebooks/convolutional_vae_keras_advi.ipynb
 docs/source/notebooks/gaussianmixturemodeladvi.ipynb
 docs/source/notebooks/ldaadviaevb.ipynb
Parameters:  vars (object) – List of random variables. If None, variational posteriors (normal distribution) are fit for all RVs in the given model.
 start (Dict or None) – Initial values of parameters (variational means).
 model (Model) – Probabilistic model.
 n (int) – Number of iterations updating parameters.
 n_mcsamples (int) – Number of Monte Carlo samples to approximate ELBO.
 minibatch_RVs (list of ObservedRVs) – Random variables in the model for which minibatch tensors are set. When this argument is given, both of arguments local_RVs and observed_RVs must be None.
 minibatch_tensors (list of (tensors or shared variables)) – Tensors used to create ObservedRVs in minibatch_RVs.
 minibatches (generator of list) – Generates a set of minibatches when calling next(). The length of the returned list must be the same with the number of random variables in minibatch_tensors.
 total_size (int) – Total size of training samples. This is used to appropriately scale the log likelihood terms corresponding to minibatches in ELBO.
 observed_RVs (Ordered dict) – Include a scaling constant for the corresponding RV. See the above description.
 global_RVs (Ordered dict or None) – Include a scaling constant for the corresponding RV. See the above
description. If
None
, it is set to{v: 1 for v in grvs}
, wheregrvs
islist(set(vars)  set(list(local_RVs) + list(observed_RVs)))
.  local_RVs (Ordered dict or None) – Include encoded variational parameters and a scaling constant for the corresponding RV. See the above description.
 encoder_params (list of theano shared variables) – Parameters of encoder.
 optimizer ((loss, list of shared variables) > dict or OrderedDict) – A function that returns parameter updates given loss and shared
variables of parameters. If
None
(default), a default Adagrad optimizer is used with parameterslearning_rate
andepsilon
below.  learning_rate (float) – Base learning rate for adagrad.
This parameter is ignored when
optimizer
is set.  epsilon (float) – Offset in denominator of the scale of learning rate in Adagrad.
This parameter is ignored when
optimizer
is set.  random_seed (int) – Seed to initialize random state.
Returns: ADVIFit – Named tuple, which includes ‘means’, ‘stds’, and ‘elbo_vals’.
References
 Kingma, D. P., & Welling, M. (2014). AutoEncoding Variational Bayes. stat, 1050, 1.
 Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568576).
 Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning (ICML15) (pp. 16131622).