Sampling Configuration Guide

This guide covers the configuration options for nutpie.sample and provides practical advice for tuning your sampler. We’ll start with basic usage and move to advanced topics like mass matrix adaptation.

Quick Start

For most models, don’t think too much about the options of the sampler, and just use the defaults. Most sampling problems can’t easily be solved by changing the sampler, but require changes to the model. So in most cases, simply use

trace = nutpie.sample(compiled_model)

Core Sampling Parameters

Drawing Samples

trace = nutpie.sample(
    model,
    draws=1000,          # Number of post-warmup draws per chain
    tune=500,            # Number of warmup draws for adaptation
    chains=6,            # Number of independent chains
    cores=None,          # Number of chains that are allowed to run simultaneously
    seed=12345           # Random seed for reproducibility
)

The number of draws affects both accuracy and computational cost:

  • Too few draws (< 500) may not capture the posterior well
  • Too many draws (> 10000) may waste computation time

If a model is sampling without divergences, but with effective sample sizes that are not large enough to achieve the desired Markov-error for your estimates, you can increase the number of chains and/or draws.

If the effective sample size is much smaller than the number of draws, you might want to consider reparameterizing the model instead, to, for instance, remove posterior correlations.

Sampler Diagnostics

You can enable more detailed diagnostics when troubleshooting:

trace = nutpie.sample(
    model,
    save_warmup=True,          # Keep warmup draws, default is True
    store_divergences=True,    # Track divergent transitions
    store_unconstrained=True,  # Store transformed parameters
    store_gradient=True,       # Store gradient information
    store_mass_matrix=True     # Track mass matrix adaptation
)

For each of the store_* arguments, additional arrays will be available in the trace.sample_stats.

Non-blocking sampling

Settings for HMC and NUTS

trace = nutpie.sample(
    model,
    target_accept=0.8,     # Target acceptance rate
    maxdepth=10            # Maximum tree depth
    max_energy_error=1000  # Error at which to count the trajectory as a divergent transition
)

The target_accept parameter implicitly controls the step size of the leapfrog steps in the HMC sampler. During tuning, the sampler will try to choose a step size such that the average acceptance rate is equal to target_accept. It has to be between 0 and 1.

The default is 0.8. Larger values will increase the computational cost, but might avoid divergences during sampling. In many diverging models, however, increasing target_accept will only make divergences less frequent, but not solve the underlying problem.

Lowering the maximum energy error to, for instance, 10 will often increase the number of divergences, and make it easier to diagnose their cause. Moreover, with a lower value for max_energy_error, divergences often get reported closer to the critical point(s) in parameter space that are actually causing them.

Mass Matrix Adaptation

Nutpie offers several strategies for adapting the mass matrix, which determines how the sampler navigates the parameter space.

The adaptation strategy is controlled by the adaptation argument, which accepts one of four values:

"diag" (default)

The default strategy. Nutpie estimates a diagonal mass matrix from both draw variance and gradient variance. This is usually the most efficient choice.

trace = nutpie.sample(model)
# equivalent to:
trace = nutpie.sample(model, adaptation="diag")

"draw_diag"

A diagonal mass matrix estimated from draw variance only, similar to the adaptation used in Stan and PyMC. This will usually result in less efficient sampling, but the total number of effective samples is occasionally higher.

trace = nutpie.sample(
    model,
    adaptation="draw_diag"
)

"low_rank"

For models with strong parameter correlations you can enable a low-rank modified mass matrix. This extends the diagonal mass matrix with a low-rank update that can capture some posterior correlations. The mass_matrix_gamma parameter is a regularization parameter — more regularization will lead to a smaller effect of the low-rank components, but might work better for higher-dimensional problems.

mass_matrix_eigval_cutoff should be greater than one, and controls how large an eigenvalue of the full mass matrix must be in order for its associated eigenvector to be included in the low-rank mass matrix.

trace = nutpie.sample(
    model,
    adaptation="low_rank",
    mass_matrix_eigval_cutoff=3,
    mass_matrix_gamma=1e-5
)

"flow" (Experimental)

A normalizing-flow reparameterization that is adapted during tuning. This allows sampling from many posteriors where current methods diverge. It is described in more detail here.

trace = nutpie.sample(
    model,
    adaptation="flow"
)

Zarr Storage (Experimental)

Nutpie includes experimental support for writing traces directly to zarr storage, which can be useful for large traces that don’t fit in memory or for distributed storage scenarios. The zarr format provides efficient, chunked, compressed storage for multi-dimensional arrays.

Basic Usage

You can write traces directly to zarr storage by providing a zarr_store parameter:

import nutpie
import pymc as pm

with pm.Model() as model:
    pm.HalfNormal("a")

compiled = nutpie.compile_pymc_model(model, backend="numba")

# Create a local zarr store
path = "trace.zarr"
store = nutpie.zarr_store.LocalStore(path)

trace = nutpie.sample(
    compiled,
    chains=2,
    seed=123,
    draws=100,
    tune=100,
    zarr_store=store
)

Memory Considerations

When using zarr storage, the trace object supports lazy loading:

# The trace is not loaded into memory by default
posterior_data = trace.posterior.a  # Lazy access

# Explicitly load the entire trace into memory (optional)
loaded_trace = trace.load()
posterior_data = loaded_trace.posterior.a  # In-memory access

Available Store Types

Nutpie supports several zarr store backends:

  • nutpie.zarr_store.LocalStore(path) - Local filesystem storage
  • nutpie.zarr_store.S3Store(...) - Amazon S3 storage
  • nutpie.zarr_store.GCSStore(...) - Google Cloud Storage
  • nutpie.zarr_store.AzureStore(...) - Azure Blob Storage
  • nutpie.zarr_store.HTTPStore(...) - HTTP-based storage

Progress Monitoring

Customize the sampling progress display:

trace = nutpie.sample(
    model,
    progress_bar=True,
    progress_rate=500,  # Update every 500ms
)