Using find_MAP on models with discrete variablesΒΆ

Maximum a posterior(MAP) estimation, can be difficult in models which have discrete stochastic variables. Here we demonstrate the problem with a simple model, and present a few possible work arounds.

In [1]:
import pymc3 as mc

We define a simple model of a survey with one data point. We use a \(Beta\) distribution for the \(p\) parameter in a binomial. We would like to know both the posterior distribution for p, as well as the predictive posterior distribution over the survey parameter.

In [2]:
alpha = 4
beta = 4
n = 20
yes = 15

with mc.Model() as model:
    p = mc.Beta('p', alpha, beta)
    surv_sim = mc.Binomial('surv_sim', n=n, p=p)
    surv = mc.Binomial('surv', n=n, p=p, observed=yes)
Applied logodds-transform to p and added transformed p_logodds_ to model.

First let’s try and use find_MAP.

In [3]:
with model:
    print(mc.find_MAP())
{'surv_sim': array(10), 'p_logodds_': array(0.42285684671251805)}

find_map defaults to find the MAP for only the continuous variables we have to specify if we would like to use the discrete variables.

In [4]:
with model:
    print(mc.find_MAP(vars=model.vars, disp=True))
{'surv_sim': array(14), 'p_logodds_': array(0.7884573537909452)}

We set the disp variable to display a warning that we are using a non-gradient minimization technique, as discrete variables do not give much gradient information. To demonstrate this, if we use a gradient based minimization, fmin_bfgs, with various starting points we see that the map does not converge.

In [5]:
with model:
    for i in range(n+1):
        s = {'p':0.5, 'surv_sim':i}
        map_est = mc.find_MAP(start=s, vars=model.vars, fmin=mc.starting.optimize.fmin_bfgs)
        print('surv_sim: %i->%i, p: %f->%f, LogP:%f'%(s['surv_sim'],
                                                      map_est['surv_sim'],
                                                      s['p'],
                                                      map_est['p'],
                                                      model.logpc(map_est)))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-299cdf766633> in <module>()
      2     for i in range(n+1):
      3         s = {'p':0.5, 'surv_sim':i}
----> 4         map_est = mc.find_MAP(start=s, vars=model.vars, fmin=mc.starting.optimize.fmin_bfgs)
      5         print('surv_sim: %i->%i, p: %f->%f, LogP:%f'%(s['surv_sim'],
      6                                                       map_est['surv_sim'],

/home/wiecki/working/projects/pymc/pymc3/tuning/starting.py in find_MAP(start, vars, fmin, return_raw, model, *args, **kwargs)
     82     if 'fprime' in getargspec(fmin).args:
     83         r = fmin(logp_o, bij.map(
---> 84             start), fprime=grad_logp_o, *args, **kwargs)
     85     else:
     86         r = fmin(logp_o, bij.map(start), *args, **kwargs)

/home/wiecki/working/projects/pymc/pymc3/blocking.py in map(self, dpt)
     50         apt = np.empty(self.ordering.dimensions)
     51         for var, slc, _, _ in self.ordering.vmap:
---> 52             apt[slc] = dpt[var].ravel()
     53         return apt
     54

KeyError: 'p_logodds_'

Once again because the gradient of surv_sim provides no information to the fmin routine and it is only changed in a few cases, most of which are not correct. Manually, looking at the log proability we can see that the maximum is somewhere around surv_sim\(=14\) and p\(=0.7\). If we employ a non-gradient minimization, such as fmin_powell (the default when discrete variables are detected), we might be able to get a better estimate.

In [ ]:
with model:
    for i in range(n+1):
        s = {'p':0.5, 'surv_sim':i}
        map_est = mc.find_MAP(start=s, vars=model.vars)
        print('surv_sim: %i->%i, p: %f->%f, LogP:%f'%(s['surv_sim'],
                                                      map_est['surv_sim'],
                                                      s['p'],
                                                      map_est['p'],
                                                      model.logpc(map_est)))

For most starting values this converges to the maximum log likelihood of \(\approx -3.15\), but for particularly low starting values of surv_sim, or values near surv_sim\(=14\) there is still some noise. The scipy optimize package contains some more general ‘global’ minimization functions that we can utilize. The basinhopping algorithm restarts the optimization at places near found minimums. Because it has a slightly different interface to other minimization schemes we have to define a wrapper function.

In [ ]:
def bh(*args,**kwargs):
    result = mc.starting.optimize.basinhopping(*args, **kwargs)
    # A `Result` object is returned, the argmin value can be in `x`
    return result['x']

with model:
    for i in range(n+1):
        s = {'p':0.5, 'surv_sim':i}
        map_est = mc.find_MAP(start=s, vars=model.vars, fmin=bh)
        print('surv_sim: %i->%i, p: %f->%f, LogP:%f'%(s['surv_sim'],
                                                      floor(map_est['surv_sim']),
                                                      s['p'],
                                                      map_est['p'],
                                                      model.logpc(map_est)))

By default basinhopping uses a gradient minimization technique, fmin_bfgs, resulting in inaccurate predictions many times. If we force basinhoping to use a non-gradient technique we get much better results

In [ ]:
with model:
    for i in range(n+1):
        s = {'p':0.5, 'surv_sim':i}
        map_est = mc.find_MAP(start=s, vars=model.vars, fmin=bh, minimizer_kwargs={"method": /"Powell"})
        print('surv_sim: %i->%i, p: %f->%f, LogP:%f'%(s['surv_sim'],
                                                      map_est['surv_sim'],
                                                      s['p'],
                                                      map_est['p'],
                                                      model.logpc(map_est)))

Confident in our MAP estimate we can sample from the posterior, making sure we use the Metropolis method for our discrete variables.

In [ ]:
with model:
    step1 = mc.step_methods.HamiltonianMC(vars=[p])
    step2 = mc.step_methods.Metropolis(vars=[surv_sim])
In [ ]:
with model:
    trace = mc.sample(25000,[step1,step2],start=map_est)
In [ ]:
mc.traceplot(trace);
In [ ]: