A worked example


An extended version of this example can be found in the DynamicHMCExamples.jl package.

Consider estimating estimating the parameter $0 \le \alpha \le 1$ from $n$ IID observations[4]

\[y_i \sim \mathrm{Bernoulli}(\alpha)\]

We will code this with the help of TransformVariables.jl, and obtain the gradient with ForwardDiff.jl (in practice, at the moment I would recommend ForwardDiff.jl for small models, and Flux.jl for larger ones — consider benchmarking a single evaluation of the log density with gradient).[5]

First, we load the packages we use.

using TransformVariables, LogDensityProblems, DynamicHMC,
    DynamicHMC.Diagnostics, Parameters, Statistics, Random

Generally, I would recommend defining an immutable composite type (ie struct) to hold the data and all parameters relevant for the log density (eg the prior). This allows you to test your code in a modular way before sampling. For this model, the number of draws equal to 1 is a sufficient statistic.

struct BernoulliProblem
    n::Int # Total number of draws in the data
    s::Int # Number of draws `==1` in the data

Then we make this problem callable with the parameters. Here, we have a single parameter α, but pass this in a NamedTuple to demonstrate a generally useful pattern. Then, we define an instance of this problem with the data, called p.[6]

function (problem::BernoulliProblem)(θ)
    @unpack α = θ               # extract the parameters
    @unpack n, s = problem       # extract the data
    # log likelihood, with constant terms dropped
    s * log(α) + (n-s) * log(1-α)

It is generally a good idea to test that your code works by calling it with the parameters; it should return a likelihood. For more complex models, you should benchmark and optimize this callable directly.

p = BernoulliProblem(20, 10)
p((α = 0.5, )) # make sure that it works

With TransformVariables.jl, we set up a transformation $\mathbb{R} \to [0,1]$ for $\alpha$, and use the convenience function TransformedLogDensity to obtain a log density in $\mathbb{R}^1$. Finally, we obtain a log density that supports gradients using automatic differentiation.

trans = as((α = as𝕀,))
P = TransformedLogDensity(trans, p)
∇P = ADgradient(:ForwardDiff, P)
ForwardDiff AD wrapper for TransformedLogDensity of dimension 1, w/ chunk size 1

Finally, we run MCMC with warmup. Note that you have to specify the random number generator explicitly — this is good practice for parallel code. The last parameter is the number of samples.

results = mcmc_with_warmup(Random.GLOBAL_RNG, ∇P, 1000; reporter = NoProgressReport())

The returned value is a NamedTuple. Most importantly, it contains the field chain, which is a vector of vectors. You should use the transformation you defined above to retrieve the parameters (here, only α). We display the mean here to check that it was recovered correctly.

posterior = transform.(trans, results.chain)
posterior_α = first.(posterior)

Using the DynamicHMC.Diagnostics submodule, you can obtain various useful diagnostics. The tree statistics in particular contain a lot of useful information about turning, divergence, acceptance rates, and tree depths for each step of the chain. Here we just obtain a summary.

Hamiltonian Monte Carlo sample of length 1000
  acceptance rate mean: 0.91, 5/25/50/75/95%: 0.63 0.87 0.97 1.0 1.0
  termination: divergence => 0%, max_depth => 0%, turning => 100%
  depth: 0 => 0%, 1 => 63%, 2 => 37%

Usually one would run parallel chains and check convergence and mixing using generic MCMC diagnostics not specific to NUTS. See MCMCDiagnostics.jl for an implementation of $\hat{R}$ and effective sample size calculations.

  • 4Note that NUTS is not especially suitable for low-dimensional parameter spaces, but this example works fine.
  • 5

    An example of how you can benchmark a log density with gradient ∇P, obtained as described below:

    using BenchmarkTools, LogDensityProblems
    x = randn(LogDensityProblems.dimension(∇P))
    @benchmark LogDensityProblems.logdensity_and_gradient($∇P, $x)
  • 6Note that here we used a flat prior. This is generally not a good idea for variables with non-finite support: one would usually make priors parameters of the struct above, and add the log prior to the log likelihood above.