A worked example

Note

An extended version of this example can be found in the DynamicHMCExamples.jl package.

Problem statement

Consider estimating the parameter $0 \le \alpha \le 1$ from $n$ IID observations[4]

\[y_i \sim \mathrm{Bernoulli}(\alpha)\]

We will code this with the help of TransformVariables.jl, and obtain the gradient with ForwardDiff.jl (in practice, at the moment I would recommend ForwardDiff.jl for small models, and Flux.jl for larger ones — consider benchmarking a single evaluation of the log density with gradient).[5]

Coding up the log density

First, we load the packages we use.

using TransformVariables, TransformedLogDensities, LogDensityProblems, LogDensityProblemsAD,
    DynamicHMC, DynamicHMC.Diagnostics, SimpleUnPack, Statistics, Random

Generally, I would recommend defining an immutable composite type (ie struct) to hold the data and all parameters relevant for the log density (eg the prior). This allows you to test your code in a modular way before sampling. For this model, the number of draws equal to 1 is a sufficient statistic.

struct BernoulliProblem
    n::Int # Total number of draws in the data
    s::Int # Number of draws `==1` in the data
end

Then we make this problem callable with the parameters. Here, we have a single parameter α, but pass this in a NamedTuple to demonstrate a generally useful pattern. Then, we define an instance of this problem with the data, called p.[6]

function (problem::BernoulliProblem)(θ)
    @unpack α = θ               # extract the parameters
    @unpack n, s = problem       # extract the data
    # log likelihood, with constant terms dropped
    s * log(α) + (n-s) * log(1-α)
end

It is generally a good idea to test that your code works by calling it with the parameters; it should return a likelihood. For more complex models, you should benchmark and optimize this callable directly.

p = BernoulliProblem(20, 10)
p((α = 0.5, )) # make sure that it works
-13.862943611198906

With TransformVariables.jl, we set up a transformation $\mathbb{R} \to [0,1]$ for $\alpha$, and use the convenience function TransformedLogDensity to obtain a log density in $\mathbb{R}^1$. Finally, we obtain a log density that supports gradients using automatic differentiation, with LogDensityProblemsAD.jl.

trans = as((α = as𝕀,))
P = TransformedLogDensity(trans, p)
∇P = ADgradient(:ForwardDiff, P)
ForwardDiff AD wrapper for TransformedLogDensity of dimension 1, w/ chunk size 1

Finally, we run MCMC with warmup. Note that you have to specify the random number generator explicitly — this is good practice for parallel code. The last parameter is the number of samples.

results = mcmc_with_warmup(Random.default_rng(), ∇P, 1000; reporter = NoProgressReport())

The returned value is a NamedTuple. Most importantly, it contains the field posterior_matrix. You should use the transformation you defined above to retrieve the parameters (here, only α). We display the mean here to check that it was recovered correctly.

posterior = transform.(trans, eachcol(results.posterior_matrix))
posterior_α = first.(posterior)
mean(posterior_α)
0.496979445844135

Using the DynamicHMC.Diagnostics submodule, you can obtain various useful diagnostics. The tree statistics in particular contain a lot of useful information about turning, divergence, acceptance rates, and tree depths for each step of the chain. Here we just obtain a summary.

summarize_tree_statistics(results.tree_statistics)
Hamiltonian Monte Carlo sample of length 1000
  acceptance rate mean: 0.92, 5/25/50/75/95%: 0.68 0.89 0.97 1.0 1.0
  termination: divergence => 0%, max_depth => 0%, turning => 100%
  depth: 0 => 0%, 1 => 65%, 2 => 35%

Parallel chains and diagnostics

Usually one would run multiple chains and check convergence and mixing using generic MCMC diagnostics not specific to NUTS.

The specifics of running multiple chains is up to the user: various forms of parallel computing can be utilized depending on the problem scale and the hardware available. In the example below we use multi-threading, using ThreadTools.jl; other excellent packages are available for threading.

It is easy to obtain posterior results for use with MCMCDiagnosticsTools.jl with stack_posterior_matrices:

using ThreadTools, MCMCDiagnosticTools
results5 = tmap1(_ -> mcmc_with_warmup(Random.default_rng(), ∇P, 1000; reporter = NoProgressReport()), 1:5)
ess_rhat(stack_posterior_matrices(results5))
(ess = [1955.9478522958184], rhat = [1.0008008166589142])

Use pool_posterior_matrices for a pooled sample:

posterior5 = transform.(trans, eachcol(pool_posterior_matrices(results5)))
posterior5_α = first.(posterior5)
mean(posterior5_α)
0.4977048783640028
  • 4Note that NUTS is not especially suitable for low-dimensional parameter spaces, but this example works fine.
  • 5

    An example of how you can benchmark a log density with gradient ∇P, obtained as described below:

    using BenchmarkTools, LogDensityProblems
    x = randn(LogDensityProblems.dimension(∇P))
    @benchmark LogDensityProblems.logdensity_and_gradient($∇P, $x)
  • 6Note that here we used a flat prior. This is generally not a good idea for variables with non-finite support: one would usually make priors parameters of the struct above, and add the log prior to the log likelihood above.