# A worked example

An extended version of this example can be found in the DynamicHMCExamples.jl package.

Consider estimating estimating the parameter $0 \le \alpha \le 1$ from $n$ IID observations^{[4]}

\[y_i \sim \mathrm{Bernoulli}(\alpha)\]

We will code this with the help of TransformVariables.jl, and obtain the gradient with ForwardDiff.jl (in practice, at the moment I would recommend ForwardDiff.jl for small models, and Flux.jl for larger ones — consider benchmarking a single evaluation of the log density with gradient).^{[5]}

First, we load the packages we use.

```
using TransformVariables, LogDensityProblems, DynamicHMC,
DynamicHMC.Diagnostics, Parameters, Statistics, Random
```

Generally, I would recommend defining an immutable composite type (ie `struct`

) to hold the data and all parameters relevant for the log density (eg the prior). This allows you to test your code in a modular way before sampling. For this model, the number of draws equal to `1`

is a sufficient statistic.

```
struct BernoulliProblem
n::Int # Total number of draws in the data
s::Int # Number of draws `==1` in the data
end
```

Then we make this problem *callable* with the parameters. Here, we have a single parameter `α`

, but pass this in a `NamedTuple`

to demonstrate a generally useful pattern. Then, we define an instance of this problem with the data, called `p`

.^{[6]}

```
function (problem::BernoulliProblem)(θ)
@unpack α = θ # extract the parameters
@unpack n, s = problem # extract the data
# log likelihood, with constant terms dropped
s * log(α) + (n-s) * log(1-α)
end
```

It is generally a good idea to test that your code works by calling it with the parameters; it should return a likelihood. For more complex models, you should benchmark and optimize this callable directly.

```
p = BernoulliProblem(20, 10)
p((α = 0.5, )) # make sure that it works
```

-13.862943611198906

With TransformVariables.jl, we set up a *transformation* $\mathbb{R} \to [0,1]$ for $\alpha$, and use the convenience function `TransformedLogDensity`

to obtain a log density in $\mathbb{R}^1$. Finally, we obtain a log density that supports gradients using automatic differentiation.

```
trans = as((α = as𝕀,))
P = TransformedLogDensity(trans, p)
∇P = ADgradient(:ForwardDiff, P)
```

ForwardDiff AD wrapper for TransformedLogDensity of dimension 1, w/ chunk size 1

Finally, we run MCMC with warmup. Note that you have to specify the *random number generator* explicitly — this is good practice for parallel code. The last parameter is the number of samples.

`results = mcmc_with_warmup(Random.GLOBAL_RNG, ∇P, 1000; reporter = NoProgressReport())`

The returned value is a `NamedTuple`

. Most importantly, it contains the field `chain`

, which is a vector of vectors. You should use the transformation you defined above to retrieve the parameters (here, only `α`

). We display the mean here to check that it was recovered correctly.

```
posterior = transform.(trans, results.chain)
posterior_α = first.(posterior)
mean(posterior_α)
```

0.5015136700920352

Using the `DynamicHMC.Diagnostics`

submodule, you can obtain various useful diagnostics. The *tree statistics* in particular contain a lot of useful information about turning, divergence, acceptance rates, and tree depths for each step of the chain. Here we just obtain a summary.

`summarize_tree_statistics(results.tree_statistics)`

Hamiltonian Monte Carlo sample of length 1000 acceptance rate mean: 0.91, 5/25/50/75/95%: 0.62 0.87 0.97 1.0 1.0 termination: divergence => 0%, max_depth => 0%, turning => 100% depth: 0 => 0%, 1 => 62%, 2 => 38%

Usually one would run parallel chains and check convergence and mixing using generic MCMC diagnostics not specific to NUTS. See MCMCDiagnostics.jl for an implementation of $\hat{R}$ and effective sample size calculations.

- 4Note that NUTS is not especially suitable for low-dimensional parameter spaces, but this example works fine.
- 5
An example of how you can benchmark a log density with gradient

`∇P`

, obtained as described below:`using BenchmarkTools, LogDensityProblems x = randn(LogDensityProblems.dimension(∇P)) @benchmark LogDensityProblems.logdensity_and_gradient($∇P, $x)`

- 6Note that here we used a
*flat prior*. This is generally not a good idea for variables with non-finite support: one would usually make priors parameters of the`struct`

above, and add the log prior to the log likelihood above.