Two tricks for change of variables in MCMC

2017/05/23 MCMC Jacobian Stan

Change of variables are sometimes advantageous, and occasionally inevitable for MCMC if you want efficient sampling, or to model a distribution that was obtained by a transformation. A classic example is the lognormal distribution: when

\[\log(y) \sim N(\mu, \sigma^2)\]

one has to adjust the log posterior by \(-\log y\) since

\[\frac{\partial \log(y)}{\partial y} = \frac{1}{y}\]


\[\log(1/y) = -\log(y).\]

In Stan, one would accomplish this as

target += -log(y)

In general, when you transform using a multivariate function \(f\), you would adjust by

\[\log\det J_f(y)\]

which is the log of the determinant of the Jacobian — some texts simply refer to this as "the Jacobian".

The above is well-known, but the following two tricks are worth mentioning.

Chaining transformations

Suppose that you are changing a variable by using a chain of two functions \(f \circ g\). Then

\[ \log\det J_{f \circ g}(y) = \log \bigl(\det J_f(g(y)) \cdot \det J_g(y)\bigr) \\\\ = \log\det J_f(g(y)) + \log\det J_g(y) \]

which means that you can simply add (the log determinant of) the Jacobians, of course evaluated at the appropriate points.

This is very useful when \(f \circ g\) is complicated and \(J_{f\circ g}\) is tedious to derive, or if you want to use multiple \(f\)s or \(g\)s and economize on the algebra. From the above, it is also easy to see that this generalizes to arbitrarily long chains of functions \(f_1 \circ f_2 \circ \dots\).

This trick turned out to be very useful when I was fitting a model where a transformation was general to both equilibrium concepts I was using (a noncooperative game and a social planner), so I could save on code. Of course, since #2224 is WIP, I had to copy-paste the code, but still saved quite a bit of work.

Transforming a subset of variables

Suppose \(x \in \mathbb{R}^m\) and \(y \in \mathbb{R}^n\) are vectors, and you are interested in transforming to

\[ z = f(x,y) \]

where \(x\) and \(z\) have the same dimension. It is useful to think about this transformation as

\[ g(x,y) = [f(x,y), y]^\top \]

where \(g : \mathbb{R}^{m+n} \to \mathbb{R}^{m+n}\). Since \(y\) is mapped to itself,

\[ J_g = \begin{bmatrix} J_{f,x} & J_{f,y} \\\\ 0 & I \end{bmatrix} \]

has a block structure, where

\[ J_{f,x} = \frac{\partial f(x,y)}{\partial x} \]

and similarly for \(J_{f,y}\). For the calculation of the determinant, you can safely ignore the latter, and \(\log \det I = 0\), so

\[ \log\det J_g = \log\det J_{f,x} \]

site not optimized for small screens, math may break