log1p in Julia
2017/09/13edit: fixed bogus interaction of MathJax and code highlighting.
edit2: added benchmarks.
This is a follow-up on a question I asked on the Julia forums about calculating
\[ \text{log1p}(x) = \log(1+x) \]
This calculation is tricky because if \(x \approx 0\),
\[ \log(1+x) \approx x \]
while as \(x \to \infty\), \(\log(1+x)\) approaches \(\log(x)\), so simply using log(1+x)
will not be as accurate as it could be. Numerical analysts have developed specialized methods for these calculations that try to get an accurate answer, and all programming languages serious about numerical calculations have an implementation either in the core language or a library.
Julia's Base.log1p
currently suggests that Base.Math.JuliaLibm.log1p
would be preferable, but then I was wondering why isn't that the default? So I decided to perform a trivial numerical experiment, calculating the error for both, and also benchmark the two methods.
Accuracy
The key question is what to compare the results with. One could compare to an existing "gold standard" implementation, or simply calculate the results using a higher precision representation. Fortunately, Julia has BigFloat
s available right out of the box.
The graph below shows the base-2 logarithm of the relative error for Base.log1p
vs \(\log\_2(1+x)\); horizontal lines are log2(eps())
and log2(eps())+1
. This suggests that Base.log1p
is very accurate, but not as good as it could be when \(x \approx 0\).
The next plot shows the relative accuracy of the relative error above, comparing Base.Math.JuliaLibm.log1p
to Base.log1p
(lower values better). In these simulations, Base.Math.JuliaLibm.log1p
is never worse, but sometimes significantly better (resulting in an extra binary digit of accuracy). This matters especially when \(x \approx 0\).
The next plot confirms this.
Speed
I also evaluated relative speed. The graph below shows the relative runtimes, obtained using BenchmarkTools.@belapsed
. Values below \(1\) mean that Base.Math.JuliaLibm.log1p
is faster: indeed, this seems to be the case except for values very close to \(0\), where there is a 10–20% performance penalty. At other values, Base.Math.JuliaLibm.log1p
is 30–40% faster.
Conclusion
For values near \(0\),
Base.Math.JuliaLibm.log1p
is more accurate, at a slight performance cost.For values away from \(0\), it is at least as accurate as
Base.log1p
, and 30—40% faster.
To me, this suggests that Base.Math.JuliaLibm.log1p
should be the default method — mostly because the extra accuracy is more important to me than the slight performance cost.
Code is available below.
download as code.jl
# consistent random numbers
srand(UInt32[0xfd909253, 0x7859c364, 0x7cd42419, 0x4c06a3b6])
"""
err(x, [prec])
Return two values, which are the log2 relative errors for calculating
`log1p(x)`, using `Base.log1p` and `Base.Math.JuliaLibm.log1p`.
The errors are calculated by compating to `BigFloat` calculations with the given
precision `prec`.
"""
function err(x, prec = 1024)
yb = log(1+BigFloat(x, prec))
e2(y) = Float64(log2(abs(y-yb)/abs(yb)))
e2(log1p(x)), e2(Base.Math.JuliaLibm.log1p(x))
end
z = exp.(randn(20000)*10) # z > 0, Lognormal(0, 10)
x = z .- 1 # x > -1
es = map(err, x) # errors
using Plots; gr() # plots
scatter(log2.(z), first.(es), xlab = "log2(x+1)", ylab = "log2 error of Base.log1p",
legend = false)
hline!(log2(eps())-[0,1])
Plots.svg("Base_log1p_error.svg")
scatter(log2.(z), last.(es) .- first.(es), xlab = "log2(x+1)",
ylab = "improvement (Base.Math.JuliaLibm.log1p)", legend = false)
Plots.svg("JuliaLibm_improvement.svg")
scatter(log2.(z), last.(es), xlab = "log2(x+1)",
ylab = "log2 error of Base.Math.JuliaLibm.log1p", legend = false)
hline!(log2(eps())-[0,1])
Plots.svg("JuliaLibm_log1p_error.svg")
######################################################################
# WARNING: these run for a very long time
######################################################################
using BenchmarkTools
z = exp.(vcat(randn(200)*10, rand(200)*0.1)) # z > 0, more values around
x = z .- 1 # x > -1
b1 = [@belapsed log1p($x) for x in x] # WARNING: takes forever
b2 = [@belapsed Base.Math.JuliaLibm.log1p($x) for x in x] # ditto
scatter(log2.(z), b2 ./ b1, xlab = "log2(x+1)",
ylab = "time Math.JuliaLibm.log1p / log1p", legend = false, yticks = 0:0.2:1.2)
hline!([1])
Plots.svg("relative_time.svg")