Branch prediction: yet another example
2017/10/04Tomas Lycken linked a very nice discussion on StackOverflow about branch prediction as a comment on the previous post. It has an intuitive explanation (read it if you like good metaphors) and some Java benchmarks. I was curious about how it looks in Julia.
The exercise is to sum elements in a vector only if they are greater than or equal to 128.
function sumabove_if(x)
s = zero(eltype(x))
for elt in x
if elt ≥ 128
s += elt
end
end
s
endThis calculation naturally has a branch in it, while the branchless version, using ifelse, does not:
function sumabove_ifelse(x)
s = zero(eltype(x))
for elt in x
s += ifelse(elt ≥ 128, elt, zero(eltype(x)))
end
s
endThe actual example has something different: using tricky bit-twiddling to calculate the same value. I generally like to leave this sort of thing up to the compiler, because it is much, much better at it than I am, and I make mistakes all the time; worse, I don't know what I actually did when I reread the code 6 months later. But I included it here for comparison:
function sumabove_tricky(x::Vector{Int64})
s = Int64(0)
for elt in x
s += ~((elt - 128) >> 63) & elt
end
s
endFollowing the original example on StackOverflow, we sum 2^15 random integers in 1:256. For this, we don't need to worry about overflow. We also sum the sorted vector: this will facilitate branch predicion, since the various branches will be contiguous.
I also benchmark a simple version using generators:
sumabove_generator(x) = sum(y for y in x if y ≥ 128)| random | sorted | |
|---|---|---|
if
| 139 | 28 |
ifelse
| 21 | 21 |
if & sort
| 96 | n/a |
| tricky | 27 | 27 |
| generator | 219 | 168 |
Benchmarks are in the table above. Note that
for the version with
if, working on sorted vectors is dramatically faster (about 5x).the non-branching
ifelseversion beats them hands down, and naturally it does not care about sorting.if you have to use
if, then you are better off sorting, even if you take the time of that into account.generators are susprisingly bad.
the tricky bit-twiddling version is actually worse than
ifelse(which reinforces my aversion to it).
Self-contained code for everything is available below.
download code as code.jl