Confounding URL typists since 2007.
Lately, I’ve been spending some time filtering data sets in Ruby. A common pattern when filtering data on multiple criteria involves short-circuiting processing at the first match or non-match, depending on whether conditions are being evaluated in an any/or or all/and context, respectively. As a result, I thought I’d run a few quick benchmarks on several implementations of this pattern. The results surprised me, so I thought I would share them here.
This benchmark, like most benchmarks, is contrived. However, I didn’t set out to prove any specific Ruby implementation was better than any other. For that matter, I didn’t even initially set out to compare Ruby implementations, at all. I wanted to see which implementation of the logic I outlined above performed the best, in general. It’s just that RVM is so freaking awesome that it made compiling a comparison of Ruby implementations dead simple, so, hey, why not?
The data set is the set of positive integers from 1 to 1,000,000. I searched this set for 100 integers in multiples of 10,000, using 4 different implementations of the pattern described above:
Additionally, I passed each “needle” to be found in the haystack in two different ways:
You can have a look at the benchmark script and its raw results, with the rehearsals removed. I ran this under MRI 1.9.2 and 1.8.7, JRuby 1.6.3, and Rubinius head.
We’ll only be looking at the “total” column, for the purposes of this discussion. N1 and N2 refer to the two numbered approaches to supplying the needle to search for, above.
MRI 1.9.2 | MRI 1.8.7 | JRuby 1.6.3 | Rubinius head | |||||
---|---|---|---|---|---|---|---|---|
N1 | N2 | N1 | N2 | N1 | N2 | N1 | N2 | |
detect | 7.650000 | 6.400000 | 71.040000 | 63.290000 | 4.736000 | 3.052000 | 6.754449 | 5.665881 |
each | 5.390000 | 5.370000 | 14.680000 | 14.660000 | 3.080000 | 3.127000 | 3.442154 | 3.280476 |
any? | 7.340000 | 6.040000 | 71.790000 | 63.860000 | 4.755000 | 3.667000 | 6.444793 | 5.590638 |
include? | 2.620000 | 2.620000 | 3.410000 | 3.410000 | 0.390000 | 0.389000 | 0.729475 | 0.729746 |
Well, for starters, let’s just state the obvious: LOL MRI 1.8.7! Maybe I haven’t been following things as closely as I thought, but I was definitely surprised to see how horrendously this (admittedly contrived) benchmark performed on MRI 1.8.7. I expected poor performance, but not on the order of 10x slower in some cases!
Now, with that out of the way, let’s see what else these numbers tell us.
Seriously fast. So fast that I initially suspected it was using both CPU cores without any specific requests for threading, which it wasn’t. I don’t claim to be a JRuby expert, but the ridiculous discrepancy in numbers, particularly in the case of Enumerable#include? lead me to think that the JVM is cheating somehow. :) Maybe something to do with the JVM supporting ints as a primitive type? Whatever it’s doing, I wholeheartedly approve.
Perhaps not as markedly so as JRuby, but Rubinius is also a much younger Ruby than JRuby is.
It’s not a good idea to use the N1 strategy outlined above, because the block which gets executed (in each case but include?) will calculate 10_000 * n for each element in the haystack array that must be evaluated. That being said, the performance hit from this sub-optimized code was more notable in some Ruby implementations than others.
Speedup from N1 to N2 | ||||
---|---|---|---|---|
MRI 1.9.2 | MRI 1.8.7 | JRuby 1.6.3 | Rubinius head | |
detect | 16.3% | 10.9% | 35.6% | 16.1% |
each | 0.4% | 0.1% | -1.5% (?) | 4.7% |
any? | 17.7% | 11.0% | 22.9% | 13.3% |
include? | 0% | 0% | 0.3% | 0% |
In this benchmark, JRuby punishes poorly-optimized code the most severely. Or, I suppose you could say that it rewards well-optimized code most generously, if you’re the glass-half-full type. In either case, it’s interesting to note.
This was perhaps the most surprising result, for me, as I suspected that the internal implementation of Enumerable#any? was essentially a C implementation of the same kind of short-circuit logic I implemented in the benchmark:
module EachDetector
def self.find(haystack, needle)
haystack.each do |v|
return true if v == needle
end
false
end
end
Nothing fancy here. Just stop what we’re doing and return true if we get a match. As it turns out, a quick look at MRI 1.9.2′s enum.c seems to validate my assumption:
DEFINE_ENUMFUNCS(any)
{
if (RTEST(result)) {
*memo = Qtrue;
rb_iter_break();
}
return Qnil;
}
static VALUE
enum_any(VALUE obj)
{
VALUE result = Qfalse;
rb_block_call(obj, id_each, 0, 0, ENUMFUNC(any), (VALUE)&result);
return result;
}
If I understand what I’m reading correctly (and I very well may not — the MRI code is a maze of defines for the uninitiated like me) it looks like it comes down to the overhead involved in having Enumerable#any? call Array#each anyway, and ENUMFUNC(any) (any_iter_i to his friends) being more involved than simply returning early from a method. I’d welcome any correction from someone who has a better grasp of Ruby internals than I do.
UPDATE: So, to further clarify what’s going on with each vs any?, Ruby is basically doing something very similar to this (although in C code):
module Enumerable
def any?(&block)
self.each do |v|
return true if yield(v)
end
false
end
end
For reference, the implementation of EachDetector.find in the benchmark code looked like this:
module EachDetector
def self.find(haystack, needle)
haystack.each do |v|
return true if v == needle
end
false
end
end
So, by “hard-coding” the short-circuit condition inside the block passed to each, as opposed to needing to yield to the block, we’ve avoided a small bit of overhead and, as a result, end up with the performance gain. The pure-Ruby version of any? shown above does indeed perform more slowly than the C implementation, as expected.
Thanks to Loren Segal for the assist!
Wait, I have to draw a conclusion? Well, OK, my conclusion is one that’s been drawn before, though. Benchmarking and profiling are indispensable tools to any Rubyist who wants to write performant code. If I learned one thing from this experiment, it’s that we shouldn’t take anything for granted, even stuff that seems like it should be common sense.
Thanks for reading!