Benchmarking Ruby’s Enumerable

3 Comments

Posted by Ernie on July 16, 2011 at 4:43 pm

Lately, I’ve been spending some time filtering data sets in Ruby. A common pattern when filtering data on multiple criteria involves short-circuiting processing at the first match or non-match, depending on whether conditions are being evaluated in an any/or or all/and context, respectively. As a result, I thought I’d run a few quick benchmarks on several implementations of this pattern. The results surprised me, so I thought I would share them here.

Disclaimer

This benchmark, like most benchmarks, is contrived. However, I didn’t set out to prove any specific Ruby implementation was better than any other. For that matter, I didn’t even initially set out to compare Ruby implementations, at all. I wanted to see which implementation of the logic I outlined above performed the best, in general. It’s just that RVM is so freaking awesome that it made compiling a comparison of Ruby implementations dead simple, so, hey, why not?

Methodology

The data set is the set of positive integers from 1 to 1,000,000. I searched this set for 100 integers in multiples of 10,000, using 4 different implementations of the pattern described above:

Enumerable#detect
Array#each (doing the each inside a method that returns true on the first match)
Enumerable#any?
Enumerable#include? (This last one is obviously going to be the winner in any case where a block is unnecessary, but I included for curiosity’s sake)

Additionally, I passed each “needle” to be found in the haystack in two different ways:

By computing the “needle” via iteration number * 10,000 in the block itself
By pre-computing the needles outside the block

You can have a look at the benchmark script and its raw results, with the rehearsals removed. I ran this under MRI 1.9.2 and 1.8.7, JRuby 1.6.3, and Rubinius head.

Results

We’ll only be looking at the “total” column, for the purposes of this discussion. N1 and N2 refer to the two numbered approaches to supplying the needle to search for, above.

	MRI 1.9.2		MRI 1.8.7		JRuby 1.6.3		Rubinius head
	N1	N2	N1	N2	N1	N2	N1	N2
detect	7.650000	6.400000	71.040000	63.290000	4.736000	3.052000	6.754449	5.665881
each	5.390000	5.370000	14.680000	14.660000	3.080000	3.127000	3.442154	3.280476
any?	7.340000	6.040000	71.790000	63.860000	4.755000	3.667000	6.444793	5.590638
include?	2.620000	2.620000	3.410000	3.410000	0.390000	0.389000	0.729475	0.729746

Observations

Well, for starters, let’s just state the obvious: LOL MRI 1.8.7! Maybe I haven’t been following things as closely as I thought, but I was definitely surprised to see how horrendously this (admittedly contrived) benchmark performed on MRI 1.8.7. I expected poor performance, but not on the order of 10x slower in some cases!

Now, with that out of the way, let’s see what else these numbers tell us.

JRuby is FAST

Seriously fast. So fast that I initially suspected it was using both CPU cores without any specific requests for threading, which it wasn’t. I don’t claim to be a JRuby expert, but the ridiculous discrepancy in numbers, particularly in the case of Enumerable#include? lead me to think that the JVM is cheating somehow. :) Maybe something to do with the JVM supporting ints as a primitive type? Whatever it’s doing, I wholeheartedly approve.

Rubinius is fast, too!

Perhaps not as markedly so as JRuby, but Rubinius is also a much younger Ruby than JRuby is.

Some implementations are more forgiving than others

It’s not a good idea to use the N1 strategy outlined above, because the block which gets executed (in each case but include?) will calculate 10_000 * n for each element in the haystack array that must be evaluated. That being said, the performance hit from this sub-optimized code was more notable in some Ruby implementations than others.

Speedup from N1 to N2
	MRI 1.9.2	MRI 1.8.7	JRuby 1.6.3	Rubinius head
detect	16.3%	10.9%	35.6%	16.1%
each	0.4%	0.1%	-1.5% (?)	4.7%
any?	17.7%	11.0%	22.9%	13.3%
include?	0%	0%	0.3%	0%

In this benchmark, JRuby punishes poorly-optimized code the most severely. Or, I suppose you could say that it rewards well-optimized code most generously, if you’re the glass-half-full type. In either case, it’s interesting to note.

Array#each performs better than Enumerable#any?

This was perhaps the most surprising result, for me, as I suspected that the internal implementation of Enumerable#any? was essentially a C implementation of the same kind of short-circuit logic I implemented in the benchmark:

module EachDetector
  def self.find(haystack, needle)
    haystack.each do |v|
      return true if v == needle
    end
    false
  end
end

Nothing fancy here. Just stop what we’re doing and return true if we get a match. As it turns out, a quick look at MRI 1.9.2′s enum.c seems to validate my assumption:

DEFINE_ENUMFUNCS(any)
{
    if (RTEST(result)) {
        *memo = Qtrue;
        rb_iter_break();
    }
    return Qnil;
}
 
static VALUE
enum_any(VALUE obj)
{
    VALUE result = Qfalse;
 
    rb_block_call(obj, id_each, 0, 0, ENUMFUNC(any), (VALUE)&result);
    return result;
}

If I understand what I’m reading correctly (and I very well may not — the MRI code is a maze of defines for the uninitiated like me) it looks like it comes down to the overhead involved in having Enumerable#any? call Array#each anyway, and ENUMFUNC(any) (any_iter_i to his friends) being more involved than simply returning early from a method. I’d welcome any correction from someone who has a better grasp of Ruby internals than I do.

UPDATE: So, to further clarify what’s going on with each vs any?, Ruby is basically doing something very similar to this (although in C code):

module Enumerable
  def any?(&block)
    self.each do |v|
      return true if yield(v)
    end
 
    false
  end
end

For reference, the implementation of EachDetector.find in the benchmark code looked like this:

module EachDetector
  def self.find(haystack, needle)
    haystack.each do |v|
      return true if v == needle
    end
    false
  end
end

So, by “hard-coding” the short-circuit condition inside the block passed to each, as opposed to needing to yield to the block, we’ve avoided a small bit of overhead and, as a result, end up with the performance gain. The pure-Ruby version of any? shown above does indeed perform more slowly than the C implementation, as expected.

Thanks to Loren Segal for the assist!

Conclusion

Wait, I have to draw a conclusion? Well, OK, my conclusion is one that’s been drawn before, though. Benchmarking and profiling are indispensable tools to any Rubyist who wants to write performant code. If I learned one thing from this experiment, it’s that we shouldn’t take anything for granted, even stuff that seems like it should be common sense.

Thanks for reading!

Filed under Blog

Tagged as jruby, performance, rubinius, ruby

You can skip to the end and leave a comment. Pinging is currently not allowed.