Confounding URL typists since 2007.
Since the release of ActiveRecord Query Interface 3.0, I think the line has been pretty blurry for the Rails community about where ActiveRecord ends and Arel begins. I’ve seen folks attribute ActiveRecord::PredicateBuilder to Arel, and I’ve seen folks assume that an ActiveRecord::Relation is an Arel object. Neither is the case. Since MetaSearch and MetaWhere both use combinations of ActiveRecord::Relation and Arel under the hood, and I’ve spent some time putting together patches for both Rails core and Arel, the blurry line between them has become more distinct for me. In this post, I hope to clear up some misunderstandings before Rails 3 is released, and the AR 3.0 query interface becomes the norm.
An ActiveRecord::Relation is the basic building block of the AR 3.0 query interface. In any case where you’re chaining things like joins, where, or order on your model, a subclass of ActiveRecord::Base, you’re actually chaining them on an ActiveRecord::Relation, because the very first “relation-y” method you call will be delegated to ActiveRecord::Base#scoped, which returns an ActiveRecord::Relation. In fact, you can get a “blank” ActiveRecord::Relation by calling that method yourself:
ruby-head > Article.scoped.class
=> ActiveRecord::Relation
Now, here’s where the differences start showing up. If you continue to chain methods on this relation, at no point will you ever actually have an Arel object. As you’re probably aware, rows returned by an ActiveRecord::Relations are lazily loaded. When it comes time for an ActiveRecord::Relation to return rows to you, the Arel relation gets built. If we take a look at activerecord/lib/active_record/relation.rb, we see where this happens:
def to_a
return if loaded?
= eager_loading? ? find_with_associations : .find_by_sql(arel.to_sql)
preload =
preload += unless eager_loading?
preload.each {|associations| .send(:preload_associations, , associations) }
# is true only if set explicity. is true if there are JOINS and no explicit SELECT.
readonly = .nil? ? :
.each { |record| record.readonly! } if readonly
= true
end
Without complicating this discussion with how eager loading gets handled, you’ll notice that the records are actually still being returned by trusty old find_by_sql, but arel is doing the SQL generation for us (arel.to_sql).
ActiveRecord::Relation#arel is defined over in activerecord/lib/active_record/relation/query_methods.rb, and it’s short and sweet:
def arel
||= build_arel
end
build_arel is way too large to paste here, but the general idea is that all that your ActiveRecord::Relation was doing while you were happily chaining joins and wheres and includes on it was to keep track of those values in corresponding , arrays. See?
ruby-head > Article.joins(:comments).joins(:moderations).joins_values
=> [:comments, :moderations]
build_arel is (unsurprisingly) the method that does the job of building up an Arel::Relation from all of those accumulated values, and Arel is what then generates the SQL on ActiveRecord’s behalf. We can even call the ActiveRecord::Relation#arel on our relation to get that object back and have a look at it.
ruby-head > Article.joins(:comments).joins(:moderations).arel.class
=> Arel::From
ruby-head > Article.joins(:comments).joins(:moderations).arel.is_a?(Arel::Relation)
=> true
So you don’t actually deal with Arel::Relations. They’re used behind the scenes, as a very real replacement for ActiveRecord’s own custom SQL generation code before. In fact, construct_finder_sql became construct_finder_arel in AR 3.0.
So, where does ActiveRecord::PredicateBuilder fit in? The stock PredicateBuilder is a really short and sweet piece of code (only 43 lines!) that sits in activerecord/lib/active_record/relation/predicate_builder.rb and is responsible for building where and having clauses, as evidenced by it only being used in two places in Rails: ActiveRecord::Base#sanitize_sql_hash_for_conditions (called when a hash is sent to sanitize_sql) and ActiveRecord::Relation#build_where (called by build_arel for where and having clauses).
Anyway, ActiveRecord::PredicateBuilder takes advantage of another feature of Arel: predications. Let’s step through PredicateBuilder a bit, and in the process, learn about Arel predication methods. The important method (and only method, besides initialize) of PredicateBuilder is build_from_hash. Here it is, in its entirety:
def build_from_hash(attributes, default_table)
predicates = attributes.map do |column, value|
table = default_table
if value.is_a?(Hash)
table = Arel::Table.new(column, :engine => )
build_from_hash(value, table)
else
column = column.to_s
if column.include?('.')
table_name, column = column.split('.', 2)
table = Arel::Table.new(table_name, :engine => )
end
unless attribute = table[column]
raise StatementInvalid,
"No attribute named `#{column}` exists for table `#{table.name}`"
end
case value
when Array, ActiveRecord::Associations::AssociationCollection,
ActiveRecord::Relation
values = value.to_a
attribute.in(values)
when Range
attribute.in(value)
else
attribute.eq(value)
end
end
end
predicates.flatten
end
So, as you can see, this method handles the key/value pairs in the hash recursively. It takes a default Arel::Table to start. If it encounters a key whose corresponding value is a hash, then it assumes the key to be a table name, instantiates a new Arel::Table and calls itself again. Otherwise, it expects keys to be the names of columns in that table, apart from one convenience provided (presumably for quicker reference of a one-off condition on another table) where if the column name contains a period, it’s assumed to be a table/column pair.
Arel::Tables have an overloaded [] operator, which expects the value inside the [] to be a column name, and returns an Arel::Attribute corresponding to that column. The attribute knows about the table it came from, so when an Arel::Attribute gets converted to SQL, it knows how to supply its table name as well. Arel::Attributes have a number of methods defined on them, including the predication methods I mentioned earlier. These methods return a kind of Arel::Predicate (things like Equality, In, Match, etc) condition with two operands. Predicates take two operands. In the case of Predicates created by predication methods, the first operand will be the attribute itself. The second is supplied to the predication method by PredicateBuilder.
We can see this further down in the method, where it calls attribute.in or attribute.eq depending on the type of value received. You can see this reflected in an ActiveRecord::Relation‘s array:
ruby-head > Article.where(
:title => 'foo',
:created_at => 3.days.ago..Time.now
).where_values
=> [#
=<Attribute title>, ="foo">,
#
=<Attribute created_at>,
=Sun, 08 May 2010 19:28:03 EDT -04:00..
2010-05-11 19:28:03 -0400>]
Incidentally, it should come as no surprise that a lot of the “magic” provided by MetaWhere comes from a rewritten PredicateBuilder, MetaWhere::Builder.
ActiveRecord::Relation does the Rails developer a great service by hiding much of the complexity of Arel, and keeping the various parts of your query in their most malleable form (what’s simpler to work with than several arrays?) until the last possible moment. However, there may be cases where you want or need to work with Arel directly. Here are a few handy tricks, and some gotchas to keep in mind.
Get the Arel object representing your current query (restated from above):
ruby-head > arel = Article.where(
:title => 'foo', :created_at => 3.days.ago..Time.now
).arel
Get the where clauses from the arel object:
ruby-head > arel.send(:where_clauses)
=> ["(\"articles\".\"title\" = 'foo')",
\"(\"articles\".\"created_at\" BETWEEN '2010-05-08 20:42:53.042691'
AND '2010-05-11 20:42:53.043081')"]
This also works for from_clauses, select_clauses, group_clauses, having_clauses, and order_clauses. For join clauses, see the next tip.
Get the join sql from the arel object as a string:
ruby-head > arel = Article.joins(:comments).arel
ruby-head > arel.joins(arel)
=> "INNER JOIN \"comments\"
ON \"comments\".\"article_id\" = \"articles\".\"id\""
Get the Arel::Table for an ActiveRecord model:
ruby-head > Article.arel_table
=> #
# or, with an alias:
ruby-head > Article.arel_table.as('awesome_articles')
=> #
:as=>"awesome_articles" ... >
Create your own conditions using predicates other than Equality and In
ruby-head > Article.where(Article.arel_table[:title].matches('%foo%')).to_sql
=> "SELECT \"articles\".* FROM \"articles\"
WHERE (\"articles\".\"title\" LIKE '%foo%')"
See this post for more details on using Arel in your conditions.
See the generated SQL for a given query:
I use this one all the time on this site, but still I think it’s worth mentioning.
ruby-head > Article.where(:title => 'foo').to_sql
=> "SELECT \"articles\".* FROM \"articles\"
WHERE (\"articles\".\"title\" = 'foo')"
One gotcha of note with to_sql:
Near the beginning of this post, you may remember I mentioned that I didn’t want to complicate the discussion of the to_a method with eager loading. Well, now it’s time to complicate things.
If eager loading is a factor in your query, either due to explicit eager_loading values, or includes values that are also used in your where conditions, then you’ll notice that the value returned by to_sql doesn’t reflect what actually gets run on the server:
ruby-head > Article.includes(:comments).where(:comments => {:body => 'hey'}).to_sql
=> "SELECT \"articles\".* FROM \"articles\"
WHERE (\"comments\".\"body\" = 'hey')"
ruby-head > Article.includes(:comments).where(:comments => {:body => 'hey'}).all
# From the development.log
Article Load (0.4ms) SELECT "articles"."id" AS t0_r0, "articles"."title" AS t0_r1,
"articles"."body" AS t0_r2, "articles"."created_at" AS t0_r3, "articles"."updated_at"
AS t0_r4, "articles"."lookup_id" AS t0_r5, "comments"."id" AS t1_r0,
"comments"."article_id" AS t1_r1, "comments"."body" AS t1_r2,
"comments"."created_at" AS t1_r3, "comments"."updated_at" AS t1_r4
FROM "articles" LEFT OUTER JOIN "comments" ON
"comments"."article_id" = "articles"."id" WHERE ("comments"."body" = 'hey')
Of course, given the nature of how eager loading and including works, you’d expect the query to differ when they’re involved. So to_sql is a useful tool to display what’s being generated in your queries — just be aware of its limitations and remember that it’s no substitute for checking the logs when confusion arises.
Chaining multiple orders on an Arel::Relation will not result in them lining up left to right as they do in ActiveRecord::Relation.
ruby-head > Article.order('id').order('title').to_sql
=> "SELECT \"articles\".* FROM \"articles\"
ORDER BY id, title"
ruby-head > articles = Article.arel_table
ruby-head > articles.order('id').order('title').to_sql
=> "SELECT \"articles\".\"id\", \"articles\".\"title\",
\"articles\".\"body\", \"articles\".\"created_at\", \"articles\".\"updated_at\",
\"articles\".\"lookup_id\"
FROM \"articles\" ORDER BY title, id"
With Arel, the last applied order is given precedence. As Bryan Helmkamp explained it to me:
Think of it like a spreadsheet. You click on a column, and it sorts by that column. You click on another column, and that sort takes precedence.
Some methods don’t build on the previous values at all, but replace them entirely.
Adding additional wheres to an Arel::Relation works as you might expect from dealing with ActiveRecord::Relation, but adding new groups, havings, etc will only use the last supplied value.
ruby-head > articles.where('id > 1').where('id < 3').to_sql
=> "SELECT \"articles\".\"id\", \"articles\".\"title\", \"articles\".\"body\",
\"articles\".\"created_at\", \"articles\".\"updated_at\",
\"articles\".\"lookup_id\" FROM \"articles\"
WHERE id > 1 AND id < 3"
uby-head > articles.having('id > 1').having('id < 3').to_sql
=> "SELECT \"articles\".\"id\", \"articles\".\"title\", \"articles\".\"body\",
\"articles\".\"created_at\", \"articles\".\"updated_at\",
\"articles\".\"lookup_id\" FROM \"articles\"
HAVING id < 3"
That’s about it, for now. Hope this helps sharpen up that blurry line between ActiveRecord 3.0 and Arel for a few of you out there!