module ActiveRecord
  class Base
    # Step through and instantiate each member of the class and execute on it,
    #   but instantiate no more than per_page instances at any given time.
    # Safe for destructive actions or actions that modify the fields
    # your :order or :conditions clauses operate on.
    def each_by_page per_page, options = {}, &block
      # By-id for model-modifying blocks
      # Build SQL to get ids of all matching records using the options provided by the user
      sql = construct_finder_sql(options.dup.merge({ :select => "`#{table_name}`.#{primary_key} AS id" }))
      # Get the results as an array of tiny hashes { "id" => "1" } and flatten them out to just the ids
      all_ids = connection.select_all(sql).map { |h| h['id'] }
      at_a_time = 0..(per_page-1)
      # chop apart the all_ids array a segment at a time
        ids = all_ids.slice!(at_a_time)
        ids_cases = []
        ids.each_with_index { |id, i| ids_cases << "WHEN #{id} THEN #{i}" }
        ids_cases = ids_cases.join(' ')
        # Do the deed on this page of results
        find(:all, options.merge(
          :conditions => [ "#{primary_key} IN (?)", ids ],
          :order => "CASE id #{ids_cases} END"
        )).each &block
      end until all_ids.empty?

This piece of code is perhaps named poorly. It arose from a need to operate on a large number of models during a migration.

Many migrations that operate on a large number of rows can accomplish their operation en-masse using an update_all.

User.update_all('filename = \'default.png\'', 'default_avatar = true')

However, sometimes the operation being performed is more complex. Perhaps its a destructive migration. Perhaps it operates on a complex graph of objects, or over a polymorphic association, or perhaps you’re relying on the propagation of ActiveRecord’s callbacks.

In any of these cases, you can easily exhaust available RAM with code such as this:

User.find(:all, :conditions => { :default_avatar => true } ) do |user| user.select_default_avatar end

ActiveRecord will first do a SQL SELECT with the provided parameters, instantiate each row of the result as an object in memory, and once all rows are instantiated, will iterate over the set, applying the block to each element.

Except if the number of rows returned is large enough, it never gets to the iterating part.

Instead of instantiating all the models at once, we instead instantiate them in bite-sized pieces.

User.each_by_page(:conditions => { :default_avatar => true } ) do |user| user.select_default_avatar end

One Comment

  1. duncanbeevers

    Updated this code to preserve order in the internal operation. I used a vicious SQL CASE statement to keep the internal operation unbound to the potentially complex ORDER clause of the original condition and explicitly specified the table name from which the id should be selected.

    Anyone have a better way of doing this?

Leave a Comment

Enclose code in <code lang="ruby"></code> if you care.
Preview your comment using the button below.