Ruby enumerators

A quick refresher about enumerable and enumerators in ruby.

Enumerable

class Pippo
  include Enumerable

  def each
   yield "a"
   yield "b"
   yield "c"
   yield "d"
  end
end

z = Pippo.new

puts z.take(3)

The class must provide a method each, which yields successive members of the collection. If Enumerable#max, #min, or #sort is used, the objects in the collection must also implement a meaningful <=> operator.

The classical pattern is called iterator. The iterator can be external and internal: it’s external when the iteration is controlled by the client (basically a visible for loop over the collection), internal when it is controlled by the collection object itself (and the collection object has methods that iterate over the collection itself).

Enumerators

Enumerator is a class that wraps an Enumerable object and adds the external iteration, ie the next method. You can create one with

e = Enumerator.new([1, 2, 3])

You can make an Enumerator from an Enumerable with the to_enum method.

e = Pippo.new.to_enum
puts e.next

If you next after the last element a StopIteration exception will be raised - note that the loop method rescues it.

A less common way to create an Enumerator is with Object#enum_for(method = :each), which will create an Enumerator iterating by the method passed as parameter. Many Enumerable methods, when called with no parameters, actually return an Enumerator, like

[1, 2, 3].each

But also #times, #upto, String#each_char and more.

You can build an Enumerator with this syntax too:

x = Enumerator.new do |y|
  loop do
    y << 1
  end
end

The y block parameter is the yielder, that takes thr yield parameter, aliased as <<.

e = Enumerator.new do |y|
  loop do
    y.yield 1
  end
end

puts e.take(10)

Something to double values:

e = Enumerator.new do |y|
  x = 1
  loop do
    y << x
    x *= 2
  end
end

puts e.take(10)

Lazy enumerators

Some operations are impossible when an enumerator is infinite because they rely on having all the elements of the collection. Lazy enumerators enumerate the values only when they’re needed. They implement the various map, select, take, etc in a lazy way, so that they do not perform the operation immediately but only when their output is called externally:

enum = Enumerator.new do |y|
  i = 1
  loop do
    y << i
    i += 1
  end
end

odd = enum.select{|x| x % 2 == 1}      # this would hang, because infinite

lazy = enum.lazy.select{|x| x % 2 == 1} # this would return immediately, the enumerator wouldn't iterate
lazy = enum.lazy.select{|x| x % 2 == 1}.take(100) # same

odd = lazy.to_a # now the enumerator calculates and expands the elements.

The methods that expand the enumerator to an array are to_a and its alias force.

Suppose you want the first 100 powers of 2 that end in 4:

enum = Enumerator.new do |y|
  i = 1
  loop do
    y << i
    i *= 2
  end
end

puts enum.select{|x| x % 10 == 4}.take(100).to_a      # this would hang!
puts enum.lazy.select{|x| x % 10 == 4}.take(100).to_a # this would work