Rubyish hashes

Published
2012-05-06
Tagged

Something that’s probably not new, but was interesting for me to discover.

There is no to_hash method in ruby, which is probably because you hardly ever need to convert something to a hash. Here’s a sample method, if we wanted to define a to_hash method on an Array:

1
class Array
2
  def to_hash
3
    return_hash = {}
4
    each{ |k,v| return_hash[k] = v
5
    return_hash
6
  end
7
end

This assumes that each entry in your array is itself an array with two elements, in a [key, vale] pattern.

In my view, there’s a prettier, “rubier” way to write our to_hash method:

1
class Array
2
  def to_hash
3
    reduce({}){ |hsh, (k,v)| hsh.merge(k=>v) }
4
  end
5
end

Now we have it on one line. Pretty, compact, better, right? Let’s benchmark:

1
#!/usr/bin/env ruby
2
3
require 'benchmark'
4
5
# Generate some sample data
6
sample_array = []
7
10000.times do
8
  sample_array << [rand(1000), rand(1000)]
9
end
10
11
Benchmark.bmbm do |x|
12
  x.report('Array#each:') do
13
    hash = {}
14
    sample_array.each do |k,v|
15
      hash[k] = v
16
    end
17
  end
18
19
  x.report('Array#reduce:') do
20
    hash = sample_array.reduce({}){ |hsh,(k,v)| hsh.merge(k=>v)}
21
  end
22
end
23
24
Rehearsal -------------------------------------------------
25
Array#each:     0.000000   0.000000   0.000000 (  0.003334)
26
Array#reduce:   4.550000   0.210000   4.760000 (  5.099619)
27
---------------------------------------- total: 4.760000sec
28
29
                    user     system      total        real
30
Array#each:     0.000000   0.000000   0.000000 (  0.003777)
31
Array#reduce:   4.430000   0.150000   4.580000 (  4.644474)

Ouch.

So what causes the problem? Is it the use of reduce, or of merge? Here’s a method that could test this:

1
class Array
2
  def to_hash
3
    reduce({}){ |hsh, (k,v)| hsh[k] = v; hsh}
4
  end
5
end

If this runs as fast as our each-using to_hash method, we know the problem lies in merge. Benchmark results:

1
Rehearsal -------------------------------------------------
2
Array#each:     0.000000   0.000000   0.000000 (  0.003309)
3
Array#reduce:   0.010000   0.000000   0.010000 (  0.005039)
4
-------------------------------------
5
---
6
total: 0.010000sec
7
8
                    user     system      total        real
9
Array#each:     0.000000   0.000000   0.000000 (  0.003523)
10
Array#reduce:   0.010000   0.000000   0.010000 (  0.004758)

It’s very, very definitely the merge that’s causing the slowdown. So how much slower does our reduce method go than the each method? Time to pull out graphing on varying size arrays:

OK, so it’s nothing to write home about. Especially when we compare it with the reduce + merge method shown above:

Slightly worse performance.