Rubyish hashes

Published: 2012-05-06
Tagged: code

Something that’s probably not new, but was interesting for me to discover.

There is no to_hash method in ruby, which is probably because you hardly ever need to convert something to a hash. Here’s a sample method, if we wanted to define a to_hash method on an Array:

class Array
  def to_hash
    return_hash = {}
    each{ |k,v| return_hash[k] = v
    return_hash
  end
end

This assumes that each entry in your array is itself an array with two elements, in a [key, vale] pattern.

In my view, there’s a prettier, “rubier” way to write our to_hash method:

class Array
  def to_hash
    reduce({}){ |hsh, (k,v)| hsh.merge(k=>v) }
  end
end

Now we have it on one line. Pretty, compact, better, right? Let’s benchmark:

1	#!/usr/bin/env ruby
2
3	require 'benchmark'
4
5	# Generate some sample data
6	sample_array = []
7	10000.times do
8	sample_array << [rand(1000), rand(1000)]
9	end
10
11	Benchmark.bmbm do \|x\|
12	x.report('Array#each:') do
13	hash = {}
14	sample_array.each do \|k,v\|
15	hash[k] = v
16	end
17	end
18
19	x.report('Array#reduce:') do
20	hash = sample_array.reduce({}){ \|hsh,(k,v)\| hsh.merge(k=>v)}
21	end
22	end
23
24	Rehearsal -------------------------------------------------
25	Array#each: 0.000000 0.000000 0.000000 ( 0.003334)
26	Array#reduce: 4.550000 0.210000 4.760000 ( 5.099619)
27	---------------------------------------- total: 4.760000sec
28
29	user system total real
30	Array#each: 0.000000 0.000000 0.000000 ( 0.003777)
31	Array#reduce: 4.430000 0.150000 4.580000 ( 4.644474)

Ouch.

So what causes the problem? Is it the use of reduce, or of merge? Here’s a method that could test this:

class Array
  def to_hash
    reduce({}){ |hsh, (k,v)| hsh[k] = v; hsh}
  end
end

If this runs as fast as our each-using to_hash method, we know the problem lies in merge. Benchmark results:

1	Rehearsal -------------------------------------------------
2	Array#each: 0.000000 0.000000 0.000000 ( 0.003309)
3	Array#reduce: 0.010000 0.000000 0.010000 ( 0.005039)
4	-------------------------------------
5	---
6	total: 0.010000sec
7
8	user system total real
9	Array#each: 0.000000 0.000000 0.000000 ( 0.003523)
10	Array#reduce: 0.010000 0.000000 0.010000 ( 0.004758)

It’s very, very definitely the merge that’s causing the slowdown. So how much slower does our reduce method go than the each method? Time to pull out graphing on varying size arrays:

OK, so it’s nothing to write home about. Especially when we compare it with the reduce + merge method shown above:

Slightly worse performance.

1	class Array
2	def to_hash
3	return_hash = {}
4	each{ \|k,v\| return_hash[k] = v
5	return_hash
6	end
7	end

1	class Array
2	def to_hash
3	reduce({}){ \|hsh, (k,v)\| hsh.merge(k=>v) }
4	end
5	end

1	class Array
2	def to_hash
3	reduce({}){ \|hsh, (k,v)\| hsh[k] = v; hsh}
4	end
5	end