Implementing DSL in Ruby, part 1: CSV importer

Jacek Galanciak

May 20, 2018

ruby,dsl

Ruby is famous, among many things, for its beautiful domain-specific languages (DSLs). The popularity is well-deserved: everyone loves to use DSLs for their efficient, often declarative, description of intention with very little syntactic noise of typical imperative implementations. However, surprisingly few Ruby programmers can implement a DSL on their own. Let's change it, one blog post at a time.

In this project, we'll implement a simple CSV importer with type coercions. We're going to use only simple techniques such as blocks and proc objects, no actual metaprogramming (no define_method, instance_eval), etc.

The syntax

Here's what we'd like to achieve:

1CSVImport.from_file('people.csv') do |config|
2 config.string :first_name, column: 1
3 config.string :last_name, column: 2
4 config.integer :age, column: 4
5 config.decimal :salary, column: 5
6end

I like to practice outside-in approach to implement DSLs. Worrying about the details will only distract us from creating the proper structure for our project. The best way to start is to create empty no-op classes and methods so that Ruby will be able to run our program without crashing.

The no-op skeleton

1class CSVImport
2 def self.from_file(filepath)
3 end
4end
5
6CSVImport.from_file('people.csv') do |config|
7 config.string :first_name, column: 1
8 config.string :last_name, column: 2
9 config.integer :age, column: 4
10 config.decimal :salary, column: 5
11 puts 'please, call me!'
12end

Running this snippet will result in our program exiting without doing anything. However, the puts 'please, call me!' line never gets called – we never call the entire block containing this statement. Here we have a significant rule of constructing no-op skeletons: we need to make sure that every block and every method is called.

Let's observe the way we pass the configuration in the DSL. CSVImport class has from_file class method that takes a block with one argument (config). This object has several methods (string, integer, decimal) and is used to describe the schema of our imported CSV files. First, let's define the schema class:

1class CSVImportSchema
2 def string(name, column:)
3 puts 'string'
4 end
5
6 def integer(name, column:)
7 puts 'integer'
8 end
9
10 def decimal(name, column:)
11 puts 'decimal'
12 end
13end

Then, make sure to pass the configuration block:

1class CSVImport
2 def self.from_file(filepath)
3 schema = CSVImportSchema.new
4 yield schema
5 end
6end

from_file method receives an implicit block as an argument. This block expects to be called with a schema object as an argument. We can do it by calling yield with the object we'd like to pass to the block.

Here's how the method would look like in a more explicit form (which is rarely used):

1def self.from_file(filepath, &block)
2 schema = CSVImportSchema.new
3 block.call(schema)
4end

Running the program will result in the following output:

1string
2string
3integer
4decimal
5please, call me!

Now we have a true no-op skeleton. We designed our DSL, and we made sure that every piece of it is appropriately called. Our next step is implementing the process of importing the CSV file.

Storing the schema

We decided in the beginning that the importer will do type coercion on each column, so we need to find a way to store each type along with the method to coerce it. Approaching it with strict object-oriented approach would most likely result in a set of classes with a polymorphic interface. But since the only thing we need for a type to store is coercion code, we can radically simplify it and use functions as types. Or lambdas, to be more Ruby-specific.

1str = ->(x) { x.to_s }
2int = ->(x) { x.to_i }
3
4str.call(123) # => "123"
5int.call('123') # => 123
6
7str # #<Proc:0x00007fd5ac989c00@(irb):1 (lambda)>

The CSVImportSchema could be, therefore, implemented like this:

1class CSVImportSchema
2 attr_reader :columns
3 Column = Struct.new(:name, :col_number, :type)
4
5 def initialize
6 @columns = []
7 end
8
9 def string(name, column:)
10 @columns << Column.new(name, column, ->(x) { x.to_s })
11 end
12
13 def integer(name, column:)
14 @columns << Column.new(name, column, ->(x) { x.to_i })
15 end
16
17 def decimal(name, column:)
18 @columns << Column.new(name, column, ->(x) { x.to_f })
19 end
20end

Data processing

Now that we have our schema with type coercion defined, we can import the CSV file and process each row to produce objects defined by the schema:

1class CSVImport
2 attr_reader :schema
3
4 def initialize
5 @schema = CSVImportSchema.new
6 end
7
8 def self.from_file(filepath)
9 import = new
10 yield import.schema
11 rows = CSV.read(filepath, col_sep: ';')
12 import.process(rows)
13 end
14
15 def process(rows)
16 rows.map { |row| process_row(row) }
17 end
18
19 private
20
21 def process_row(row)
22 obj = {}
23 @schema.columns.each do |col|
24 obj[col.name] = col.type.call(row[col.col_number - 1])
25 end
26 obj
27 end
28end

Now when we run the importer against an actual file:

1CSVImport.from_file('people.csv') do |config|
2 config.string :first_name, column: 1
3 config.string :last_name, column: 2
4 config.integer :age, column: 4
5 config.decimal :salary, column: 5
6end

We'll get the following output:

1...
2{:first_name=>"Oleta", :last_name=>"Raynor", :age=>38, :salary=>118600.0}
3{:first_name=>"Tyra", :last_name=>"Johns", :age=>34, :salary=>65700.0}
4{:first_name=>"Reinhold", :last_name=>"Koch", :age=>27, :salary=>51900.0}
5{:first_name=>"Mary", :last_name=>"Stanton", :age=>58, :salary=>46800.0}
6{:first_name=>"Lyric", :last_name=>"Kub", :age=>52, :salary=>105300.0}
7{:first_name=>"Violette", :last_name=>"Lakin", :age=>69, :salary=>81300.0}
8...

Our work is almost done.

Refactoring

The code works as intended, but its structure could be improved:

  • Wrap all classes in a namespace
  • Split the importer from the orchestration
  • Importer should only take an already-formed schema object, leaving the forming of the schema to the orchestration
  • Expose the orchestration from the top-level module

You can see the result of the refactoring on GitHub.

Wrapping up

The implementation of our DSL resulted in about 50 lines of code – it can fit one screen. DSLs don't have to involve heavy metaprogramming. DSLs are a part of what makes Ruby so beautiful and are an essential component of designing clean and usable APIs for other developers.

Jacek Galanciak © 2020