Implementing DSL in Ruby, part 1: CSV importer

Ruby is famous, among many things, for its beautiful domain-specific languages (DSLs). The popularity is well-deserved: everyone loves to use DSLs for their efficient, often declarative, description of intention with very little syntactic noise of typical imperative implementations. However, surprisingly few Ruby programmers can implement a DSL on their own. Let's change it, one blog post at a time.

In this project, we'll implement a simple CSV importer with type coercions. We're going to use only simple techniques such as blocks and proc objects, no actual metaprogramming (no define_method, instance_eval), etc.

The syntax

Here's what we'd like to achieve:

1
2
3
4
5
6
CSVImport.from_file('people.csv') do |config|
  config.string :first_name, column: 1
  config.string :last_name, column: 2
  config.integer :age, column: 4
  config.decimal :salary, column: 5
end

I like to practice outside-in approach to implement DSLs. Worrying about the details will only distract us from creating the proper structure for our project. The best way to start is to create empty no-op classes and methods so that Ruby will be able to run our program without crashing.

The no-op skeleton

1
2
3
4
5
6
7
8
9
10
11
12
class CSVImport
  def self.from_file(filepath)
  end
end

CSVImport.from_file('people.csv') do |config|
  config.string :first_name, column: 1
  config.string :last_name, column: 2
  config.integer :age, column: 4
  config.decimal :salary, column: 5
  puts 'please, call me!'
end

Running this snippet will result in our program exiting without doing anything. However, the puts 'please, call me!' line never gets called – we never call the entire block containing this statement. Here we have a significant rule of constructing no-op skeletons: we need to make sure that every block and every method is called.

Let's observe the way we pass the configuration in the DSL. CSVImport class has from_file class method that takes a block with one argument (config). This object has several methods (string, integer, decimal) and is used to describe the schema of our imported CSV files. First, let's define the schema class:

1
2
3
4
5
6
7
8
9
10
11
12
13
class CSVImportSchema
  def string(name, column:)
    puts 'string'
  end

  def integer(name, column:)
    puts 'integer'
  end

  def decimal(name, column:)
    puts 'decimal'
  end
end

Then, make sure to pass the configuration block:

1
2
3
4
5
6
class CSVImport
  def self.from_file(filepath)
    schema = CSVImportSchema.new
    yield schema
  end
end

from_file method receives an implicit block as an argument. This block expects to be called with a schema object as an argument. We can do it by calling yield with the object we'd like to pass to the block.

Here's how the method would look like in a more explicit form (which is rarely used):

1
2
3
4
def self.from_file(filepath, &block)
  schema = CSVImportSchema.new
  block.call(schema)
end

Running the program will result in the following output:

1
2
3
4
5
string
string
integer
decimal
please, call me!

Now we have a true no-op skeleton. We designed our DSL, and we made sure that every piece of it is appropriately called. Our next step is implementing the process of importing the CSV file.

Storing the schema

We decided in the beginning that the importer will do type coercion on each column, so we need to find a way to store each type along with the method to coerce it. Approaching it with strict object-oriented approach would most likely result in a set of classes with a polymorphic interface. But since the only thing we need for a type to store is coercion code, we can radically simplify it and use functions as types. Or lambdas, to be more Ruby-specific.

1
2
3
4
5
6
7
str = ->(x) { x.to_s }
int = ->(x) { x.to_i }

str.call(123)   # => "123"
int.call('123') # => 123

str             # #<Proc:0x00007fd5ac989c00@(irb):1 (lambda)>

The CSVImportSchema could be, therefore, implemented like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class CSVImportSchema
  attr_reader :columns
  Column = Struct.new(:name, :col_number, :type)

  def initialize
    @columns = []
  end

  def string(name, column:)
    @columns << Column.new(name, column, ->(x) { x.to_s })
  end

  def integer(name, column:)
    @columns << Column.new(name, column, ->(x) { x.to_i })
  end

  def decimal(name, column:)
    @columns << Column.new(name, column, ->(x) { x.to_f })
  end
end

Data processing

Now that we have our schema with type coercion defined, we can import the CSV file and process each row to produce objects defined by the schema:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class CSVImport
  attr_reader :schema

  def initialize
    @schema = CSVImportSchema.new
  end

  def self.from_file(filepath)
    import = new
    yield import.schema
    rows = CSV.read(filepath, col_sep: ';')
    import.process(rows)
  end

  def process(rows)
    rows.map { |row| process_row(row) }
  end

  private

  def process_row(row)
    obj = {}
    @schema.columns.each do |col|
      obj[col.name] = col.type.call(row[col.col_number - 1])
    end
    obj
  end
end

Now when we run the importer against an actual file:

1
2
3
4
5
6
CSVImport.from_file('people.csv') do |config|
  config.string :first_name, column: 1
  config.string :last_name, column: 2
  config.integer :age, column: 4
  config.decimal :salary, column: 5
end

We'll get the following output:

1
2
3
4
5
6
7
8
...
{:first_name=>"Oleta", :last_name=>"Raynor", :age=>38, :salary=>118600.0}
{:first_name=>"Tyra", :last_name=>"Johns", :age=>34, :salary=>65700.0}
{:first_name=>"Reinhold", :last_name=>"Koch", :age=>27, :salary=>51900.0}
{:first_name=>"Mary", :last_name=>"Stanton", :age=>58, :salary=>46800.0}
{:first_name=>"Lyric", :last_name=>"Kub", :age=>52, :salary=>105300.0}
{:first_name=>"Violette", :last_name=>"Lakin", :age=>69, :salary=>81300.0}
...

Our work is almost done.

Refactoring

The code works as intended, but its structure could be improved:

  • Wrap all classes in a namespace
  • Split the importer from the orchestration
  • Importer should only take an already-formed schema object, leaving the forming of the schema to the orchestration
  • Expose the orchestration from the top-level module

You can see the result of the refactoring on GitHub.

Wrapping up

The implementation of our DSL resulted in about 50 lines of code – it can fit one screen. DSLs don't have to involve heavy metaprogramming. DSLs are a part of what makes Ruby so beautiful and are an essential component of designing clean and usable APIs for other developers.

Posted on May 20, 2018 in ruby, dsl by Jacek Galanciak

comments powered by Disqus