Implementing DSL in Ruby, part 1: CSV importer
Ruby is famous, among many things, for its beautiful domain-specific languages (DSLs). The popularity is well-deserved: everyone loves to use DSLs for their efficient, often declarative, description of intention with very little syntactic noise of typical imperative implementations. However, surprisingly few Ruby programmers can implement a DSL on their own. Let's change it, one blog post at a time.
In this project, we'll implement a simple CSV importer with type coercions. We're going to use only simple techniques such as blocks and proc
objects, no actual metaprogramming (no define_method
, instance_eval
), etc.
The syntax
Here's what we'd like to achieve:
1
2
3
4
5
6
CSVImport.from_file('people.csv') do |config|
config.string :first_name, column: 1
config.string :last_name, column: 2
config.integer :age, column: 4
config.decimal :salary, column: 5
end
I like to practice outside-in approach to implement DSLs. Worrying about the details will only distract us from creating the proper structure for our project. The best way to start is to create empty no-op classes and methods so that Ruby will be able to run our program without crashing.
The no-op skeleton
1
2
3
4
5
6
7
8
9
10
11
12
class CSVImport
def self.from_file(filepath)
end
end
CSVImport.from_file('people.csv') do |config|
config.string :first_name, column: 1
config.string :last_name, column: 2
config.integer :age, column: 4
config.decimal :salary, column: 5
puts 'please, call me!'
end
Running this snippet will result in our program exiting without doing anything. However, the puts 'please, call me!'
line never gets called – we never call the entire block containing this statement. Here we have a significant rule of constructing no-op skeletons: we need to make sure that every block and every method is called.
Let's observe the way we pass the configuration in the DSL. CSVImport
class has from_file
class method that takes a block with one argument (config
). This object has several methods (string
, integer
, decimal
) and is used to describe the schema of our imported CSV files. First, let's define the schema class:
1
2
3
4
5
6
7
8
9
10
11
12
13
class CSVImportSchema
def string(name, column:)
puts 'string'
end
def integer(name, column:)
puts 'integer'
end
def decimal(name, column:)
puts 'decimal'
end
end
Then, make sure to pass the configuration block:
1
2
3
4
5
6
class CSVImport
def self.from_file(filepath)
schema = CSVImportSchema.new
yield schema
end
end
from_file
method receives an implicit block as an argument. This block expects to be called with a schema object as an argument. We can do it by calling yield
with the object we'd like to pass to the block.
Here's how the method would look like in a more explicit form (which is rarely used):
1
2
3
4
def self.from_file(filepath, &block)
schema = CSVImportSchema.new
block.call(schema)
end
Running the program will result in the following output:
1
2
3
4
5
string
string
integer
decimal
please, call me!
Now we have a true no-op skeleton. We designed our DSL, and we made sure that every piece of it is appropriately called. Our next step is implementing the process of importing the CSV file.
Storing the schema
We decided in the beginning that the importer will do type coercion on each column, so we need to find a way to store each type along with the method to coerce it. Approaching it with strict object-oriented approach would most likely result in a set of classes with a polymorphic interface. But since the only thing we need for a type to store is coercion code, we can radically simplify it and use functions as types. Or lambdas, to be more Ruby-specific.
1
2
3
4
5
6
7
str = ->(x) { x.to_s }
int = ->(x) { x.to_i }
str.call(123) # => "123"
int.call('123') # => 123
str # #<Proc:0x00007fd5ac989c00@(irb):1 (lambda)>
The CSVImportSchema
could be, therefore, implemented like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class CSVImportSchema
attr_reader :columns
Column = Struct.new(:name, :col_number, :type)
def initialize
@columns = []
end
def string(name, column:)
@columns << Column.new(name, column, ->(x) { x.to_s })
end
def integer(name, column:)
@columns << Column.new(name, column, ->(x) { x.to_i })
end
def decimal(name, column:)
@columns << Column.new(name, column, ->(x) { x.to_f })
end
end
Data processing
Now that we have our schema with type coercion defined, we can import the CSV file and process each row to produce objects defined by the schema:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class CSVImport
attr_reader :schema
def initialize
@schema = CSVImportSchema.new
end
def self.from_file(filepath)
import = new
yield import.schema
rows = CSV.read(filepath, col_sep: ';')
import.process(rows)
end
def process(rows)
rows.map { |row| process_row(row) }
end
private
def process_row(row)
obj = {}
@schema.columns.each do |col|
obj[col.name] = col.type.call(row[col.col_number - 1])
end
obj
end
end
Now when we run the importer against an actual file:
1
2
3
4
5
6
CSVImport.from_file('people.csv') do |config|
config.string :first_name, column: 1
config.string :last_name, column: 2
config.integer :age, column: 4
config.decimal :salary, column: 5
end
We'll get the following output:
1
2
3
4
5
6
7
8
...
{:first_name=>"Oleta", :last_name=>"Raynor", :age=>38, :salary=>118600.0}
{:first_name=>"Tyra", :last_name=>"Johns", :age=>34, :salary=>65700.0}
{:first_name=>"Reinhold", :last_name=>"Koch", :age=>27, :salary=>51900.0}
{:first_name=>"Mary", :last_name=>"Stanton", :age=>58, :salary=>46800.0}
{:first_name=>"Lyric", :last_name=>"Kub", :age=>52, :salary=>105300.0}
{:first_name=>"Violette", :last_name=>"Lakin", :age=>69, :salary=>81300.0}
...
Our work is almost done.
Refactoring
The code works as intended, but its structure could be improved:
- Wrap all classes in a namespace
- Split the importer from the orchestration
- Importer should only take an already-formed schema object, leaving the forming of the schema to the orchestration
- Expose the orchestration from the top-level module
You can see the result of the refactoring on GitHub.
Wrapping up
The implementation of our DSL resulted in about 50 lines of code – it can fit one screen. DSLs don't have to involve heavy metaprogramming. DSLs are a part of what makes Ruby so beautiful and are an essential component of designing clean and usable APIs for other developers.
comments powered by Disqus