Implementing DSL in Ruby, part 1: CSV importer
Ruby is famous, among many things, for its beautiful domain-specific languages (DSLs). The popularity is well-deserved: everyone loves to use DSLs for their efficient, often declarative, description of intention with very little syntactic noise of typical imperative implementations. However, surprisingly few Ruby programmers can implement a DSL on their own. Let's change it, one blog post at a time.
In this project, we'll implement a simple CSV importer with type coercions. We're going to use only simple techniques such as blocks and proc
objects, no actual metaprogramming (no define_method
, instance_eval
), etc.
The syntax
Here's what we'd like to achieve:
1CSVImport.from_file('people.csv') do |config|2 config.string :first_name, column: 13 config.string :last_name, column: 24 config.integer :age, column: 45 config.decimal :salary, column: 56end
I like to practice outside-in approach to implement DSLs. Worrying about the details will only distract us from creating the proper structure for our project. The best way to start is to create empty no-op classes and methods so that Ruby will be able to run our program without crashing.
The no-op skeleton
1class CSVImport2 def self.from_file(filepath)3 end4end56CSVImport.from_file('people.csv') do |config|7 config.string :first_name, column: 18 config.string :last_name, column: 29 config.integer :age, column: 410 config.decimal :salary, column: 511 puts 'please, call me!'12end
Running this snippet will result in our program exiting without doing anything. However, the puts 'please, call me!'
line never gets called – we never call the entire block containing this statement. Here we have a significant rule of constructing no-op skeletons: we need to make sure that every block and every method is called.
Let's observe the way we pass the configuration in the DSL. CSVImport
class has from_file
class method that takes a block with one argument (config
). This object has several methods (string
, integer
, decimal
) and is used to describe the schema of our imported CSV files. First, let's define the schema class:
1class CSVImportSchema2 def string(name, column:)3 puts 'string'4 end56 def integer(name, column:)7 puts 'integer'8 end910 def decimal(name, column:)11 puts 'decimal'12 end13end
Then, make sure to pass the configuration block:
1class CSVImport2 def self.from_file(filepath)3 schema = CSVImportSchema.new4 yield schema5 end6end
from_file
method receives an implicit block as an argument. This block expects to be called with a schema object as an argument. We can do it by calling yield
with the object we'd like to pass to the block.
Here's how the method would look like in a more explicit form (which is rarely used):
1def self.from_file(filepath, &block)2 schema = CSVImportSchema.new3 block.call(schema)4end
Running the program will result in the following output:
1string2string3integer4decimal5please, call me!
Now we have a true no-op skeleton. We designed our DSL, and we made sure that every piece of it is appropriately called. Our next step is implementing the process of importing the CSV file.
Storing the schema
We decided in the beginning that the importer will do type coercion on each column, so we need to find a way to store each type along with the method to coerce it. Approaching it with strict object-oriented approach would most likely result in a set of classes with a polymorphic interface. But since the only thing we need for a type to store is coercion code, we can radically simplify it and use functions as types. Or lambdas, to be more Ruby-specific.
1str = ->(x) { x.to_s }2int = ->(x) { x.to_i }34str.call(123) # => "123"5int.call('123') # => 12367str # #<Proc:0x00007fd5ac989c00@(irb):1 (lambda)>
The CSVImportSchema
could be, therefore, implemented like this:
1class CSVImportSchema2 attr_reader :columns3 Column = Struct.new(:name, :col_number, :type)45 def initialize6 @columns = []7 end89 def string(name, column:)10 @columns << Column.new(name, column, ->(x) { x.to_s })11 end1213 def integer(name, column:)14 @columns << Column.new(name, column, ->(x) { x.to_i })15 end1617 def decimal(name, column:)18 @columns << Column.new(name, column, ->(x) { x.to_f })19 end20end
Data processing
Now that we have our schema with type coercion defined, we can import the CSV file and process each row to produce objects defined by the schema:
1class CSVImport2 attr_reader :schema34 def initialize5 @schema = CSVImportSchema.new6 end78 def self.from_file(filepath)9 import = new10 yield import.schema11 rows = CSV.read(filepath, col_sep: ';')12 import.process(rows)13 end1415 def process(rows)16 rows.map { |row| process_row(row) }17 end1819 private2021 def process_row(row)22 obj = {}23 @schema.columns.each do |col|24 obj[col.name] = col.type.call(row[col.col_number - 1])25 end26 obj27 end28end
Now when we run the importer against an actual file:
1CSVImport.from_file('people.csv') do |config|2 config.string :first_name, column: 13 config.string :last_name, column: 24 config.integer :age, column: 45 config.decimal :salary, column: 56end
We'll get the following output:
1...2{:first_name=>"Oleta", :last_name=>"Raynor", :age=>38, :salary=>118600.0}3{:first_name=>"Tyra", :last_name=>"Johns", :age=>34, :salary=>65700.0}4{:first_name=>"Reinhold", :last_name=>"Koch", :age=>27, :salary=>51900.0}5{:first_name=>"Mary", :last_name=>"Stanton", :age=>58, :salary=>46800.0}6{:first_name=>"Lyric", :last_name=>"Kub", :age=>52, :salary=>105300.0}7{:first_name=>"Violette", :last_name=>"Lakin", :age=>69, :salary=>81300.0}8...
Our work is almost done.
Refactoring
The code works as intended, but its structure could be improved:
- Wrap all classes in a namespace
- Split the importer from the orchestration
- Importer should only take an already-formed schema object, leaving the forming of the schema to the orchestration
- Expose the orchestration from the top-level module
You can see the result of the refactoring on GitHub.
Wrapping up
The implementation of our DSL resulted in about 50 lines of code – it can fit one screen. DSLs don't have to involve heavy metaprogramming. DSLs are a part of what makes Ruby so beautiful and are an essential component of designing clean and usable APIs for other developers.