Ruby on Rails — Importing Data from an Excel File
Importing data from an Excel spreadsheet doesn’t need to be difficult. While Rails doesn’t include a native utility to handle these file types, there are several gems that make it quite easy to read/write Excel spreadsheets.
In this post, we will create a rake task and use the roogem to read from an Excel file and import data into the database.
Setup
We’ll start by adding the roo gem to our project. If you’re using Bundler 1.15 or higher, you can use the bundle add
command.
bundle add roo
This will add roo to the Gemfile
and install it. Alternatively, you can update the Gemfile
and install it manually.
Add this line to your Gemfile
.
gem 'roo', '~> 2.8'
And install.
bundle install
Creating the Rake Task
Now that we’ve added roo, we can start working on the actual feature.
Generate the rake task.
rails g task import data
This will generate the file import.rake
in lib/tasks
where “import" is the namespace and “data" is the task name.
Now that our task is created, let’s update the description. We’ll also add a simple puts command and ensure that it runs.
namespace :import do desc "Import data from spreadsheet" # update this line task data: :environment do puts 'Importing Data' # add this line end end
Run this command in your console.
bundle exec rails import:data
Importing the Data
To keep it simple, I have a very basic user schema. Where a user has two fields, name and email. I’ve added the spreadsheet file at lib/data.xlsx which contains several rows, each representing a new user to be created.
As you can see, the first row of the spreadsheet represents the headers. Everything else is the actual data that we need. Let’s start implementing roo so we can map over this data and create the users.
First, we will require roo and open the spreadsheet within the task.
require 'roo' namespace :import do desc "Import data from spreadsheet" task data: :environment do puts 'Importing Data' # add this line data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet end end
Next, lets grab the first row of the spreadsheet since we know this is the header row. We’ll use this later to create a hash when mapping over the rows with the data we need.
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row
Now we can map over the spreadsheet rows and extract the user data. Here is what the code looks like.
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row data.each_with_index do |row, idx| next if idx == 0 # skip header # create hash from headers and cells user_data = Hash[[headers, row].transpose] if User.exists?(email: user_data['email']) puts "User with email '#{user_data['email']}' already exists" next end user = User.new(user_data) puts "Saving User with email #{user.email}" user.save! end
Let’ walk through this.
We map over each row in the spreadsheet using data.each_with_index-->.
If we’re on the first row (idx == 0
), we want to continue to the next iteration because this is the header row.
Next, we create a new hash that contains the data from the current row. We build this hash using the headers
array that we grabbed earlier and the current row
that we are on in the loop. So, what is this line doing?
Let’s start with Array#transpose. I’m not going to dive into the transpose method but it essentially turns columns into rows when you have a multi-dimensional array. Here’s a quick visual.
This is what the array looks like before calling transpose
.
[ ['name', 'email'], ['john', 'john@test.com'] ]
And this is what is returned when calling transpose
.
[ ['name', 'john'], ['email', 'john@test.com'] ]
When passing the above array to Hash[]
, we get a back a new hash where the key
is the first item in each nested array and its value is the second. In our case, the hash matches the user fields we need to create a new database entry. ``
user_data = Hash[[["name", "John"], ["email", "john@test.com"]]] p user_data # {"name"=>"John", "email"=>"john@test.com"}
Then we check to see if a user already exists with the current email address. If it does, we print some text to the console and move on to the next iteration without saving.
# next if user exists if User.exists?(email: user_data['email']) puts "User with email #{user_data['email']} already exists" next end
And finally, if the user doesn’t already exist, we can create a new user instance with the fields generated from the current row and save the new user in the database.
user = User.new(user_data) puts "Saving User with email '#{user.email}'" user.save!
The complete code should look something like this.
require 'roo' namespace :import do desc "Import data from spreadsheet" task data: :environment do puts 'Importing Data' data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row data.each_with_index do |row, idx| next if idx == 0 # skip header row # create hash from headers and cells user_data = Hash[[headers, row].transpose] # next if user exists if User.exists?(email: user_data['email']) puts "User with email #{user_data['email']} already exists" next end user = User.new(user_data) puts "Saving User with email '#{user.email}'" user.save! end end end
That’s it! Now you can run this task and import users from the spreadsheet.
While this approach uses a very simple data structure, it’s a good starting point even for more complex situations.