Back

Ruby on Rails — Importing Data from an Excel File

Gabriel Martin
Gabriel Martin
January 13, 2020
Ruby on Rails — Importing Data from an Excel File

Photo by Rich Tervet on Unsplash

Importing data from an Excel spreadsheet doesn’t need to be difficult. While Rails doesn’t include a native utility to handle these file types, there are several gems that make it quite easy to read/write Excel spreadsheets.

In this post, we will create a rake task and use the roogem to read from an Excel file and import data into the database.

Setup

We’ll start by adding the roo gem to our project. If you’re using Bundler 1.15 or higher, you can use the bundle add command.

bundle add roo

This will add roo to the Gemfile and install it. Alternatively, you can update the Gemfile and install it manually.

Add this line to your Gemfile.

gem 'roo', '~> 2.8'

And install.

bundle install

Creating the Rake Task

Now that we’ve added roo, we can start working on the actual feature.

Generate the rake task.

rails g task import data

This will generate the file import.rake in lib/tasks where “import" is the namespace and “data" is the task name.

Now that our task is created, let’s update the description. We’ll also add a simple puts command and ensure that it runs.

namespace :import do desc "Import data from spreadsheet" # update this line task data: :environment do puts 'Importing Data' # add this line end end

Run this command in your console.

bundle exec rails import:data

Importing the Data

To keep it simple, I have a very basic user schema. Where a user has two fields, name and email. I’ve added the spreadsheet file at lib/data.xlsx which contains several rows, each representing a new user to be created.

As you can see, the first row of the spreadsheet represents the headers. Everything else is the actual data that we need. Let’s start implementing roo so we can map over this data and create the users.

First, we will require roo and open the spreadsheet within the task.

require 'roo' namespace :import do desc "Import data from spreadsheet" task data: :environment do puts 'Importing Data' # add this line data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet end end

Next, lets grab the first row of the spreadsheet since we know this is the header row. We’ll use this later to create a hash when mapping over the rows with the data we need.

data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row

Now we can map over the spreadsheet rows and extract the user data. Here is what the code looks like.

data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row data.each_with_index do |row, idx| next if idx == 0 # skip header # create hash from headers and cells user_data = Hash[[headers, row].transpose] if User.exists?(email: user_data['email']) puts "User with email '#{user_data['email']}' already exists" next end user = User.new(user_data) puts "Saving User with email #{user.email}" user.save! end

Let’ walk through this.

We map over each row in the spreadsheet using data.each_with_index-->.

If we’re on the first row (`idx == 0`), we want to continue to the next iteration because this is the header row.

Next, we create a new hash that contains the data from the current row. We build this hash using the `headers` array that we grabbed earlier and the current `row` that we are on in the loop. So, what is this line doing?

Let’s start with Array#transpose. I’m not going to dive into the transpose method but it essentially turns columns into rows when you have a multi-dimensional array. Here’s a quick visual.

This is what the array looks like before calling `transpose`.

[ ['name', 'email'], ['john', 'john@test.com'] ]

And this is what is returned when calling `transpose`.

[ ['name', 'john'], ['email', 'john@test.com'] ]

When passing the above array to `Hash[]`, we get a back a new hash where the `key` is the first item in each nested array and its value is the second. In our case, the hash matches the user fields we need to create a new database entry.

user_data = Hash[[["name", "John"], ["email", "john@test.com"]]] p user_data # {"name"=>"John", "email"=>"john@test.com"}

Then we check to see if a user already exists with the current email address. If it does, we print some text to the console and move on to the next iteration without saving.

# next if user exists if User.exists?(email: user_data['email']) puts "User with email #{user_data['email']} already exists" next end

And finally, if the user doesn’t already exist, we can create a new user instance with the fields generated from the current row and save the new user in the database.

user = User.new(user_data) puts "Saving User with email '#{user.email}'" user.save!

The complete code should look something like this.

require 'roo' namespace :import do desc "Import data from spreadsheet" task data: :environment do puts 'Importing Data' data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet headers = data.row(1) # get header row data.each_with_index do |row, idx| next if idx == 0 # skip header row # create hash from headers and cells user_data = Hash[[headers, row].transpose] # next if user exists if User.exists?(email: user_data['email']) puts "User with email #{user_data['email']} already exists" next end user = User.new(user_data) puts "Saving User with email '#{user.email}'" user.save! end end end

That’s it! Now you can run this task and import users from the spreadsheet.

While this approach uses a very simple data structure, it’s a good starting point even for more complex situations.

Share this post

Interested in working with us?

Give us some details about your project, and our team will be in touch within a day or two.