In this article, we’ll see how to scrape a webpage using Ruby on Rails and Nokogiri.
I’m going to assume you have some knowledge of Ruby and/or Ruby on Rails. Also going to assume you have Ruby and Rails installed on your machine. See here on how to do that.
Ruby on Rails
The first thing we need to do is set up our Rails project. Rails has a handy generator command to help with that. Navigate to a directory where you have the rights to create files and type:
That creates a folder named scrapertutorial with all our Rails files in it. Just to test that everything installed properly and working fine, `cd` into your new project folder and run `rails server` to your app working.
Now, let’s install the Ruby gems we will be using. We’ll need the Nokogiri gem to help us with scraping and parsing. You can read more about Nokogiri here. Add the gem below to your Gemfile located in your project’s root directory and run `bundle install`.
Up next is writing code to scrape our intended website and display it somewhere. Before we do that though, I want to outline what exactly we will be getting from abokifx.com.
We are interested in getting the current Parallel Exchange rate for USD to Naira(which can be found here) and current BDC Exchange rate for USD to Naira(which can be found here).
What that means for us is that we are going to have two controllers in the Rails app that helps with scraping for each page(Parallel and BDC). We could have easily written everything in one controller but I think structuring helps to understand the code better.
We are going to create two controllers now. One for parallel rates and BDC rates. Again, Rails has a handy generator command for that.
That generates a bunch of files for both controllers. If you `cd` into app/controllers you should see both of the controllers we just created. Let’s start with parallel_controller.rb and open the file in your text editor or preferred IDE and copy this code into it.
Let’s run through the code block above and see what’s happening. `ParallelController` inherits from `ActionController`, which defines a number of helpful methods. We then define an action called `parallelrate` that’s going to contain our logic.
Requiring `open-ssl` gives us the ability to open web pages in our code. We then defined a `doc` variable that contains the content of the webpage that Nokogiri is scraping and parsing for us.
Now, before we continue it’s important to note that Nokogiri works by looking for selectors on web pages and trying to use those selectors to determine what to scrape.
If you visit abokifx.com and try to inspect the element of this particular div. You would see a `div` with a class of `lagos-market-rates-inner` that sort of encompasses all of that block.
Hence, it acts as some sort of foundation for scraping the web page.
So I then defined another variable called `rate` that helps us to narrow down what we are looking for. Inspecting the element for the block above yet again, we get something like this.
As you can see, the whole block is basically a table with lots of children. Here’s where the narrowing down helps. From the code above, I wrote
This means, inside `.lagos-market-rates-inner` look for the first `table` tag, then the second `tr` tag in that table, then the second `td` tag in the `tr` and select just the contents/text.
`rate` now contains the current rate which is currently at 359 / 362, but we are only interested in the selling rate so `@formattedrate = rate[6..8]` helps with that.
When Nokogiri parses and scrapes the desired content of abokifx.com into `rate` we have this `359 / 362` . That’s a total of 8 characters (counting from 0 and including the spaces). So by saying `6..8` we are specifying that we only want to see characters from the sixth position to the eighth position.
This, `@formattedrate` is an instance variable that can be used in the views which we’ll use later. Lastly, the controller knows to render a template in the parallel views folder named `home` .
Now open `bdc_controller.rb` and put the code below into it. It’s pretty much the same thing except different action name and a different web page to scrape from.
Now we have all our controllers wired up. Let’s work on the views and styles.
Navigate into `app/assets/stylesheets/application.css` and paste the code below to help us with styles.
Next up are the views. In our controllers, we specified the template views we want to use, but they don’t exist yet, so navigate to `app/views/bdc` and `app/views/parallel` and create a `home.html.erb` in each folder. For the parallel rate view, paste the code below into `home.html.erb`.
The `.erb` (Embedded RuBy)extension means we want to use the ruby templating for our views so we can do stuff like `<%= @somevariable %>` . You can read more here.
We get to use the `@formattedrate` variable declared in the controller above which contains the rate we scraped.
Do the same for the `home.html.erb` but with a different code as seen below.
Next, we’ll configure the routes so the Rails app knows how to understand requests from the browser. Navigate to `config/routes.rb` and paste the code below into it.
We are basically telling Rails with `root ‘parallel#parallelrate` that whenever a request is sent to our app’s homepage, respond with the `parallelrate` action in the `parallel` controller and that whenever a request is sent to `/bdcrate`, respond with the `bdcrate` action in the `bdc` controller.
Now we can finally run `rails server` in our terminal and your app should be working and you should see something like this below
Now that we are sure everything works, you can go ahead to deploy on Heroku/Digital Ocean and take the web app online.
In case you want all the code above in one place, you can check them out here on this GitHub repository.