It is not uncommon for websites to provide APIs for Web Scraping their data. In fact websites like FaceBook, Google, LinkedIn, and Twitter are just a few among established websites to render this service.
If you are looking to get data from a website or web page that doesn’t have a scraping API, this brief howto is for you. There are a lot of online tools that perform this task these days, some paid some free.
I will be focusing on a mainly free tool called web scraper. Web scraper is a chrome extension that is perfect for scraping data off web pages.
Downloading the web scraper extension
If you do not already have this extension, you can get it by simply making a google search of “web scraper tool” or searching for web scraper on your chrome web store. Once installed the extension icon is displayed on the top right corner of your chrome browser.
A step-by-step process for Web Scraping with web scraper
To perform a scrap of data with this extension, follow the following steps:
Step1:- Visit the website you wish to scrape data from.
Step2:- Right-click on any blank space on the web page and select the Inspect option. A new pane opens at the bottom or at the right side of the window.
Step3:- In the new pane, click the “web scraper” menu and navigate create a new sitemap > create a sitemap.
Step4:- Fill the text boxes for “sitemap” and “start URL”. Best to copy the URL from the URL locator at the top of your browser. The “+” button to the right side of the start URL textbox can be used to add multiple URLs.
Step5:- Next click on “Add new selector”
Step6:- Next input a name for the ID name field. Then under the “type” drop-down list, choose the type of data you intend to scrap, in this example, i choose a link. Check the “multiple checkboxes”. Click on the “Select” button and select three of the data you want to scrap and other data of the same type will be automatically selected. Check the checkbox beside the “done selecting” button and click done selecting. Click save selector.
Step7:- After step 6, click on add new selector. In this example i will click on a therapist’s name whose details i want to scrap, this navigates my browser to the location with the therapists details.
Step8:- Name the “id” field, choose a type (If it the name you need to scrap for example, leave it as Text). Click select and on the web page, select the Item needed. Click “done selecting”. Click “save selector.”
Step9:- Repeat steps 7 and 8 for all the data you need from each object, in this case, the therapist. (You could repeat for a phone number or for website).
Step10:- The above step ensures all the lists of the chosen item type on that URL are scrapped. But very often, list run into multiple web pages as shown below. To ensure the same format will be followed in scrapping these other pages, click “sitemaps”.
Step11:- Click on the URL.
Step12:- Fill in the “id” field, change the “type” to link, check the multiple checkboxes, and click “select”. Next, select the pages you want the scrap to work on. Once you select the first 2, the other pages are automatically selected. Check the text box beside “done selecting” and click “done selecting”. Next click save selector.
Step13:- Next navigate to sitemap > Scrape > start scrape. A pop up window appears and the scraping starts. Depending on the size, this may take a while.
Step14:- After scraping is done, navigate sitemap > “Export sitemap” or “Export data as CSV” > download.
Once these steps are followed, you will have your list of data scraped from the website or web pages. This is one of many scraping options. But in my experience, it is very efficient and mainly free.
You may also like to read