The Problem
Two days ago, a client came to me with a problem. They wanted to pull some specific data from a third-party provider. Typically, I recommend that you use that third-party’s official API to ingest data. However, in this case the third party was in the process of adding this data to their API and it would not be available for a few months at least. This is great, but does not help the client put together a report this week. So, they called me in to see if I could scrape the data directly from the third-party’s web portal using Python and Selenium.
The Solution
The short answer is: yes, you can absolutely do this! The main question is: how long will it take to configure? If it would take 10 hours to build the web scraper and you only need to pull the data once, then manually compiling it may be more efficient. If it would take 10 hours to build the process, but you need to pull the data once per week for six months, then it is unquestionably worth the time investment.
Part 1: Scraping Data from the HTML
By inspecting the page
Part 2: Scraping Data with JavaScript
Using the Highcharts API