Sometimes, you may need to scrape some structured data from the internet with Excel. Honestly, Excel. Honestly, Excel should be your last weapon of choice there, as far as it is really not a web-scraping software. Still, it is quite possible, if you give it a try.
What you need is some understanding about how HTML works, its structure and some advanced VBA skills. Let’s say you would like to the title and the link of the VBA books on Amazon.com. Then, you open Amazon.com with a search in it like this one:
https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=vba
and start examining the browser’s elements with F12. After some time, you realize that the element you need are the following 2 classes:
- s-results-list-atf
- a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal
How do you realize this? By moving your mouse to the elements and seeing what is being highlighted (I use Google Chrome for this). Then the code, which prints the books’ titles and their links is quite short:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
Sub TestMe() Dim appIE As Object Set appIE = CreateObject("InternetExplorer.Application") With appIE .Navigate "https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=vba" .Visible = True End With Do While appIE.Busy DoEvents Loop Dim allData As Object Set allData = appIE.document.getElementById("s-results-list-atf") Dim book As Object For Each book In allData.getElementsByClassName("a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal") Debug.Print book.innerText Debug.Print book.Href Debug.Print vbCrLf Next End Sub |
Yup, it prints everything in the immediate window:
Seems easy and nice. Then why is not VBA a good tool for scraping?
Well, if one decides to prints the prices of the books, an additional search for the prices and their class should be carried out. Additionally, not all books in Amazon have all prices – e.g. some are not available on Kindle and some are only on Kindle. This means, that a simple loop through the prices should be a bit meaningless, they should be mapped to the books. Which is possible, but takes some time to be written. Then another problem pops up – amazon has many pages of results:
which is quite tough – you have to think about some logic that clicks on the pages and scrapes all the data again. This is not rocket science, but still requires some time. And probably some additional code, which would be hardly reusable anywhere else. Thus, one may really say that there should be other, better ways to do it. The inspiration for this article came to me after studying for UiPath certification – I have noticed that with that software this task is literally a 5 minute job with no coding required. Thus, I have decided to take a look how quickly I can do something like that in VBA. Well, it was quick (I would not lie), but I was quite picky when choosing the task – I did not write any prices nor additional information and I have decided not automate the Next clicking.
So, take home message from this article – there is an automation software on the market which does not require any programming for standardized tasks like the one above.