Although there are plenty of other tools for extracting data from a website (take a look at Beautiful Soup), VBA is somehow good, because … well, because it is somehow challenging to do it every time. Yesterday, I answered a question in StackOverflow about extracting financial data, and today I have considered to rewrite the code a bit and write an article about it. Fun, eh?
The idea:
- Extract all the data for Brent Crude Oil(ICE) from the table in the site mrci.com/ohlc/ohlc-all.php.
- Write it to Excel
This is the result in Excel, scraping the shown data:
So, how did I manage to do the “magic”? If you take a look at the Main() function, it is somehow self-explanatory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Sub Main() Dim url As String: url = "https://www.mrci.com/ohlc/ohlc-all.php" Dim target As String: target = "Brent Crude Oil(ICE)" Dim appIE As Object Set appIE = StartIE(url, appIE) Dim allRowsOfData As Variant allRowsOfData = appIE.document.getElementsByClassName("strat") Dim headers As Variant headers = GenerateHeaders(allRowsOfData) Dim title As String title = GenerateTitle(appIE) Dim targetPeriods As Object Set targetPeriods = CreateObject("Scripting.Dictionary") Set targetPeriods = GenerateTargetPeriods(allRowsOfData, headers, target, targetPeriods) WriteToExcel targetPeriods, headers, target, title appIE.Quit End Sub |
At first, the website is loaded. Then the headers are recorded and written to an array called headers. Title is generated. A dictionary titlePeriods is filled with data. At the end, the data is written to Excel. Now let’s start function by function.
StartIE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Public Function StartIE(url As String, appIE As Object) As Object Set appIE = CreateObject("InternetExplorer.Application") With appIE .Navigate url .Visible = True End With WaitSomeMilliseconds 2000 Do While appIE.Busy: DoEvents: Loop Set StartIE = appIE End Function |
The idea is to keep away some lines from the Main() function and to start the Internet Explorer separately. WaitSomeMilliseconds 2000 is somehow brutal, but the DoEvents sometimes runs quite fast and appIE is not even starting to be “Busy”.
GenerateHeaders and GenerateTitle
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Public Function GenerateHeaders(allRowsOfData As Variant) As Variant Dim headers As Variant headers = Split(allRowsOfData.Rows(2).innerText, vbCrLf) headers = RemoveEmptyElementsFromArray(headers) GenerateHeaders = headers End Function Public Function GenerateTitle(appIE As Object) As String Dim titleObj As Variant titleObj = appIE.document.getElementsByClassName("title1") GenerateTitle = titleObj.innerText End Function |
Here we get the headers and the title for the report. The headers are the second row from the table:
After getting them, we split them by new line (because they come into different lines from .innerText property). Cleaning and removing the empty values in the array is done with RemoveEmptyElementsFromArray(headers). Concerning the title, it is quite self-explanatory – “title1” is the class name of the element and its innerText is “Daily Future Price Listing Fri November 29, 2019”.
GenerateTargetPeriods
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
Public Function GenerateTargetPeriods(allRowsOfData As Variant, headers As Variant, target As String, targetPeriods As Object) As Variant Dim child As Variant Dim child2 As Variant Dim myKey As Variant Dim i As Long Dim found As Boolean: found = False For Each child In allRowsOfData.Children For Each child2 In child.Children If InStr(1, child2.innerText, target) Then found = True If InStr(1, child2.outerhtml, "th class=") And (Not InStr(1, child2.outerhtml, target) > 0) Then found = False End If If found And child2.Cells.Length = UBound(headers) + 1 Then myKey = target & child2.Children(0).innerText targetPeriods.Add myKey, New Collection For i = LBound(headers) To UBound(headers) targetPeriods(myKey).Add (child2.Children(i).innerText) Debug.Print child2.Children(i).innerText Next End If Next Next Set GenerateTargetPeriods = targetPeriods End Function |
Generating the target periods is the “meat” (I am not vegetarian) of the code. It returns a dictionary with key string and value list. In the key, I am writing the target and the first cell of the row:
Once it is generated, we return to writing the data to Excel.
WriteToExcel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
Public Sub WriteToExcel(targetPeriods As Object, headers As Variant, target As String, title As String) Dim row As Long: row = 1 Dim col As Long: col = 1 Dim element As Variant Dim unit As Long Dim child As Variant With Worksheets(1) .Cells.Delete .Cells(row, col) = title row = row + 1 .Cells(row, col) = target row = row + 2 For Each element In headers .Cells(row, col) = element col = col + 1 Next row = row + 1 For Each child In targetPeriods col = 1 For unit = 1 To targetPeriods(child).Count .Cells(row, col) = targetPeriods(child)(unit) col = col + 1 Next row = row + 1 Next child Dim startingCol As String Dim endingCol As String startingCol = NumberToLetters(2) endingCol = NumberToLetters(UBound(headers) + 1) .Columns(startingCol & ":" & endingCol).EntireColumn.AutoFit End With End Sub |
Writing to Excel is nothing advanced – we loop through the worksheet, taking into account its rows and columns, incrementing the columns and the rows, whenever we need to. Probably it is a bit tricky at the beginning, as far as the columns should always start from 1 on the new row and the rows should be always incremented, but after some time it becomes clear.
Auxiliary Functions
-
WaitSomeMilliseconds
This is an old function, that I use for “asking” Excel to freeze for a couple of seconds, to compensate for some slow internet connection:
1 2 3 4 5 |
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) Public Sub WaitSomeMilliseconds(Optional Milliseconds As Long = 1000) Sleep Milliseconds End Sub |
-
NumberToLetters
This one is used in the column autofit of the WriteToExcel function – .Columns(startingCol & ":" & endingCol).EntireColumn.AutoFit
1 2 3 |
Public Function NumberToLetters(number As Long) As String NumberToLetters = Split(Cells(1, number).Address, "$")(1) End Function |
-
RemoveEmptyElementsFromArray
This is an interesting function, which removes the empty elements of an array. And it does it so nicely, I will add it to my VBA_personal repo at GitHub:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Public Function RemoveEmptyElementsFromArray(myArray As Variant) As Variant Dim i As Long, j As Long ReDim newArray(LBound(myArray) To UBound(myArray)) For i = LBound(myArray) To UBound(myArray) If Trim(myArray(i)) <> "" Then j = j + 1 newArray(j) = myArray(i) End If Next i ReDim Preserve newArray(LBound(myArray) To j - 1) RemoveEmptyElementsFromArray = newArray End Function |
Pretty much that’s it. The whole code is available in GitHub here:
https://github.com/Vitosh/VBA/tree/master/ScrapingDataFromWebsiteWithVBA
Enjoy it!