VBA – Extracting financial data from a website in table format

Although there are plenty of other tools for extracting data from a website (take a look at Beautiful Soup), VBA is somehow good, because … well, because it is somehow challenging to do it every time. Yesterday, I answered a question in StackOverflow about extracting financial data, and today I have considered to rewrite the code a bit and write an article about it. Fun, eh?

The idea:

  • Extract all the data for Brent Crude Oil(ICE) from the table in the site mrci.com/ohlc/ohlc-all.php.
  • Write it to Excel

This is the result in Excel, scraping the shown data:


So, how did I manage to do the “magic”? If you take a look at the Main() function, it is somehow self-explanatory:

Sub Main()
    
    Dim url As String: url = "https://www.mrci.com/ohlc/ohlc-all.php"
    Dim target As String: target = "Brent Crude Oil(ICE)"
    
    Dim appIE As Object
    Set appIE = StartIE(url, appIE)

    Dim allRowsOfData As Variant
    allRowsOfData = appIE.document.getElementsByClassName("strat")
    
    Dim headers As Variant
    headers = GenerateHeaders(allRowsOfData)
    
    Dim title As String
    title = GenerateTitle(appIE)
    
    Dim targetPeriods As Object
    Set targetPeriods = CreateObject("Scripting.Dictionary")
    Set targetPeriods = GenerateTargetPeriods(allRowsOfData, headers, target, targetPeriods)
    
    WriteToExcel targetPeriods, headers, target, title
    
    appIE.Quit

End Sub

At first, the website is loaded. Then the headers are recorded and written to an array called headers. Title is generated. A dictionary titlePeriods is filled with data. At the end, the data is written to Excel. Now let’s start function by function.

StartIE

Public Function StartIE(url As String, appIE As Object) As Object
    
    Set appIE = CreateObject("InternetExplorer.Application")
    With appIE
        .Navigate url
        .Visible = True
    End With

    WaitSomeMilliseconds 2000
    Do While appIE.Busy: DoEvents: Loop
    
    Set StartIE = appIE
    
End Function

The idea is to keep away some lines from the Main() function and to start the Internet Explorer separately. WaitSomeMilliseconds 2000 is somehow brutal, but the DoEvents  sometimes runs quite fast and appIE is not even starting to be “Busy”.

GenerateHeaders and GenerateTitle

Public Function GenerateHeaders(allRowsOfData As Variant) As Variant

    Dim headers As Variant
    headers = Split(allRowsOfData.Rows(2).innerText, vbCrLf)
    headers = RemoveEmptyElementsFromArray(headers)
    GenerateHeaders = headers
    
End Function

Public Function GenerateTitle(appIE As Object) As String
    
    Dim titleObj As Variant
    titleObj = appIE.document.getElementsByClassName("title1")
    GenerateTitle = titleObj.innerText
    
End Function

Here we get the headers and the title for the report. The headers are the second row from the table:

After getting them, we split them by new line (because they come into different lines from .innerText property). Cleaning and removing the empty values in the array is done with RemoveEmptyElementsFromArray(headers). Concerning the title, it is quite self-explanatory – “title1” is the class name of the element and its innerText is “Daily Future Price Listing Fri November 29, 2019”.

GenerateTargetPeriods

Public Function GenerateTargetPeriods(allRowsOfData As Variant, headers As Variant, target As String, targetPeriods As Object) As Variant

    Dim child As Variant
    Dim child2 As Variant
    Dim myKey As Variant
    Dim i As Long
    Dim found As Boolean: found = False
    
    For Each child In allRowsOfData.Children
        For Each child2 In child.Children
        
            If InStr(1, child2.innerText, target) Then found = True
            If InStr(1, child2.outerhtml, "th class=") And (Not InStr(1, child2.outerhtml, target) > 0) Then
                found = False
            End If
            
            If found And child2.Cells.Length = UBound(headers) + 1 Then
                
                myKey = target & child2.Children(0).innerText
                targetPeriods.Add myKey, New Collection
                
                For i = LBound(headers) To UBound(headers)
                    targetPeriods(myKey).Add (child2.Children(i).innerText)
                    Debug.Print child2.Children(i).innerText
                Next
                
            End If
        Next
    Next
    
    Set GenerateTargetPeriods = targetPeriods

End Function

Generating the target periods is the “meat” (I am not vegetarian) of the code. It returns a dictionary with key string and value list. In the key, I am writing the target and the first cell of the row:

Once it is generated, we return to writing the data to Excel.

WriteToExcel

Public Sub WriteToExcel(targetPeriods As Object, headers As Variant, target As String, title As String)
    
    Dim row As Long: row = 1
    Dim col As Long: col = 1
    Dim element As Variant
    Dim unit As Long
    Dim child As Variant
    
    With Worksheets(1)
        .Cells.Delete
        .Cells(row, col) = title
        row = row + 1
        .Cells(row, col) = target
        row = row + 2
        
        For Each element In headers
            .Cells(row, col) = element
            col = col + 1
        Next
        row = row + 1
        
        For Each child In targetPeriods
            col = 1
            For unit = 1 To targetPeriods(child).Count
                .Cells(row, col) = targetPeriods(child)(unit)
                col = col + 1
            Next
            row = row + 1
        Next child
        
        Dim startingCol As String
        Dim endingCol As String
        startingCol = NumberToLetters(2)
        endingCol = NumberToLetters(UBound(headers) + 1)
        
        .Columns(startingCol & ":" & endingCol).EntireColumn.AutoFit
    End With
    
End Sub

Writing to Excel is nothing advanced – we loop through the worksheet, taking into account its rows and columns, incrementing the columns and the rows, whenever we need to. Probably it is a bit tricky at the beginning, as far as the columns should always start from 1 on the new row and the rows should be always incremented, but after some time it becomes clear.

Auxiliary Functions

  • WaitSomeMilliseconds

This is an old function, that I use for “asking” Excel to freeze for a couple of seconds, to compensate for some slow internet connection:

Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)

Public Sub WaitSomeMilliseconds(Optional Milliseconds As Long = 1000)
    Sleep Milliseconds
End Sub
  • NumberToLetters

This one is used in the column autofit of the WriteToExcel  function – .Columns(startingCol & “:” & endingCol).EntireColumn.AutoFit

Public Function NumberToLetters(number As Long) As String
    NumberToLetters = Split(Cells(1, number).Address, "$")(1)
End Function
  • RemoveEmptyElementsFromArray

This is an interesting function, which removes the empty elements of an array. And it does it so nicely, I will add it to my VBA_personal repo at GitHub:

Public Function RemoveEmptyElementsFromArray(myArray As Variant) As Variant
    
    Dim i As Long, j As Long
    ReDim newArray(LBound(myArray) To UBound(myArray))
    
    For i = LBound(myArray) To UBound(myArray)
        If Trim(myArray(i)) <> "" Then
            j = j + 1
            newArray(j) = myArray(i)
        End If
    Next i
    
    ReDim Preserve newArray(LBound(myArray) To j - 1)
    RemoveEmptyElementsFromArray = newArray
    
End Function

Pretty much that’s it. The whole code is available in GitHub here:

https://github.com/Vitosh/VBA/tree/master/ScrapingDataFromWebsiteWithVBA

Enjoy it!