Long time ago, an astronomer called Benford was taking a look at logarithm books, noticing that the earlier pages (starting with 1) are pretty much more worn out than the later ones. He thought about the result, and managed to come up with a law, that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.
What does this mean? Pretty much, that if you randomly choose 100 numbers from 0 to +infinity, about there is about 0.30 probability, that the leading digit of the number you have chosen is 1 and and 0.176 that the leading digit is 2. These probabilities are quite higher than the 0.11, which one would expect, dividing 1/9. Thus, the formula for the first digit could be calculated like this in the VBEditor immediate window for the digit “3”:
1 2 3 |
digit = 3 ?Round(WorksheetFunction.Log10(1 + 1 / digit), 3) 0,125 |
Why is that actually important and where is it applicable? It works quite perfectly for detecting fake numbers and financial frauds. In general, if someone decides to fabricate numbers in a report, then the first digits are distributed quite uniformly and a comparison with the Benford’s expected distribution shows the fraud. It seems like a joke or fake news, but in a lot of countries this is one of the “weapons” of the financial authorities, in order to locate financial frauds. In US, evidence based on Benford’s law is admitted in court. Thus, you better not fabricate numbers in reports.
The task
I have decided to make a small example of the Benford’s law in VBA, gathering numeric data from Excel range and checking its first digits, and comparing their distribution against the expected one. The final report looks like this:
The code in a module
The code in a module consists of 3 Subs, MainBenfordCheck being the starting point of the program. CreateLogFile generates the report and CodifyTime makes sure that the report always has a unique name, thus one less thing to take care of:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
Public Sub MainBenfordCheck(myRange As Range) Dim myCell As Range Dim benford As New BenfordModel For Each myCell In myRange If IsNumeric(myCell) Then benford.IncrementValue Abs(myCell.value) benford.IncrementCount End If Next myCell CreateLogFile benford.CreateBenfordLawReport End Sub Public Sub CreateLogFile(Optional report As String) On Error GoTo CreateLogFile_Error Dim newFilePath As String newFilePath = "\tests_info" Dim filename As String filename = ThisWorkbook.Path & newFilePath & CodifyTime(True) If Dir(ThisWorkbook.Path & newFilePath, vbDirectory) = vbNullString Then MkDir ThisWorkbook.Path & newFilePath Dim fs As Object Set fs = CreateObject("Scripting.FileSystemObject") Dim notepad As Object Set notepad = fs.CreateTextFile(filename, True) Dim header As String header = Now & vbCrLf & "Created by: " & Environ("USERNAME") notepad.WriteLine header notepad.WriteLine report notepad.Close Dim shellCommand As String shellCommand = "C:\WINDOWS\notepad.exe " shellCommand = shellCommand & filename Shell shellCommand On Error GoTo 0 Exit Sub CreateLogFile_Error: Debug.Print "Error " & Err.Number & " (" & Err.Description & ") in procedure CreateLogFile of Sub mod_TDD_Export" End Sub Public Function CodifyTime(Optional makePath As Boolean = False) As String On Error GoTo codify_Error Dim timePart01 As Double Dim timePart02 As Double Dim timePartNow As Double timePartNow = Round(Now(), 8) timePart01 = Split(CStr(timePartNow), ",")(0) timePart02 = Split(CStr(timePartNow), ",")(1) CodifyTime = Hex(timePart01) & "_" & Hex(timePart02) If makePath Then CodifyTime = "\" & CodifyTime & ".txt" On Error GoTo 0 Exit Function codify_Error: MsgBox "Error " & Err.Number & " (" & Err.Description & ") in procedure codify of Function TDD_Export" End Function |
The code in class “BenfordModel”
The class is responsible for processing the model and return the expected distribution of the first significant digit. Pretty much, it gathers the digits, assigns them to a benfordCheckValues array and at the end calculates the percentage vs the expected percentage.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
Private benfordCheckValues As Variant Private benfordCount As Long Sub Class_Initialize() Dim counter As Long ReDim benfordCheckValues(9) For counter = LBound(benfordCheckValues) To UBound(benfordCheckValues) benfordCheckValues(counter) = 0 Next counter End Sub Function InitialValuesBenford(val As Long) As Double '1 = "30,1%" '2 = "17,6%" '3 = "12,5%" '4 = " 9,7%" '5 = " 7,9%" '6 = " 6,7%" '7 = " 5,8%" '8 = " 5,1%" '9 = " 4,6%" InitialValuesBenford = Round(WorksheetFunction.Log10(1 + 1 / val), 3) End Function Function PercentageFixer(valToReturn As Double) As String If valToReturn > 0.1 Then PercentageFixer = Trim(Format(valToReturn, "##.0%")) ElseIf valToReturn = 0 Then PercentageFixer = " " & Format(valToReturn, "0.0%") Else PercentageFixer = " " & Format(valToReturn, "#.0%") End If End Function Function CreateBenfordLawReport() As String Dim line As String: line = "---------------------------------" On Error GoTo CreateBenfordLawReport_Error Dim counter As Long CreateBenfordLawReport = line & line & line & vbCrLf _ & line & line & line & vbCrLf _ & line & line & line & vbCrLf _ & "Benford's Law" & vbCrLf & "https://en.wikipedia.org/wiki/Benford%27s_law" & vbCrLf For counter = LBound(CheckValues) To UBound(CheckValues) If counter = 0 Then Dim header As String header = CreateBenfordLawReport & vbCrLf & "#" & vbTab & _ "-> " & "Val." & vbTab & "Real%" & vbTab & "Expected" CreateBenfordLawReport = header Else CreateBenfordLawReport = CreateBenfordLawReport & vbCrLf & counter & vbTab & _ "-> " & CheckValues(counter) & vbTab & _ PercentageFixer(Round(CheckValues(counter) / Me.Count, 3)) & vbTab & _ PercentageFixer(InitialValuesBenford(counter)) & vbTab & "|" End If If counter = 0 Or counter = 9 Then CreateBenfordLawReport = CreateBenfordLawReport & vbCrLf & line End If Next counter On Error GoTo 0 Exit Function CreateBenfordLawReport_Error: CreateBenfordLawReport = "Not enough data..." End Function Property Get CheckValues() As Variant CheckValues = benfordCheckValues End Property Property Get Count() As Long Count = benfordCount End Property Sub IncrementCount() benfordCount = benfordCount + 1 End Sub Sub IncrementValue(valToInput As Variant) Dim leftDigit As Variant leftDigit = Left(valToInput, 1) benfordCheckValues(leftDigit) = benfordCheckValues(leftDigit) + 1 End Sub |
The result
If we use some natural numbers like all points of the ranked table tennis man (source ) we may try to see how good the Benford’s law works there.
After running the model on all 1222 ranked players, who have at least 1 point for the table tennis ranking, this is the result we get:
The real percentages are deviating to the expected ones, but still the results are not 11,1% for any digit, nor we get 57% anywhere as in the randomly generated values by Excel from the first report.
The code is available in GitHub, including the table tennis data.