VBA – Benford Analysis (First-Digit Law)

Long time ago, an astronomer called Benford was taking a look at logarithm books, noticing that the earlier pages (starting with 1) are pretty much more worn out than the later ones. He thought about the result, and managed to come up with a law, that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.

What does this mean? Pretty much, that if you randomly choose 100 numbers from 0 to +infinity, about there is about 0.30 probability, that the leading digit of the number you have chosen is 1 and and 0.176 that the leading digit is 2. These probabilities are quite higher than the 0.11, which one would expect, dividing 1/9. Thus, the formula for the first digit could be calculated like this in the VBEditor immediate window for the digit “3”:

Why is that actually important and where is it applicable? It works quite perfectly for detecting fake numbers and financial frauds. In general, if someone decides to fabricate numbers in a report, then the first digits are distributed quite uniformly and a comparison with the Benford’s expected distribution shows the fraud. It seems like a joke or fake news, but in a lot of countries this is one of the “weapons” of the financial authorities, in order to locate financial frauds. In US, evidence based on Benford’s law is admitted in court. Thus, you better not fabricate numbers in reports.

The task

I have decided to make a small example of the Benford’s law in VBA, gathering numeric data from Excel range and checking its first digits, and comparing their distribution against the expected one. The final report looks like this:

The report is in Notepad, the input is of random numbers, thus Expected and Real percentages deviate highly

The code in a module

The code in a module consists of 3 Subs, MainBenfordCheck being the starting point of the program. CreateLogFile generates the report and CodifyTime makes sure that the report always has a unique name, thus one less thing to take care of:

 

The code in class “BenfordModel”

The class is responsible for processing the model and return the expected distribution of the first significant digit. Pretty much, it gathers the digits, assigns them to a benfordCheckValues array and at the end calculates the percentage vs the expected percentage.

 

The result

If we use some natural numbers like all points of the ranked table tennis man (source ) we may try to see how good the Benford’s law works there.

The ranking in table tennis is quite different than the one in court tennis, actually

After running the model on all 1222 ranked players, who have at least 1 point for the table tennis ranking, this is the result we get:

The number of points, starting with digit “1” is 413

The real percentages are deviating to the expected ones, but still the results are not 11,1% for any digit, nor we get 57% anywhere as in the randomly generated values by Excel from the first report.

The code is available in GitHub, including the table tennis data.

Tagged with: , ,