Data can have characteristic that surprise us. One such surprise is that across numbers…

Background and Aims

Data can have characteristic that surprise us. One such surprise is that across numbers

representing many types of data the first digit has a systematic and predictable distribution, in particular it

has a log distribution such that about 30% of the time the first digit is 1, 18% it is 2, down to 5% for the

digit 9. This was first shown by the engineer Frank Benford, so this first digit distribution is known as

Benford’s law. Since he demonstrated it in Benford (1938) it has been shown to hold for a large amount

of data.

Nigrini (1999) reports on the use of Benford’s law as a tool for detecting fraud. If lots of normal

financial data conforms Benford’s law, then deviation from Benford’s law could be evidence of human

manipulation of the data. He presents examples of Benford’s law being used successfully to detect fraud.

However this application of Benford’s law assumes that people do not produce numbers that fit to

Benford’s law when they try to generate data. In the experiment we ran in ATHK1001 tutorials in Week 2

we tested this assumption by having participants generate numerical answers to question that few knew

the answer to. As well as general knowledge question, we asked participants to estimate quantities. If

these numbers follow Benford’s law then it may be harder to use Benford’s law to detect fraud because it

would demonstrate that people may naturally conform to the law, at least under some circumstances.

Nigrini (1999) also pointed out that not all numbers fit to Benford’s law, in particular, arbitrary

numbers like receipt numbers would not be expected to. So in this experiment we tested if people’s

responses would also show a distinction between meaningful and meaningless numbers.

If people do generate data that tends to conform to Benford’s law then there may individuals who

do so more than others, and they may be consistent across tasks. Therefore we examined whether there

was a correlation between a measure of how close individuals were to Benford’s law for meaningful

items and estimation items.

"Is this question part of your assignment? We can help"