Talk:Benford's law/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4

Deletion explained

I deleted this statement (in the restricted data set analysis): " In the case of the villages it could be applied, but expecting only first numbers to be 3, 4, 5, 6, 7, 8, 9 each getting the same relative probabilities as in the general case."

This is incorrect. In this case, the first digit is the one-digit approximation to the distribution over the restricted range. The first digit won't follow Benford's law, but instead will follow the underlying distribution, whatever it is-- for example, for the villiages case, it might be a normal distribution (exp-(-x^2));

Benford's law does not apply to a spread of numbers that spread less than an order of magnitude. For example, the first digit of the height of adult male humans in meters, or feet, do not follow Benford's law, although the first digit of height in mm does.

'Fraid not. The first digit of height in mm is the same as the first (non-zero) digit of the height in meters.--Henrygb 21:05, 18 October 2006 (UTC)

I removed the sentence about the spread of numbers above as the example is wrong. The reason the height of the human population does not obey Benford's law is that is a normal distribution (as explained elsewhere in the article) If the assertion is true, then somebody will need find a better example.195.212.29.171 16:47, 21 August 2007 (UTC) Richard Nicholas.

First digit or all digits

I thought this law only applies to the first digit of numbers, but the article seems to suggest it works for all digits. Is this correct? AxelBoldt 11:32, 4 February 2002 (UTC)

Well, it suggests that 13 would be more common than 15, for instance, yes. That's just the same law applied to base 100. —Simetrical (talk • contribs) 02:18, 14 February 2006 (UTC)
It applies to second digit, etc, but with a much smaller deviation from a uniform distribution. The deviation from uniform of the first digit is so profound that it is easy to statistically etect it in real data. I think you would be hard pressed to statistically detect it in the second digit in real data. Bubba73 (talk), 02:23, 14 February 2006 (UTC)

Links wanted

I am pleased to see how my initial, very basic and imprecise article has in a few days evolved into something quite decent. But I think this law deserves to be much more widely known, so it would be nice if there were more links to it from other pages. At the moment there is just one! Maybe the Wikipedians who have delved more deeply into the articles about statistics and probability have an idea about where suitable links could be placed? Calypso 15:51, 25 February 2002 (UTC)

Proof vs. demonstration

"This can be proven mathematically: if one repeatedly "randomly" choses a probability distribution and then randomly choses a number according to that distribution, the resulting list of numbers will obey Benford's law."
is that proof or a demonstration? -- Tarquin 01:44, 26 June 2002

Explanations

The first two 'explanations' are patently absurd. The second shows that Benford's law is a limiting form of the zeta distribution but doesn't say why it works. The first never gives Benford's law. More precisely, as you count, the proportion of 1s increases, then the proportion of 2s until it equals the 1s, then the proportion of 3s, and so on. At each stage there are at most 3 different proportions.

As far as I can see, Hill's explanation is the most likely.--Terry Moore (218.101.112.28 (talkcontribs)) 10:17, 6 March 2004 (UTC)

Right. So if you had an infinite run of digits (1 to infinity), then Benford's law wouldn't hold - each initial digit would occur an equal number of times. But no real-life data set is infinite, so all real-life data sets have a highest (maximum) figure. Suppose that the maximum is set at random; then, because of the way the proportions work (first mostly 1s, then 2s catching up, then 3s catching up), whatever range you end up with will tend to favour 1s over 2s, 2s over 3s and so on. That's why it works with real-life data, but not, say, with the phone book. Toby W 10:25, 6 Mar 2004 (UTC)
Incorrect. Benford's law applies precisely because the set of values is unbounded and not constrained by a particular distribution; a collection of values which may be infinitely large will still follow this law. ᓛᖁ♀ 22:43, 9 November 2005 (UTC)

Not quite--there are only three possible proportions at each stage. However, you get Benford's law if you count geometrically. Suppose you invest $1 at 7% compound interest. Then your investment doubles roughly every 10 years. For the first 10 years your monthly statement will show a first digit of 1, but it goes through 2 and 3 over the next 10 years. When it reaches 5 it only takes 10 years to cycle through 5, 6, 7, 8 and 9 before getting to 1 again.--Terry Moore (192.195.12.4 (talkcontribs)) 00:08, 18 May 2004 (UTC)

Sequential access

Shyamal's recent addition: Benford was an astronomer, and it is generally believed that the law was discovered when he noticed that the early pages of the book of logarithms were more used than the later ones. However, it has been argued that any book that is sequentially accessed would show more wear and tear on the earlier pages. This story might thus be apocryphal, just like Newton's supposed discovery of gravity from observation of a falling apple.

True, any book that is sequentially accessed would show more wear and tear on the earlier pages. But isn't that the point? Logarithm tables aren't sequentially accessed, are they? You turn to the page which has the number whose log you want to know (like a dictionary); you don't read it from start to finish (like a novel). So, if you more frequently want to know log 1 or 10 or 100 than you do 9 or 99 or 999, the earlier pages will show more wear and tear than the later ones. Besides, isn't the historical question of whether the law occurred to Benford when he looked at the wear and tear on a logarithm book independent of the question of whether he was right to jump to the conclusion he did in fact jump to? Toby W 09:40, 25 Mar 2004 (UTC)
It was Simon Newcomb who was the astronomer, and when I was in graduate school in astronomy, the discovery was indeed attributed to the wear on the early pages of his log tables, for the reasons stated. Newcomb was the director of the United States Naval Observatory around the turn of the 20th century. It would not surprise me if the actual log tables still existed in the USNO archives.
Frank Benford was not an astronomer; he was a physicist and electrical engineer, according to his WikiPedia entry. I've never heard the story about the log tables attributed to him. Bill Jefferys (talk) 18:37, 27 May 2008 (UTC)
You are correct, but you are responding to a comment from four years ago - the article hasn't had those comments in a long, long time. - DavidWBrooks (talk) 18:49, 27 May 2008 (UTC)

Incorrect statement removed

I have taken out The product of n uniform random numbers will conform to Benford's law. (The sum of n uniform random numbers tends towards a normal distribution.) since it is not really true (try uniform on [0,1] and n=2, or uniform on [10,11] and n=7). In fact it tends to a log-normal distribution, and only comes close to Benford's law when the variance is large, i.e. for large enough n. --Henrygb 16:25, 21 Jun 2004 (UTC)

Unclear statement

The following statement is unclear:

"The precise form of Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed."

To be uniformly distributed, a variable must have a largest and smallest possible value. These are not specified in the discussion.

This objection is correct. (One could also give a meaning to uniform distributions on unbounded intervals, but there is no need of such an assumption here). The sentence above should be made precise this way:

"The precise form of Benford's law can be explained if one assumes that the MANTISSAS of the numbers (that is, the fractional parts of logarithms) are uniformly distributed in the unit interval." PMajer 10:33, 28 December 2006 (UTC)


The explanation based on scale-invariance is unclear as well. The preceding unsigned comment was added by Pvh (talk • contribs) 20:25, 10 May 2005 (UTC).

Right, the product of two uniformly-distributed numbers is not a Benford distribution, but it is approching it. Multiplying my n independent numbers approaches Benford's law, and n=3 is already pretty close (close enough to be a statistical match, IIRC). Bubba73 (Bubba73 (talkcontribs)) 18:20, 11 May 2005 (UTC)

Problems

If I understand this law correctly, if I go to the telephone book (White Pages) 

and start listing the first digit of each house number for sequential entries, skipping entries of the same surname at the same address, I will have a sequence of first digits that should be distributed according to Benford's law. It does not happen. Frankly I don't see why it should.

The reason it doesn't is because Benford's law only applies to *naturally occurring* lists, not lists that are assigned specific number by people. For instance, addresses are assigned by people, within a given range. Invoice amounts, however, are NOT specifically assigned by people, and hence follow this distribution. —Preceding unsigned comment added by 208.65.192.1 (talk) 21:31, 12 February 2008 (UTC)

Solutions!

Software developer here! I thought this too and didn't understand the explanation, so, annoyed, I wrote a program to pick random numbers to prove this law wrong. I decided to use values between 0 and 9999. Halfway through writing it I suddenly realized that 9999 is an artificially selected number, and in realizing this, I also realized why this law works. The phonebook example doesn't follow this pattern because if you list all possible phone numbers they range between 111-1111 and 999-9999 (depending on country). This 999-9999 is an artificially selected maximum. If you started handing out phone numbers beginning with 111-1111 and incrementally counting upward though, we'd have a lot of numbers starting with 1 and not a lot starting with 9. The real world doesn't pick maximum numbers made entirely of 9's. Here's an explanation for non-math non-statistic people if that doesn't clarify.

If we simply count all the numbers between 1 and 9999, 11% of the numbers (1, 10-19, 100-199, and 1000-1999) start with 1's (which is expected because there's only 9 possible digits a number can start with; numbers don't start with 0). HOWEVER, 9999 is a very carefully selected number. If we LOWER our maximum value by oh... 8,000... and count from 1 to 2,000, we again have 1, 10-19, 100-199, and 1000-1999 (again 1,111 numbers) starting with 1, but there are only 2,000 numbers. 1 is the first digit with a 55.5% frequency. If we had gone HIGHER by 8,000 instead of lower by 8,000 (18,000), we would have added 8,000 more numbers beginning with 1 (10,000-18,000), resulting in 9,111 out of 18,000 (again about 50%). In fact, regardless of which direction we pick (up or down), the farther we travel from 9999, the higher the number of numbers beginning with 1.

If anyone wants to write software to simulate this, simply make sure that your maximum possible value is randomly selected before picking random numbers. Also keep in mind that if your maximum possible value is chosen as a number between 1 and 9999, this is already disrupting the law. As the number of randomizations done to the maximum possible value increases to infinity prior to picking random numbers, the distribution approaches Benford's Law. —Preceding unsigned comment added by 76.171.61.100 (talk) 16:04, 17 December 2007 (UTC)

Newton's apple

Some version of Newton's apple story is probably true. See

So I removed mention of it here. --Mmm 05:35, 25 March 2006 (UTC)


Explanation (again)

Someone (above) mentioned their dissatisfaction with the explanation given for this "law". I am not happy with it either. I think I can improve on the reasoning, although I'm not all the way there yet. I'll present my partial result here, with the wish that it may stimulate discussion and eventual consensus on a convincing and (hopefully) easily understood explanation:


BL struck me immediately as sensible, although the explantions published in the article tend to miss the point as far as I'm concerned. Here are a couple of examples to show how BL works:

Street Numbers A suburban developer decides to number his streets (as opposed to naming them). He will start at 1st Street and continue on with some generally small sequence. Taking the totality of all developments of this type, it is very obvious that 1st Street will occur much more frequently than 65th Street; as all such developments will have a 1st Street, whereas few will be large enough to have a 65th Street. Of course, this affects the distribution of first digits. Eg, 1 will occur more frequently than 6 because more developments will stop at, say 15th Street than go up to 65th Street.

Billing It is frankly more difficult to convincingly account for the preponderance of lower digits occuring in day to day bills. Nonetheless, I do have some thoughts. Perhaps they may inspire another reader to arrive at a rigorous explanation.

I suspect that something similar to the 'street number argument' may account, in part at least, for a preponderance of smaller digits in the leading digit of household bills. To undertand this it is useful to remember that the value of a unit of currency is not a random variable, but rather is chosen, from motives of convenience, to correspond roughly with to the price of small, day to day purchases:

Imagine you were to find yourself at a grocery store checkout counter in an unknown country, having to purchase a single red apple. You have no idea of the currency value. Still, it would be a reasonable for you to expect the bill to come in at something like 1 currency unit, and you would probably be right to feel some suspicion if the cashier were to hand you a bill for 57,844 currency units.

Of course, the 57,000 CU apple does happen in places. But this is always an indication that the currency has departed severely from its original value (run away inflation). Usually this is a temporary anomaly. At some point the government will introducing a new currency, thereby rescaling prices.

It is also true that certain private purchases (real estate, for instance) will require exchanges in the thousands or even millions of currency units. But the point is that these purchases are likely to be for infrequently purchased items. It seems reasonable to expect that the phone bill, the electricity bill, etc. will be a small number of currency units and, hence, disporportionally likely to begin with a small digit. Later digits in a bill amount may indeed be evenly distributed (or, nearly so), but a preponderance of low digits in the leading figure is enough to influence the cumulative result.

In addition to this argument, I suspect another effect contributes to enhance the probability for low digits in household bills. This effect is an outcome of the fact that the standard deviation in billing amounts is generally a factor of the average bill amount, rather than some constant amount.

To take an example, suppose that electricity bills typically cluster about 25% from some average value. If this average happened to be, say 15 CU, then the range from 11.25 CU to about 18,75 CU are common, so the majority of bills will begin with a 1. If, on the other hand, the average bill were 85 CU, then the leading digit would not be so highly concentrated at 8. Rather, there would be a range of high freqency leading digits (6,7,8,9 and 1). Of course, 8 will also occur as a leading digit with a fairly high frequency if the average bill were CU 75 or CU 95 as well. Nevertheless, as the spread of leading digits is greater for the higher average bill, the set of circumstances (average and standard deviation bill amounts) giving rise to a higher leading digit is relatively narrower; so that lower digits more likely to occur.

--Philopedia 10:19, 8 May 2006 (UTC)

I'm dubious about the street number argument. The usual discussions of Benford's law that I have seen rely on the idea that a unit of measure is used, e.g., areas of countries, so that rescaling the unit of measurement, e.g., square kilometers to square miles, ought only to shuffle things around, but not affect the distribution. This invariance property yields a logarithmic distribution of first digits since the measure dy/y is invariant to scale changes. But with street numbers, this argument isn't available since you can't rescale them. I'm willing to believe that 1's are more common than 9's in developments, but I don't see where a logarithmic distribution would come from. Bill Jefferys (talk) 22:49, 21 May 2008 (UTC)

Naive explanation?

It is not clear to me why the explanation should "assume" that it is the log of the first numeral that should be evenly distributed rather than the numeral itself. However, couldn't the explanation be simply this: In most counts, quantities, or measurements, unless the datum is 0 (which can only occur in the base0 place, and there are many populations where the datum must be >0), the first digit (regardless of which place) must be ≥ 1, but it needn't be ≥ 2; if it isn't 1, then must be ≥ 2, but it needn't be ≥ 3; and so on to 9. A logarithmic distribution would capture this perfectly.

By the way, in binary (or notches on a stick) the first numeral will always be 1 unless the datum is 0. While this result is obvious, it is consistent with Benford's law and a logarithmic distribution.

Finell (Talk) 22:30, 17 September 2006 (UTC)

number of digits in percentages

This is a response to a comment on my own talk page from User:Das my, who wondered why I reverted his three decimal expansion of the percentage of likelihood of each digit, turning it back into one decimal place. (The rollback button doesn't allow Edit Summaries; a longtime complaint)

I did it because they stuck me as an example of unnecessary accuracy. It might be more accurate in a mathematical sense to say the probability of starting with 1 is 30.103% instead of 30.1%, but it doesn't make the issue any clearer to the reader (the difference, after all, is only one part in 10,000). I just throws in more digits that increase the chance of mis-reading a number. Since these are log calculations, you could expand it to 100 decimal places if you wanted, but so what?

The chart lets casual readers get a quick sense of how the likelihood degreases as digits increase. Keeping one decimal point is the coarsest we can be and still demonstrate how the probability declines with every number - otherwise I'd suggest rounding to the nearest integer. - DavidWBrooks 20:02, 4 October 2006 (UTC)

Suggestion for a better explanation

Like others in this page I find the explanation not entirely satisfactory.

For instance I find it more difficult to accept that the digits of all measurements performed in the decimal system do not follow the uniform distribution in this system, rather than that the distribution of the measurements is not invariant under different units.

However I did recently read a book that gave a more reasonable explanation.

The explanation has to do with the transformation of distributions under multiplication, which is quite common in laws dealing with quantities found in our world.

Suppose the product of any number of independent random variables (we are only concerned with their mantissas here). The book demonstrates that:

a) If any of the variables follows the reciprocal distribution, then the product also follows the same distribution. (The persistence of the reciprocal distribution once it is established.)

b) Even if none of the variables follows the reciprocal distribution, the distribution of the product cannot have a greater distance from the reciprocal distribution than any of the factor variables.

The same is then proved for division and may hold for other operators but the book doesn't go into that.

So unless all our quantities are "fundamental", their distribution tends to be close to the reciprocal (the more complex their calculation the closer).

The book is "Numerical Methods for Scientists and Engineers" by R.W.Hamming and is quite respected in its field as far as I know. So if anyone has access to it, you could consider that section (2.8 in the book) as an addition to the article.

[Edit] Apparently the same author has written an article in Bell Systems Technical Journal (On the Distribution of Numbers, vol. 49, no. 8, pp. 1609-1625, October, 1970) on this subject , so more information could be found there. Panayk 17:21, 5 May 2007 (UTC)

One reason there is never going to be a fully satisfactory explanation for Benford's law is that it is supposed to apply to "real-world" data, and there will never be a mathematically precise way to characterize such data. And in fact it doesn't really apply to all real-world data by any means. Consider the distribution of a human population of 21 years and older as measured in feet. Almost all the data will have first significant digit between 5 and 7 (or another suitably short range if applied to a population that's an outlier, height-wise (such as pygmies or Australian bushmen).

Conclusion: Whatever the "explanation(s)" given in the article, they should all be characterized as heuristic. Even if there are precise mathematical conditions that can be proved to yield the Benford distribution of digits, that's not an explanation of the law until it is connected to the distribution of "real-word data" -- and that can never be done in a precise manner, since just what real-world data consists of can never be made entirerly precise.Daqu 19:26, 13 May 2007 (UTC)

A More Concise Introductory Sentence

Benford's law, also called the First-Digit Law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one third of the time (33% of the time), and larger numbers occur as the leading digit with less and less frequency (as they grow in magnitude), to the point so that 9 is the first digit (of any number within such lists) less than 5% of the time.

Bayesian probability theory predicts Benford's law

And has been predicting it since the early 20th century. To me Benford's law validates the bayesians.

To quote E.T. Jaynes:

Chap. 6: ELEMENTARY PARAMETER ESTIMATION The Jeffreys Prior. Harold Jeffreys (1939; Chap. 3) proposed a different way of handling this problem. He suggests that the proper way to express complete ignorance of a continuous variable known to be positive, is to assign uniform prior probability to its logarithm; i.e., the prior probability density is

p(s|Ij) = 1/s (0 < s < inf)

Of course, you can't normalize this, but that doesn't stop you from using it. In many cases, including the present one, it can be used directly because all the integrals involved converge. In almost all cases we can approach this prior as the limit of a sequence of proper (normalizable) priors, with mathematically well--behaved results. If even that does not yield a proper posterior distribution, then the [analysis] is warning us that the data are too uninformative about either very large s or very small s to justify any definite conclusions, and we need to get more evidence before any useful inferences are possible. Jeffreys justified [the prior] on the grounds of invariance under certain changes of parameters; i.e. instead of using the parameter s, what prevents us from using t=s2 , or t=s3 ? Evidently, to assign a uniform prior probability density to s, is not at all the same thing as assigning a uniform prior probability to t; but if we use the Jeffreys prior, we are saying the same thing whether we use s or any power sm as the parameter. There is the germ of an important principle here; but it was only recently that the situation has been fairly well understood. When we take up the theory of transformation groups in Chapter 12, we will see that the real justification of Jeffreys' rule cannot lie merely in the fact that the parameter is positive; but that our desideratum of consistency in the sense that equivalent states of knowledge should be represented by equivalent probability assignments, uniquely determines the Jeffreys rule in the case when s is a scale parameter. Then marginalization theory will reinforce this by deriving it uniquely -- without appealing to any principles beyond the basic product and sum rules of probability theory -- as the only prior for a scale parameter that is completely uninformative about other parameters that may be in the model. These arguments and others equally cogent all lead to the same conclusion: the Jeffreys prior is the only correct way to express complete ignorance of a scale parameter.

from [1] also available on amazon.com.

--BenE 15:45, 30 July 2007 (UTC)