Jump to content

Talk:Benford's law: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 305: Line 305:




Hi, the article explains that charting the mathematical constants that you can see in the chart given that the constants also show the Benfords Law pattern BUT it says the chart uses "the first significant number" of the constant and therefore does that mean its excluding the first "0" of the constant because I don't know of any constants higher than 5 and that chart implies there's constants that start with 6,7,8, and even 9 and I do not know of any constant that starts with that high a number. Can the article explain that it's excluding the first "0" or am I reading the chart wrong?
Hi, the article explains that charting the mathematical constants that you can see in the chart given that the constants also show the Benfords Law pattern BUT it says the chart uses "the first significant number" of the constant and therefore does that mean its excluding the first "0" of the constant because I don't know of any constants higher than 5 and that chart implies there's constants that start with 6,7,8, and even 9 and I do not know of any constant that starts with that high a number. Can the article explain that it's excluding the first "0" of the constant or am I reading the chart wrong?





Revision as of 05:28, 25 July 2012

WikiProject iconMathematics B‑class Low‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
BThis article has been rated as B-class on Wikipedia's content assessment scale.
LowThis article has been rated as Low-priority on the project's priority scale.
WikiProject iconStatistics B‑class Low‑importance
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
BThis article has been rated as B-class on Wikipedia's content assessment scale.
LowThis article has been rated as Low-importance on the importance scale.

Explanation

The section entitled "Explanation" is difficult to understand. Please remember that Wikipedia is a general encyclopædia and is equally likely to be viewed by individuals of average mathematical knowledge as it is to be viewed by specialists. 68.49.208.76 06:30, 2 September 2007 (UTC)[reply]

You are correct - but accurate, easy-to-understand explanations of technical issues are very, very hard to write. Wikipedia is full of excellent articles that are useless to 99% of humanity for that very reason. And Benford's Law is particularly tricky to explain to laymen; it's so counter-intuitive. - DavidWBrooks 11:53, 2 September 2007 (UTC)[reply]

Please take a look into: http://www.dspguide.com/ch34.htm. In short, it explains the "law" as an artifact of the manipulation of the data. Pretty well written and easy to understand. Marco 12:51, 14 September 2008 (UTC)[reply]

I dunno. This is the heart of its argument, quoted:
In answer to our question, the logarithmic pattern of leading digits derives solely from sf(g) and the convolution, and not at all from pdf(g).
Putting that into layman's terms will be - well, interesting. Feel free to take a shot at it, though. - DavidWBrooks (talk) 12:54, 14 September 2008 (UTC)[reply]
When you get past all the making-a-lot-out-of-quantitative-details, he's just saying that if a probability distribution is broad and reasonably flat on a log-scale, then Benford's law is expected to hold. That's an easy point to explain and get across. Just put a broad, smooth distribution on a properly-labeled log scale. Maybe, for comparison, put a really sharp distribution on a log scale. Readers will be able to look at the log scale and see how Benford's law should hold in the first case but not the second.
The problem is that a "properly-labeled log scale" includes labels for 1,2,3,4,5,6,7,8,9,10,20,30,etc., and it's a bit tricky to do that in the programs I have. I'll try though.... --Steve (talk) 19:22, 14 September 2008 (UTC)[reply]
The logarithmic-scale probability density function for an exponential decay process. The area under the curve between two points is proportional to the probability that the function has a value between those two points. Note that this looks different from the conventional depiction of exponential decay. This is because the x-axis is distorted by the logarithmic scaling, so the height also has to be distorted for the area under the curve to be correct.
Here's a start. There's a published paper that says that exponential-decay probability functions satisfy Benford's Law to within a few percent. This matches up with the fact that the probability-density function is reasonably smooth over about two orders of magnitude, as shown in this pic. The text would describe how, since 30.1% of the horizontal (logarithmic) number line lies between 1's and 2's, it's not surprising that around 30.1% of the area under this particular curve lies between 1's and 2's. --Steve (talk) 20:50, 14 September 2008 (UTC)[reply]
OK, I decided against that particular picture but added the appropriate explanation and citations with diagrams. Hope it's helpful. --Steve (talk) 05:36, 9 October 2008 (UTC)[reply]
The new material looks good. Derek farn (talk) 11:12, 9 October 2008 (UTC)[reply]
Unfortunately, the new material is an original research. We therefore must forget it. Wikipedia is and advanced technology for collecting conservative garbage! --Javalenok (talk) 12:23, 24 October 2011 (UTC)[reply]
Not original research: The argument is the same as the cited article in "The American Statistician". [Well, the discussion above was in 2008, and the "American Statistician" article was in 2009. So if you had objected in 2008, you might have had a point... But right now it is definitely not original research.] --Steve (talk) 21:42, 24 October 2011 (UTC)[reply]

Dispersion should not be too small

normally never mentioned: the dispersion or variance should be not "to small". A kind of proof in nordisk Matematisk tidskrift from 1965 ( or almost) has that condition included in teh proof. —Preceding unsigned comment added by 130.226.230.8 (talkcontribs) 16:18, 16 May 2008

Error

I'm not convinced the log10 should change to log100 just because we look at two digits instead of one. The number is still base10.

Here's an example: Numbers that start with 1 should comprise 30.1% of the numbers. If we subdivide all the numbers beginning with 1 into 10,11,12,...19, we should expect these ten sub-numbers to add up to the 30.1% expectation of all numbers beginning with 1. Using Log10 (and NOT Log100) yields:

10 - 4.14%

11 - 3.78%

12 - 3.48%

13 - 3.22%

14 - 3.00%

15 - 2.80%

16 - 2.63%

17 - 2.48%

18 - 2.35%

19 - 2.23%


(summing the distributions)

yields 30.1%

. . .which is exactly what we would expect.

Using Log100, on the other hand, will yield only half of the expected value. You can duplicate this result for all the ranges 1-9.

Caleb B caleb@tcad.net —Preceding unsigned comment added by 69.29.42.173 (talk) 21:21, 10 June 2008 (UTC)[reply]

You are right. If you take log 100, then the cumulative probability is log_{100} (100) - log_{100} (10) = 1/2, whereas it should be 1. The right probability of a group of digits n (=10,...,99) is log_{10} (n+1) - log_{10} (n), which is also the probability mentioned in Hill's paper. 131.155.15.29 (talk) 09:46, 9 July 2008 (UTC)[reply]

scale invariance

I'm not sure the example with feet and yards is correct. I'm not a statistician, but what I understand based Harold Jeffreys book (1939; Chap. 3) is that the problem arises when you are dealing with scales that have different number of dimensions. There is no reason to think that a measurement squared should have a different distribution than a measurement cubed. Imagine that you have cubes of random sizes. There is no reason to think that the distribution of the volumes of the cubes should look different from the distribution of the areas. If the measurements gave a uniform distribution for the area (equal numbers of each digits), it would give a non uniform distribution for the volume and vice versa. The only distribution that doesn't change when you transform from the area to the volume (or take any other power) is to assign the uniform probability to the logarithm. This also illustrates why Bensford's law only applies to scales that can't be negative. A cube can't have a negative volume or a negative area. There is an asymmetry in things that can only be positive because when you go towards the negatives you hit a wall at zero whereas you can go to infinity towards the more positive values. Measurements that can be negative don't have this asymmetry. Going towards the negatives is the mirror of going towards the more positives and thus things tend to get distributed uniformly.--BenE (talk) 04:50, 6 December 2008 (UTC)[reply]

Strictly, cubes can have negative volume, if they can have negative sides. But the whole thing is related to scale invariance which happens to be associated with power laws and so in turn follows the solution to the dimension problem. Whether you prefer
log(2)-log(1) = log(6)-log(3), or
log(2)-log(1) = log(sqrt(2))-log(sqrt(1))+log(sqrt(20))-log(sqrt(10)) = log(cbrt(2))-log(cbrt(2))+log(cbrt(20))-log(cbrt(10))-log(cbrt(200))-log(cbrt(100))
as an illustration is a matter of personal choice, though I think the former is easier to understand in the context of first digits.--164.36.38.240 (talk) 10:26, 8 January 2009 (UTC)[reply]


The point about scale invariance should be trivially obvious. For things like river length or income there is no obvious natural unit to count in. We could measure river length in miles, kilometres, or any historical measure and the law still applies. Similarly income will usually be measured in the local currency but the law will hold true whether we use USD, Indian Rupees or gold ounces (for the same data). This is in contrast to things like population where the obvious unit to count in is people.

The "why" link looks like some individual is just showing their mathematical naivete. Either put an explanation there or leave it alone but asking "why" is basically saying "I don't get it" - which is fine but not something anyone else needs to know. —Preceding unsigned comment added by 88.104.110.246 (talk) 17:28, 1 March 2011 (UTC)[reply]

Paper on the arXiv

I have added a link to a paper on the arXiv, which discusses shannon-entropy and benfords law and whatnot. Perhaps someone might wish add something to the page from that, and move the link from "external links" to "references". —Preceding unsigned comment added by Paul Murray (talkcontribs) 00:08, 23 January 2009 (UTC)[reply]

Nonsense in the text

It is simply not true that the probability of something is just the area under a curve when drawn in logaritmic scale.

Making a substitution x = 10^y (y is, hence, the ordinate on the logarithmic axis) yields

I.e. the probability is the area under the curve f(y) = Ln[10] p(y) 10^y, which is completely different —Preceding unsigned comment added by 147.231.27.150 (talk) 14:19, 11 February 2009 (UTC)[reply]

Right, the transformation needs to be accounted for in the integrand. Where is the problem in the article? Baccyak4H (Yak!) 14:37, 11 February 2009 (UTC) improved OP math markup Baccyak4H (Yak!) 14:41, 11 February 2009 (UTC)[reply]
This is Footnote [5] in the article:
"Note that if you have a regular probability distribution (on a linear scale), you have to multiply it by a certain function to get a proper probability distribution on a log scale: The log scale distorts the horizontal distances, so the height has to be changed also, in order for the area under each section of the curve to remain true to the original distribution. See, for example, [1]"
--Steve (talk) 09:23, 12 February 2009 (UTC)[reply]

"Mathematical statement"

Benford's law is a loosely-formulated implication: it says that if you consider numbers drawn from certain natural sources, then their first digits will conform to a certain specific distribution. The section called "mathematical statement" gives a precise formulation of the conclusion. Making the hypothesis mathematically precise is much more problematic. It's claimed that Hill was the first to give a precise mathematical proof of Benford's law, but in fact what he did was to give a mathematical proof of a mathematical statement, which one may or may not agree captures the essence of Benford's law.

By the way, within the article it's hard to figure out what earlier material is cited by the claim that "Ted Hill proved the result about mixed distributions mentioned above." The word "mixed" doesn't appear elsewhere in the article, and I'm guessing that the reference is to the section on multiple probability distributions. Ishboyfay (talk) 03:44, 20 February 2009 (UTC)[reply]

I agree, if we say that Benford's law is an approximate empirical statement, then it can't be mathematically proven as such. This calls for more careful phrasing than we have now. For your second question, yes it's the multiple probability distributions, I put in a link to clarify. :-) --Steve (talk) 07:23, 20 February 2009 (UTC)[reply]
(I'm basically agreeing with the above.) The first sentence of the misnamed section Mathematical statement reads:
"More precisely, Benford's law states that the leading digit d (d ∈ {1, …, b − 1} ) in base b (b ≥ 2) occurs with probability P(d)=logb(d + 1) − logbd = logb((d + 1)/d)."
But as much as this "kind of" states it "correctly", this is not a mathematical statement at all, since it is extremely unclear what the word "probability" means here. Alas, the word "probability" has no meaning at all here.
I'm not saying that Benford's law *cannot* be stated mathematically, but that is a very slippery endeavor.
I strongly recommend that Wikipedia stick to true statements and avoid false, and -- like the above sentence -- meaningless ones.Daqu (talk) 18:26, 11 May 2009 (UTC)[reply]

Prime numbers may follow benfords law

http://www.physorg.com/news160994102.html —Preceding unsigned comment added by 208.71.237.254 (talk) 17:58, 11 May 2009 (UTC)[reply]

Not really. The first digits of primes up to 10n are fairly evenly distributed for large n. The picture is different for primes up to 2×10n for large n or 3×10n etc. but even then it does not approach Benford's law. --Rumping (talk) 12:07, 20 November 2009 (UTC)[reply]

Income differences?

Can you really count income to "distributions that cover many orders of magnitude rather smoothly"? Is there a significant portion of the population that has ten times more, and ten times less, income than the average?Mumiemonstret (talk) 12:59, 25 September 2009 (UTC)[reply]

Well in the US, see here, 80% earn 15,000 to 150,000 dollars. So yeah, I guess that's mostly within one order of magnitude. Can you think of a better example? Or we could soften the wording: "distributions that span several orders of magnitude rather smoothly" :-) --Steve (talk) 03:02, 26 September 2009 (UTC)[reply]

Linkfarm cleanup

I saw my removal of the WP:LINKFARM was partially reverted, so I wanted to make it clear why I pulled out the links that have been restored. The primary problem is that they all run afoul of

The primary problem is with ELNO#1: "Any site that does not provide a unique resource beyond what the article would contain if it became a featured article," which I would say describes all the remaining links. I don't think any of them contain anything special that would be unavailable if someone took the time to flesh out this article. Some of them would definitely be worth using to add or cite material, though, which would improve the article and have the nice side effect of retaining the links.

Some of the links have problems beyond ELNO#1, though:

5. Links to web pages that primarily exist to sell products or services, or to web pages with objectionable amounts of advertising. For example, the mobile phone article does not link to web pages that mostly promote or advertise cell-phone products or services.
11. Links to blogs, personal web pages and most fansites, except those written by a recognized authority. (This exception is meant to be very limited; as a minimum standard, recognized authorities always meet Wikipedia's notability criteria for biographies.)
11. Links to blogs, personal web pages and most fansites, except those written by a recognized authority. (This exception is meant to be very limited; as a minimum standard, recognized authorities always meet Wikipedia's notability criteria for biographies.)
(Additionally, it seems likely that this link violates the policy's directive not to link to pages that violate copyright law.)
8. Direct links to documents that require external applications or plugins (such as Flash or Java) to view the content, unless the article is about such file formats. See rich media for more details.
13. Sites that are only indirectly related to the article's subject: the link should be directly related to the subject of the article. A general site that has information about a variety of subjects should usually not be linked to from an article on a more specific subject. Similarly, a website on a specific subject should usually not be linked from an article about a general subject. If a section of a general website is devoted to the subject of the article, and meets the other criteria for linking, then that part of the site could be deep linked.
8. Direct links to documents that require external applications or plugins (such as Flash or Java) to view the content, unless the article is about such file formats. See rich media for more details.
5. Links to web pages that primarily exist to sell products or services, or to web pages with objectionable amounts of advertising. For example, the mobile phone article does not link to web pages that mostly promote or advertise cell-phone products or services.
8. Direct links to documents that require external applications or plugins (such as Flash or Java) to view the content, unless the article is about such file formats. See rich media for more details.
8. Direct links to documents that require external applications or plugins (such as Flash or Java) to view the content, unless the article is about such file formats. See rich media for more details.

There is one notable exception, the Benford Online Bibliography. The front page, which is what we linked to, doesn't have very much information, but clicking through provides some great resources. I should have been more careful to keep it the first time.

Thoughts? — Bdb484 (talk) 21:14, 4 February 2010 (UTC)[reply]

I have done quite a lot of spam cleanup, and in general I would be inclined to agree with you. But for this particular article, the links are helpful and each has a very high content-to-noise ratio. Benford's law is a very strange observation and each of the external links provides some useful insight. We could spend a couple of hours here and reluctantly agree to remove maybe two or three of the links – what particular benefit would arise from that? I know about WP:OTHERSTUFF, but the time spent rearranging entirely innocent links here would have much better effect by cleaning some real spam, for example, WT:WikiProject Spam. For an example of a true linkfarm, see here (now cleaned up). In summary, all Wikipedia's procedures involve the application of common sense (with very few exceptions, see WP:5P), and the current external links are not linkspam and they each have different but useful information that assists the reader, so my opinion is that none should be removed. If you really want, the couple of links requiring Java or whatever can be flagged, however none of the links go directly to a page that requires some application to see what the page offers. Johnuniq (talk) 00:49, 5 February 2010 (UTC)[reply]
I agree with Johnuniq. All of the current links are very useful to the reader (myself included). In fact, I would be in favour of adding a few more, as long as the help to further illustrate this peculiar ratio. It pops up all over the place. A list of places (with associated links) would further this article. --Thorwald (talk) 01:54, 5 February 2010 (UTC)[reply]
I'm 100 percent with you on the primacy of common sense over Wikipedia "rules," but this isn't a case where the two conflict. If the links are that helpful, then the information can be pulled into the article and cited. As it stands, the external link section is no more useful than googling Benford and seeing what pops up. Thanks to the Benford Bibliography link, we already have a collection of high-quality links that easily surpasses what we have here.
The benefit from working out which links should stay and which should go is simple: improving the article. If you find something particularly useful in one of these links, then go ahead and add it to the article with a citation. Then the reader has access to more information that is better organized -- all without losing links to the pages in question.
That way, everybody gets what they want, no? — Bdb484 (talk) 02:10, 5 February 2010 (UTC)[reply]
Sounds good. I agree that is a better approach. Just don't delete the links until we have time to include the relevant information in the article (with citations). --Thorwald (talk) 02:26, 5 February 2010 (UTC)[reply]
My instinct is to agree that useful content from an external link should be incorporated into the article, with the link used as a reference – that is one of the first things we say to people who add links to their website on fifty different articles. However, I think this topic has rather unusual attributes that make that procedure unworkable since most of the external links have too much detail for a general article here, yet they each have something useful to say. The Benford Online Bibliography link that you moved to the top of the list is only of interest to a serious researcher (I would be inclined to restore it to the "More mathematical" section). Johnuniq (talk) 02:35, 5 February 2010 (UTC)[reply]
I'm not in any rush to pull any of the links. They were initially pulled down as part of WP:BRD cycle, so I'm happy to give anyone plenty of time to work it out.
From my review of the bibliography, it seems that it has plenty for both the lay reader and the experienced mathematician. For example, the first link they offer is to the Radio Lab segment, which presented a very accessible introduction to Benford. It also provides a lot of links to plain-language news coverage from the Wall Street Journal, the New York Times, Washington Post. I'd be inclined to leave it where it is, but I wouldn't object if you feel strongly about it. — Bdb484 (talk) 02:45, 5 February 2010 (UTC)[reply]
I could be mistaken, but it seems like there hasn't been anything done in the way of clean-up in the two months-plus since we talked about this. Is anybody working on this? — Bdb484 (talk) 20:03, 6 April 2010 (UTC)[reply]
Above I have explained that your suggestion, while admirable in general, is difficult to implement (and possibly unhelpful) in this particular case. You could try getting more opinions at WP:ELN. Johnuniq (talk) 03:17, 7 April 2010 (UTC)[reply]
I do remember you offering that opinion, though I don't remember you offering anything to substantiate it. Just the same, when it was requested that I allow some time for the relevant material to be included in the article, I was happy to do so. After more than two months, no apparent effort has been made to that end.
The folks at ELN generally support abiding by WP:ELNO, so I'm not sure what more they would have to offer to the discussion. But if you think they'd believe there's a reason to disregard the rules that are good enough for every other article, you should probably take that route yourself, as the burden for establishing a reason to keep each of these links falls on you. If not, though, I'll be happy to handle the EL clean-up myself. — Bdb484 (talk) 04:16, 7 April 2010 (UTC)[reply]

I compared the current external links list with what existed one year ago. In the last 12 months:

  • Eight links are the same (with some tweaking, but same site).
  • Four links have been removed ([2], [3], [4], [5]).
  • Three links have been added:
  1. Benford Online Bibliography, an online bibliographic database on Benford's Law.
  2. Benford’s law, Zipf’s law, and the Pareto distribution by Terence Tao
  3. From Benford to Erdös, WNYC radio segment

The first new link seems essential, the second is by Terence Tao which automatically qualifies it, and the third seems hard to disagree with (although I have not heard it). Given that eight links have been considered satisfactory for at least a year, and the three new ones seems totally suitable, I do not see why any should be pruned. I notice that Sbyrnes321 reverted your removal of the links, and Thorwald posted in agreement with keeping the links above (although the second post supports conversion cited material). I think it fair to conclude that consensus favors keeping the links. Johnuniq (talk) 08:52, 7 April 2010 (UTC)[reply]

Example #2

Here is an xample using factorials. From OEIS A008905 ref, Noe has a list of the first 1000 or so leading digits of the consecutive factorials. Here is the distribution that is seen from that sample vs Benford:

digit Benford n!
1 0.30 0.29
2 0.18 0.18
3 0.13 0.12
4 0.10 0.10
5 0.08 0.07
6 0.07 0.09
7 0.06 0.05
8 0.05 0.05
9 0.05 0.05

--Billymac00 (talk) 04:53, 25 April 2010 (UTC)[reply]

Bizarre crashing problems

This is really weird, but I've found that, for some reason (at least on my computer, an iMac running Mac OS X Snow Leopard), this page crashes when viewed in Chrome, but the talk page loads fine. Meanwhile, Safari can load the page itself, but not the talk page. I haven't tried any other browsers or operating systems, but I'd still like to know if anyone else has encountered this and if any explanation is known. 75.69.192.96 (talk) 05:00, 23 May 2010 (UTC)[reply]

Best to post this at WP:Village pump (technical) (with a link to Benford's law). Johnuniq (talk) 05:08, 23 May 2010 (UTC)[reply]


This crashing issue is happening for me, too. Only the Benford's Law page, and the page loads fine, but the moment you attempt to scroll down the thread crashes. It happens with the current Chrome and Chrome-dev release, as well as the current daily build of Chromium. Oh, and Safari and Firefox, too. apraetor —Preceding undated comment added 01:43, 16 June 2010 (UTC).[reply]

  • I'm in Safari and Chrome 5 on 10.6.4 and I don't see a problem with either page. But I would post to VPT if you are sure that this problem is unique to this page and that browser setup. 184.59.8.54 (talk) 18:52, 16 June 2010 (UTC)[reply]
    I was going to post on VPT on behalf of the two people with the problem, but when I searched that page for "crash" I saw that similar complaints have been made, with the reply that it is a browser bug and the supplier of the browser should be notified. Johnuniq (talk) 00:29, 17 June 2010 (UTC)[reply]

Explanation 2

see also discussion above

I just read in a post the following: "Benford’s law arises naturally if the data under consideration span several orders of magnitude—for example, the first digits of the powers of two obey Benford's law" – this seems to me much more intuitive than the current "This distribution of first digits arises whenever a set of values has logarithms that are distributed uniformly, as is approximately the case with many measurements of real-world values". Are they equivalent, or perhaps complementary? For me, as a layman, the "spanning orders of magnitude" thing made much more sense and instantly gave me a vague but intuitive grasp at why the law works. Do you think it could be integrated in the lead? --Waldir talk 00:55, 15 December 2010 (UTC)[reply]

Phrasing like that is in Benford's law#Limitations, but not in the lead right now. I agree, it should be. --Steve (talk) 01:10, 15 December 2010 (UTC)[reply]
I'm not very familiar with this topic, so I'd rather not add it myself. Do you think you could do that? --Waldir talk 15:57, 17 December 2010 (UTC)[reply]

Graphic caption wrong?

A logarithmic scale bar. Picking a random x position on this number line, roughly 30% of the time the first digit of the number will be 1.

I understand this (the 1 to 2 zone is bigger), but isn't that only due to the scale of the graph? If an x is chosen random visually by distance, then I agree, but if it is a random x variable, it should over time hit all the digits equally no? —Preceding unsigned comment added by 216.7.125.201 (talk) 17:09, 28 January 2011 (UTC)[reply]

The caption should be reworded to be clearer. Maybe "Roughly 30% of this line consists of numbers that begin with the digit 1: Numbers between 0.1 and 0.2, between 1 and 2, between 10 and 20, etc."? What do you think?
Also the picture is pretty hard to read. There doesn't seem to be a better option: [6]. It should really be SVG... --Steve (talk) 20:04, 28 January 2011 (UTC)[reply]

Scale invariance argument

It appears to be based on the paper of R.S. Pinkham. On the distribution of first significant digits. Ann. Math. Statist., 32:1223{1230, 1961. (open access) But someone should check the details before adding it as source. Tijfo098 (talk) 02:30, 23 March 2011 (UTC)[reply]

I was right, see slide #26 here. Probably a more citable secondary ref exists somewhere. Tijfo098 (talk) 22:28, 23 March 2011 (UTC)[reply]

Election fraud

Quite amusing to see that exact same "magic box" claimed to prove fraud in Iranian elections can be used to prove that Obama "stole" the US pres. elections. Deckert, Myagkov and Ordeshook go over this in quite some detail. The basic error in using BL as a "magic box" is the assumption that voters are iids; especially not true in small precincts. Heck, they even show that using that analysis one concludes that elections have been stolen in some US precincts for decades in a row. Tijfo098 (talk) 03:50, 23 March 2011 (UTC)[reply]

Amusing...? I tend to find that thing quite depressing... blatant misunderstandings like that. JaeDyWolf ~ Baka-San (talk) 15:47, 23 March 2011 (UTC)[reply]

Explanations

I think that sections 4.1 and 4.2 should be reversed in order. The reason is that it is the scale invariance that is the primary reason for this phenomenon. It's true that exponential growth processes display the phenomenon, but the reason is that exponential growth processes is one of a number of ways that scale invariance can arise. Putting exponential growth processes in first place overemphasizes this particular way of obtaining the phenomenon at the expense of the more primary underlying reason.

I was led to look at the article because of a recent entry on Andrew Gelman's statistics blog where Benford's law was discussed. One of the comments indicated that it was exponential growth that was the cause of the phenomenon (I responded just below it about the more general reason). It may be that the person who made that comment had read part, but not all of the WikiPedia entry (which had been cited by Andrew Gelman), and came away thinking that exponential growth was "the" explanation. Reversing the order of the two entries would put the early emphasis where it belongs: On the scale invariance. Bill Jefferys (talk) 13:18, 13 October 2011 (UTC)[reply]

I disagree that scale invariance is "the primary reason for this phenomenon". The height of human adults does not satisfy Benford's law when the heights are measured in feet, and it also does not satisfy Benford's law when the heights are measured in meters. I don't see anything within the "scale invariance" argument that would explain why Benford's law does apply to the lengths of rivers but does not apply to the heights of human adults. If something deserves to be called "the primary reason for this phenomenon", it should of course explain both why it works when it works, and why it doesn't work when it doesn't work. I think the real "primary explanation" is the one in the "Limitations" section (poor organization there), although of course I'm biased. :-) --Steve (talk) 17:21, 20 October 2011 (UTC)[reply]
Scale invariance obviously only applies when the phenomenon being measured involves numbers varying by at least several orders of magnitude. This is the case with rivers, and not the case with human heights, which don't vary even over one order of magnitude, considering newborn infants and basketball players at the extremes.
My point is that exponential growth processes is only one mechanism that produces scale invariance, so that putting it first is misleading (see the Gelman blog entry). It should be second (and your comment doesn't refute this). Bill Jefferys (talk) 17:54, 20 October 2011 (UTC)[reply]
"Scale invariance obviously only applies when the phenomenon being measured involves numbers varying by at least several orders of magnitude." This is not "obvious" based on the wikipedia article explanation as currently written. (At least, not obvious to me!) The current explanation just says, "if it is indeed true that the first digits have a particular distribution, it must be independent of the measuring units used", but does not say a word about why or whether "it is indeed true that the first digits have a particular distribution". You seem to have a deeper understanding of the "scale invariance argument" than just what's written here now, and I hope you take some time to improve that part of the article. :-) --Steve (talk) 21:42, 24 October 2011 (UTC)[reply]
Actually, the "limitations" section you pointed to gives a reasonably good discussion of why the phenomenon has to vary over several orders of magnitude.
I'm thinking that a (perhaps) retitled "limitations" section might come before the section we're discussing, and then reorder the two entries in that section to put the exponential growth processes second last. Again, my motivation here is that the person who commented on Andrew Gelman's blog may well have read that section, halfway through, decided that exponential growth was the explanation, when the whole issue is much more than that since exponential growth is only one way that the phenomenon can arise, and there are more fundamental considerations, as both you and I point out. Bill Jefferys (talk) 23:09, 24 October 2011 (UTC)[reply]

Scale invariance partly circular

The scale invariance section says, "The law can alternatively be explained by the fact that, if it is indeed true that the first digits have a particular distribution, it must be independent of the measuring units used (otherwise the law would be an effect of the units, not the data). "

It's a proof by contradiction, assuming that "the law is an effect of the units" is false, but it never proves this step. We clearly need better sources for this section. Superm401 - Talk 23:28, 31 January 2012 (UTC)[reply]

Question about distributions

The article states that the law holds only for data sets whose logarithms are uniformly and randomly distributed. It further states that data sets following normal distributions would not follow the law. That makes sense, but the text does not elaborate on what kind of data sets would have their logarithms uniformly and randomly distributed. Do I understand correctly that a set of completely random numbers would not follow this law? If so, what properties should a set of data have in order for the logarithms of those data to be uniformly and randomly distributed? In other words, what is it exactly that makes real-life financial data, for example, not distributed in uniform matter? I think a short sentence addressing this would do this otherwise helpful article a lot of good. Any takers?—Ëzhiki (Igels Hérissonovich Ïzhakoff-Amursky) • (yo?); March 29, 2012; 17:32 (UTC)

A simple set that satisfies Benford's and trivially has a uniformly distributed logs is just an = e^xn, where xn is randomly distributed in the range of [0, 10]. The an then span several decades and satisfy Benford's. I need to check this, but I'm pretty sure normally distributes data does follow Benford's -- but only if the width of the distribution spans several decades, which is unlikely for a single stock to do. This is kinda a tricky issue but I'll see what I can do on it later. a13ean (talk) 14:47, 30 March 2012 (UTC)[reply]
Thanks, this helps somewhat and I'll be looking forward to your addition. By the way, "e" in "e^xn" above is the Euler's number, correct? If so, can't it be any other arbitrarily selected constant?—Ëzhiki (Igels Hérissonovich Ïzhakoff-Amursky) • (yo?); March 30, 2012; 15:01 (UTC)
It is Euler's number, and it's an arbitrary choice here just because it plays nicely with the natural log. There's lots of other functions that are mostly log-distributed across several decades and similarly follow Bedford's. a13ean (talk) 15:45, 30 March 2012 (UTC)[reply]


MORE QUESTION ABOUT THE BENFORD'S LAW DISTRIBUTION


Hi, the article explains that charting the mathematical constants that you can see in the chart given that the constants also show the Benfords Law pattern BUT it says the chart uses "the first significant number" of the constant and therefore does that mean its excluding the first "0" of the constant because I don't know of any constants higher than 5 and that chart implies there's constants that start with 6,7,8, and even 9 and I do not know of any constant that starts with that high a number. Can the article explain that it's excluding the first "0" of the constant or am I reading the chart wrong?


173.238.43.211 (talk) 05:26, 25 July 2012 (UTC)[reply]