Talk:Cumulative frequency analysis

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.


This article was first made on 11:56, 12 June 2007 (UTC)

various problems[edit]

I have a number of qualms about the way this article is now written. For now I'll limit myself to the easy one: style of mathematical notation. Consider this:

When the data are arranged in descending order, the maximum first and the minimum last, and Rd is the rank number, the cumulative frequency is written as FP<X = Rd/(n+1).
When the data are arranged in ascending order, the minimum first and the maximum last, and Ra is the rank number, the cumulative frequency is written as FP<X = 1-Ra/(n+1).

Here's a way of writing it that is consistent with conventional Wikipedia usage:

When the data are arranged in descending order, the maximum first and the minimum last, and Rd is the rank number, the cumulative frequency is written as FP < X = Rd/(n + 1).
When the data are arranged in ascending order, the minimum first and the maximum last, and Ra is the rank number, the cumulative frequency is written as FP < X = 1 − Ra/(n + 1).

Contrast three styles with each other:

FP<X = 1-Ra/(n+1).
FP < X = 1 − Ra/(n + 1).

The second and third are consistent with Wikipedia's conventions. The second fits well with "inline" math notation; the third with "displayed" math notation. The third form has been known to cause various problems when used in an "inline" as opposed to "displayed" setting.

To be continued.... Michael Hardy 20:09, 18 June 2007 (UTC)

PS: See also Wikipedia:Manual of Style (mathematics). More later... Michael Hardy 20:17, 18 June 2007 (UTC)

Thanks Michael Hardy. I see your point. I am new to Wikipedia + I am learning + I will do my best to adjust.
R.J.Oosterbaan 07:44, 19 June 2007 (UTC)

Changes were made on the basis of Michael Hardy's comments. R.J.Oosterbaan 22:47, 19 June 2007 (UTC)


This article should be renamed to something like "Cumulative Frequency Analysis" or "CDF Estimation" since there is already a Cumulative Distribution Function article.--Adoniscik (talk) 22:49, 27 January 2008 (UTC)

To begin with I have added an internal link to Cumulative Distribution Function under See Also . R.J.Oosterbaan (talk) 19:16, 28 January 2008 (UTC)
The page Cumulative frequency is renamed to Cumulative frequency analysis and the original page Cumulative frequency is now a disambiguation page linking to Cumulative Distribution Function and the renamed page. R.J.Oosterbaan (talk) 23:21, 5 February 2008 (UTC)


I very much doubt the solid base of this article. It seems a rather laymen like summing up of several techniques, being described better elsewhere. (talk) 21:22, 21 June 2010 (UTC)

I've had similar suspicions about this article, but I haven't looked it over carefully yet. Michael Hardy (talk) 22:32, 21 June 2010 (UTC)
From a brief skim-read, it strikes me it's a disguised advert for a particular software package. Qwfp (talk) 07:23, 22 June 2010 (UTC)
This last comment might apply to the section marked, but not to the whole article I think. The main part of the article is rather a single-author affair, seemingly starting from a rather isolated or first principles approach within a particular field of application (which is at least partly suppressed). It probably does need to be merged with other existing material here .... one possiblity would be univariate analysis ... at least both of these seem to be looking for a better role to play. Melcombe (talk) 08:53, 22 June 2010 (UTC)
I also want to express my doubts about this article. The only relevant thing is the empirical cumulative distribution, which has its own article. The rest is about estimating probabilities, be it presented in a rather clumsy way. Nijdam (talk) 10:03, 2 February 2011 (UTC)


The article was reorganized and the software list was extended. R.J.Oosterbaan (talk) 19:50, 28 June 2010 (UTC)


All the above mentioned criticism still holds. Let me try stepwise to come to understanding. First question: what has fitting to a probability distribution to do with the subject?Nijdam (talk) 21:19, 29 October 2011 (UTC)

  • Strange question: the fitting of a cumulative frequency distribution to a cumulative probability distribution is part of the cumulative frequency analsysis (the title of this article) because it allows interpolation and extrapolation of the data series, which may be useful in many branches of science and engineering.
  • Nijdam removed the section "Ranking" from the article claiming it was irrelevant. Well, by ranking data one can obtain the cumulative frequency of the data using the rank number, and the determination of the cumulative frequency is, logically, part of the cumulative frequency analysis, hence highly relevant. Nijdam needs to restore the removed section rapidly. — Preceding unsigned comment added by (talk) 14:11, 25 November 2011 (UTC)


I placed the tag "Disputed" as nothing has been done since I first critisized this article. The term "Cumulative frequency analysis" is no commonly term in statistic literature. The article itself is a gathering of some statistical terms and methods, brought in a very simplified and unprecise manner. I think the targettted ananlysis is something like 'extreme values' and possibly the 'emperical distribution function', for which articles already exist.Nijdam (talk) 07:31, 8 May 2014 (UTC)

The notion "frequency" is well known and it can be analysed, so there is nothing wrong with the notion "frequency analysis". The same holds for "cumulative frequency" and "cumulative frequency analysis". All these terms are very normal and common. Therefore these terms can be used in Wikipedia. Regarding the expression "brought in a very simplified and unprecise manner", see my observations in the next section. Asitgoes (talk) 10:13, 19 May 2014 (UTC)
Frequency, cumulative and analysis are indeed well known terms, but this does not imply that all combination of them also are well known or meaningful.Nijdam (talk) 09:03, 21 May 2014 (UTC)

Agree with disputed[edit]

To add, I've been reading the literature of the second footnote that was accessible online, a paper published by Wageningen University. The latter paper unfortunately suffers from the same problems as this article on Wikipedia. Basically, they don't use key concepts of mainstream statistical theory, such as hypothesis testing and statistical modelling, but instead they invent a slang of their own. Marcocapelle (talk) 16:33, 11 May 2014 (UTC)

The above reference stems from an engineering handbook. It deals with Applied statistics. The Wikipedia article on applied statistics starts with: "Statistics is the study of the collection, organization, analysis, interpretation and presentation of data".
That is exactly what the disputed article is about.
In the article Applied science one reads: "Applied science is a discipline of science that applies existing scientific knowledge to develop more practical applications, such as technology or inventions. Within natural science, disciplines that are basic science, also called pure science, develop information to predict and perhaps explain—thus somehow understand—phenomena in the natural world. Applied science applies the basic science toward practical endeavors. Applied science is typically engineering, which develops technology, although there might be feedback between basic science and applied science: research and development (R&D)".
Hence, the methods used in “applied science”, which is also science even though it may deviate somewhat from "pure sience", are perfectly legitimate.
The article Applied physics gives the following definition: "Applied physics is physics which is intended for a particular technological or practical use. It is usually considered as a bridge or a connection between "pure" physics and engineering. "Applied" is distinguished from "pure" by a subtle combination of factors such as the motivation and attitude of researchers and the nature of the relationship to the technology or science that may be affected by the work".
Similarly, “applied statistics” is distinguished from “pure statistics” by a subtle combination of factors such as the motivation and attitude of researchers and the nature of the relationship to the technology or science that may be affected by the work. In the article under discussion one can find that combination. Only the reader who expects the disputed article to be concerned with pure statistics and who rejects applied statistics will be disappointed in it.
Further, the lemma Applied mathematics states that: "Applied mathematics" is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge. The term "applied mathematics" also describes the professional specialty in which mathematicians work on practical problems; as a profession focused on practical problems, applied mathematics focuses on the formulation and study of mathematical models. In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics, where mathematics is developed primarily for its own sake. Thus, the activity of applied mathematics is vitally connected with research in pure mathematics".
In agreement wih this, “pure statistics” is developed primarily for its own sake, whereas “applied statistics” focusses on practical applications, as is done in the present article.
Contrary to Marcocapelle's statements (hypothesis testing and statistical modeling are not used), this articles pays ample attention to confidence analysis for hypothesis testing and distribution fitting for statistical modeling.
Marcocapelle also believes that: "they invent a slang of their own". Indeed, the article tries to avoid the dialect of "pure statisticians" as much as possible for the benefit of the reader who is not a professional statistician, and it tries to use common language wherever feasible. The word "slang" seems to be misplaced here.
Asitgoes (talk) 10:13, 19 May 2014 (UTC)


Thank you, that confirms Nijdam's earlier conclusion: "The article itself is a gathering of some statistical terms and methods, brought in a very simplified and unprecise manner." Marcocapelle (talk) 20:47, 26 May 2014 (UTC)
Time to remove the article? Nijdam (talk) 09:35, 28 May 2014 (UTC)
I'm not that familiair with formal procedures on Wikipedia yet. If I understood correctly (...) the page is deleted automatically after confirmatory judgment by an independent moderator after a discussion on a dedicated deletion discussion page. Marcocapelle (talk) 20:49, 28 May 2014 (UTC)
Sorry, I meant: do you agree with my proposal for deletion? Nijdam (talk) 09:33, 30 May 2014 (UTC)

Copy fom Wikipedia:Requests for undeletion[edit]

Google search for this common tool of mathematics yields 2.9M results, the page was informative and instructive when I last reviewed and reason given was "Non Existing Subject" which I cannot understand - (talk) 17:55, 15 January 2015 (UTC)

Yes check.svg Done - as a contested proposed deletion, the article has been restored on request. I will notify user Nijdam (talk), who proposed it, and who may choose to nominate it at WP:Articles for deletion, which would start a debate lasting seven days, to which you would be welcome to contribute. JohnCD (talk) 19:14, 15 January 2015 (UTC)
The Google search includes sources with only teh terms 'cumulative', 'frequency', and 'analysis'. Searching for the combination "cumulative frequency analysis" results in 9 hits, 3 of them from Wikipedia, and none of them referring to a relevant theory. Nijdam (talk) 13:25, 18 January 2015 (UTC)
Before I start a nomination for deletion, I ask the advocates to show me a serious source that shows the existence of the subject as a theory and explains the meaning of the subject.Nijdam (talk) 13:36, 18 January 2015 (UTC)
Searching for "cumulative frequency analysis" in Google books yields 2,520 results and in Google scholar yields 305 results, so the term does seem to be used in the literature. As far as I can tell, "cumulative frequency distribution" is a somewhat rare synonym for "empirical distribution function" used in some circles, e.g., [1]. So I would think that at the very least, a redirect from this article to Empirical distribution function would be warranted.
Beyond the definition, it seems like there should be a place on WP for a discussion of estimation techniques, confidence intervals, and hypothesis tests (like the KS test) regarding empirical distribution functions. I agree that this article is idiosyncratic, in part because there are few sources that discuss CFA as a topic separate from eCDF analysis. Perhaps the discussion of the statistics of eCDFs would be better placed in the Empirical distribution function, as it already has some convergence results. --Mark viking (talk) 20:27, 1 March 2015 (UTC)
I would agree with a redirect of this term to Empirical distribution function . Nijdam (talk) 15:12, 2 March 2015 (UTC)