Jump to content

Wikipedia:Articles for deletion/Statsmodels

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Josefpktd (talk | contribs) at 18:06, 23 February 2014 (→‎Statsmodels). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Statsmodels (edit | talk | history | protect | delete | links | watch | logs | views) – (View log · Stats)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

Notability not established. The main source, a SciPy conference paper, has been cited only three times according to GScholar. The other source is the topic's website. QVVERTYVS (hm?) 13:12, 22 February 2014 (UTC)[reply]

  • statsmodels is used in industry and research without always citing it, for example

Dabdoub, S. M., A. A. Tsigarida, and P. S. Kumar. 2013. “Patient-Specific Analysis of Periodontal and Peri-Implant Microbiomes.” Journal of Dental Research 92 (12 suppl): 168S–175S. doi:10.1177/0022034513504950. Quote:"Single and multiple comparisons of distributions were carried out with the statistical facilities provided by JMP (SAS Institute Inc.), as well as the Python libraries SciPy, pandas, and statsmodels." — Preceding unsigned comment added by 96.127.225.218 (talk) 14:14, 22 February 2014 (UTC)[reply]

That paper only has one citation. We need something better to satisfy WP:NSOFT. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
  • statsmodels is a established tool used by many researchers, including Nobel Prize winners. Many researchers use it without giving it proper credit in their publications. It is part of the Enthought distribution package for scientists: [1]. Nobel Prize Laureate Prof. Thomas Sargent mentions it in his website as one of the most useful Python modules: [2]. It is part of the open source movement, and it would be a mistake for Wikipedia to remove this article. Matplotlib (talk) 15:05, 22 February 2014 (UTC)[reply]
That webpage only mentions statsmodels once, in a list, and WP:NSOFT clearly states that "Inclusion of software in lists of similar software generally does not count as deep coverage" and is not sufficient to establish notability. The rest of your argument is irrelevant, I'm afraid. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
I respectfully disagree. There are well over 10,000 python modules, and he is only citing 4 modules. Clearly, it is a great endorsement by one of the most relevant academics of our time. Econometricians reading this discussion would be rolling their eyes. Matplotlib (talk) 02:22, 23 February 2014 (UTC)[reply]
Ok, that might be useful. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
I collected the list from what I found with Google Scholar. There are two kinds of articles, those that use parts of statsmodels and usually mention the statsmodels homepage in brackets or a footnote. The second kind mentions statsmodels for the python eco-system and in some cases for further analysis. I will add more comments about this. — Preceding unsigned comment added by Josefpktd (talkcontribs) 14:49, 23 February 2014 (UTC)[reply]
I found another one that was not on Google Scholar: they mention using Python and R in the main article, but statsmodels is only cited in the Supplementary Material which is not indexed by Google Scholar, as far as I can see. http://bioinformatics.oxfordjournals.org/content/29/14/1825.full?sid=46bb91f0-38f6-493c-a38c-c202b0dbfc34 — Preceding unsigned comment added by Josefpktd (talkcontribs) 15:46, 23 February 2014 (UTC)[reply]
  • statsmodels is just a traditional statistics and econometrics package written in Python with less coverage than R or Stata but covers most of the commonly used models and hypothesis tests (together with scipy.stats.) There is no hype associated with it. For a bit of background see http://stats.stackexchange.com/questions/47913/pandas-statsmodel-scikits-learn/48578#48578

    The number of articles that use or mention statsmodels shows that statsmodels has found acceptance in the research communities of various fields. Of course the citation or usage count is much smaller than the one of long established packages like R or Stata. We, statsmodels developers, never emphasized getting citations. As pointed out on our mailing list, we don't even have the conference article citation displayed prominently on the documentation website. Statsmodels is also used in a few university courses for using python in the field, but I don't have a list of those.

    Eco-system: Referring to "It is not unreasonable to allow relatively informal sources for free and open source software, if significance can be shown" WP:NSOFT.
    I think what Matplotlib pointed out in the comment above is important. statsmodels is an established and important part for the python in science and python for data analysis ecosystems. Numpy, scipy, pandas, scikit-learn, statsmodels, pymc, and some others, are the general purpose packages, which are complemented by field or application specific packages. So, often lists for recommended packages will include statsmodels and the other main packages. statsmodels has participated in each of the last five years in the Google Summer of Code under the umbrella of the Python Software Foundation, the first year or two as a scipy project.
    Statsmodels is included in all science oriented python distributions, but most of the "spreading the word" goes through blogs and mailing lists.
    One example as illustration: http://www.automatedtrader.net/articles/software-review/144328/utopian-quantopian reviews an open source package in finance written by a startup. (Automated Trader Magazine Issue 30 Q3 2013) It mentions statsmodels next to scipy, and then points out a limitation of statsmodels in the next paragraph.

    Statsmodels is treated as a tool library, which is necessary but does not require special emphasis.