Jump to content

Wikipedia:Articles for deletion/Statsmodels

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Josefpktd (talk | contribs) at 20:02, 24 February 2014 (→‎Statsmodels). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Statsmodels (edit | talk | history | protect | delete | links | watch | logs | views) – (View log · Stats)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

Notability not established. The main source, a SciPy conference paper, has been cited only three times according to GScholar. The other source is the topic's website. QVVERTYVS (hm?) 13:12, 22 February 2014 (UTC)[reply]

  • statsmodels is used in industry and research without always citing it, for example

Dabdoub, S. M., A. A. Tsigarida, and P. S. Kumar. 2013. “Patient-Specific Analysis of Periodontal and Peri-Implant Microbiomes.” Journal of Dental Research 92 (12 suppl): 168S–175S. doi:10.1177/0022034513504950. Quote:"Single and multiple comparisons of distributions were carried out with the statistical facilities provided by JMP (SAS Institute Inc.), as well as the Python libraries SciPy, pandas, and statsmodels." — Preceding unsigned comment added by 96.127.225.218 (talk) 14:14, 22 February 2014 (UTC)[reply]

That paper only has one citation. We need something better to satisfy WP:NSOFT. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
  • statsmodels is a established tool used by many researchers, including Nobel Prize winners. Many researchers use it without giving it proper credit in their publications. It is part of the Enthought distribution package for scientists: [1]. Nobel Prize Laureate Prof. Thomas Sargent mentions it in his website as one of the most useful Python modules: [2]. It is part of the open source movement, and it would be a mistake for Wikipedia to remove this article. Matplotlib (talk) 15:05, 22 February 2014 (UTC)[reply]
That webpage only mentions statsmodels once, in a list, and WP:NSOFT clearly states that "Inclusion of software in lists of similar software generally does not count as deep coverage" and is not sufficient to establish notability. The rest of your argument is irrelevant, I'm afraid. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
I respectfully disagree. There are well over 10,000 python modules, and he is only citing 4 modules. Clearly, it is a great endorsement by one of the most relevant academics of our time. Econometricians reading this discussion would be rolling their eyes. Matplotlib (talk) 02:22, 23 February 2014 (UTC)[reply]
Ok, that might be useful. QVVERTYVS (hm?) 15:46, 22 February 2014 (UTC)[reply]
I collected the list from what I found with Google Scholar. There are two kinds of articles, those that use parts of statsmodels and usually mention the statsmodels homepage in brackets or a footnote. The second kind mentions statsmodels for the python eco-system and in some cases for further analysis. I will add more comments about this. — Preceding unsigned comment added by Josefpktd (talkcontribs) 14:49, 23 February 2014 (UTC)[reply]
I found another one that was not on Google Scholar: they mention using Python and R in the main article, but statsmodels is only cited in the Supplementary Material which is not indexed by Google Scholar, as far as I can see. http://bioinformatics.oxfordjournals.org/content/29/14/1825.full?sid=46bb91f0-38f6-493c-a38c-c202b0dbfc34 — Preceding unsigned comment added by Josefpktd (talkcontribs) 15:46, 23 February 2014 (UTC)[reply]
  • statsmodels is just a traditional statistics and econometrics package written in Python with less coverage than R or Stata but covers most of the commonly used models and hypothesis tests (together with scipy.stats.) There is no hype associated with it. For a bit of background see http://stats.stackexchange.com/questions/47913/pandas-statsmodel-scikits-learn/48578#48578

    The number of articles that use or mention statsmodels shows that statsmodels has found acceptance in the research communities of various fields. Of course the citation or usage count is much smaller than the one of long established packages like R or Stata. We, statsmodels developers, never emphasized getting citations. As pointed out on our mailing list, we don't even have the conference article citation displayed prominently on the documentation website. Statsmodels is also used in a few university courses for using python in the field, but I don't have a list of those.

    Eco-system: Referring to "It is not unreasonable to allow relatively informal sources for free and open source software, if significance can be shown" WP:NSOFT.
    I think what Matplotlib pointed out in the comment above is important. statsmodels is an established and important part for the python in science and python for data analysis ecosystems. Numpy, scipy, pandas, scikit-learn, statsmodels, pymc, and some others, are the general purpose packages, which are complemented by field or application specific packages. So, often lists for recommended packages will include statsmodels and the other main packages. statsmodels has participated in each of the last five years in the Google Summer of Code under the umbrella of the Python Software Foundation, the first year or two as a scipy project.
    Statsmodels is included in all science oriented python distributions, but most of the "spreading the word" goes through blogs and mailing lists.
    One example as illustration: http://www.automatedtrader.net/articles/software-review/144328/utopian-quantopian reviews an open source package in finance written by a startup. (Automated Trader Magazine Issue 30 Q3 2013) It mentions statsmodels next to scipy, and then points out a limitation of statsmodels in the next paragraph.

    Statsmodels is treated as a tool library, which is necessary but does not require special emphasis. — Preceding unsigned comment added by Josefpktd (talkcontribs) 18:06, 23 February 2014 (UTC)[reply]
  • FYI: I tried to add statsmodels to Wikipedia in 2011, see User:Josefpktd/Statsmodels for my draft. At the time I did not try to show notability because, although we were already well established in the numerical python community, we did not have a wider user base yet. After an additional two and a half years of growth, I think statsmodels is widespread and known well enough to justify "notability" for Wikipedia. Also note that this time it is not a statsmodels developer that started the Wikipedia page.Josefpktd (talk) 22:02, 23 February 2014 (UTC)[reply]
Note: This debate has been included in the list of Software-related deletion discussions. • Gene93k (talk) 03:01, 24 February 2014 (UTC)[reply]
Blogs are usually not accepted per WP:SPS, unless they're company blogs or the blog of a major figure in industry, academia or the OSS world. Same goes for StackExchange and similar crowdsourced Q&A websites. I've cited automatedtrader.net. To be honest, I'm convinced that statsmodels is a relatively major library; the question is whether an encyclopedic article can be written about it (but I'm moving towards a "yes" on that question). QVVERTYVS (hm?) 09:21, 24 February 2014 (UTC)[reply]
Related to WP:SPS As far as I understand this would refer to publication like blogs written, in the case of software, by the developers or developing company. I referenced the three blogs (or one lecture notes and two blog) to **illustrate** the significance of statsmodels for open source statistical analysis. Those were written by users that are not directly involved in statsmodels. However, since it's open source, the first author contacted the mailing list when he was writing his course notes. The second improved a function in statsmodels when he found during his blog writing that our previous version was slow. I only found the last blog while searching now for establishing notability. I emphasized "illustrate" because statsmodels doing traditional statistics and econometrics is not newsworthy or hyped enough to make it into the New York Times or Wall Street Journal, and most of the examples and comments on statsmodels are on blogs.Josefpktd (talk) 16:05, 24 February 2014 (UTC)[reply]
Actually, R did make it to the NYT. But coming back to SPS, it makes the exception that "Self-published expert sources may be considered reliable when produced by an established expert on the subject matter, whose work in the relevant field has previously been published by reliable third-party publications." To me, that means that Thomas Sargent's website is an acceptable source, but J. Random User's blog is not, regardless of whether they're involved with statsmodels. The reason for SPS, as I've always understood it, is that it's too easy to create a blog, post what you want on Wikipedia on the blog, then cite it — not so much to stop promotional editing. QVVERTYVS (hm?) 16:43, 24 February 2014 (UTC)[reply]
I thought "Self-" in SPS refers to the subject of the Wikipedia page, and we (self) didn't write those blogs so we can get into Wikipedia. (Aside: R made it into the NYT after 16 years plus another 17 years of S as background. I hope statsmodels makes it sooner. :) I know that many of our sources are not strictly defined as "reliable sources". I'm not sure what "relatively informal sources for free and open source software" means. However, what I tried to show with the wide range of sources is that statsmodels has been "noted" by academic researches, data analysts and companies, so it should be "notable" enough for Wikipedia in my opinion.Josefpktd (talk) 20:02, 24 February 2014 (UTC)[reply]
QVVERTYVS is there anything missing that would help to convince you. Or should we wait another year, and another 10 or 30 publications that use statsmodels and until Tom Sargent includes some statsmodels examples in his quant-econ site.Josefpktd (talk) 20:02, 24 February 2014 (UTC)[reply]
I usually pay more attention to content in blogs than origin. I just saw that the London School of Economics "syndicated" an article on the use of python for statistical analysis, http://blogs.lse.ac.uk/impactofsocialsciences/2014/02/24/on-the-future-of-statistical-languages/ (Note it contains the disclaimer that it's not an official position) — Preceding unsigned comment added by Josefpktd (talkcontribs) 12:42, 24 February 2014 (UTC)[reply]
I'm adding one more example of blogs. There are several companies and startups that use or start to use Python for data analysis. I know of a few but do not have any overview who is using statsmodels as one of the backend tools. cbinsights looks like a analytics company that has never been in contact with statsmodels development, as far as I know: http://www.cbinsights.com/team-blog/python-tools-machine-learning/Josefpktd (talk) 16:14, 24 February 2014 (UTC)[reply]
Just for completeness, a WP:SPS: This is my blog http://jpktd.blogspot.ca/ where I add on and off some explanations or descriptions of statistics that is under development. Except for the release announcement, it is mostly technical and "boring" statistics. I am one of the two main developers and maintainers of statsmodels. http://www.ohloh.net/p/statsmodels/contributors?query=&sort=commits_12_mo Josefpktd (talk) 16:53, 24 February 2014 (UTC)[reply]
  • Related to adding a page on statsmodels. I'm a frequent user of statistics pages on Wikipedia, and if statsmodels has its own page, then it will be easier to add it to statistics pages that have an implementation section. For example, searching Wikipedia "~statsmodels" shows several pages where Wikipedia contributors have added statsmodels for the Python implementation. — Preceding unsigned comment added by Josefpktd (talkcontribs) 12:16, 24 February 2014 (UTC)[reply]