Jump to content

Wikipedia talk:Statistics

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by On Wheezier Plot (talk | contribs) at 00:44, 26 December 2006 (→‎100,000,000 edit counts! Yeehah!: The bigger title, the better!). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Discussion of article size from 2002

OK, somebody has to say this: The fact that we are patting ourselves on the back for intentionally undercounting our articles is just plain silly. I just now went and looked under "short pages" at all 28 pages with exactly 100 bytes, and 13 of them contained a comma. Not a single one of them deserves to be called an article, but almost half are counted. Next I looked at all 33 pages with exactly 200 bytes, and 27 of those contained a comma. A few of them (not eighty percent!) might be considered articles under an extremely lenient definition of article, but does anyone outside of Wikipedia consider a single, brief paragraph to be an article? Are ANY of Brittanica's articles under 500 bytes?

I estimate our median article size as 1000 bytes, because that's the size of our 18943rd longest page according to long pages. (18943 would be the median of 37886 total articles.) To my mind, a conservative count of articles would place an 1000-byte minimum, rather than a 1000-byte median, which would trim our total article count in half. But no matter how we count articles, let us at least prominently post the median size of the articles which are included in the count. And please, please don't call the count "unimpeachable". (For refernence, my little tirade (including this sentence) is 1367 bytes long, i.e. rather longer than our median article.)

--Fritzlein 02:55 Aug 17, 2002 (PDT)

I agree.


I'm sure that I found information about the total database size of Wikipedia recently, but I can't find it again. Could this be made available again, please.

It would be good to have the info on this page, or a link on this page to where it can be found.

If I get an answer - this time I'll try to keep the info! David Martland 15:20 Dec 13, 2002 (UTC)

Don't know if this is what you want, but:

$ ls -l
total 2801904
-rw-rw----    1 mysql    mysql        8852 Aug  9 19:27 archive.frm
-rw-rw----    1 mysql    mysql    21077440 Dec 13 17:19 archive.MYD
-rw-rw----    1 mysql    mysql        1024 Dec 13 17:19 archive.MYI
-rw-rw----    1 mysql    mysql        8586 Jul 20 19:30 brokenlinks.frm
-rw-rw----    1 mysql    mysql     7599784 Dec 13 17:58 brokenlinks.MYD
-rw-rw----    1 mysql    mysql     6398976 Dec 13 17:58 brokenlinks.MYI
-rw-rw----    1 mysql    mysql        9114 Nov 22 08:21 cur.frm
-rw-rw----    1 mysql    mysql    451396440 Dec 13 17:59 cur.MYD
-rw-rw----    1 mysql    mysql    201449472 Dec 13 17:59 cur.MYI
-rw-rw----    1 mysql    mysql        8756 Jul 20 19:30 image.frm
-rw-rw----    1 mysql    mysql        8586 Jul 20 19:30 imagelinks.frm
-rw-rw----    1 mysql    mysql      192136 Dec 13 17:51 imagelinks.MYD
-rw-rw----    1 mysql    mysql      175104 Dec 13 17:51 imagelinks.MYI
-rw-rw----    1 mysql    mysql      457448 Dec 13 15:53 image.MYD
-rw-rw----    1 mysql    mysql      215040 Dec 13 15:53 image.MYI
-rw-rw----    1 mysql    mysql        8706 Jul 20 19:30 ipblocks.frm
-rw-rw----    1 mysql    mysql        7300 Dec 12 15:23 ipblocks.MYD
-rw-rw----    1 mysql    mysql        3072 Dec 12 15:23 ipblocks.MYI
-rw-rw----    1 mysql    mysql        8582 Jul 20 19:30 links.frm
-rw-rw----    1 mysql    mysql    41151856 Dec 13 17:59 links.MYD
-rw-rw----    1 mysql    mysql    26686464 Dec 13 17:59 links.MYI
-rw-rw----    1 mysql    mysql        8898 Nov 22 08:43 old.frm
-rw-rw----    1 mysql    mysql        8790 Jul 20 19:30 oldimage.frm
-rw-rw----    1 mysql    mysql       54436 Dec 13 09:51 oldimage.MYD
-rw-rw----    1 mysql    mysql       11264 Dec 13 09:51 oldimage.MYI
-rw-rw----    1 mysql    mysql    2082194432 Dec 13 17:59 old.MYD
-rw-rw----    1 mysql    mysql    19528704 Dec 13 17:59 old.MYI
-rw-rw----    1 mysql    mysql        8598 Jul 20 18:49 random.frm
-rw-rw----    1 mysql    mysql       85000 Dec 13 04:47 random.MYD
-rw-rw----    1 mysql    mysql        1024 Dec 13 04:47 random.MYI
-rw-rw----    1 mysql    mysql        8964 Oct 28 08:06 recentchanges.frm
-rw-rw----    1 mysql    mysql     3014196 Dec 13 17:59 recentchanges.MYD
-rw-rw----    1 mysql    mysql     1556480 Dec 13 17:59 recentchanges.MYI
-rw-rw----    1 mysql    mysql        8700 Jul 20 18:49 site_stats.frm
-rw-rw----    1 mysql    mysql          29 Dec 13 17:59 site_stats.MYD
-rw-rw----    1 mysql    mysql        2048 Dec 13 17:59 site_stats.MYI
-rw-rw----    1 mysql    mysql        8874 Aug 24 02:54 user.frm
-rw-rw----    1 mysql    mysql     1838736 Dec 13 17:48 user.MYD
-rw-rw----    1 mysql    mysql      174080 Dec 13 17:48 user.MYI
-rw-rw----    1 mysql    mysql        8590 Nov 27 15:15 watchlist.frm
-rw-rw----    1 mysql    mysql      291006 Dec 13 17:37 watchlist.MYD
-rw-rw----    1 mysql    mysql      471040 Dec 13 17:37 watchlist.MYI



I'm sure I've seen some info regarding the total size of Wikipedia (in bytes, or Mbytes) and the availability of downloading the whole database.

I'd be glad if someone could point me in the direction of that info or the relevant page again. I'll save the info this time!

It'd be good to have that info on this page - or a link - too. David Martland 15:18 Dec 13, 2002 (UTC)

There really needs to be a more conservative total article count that makes a distinction between encyclopedia articles and almanac articles and that excludes more certain pages. What particularly troubles me is that there are now thousands of year almanac pages and that most of them can't be considered to even be almanac articles because they are just templates.

I therefore propose the following (in addition to the current criteria); 1) any page linked to centuries should be excluded from the total article count and should be given its own line in special:statistics at least until most of these pages become almanac articles (the vast majority are either templates or templates with one or two entries). 2) any page with a link to Wikipedia:Disambiguation be excluded from the count. 3) any page that is less than 500 bytes be excluded from the count (E. coli is 610 bytes). and 4) there should be three "total article counts" for everything not excluded by the above; one for anything with the string, list, chart, timeline or table in their titles (these would be "almanac-like" articles), one for everything left over (these would be "encyclopedia articles") and one grand total count that would still be the number displayed on the Main Page.

Our current count is exaggerating the true number of articles we have and is harming the project as a result. We need to be honest with our article counts and very conservative -- otherwise we will loose credit with passers-by who are at first impressed by our article count but then find out that it is bloated. --mav 13:36 Aug 28, 2002 (PDT)

I think that sounds pretty reasonable. --Brion 13:47 Aug 28, 2002 (PDT)

As long the criteria you select are easily computable, I'm happy to make whatever change in the software is necessary to reflect a better count. I also don't think anyone is making any claims about the accuracy of the count--the statistics page itself is careful to point out that these are just estimates. But I agree, a more conservative estimate is entirely warranted. --LDC

Great! While you are at it a link to Wikipedia:What is an article under the first occurance of that word on the special stats page would be nice. --mav
I never thought I'd say this, but I'd like something to be more conservative.  :-) (Just the article count, not any of Bush's cabinet). --KQ
From a random sampling of pages, I would say that something like a third of our pages would truly count as useful articles in the eyes of a new user (that agrees with earlier observations from Kajakit on the mailing list). That would mean we have something like 10,000-15,000 'useful articles' in the database. We could proxy that by counting, say, articles over 1500 characters long. At the time of writing, that would give 14,148 'useful articles' compared to a headline number on the main page of 39,654.
The 1500 character threshold has the advantage of being long enough to cover most of the non-articles according to the criteria suggested by mav (century pages, disambiguation pages etc) automatically.
I would not like to see the headline count on the main page reduced - I think that would be confusing for new users and perhaps a bit demotivating for the rest of us. We could consider changing the main page wording to something like:
... We started in January 2001 and are already working on 6,862,837 articles, with more being added and improved all the time. We want to make over 100,000 complete articles, so let's get to work! Anyone, including you, can edit any article ....
That would let us keep the headline count without creating the suggestion that they are all finished, polished articles. We could keep a running total of '1500 character articles', and perhaps other sizes too, on the statistics pages for those that are interested.
Enchanter 17:19 Aug 28, 2002 (PDT)

I don't think 1500 characters is a particularly useful number -- there are many subjects for which 500 characters would suffice as a minimum article size (although something in the range of 500 - 1000 characters wouldn't bother me too much). Could you maybe run some quick numbers to see how much of a reduction would occur if my proposal were to be enacted (this could be done easily if there were an page count in "what links here")? I was thinking about a reduction of 5 - 7 (maybe 10) thousand. Even if it is more than that I don't think that a temporary reduction in the total article count would hurt. We've already been through one round of this back when we upgraded to phase II and it didn't hurt anybody's moral that I know of. All that we have to do is write-up an announcement that we are enacting a far more conservative definition of what we consider to be an article as far as automatic detection goes. --mav

Mav - heres some figures broadly following your proposal above:
Total articles    45179  (including without comma)
<500             -14314
-Disambiguation    -289
-Year in review   -1292  (pages with numbers as title)
-'list' in title   -386
-'century' in title -60
-'timeline' in title-59
-'table' in title   -55
TOTAL             28903
The main message here is that what really drives the numbers is the threshold for article size that you use. The other exclusions make a relatively small difference (although I'm sure there are in fact many more 'list like' non articles that aren't picked up by these criteria).

I also don't think we could ever be able to teach a computer how to dertermine just what is, or is not, a 'useful' article. --mav

I agree. That's why I think the best process is to:
  • Decide the proportion of pages we want to count, by randomly sampling recent changes and making subjective decisions.
  • Choosing an article size that gives broadly the number we want.
That's how I came up with the threshold of 1500. I absolutely agree that some articles that are shorter than 1500 are worthwhile, but these are offset by the longer articles that are not much use (according to my relatively strict subjective definition of an article).
The impression I get that the average quality of Wikipedia has been fairly constant. That is, the tendency for the average quality to rise as articles are improved and the tendency for average quality to fall as new stubs are added broadly cancel out. If so, then picking an article size threshold should give a reasonably stable indicator of articles up to a certain quality.
Enchanter 01:39 Aug 29, 2002 (PDT)
Thanks for doing the numbers -- this should give the developers plenty to chew on. I Think a figure of 28,000 is about right. --mav
I like the format of this count. I'd love to see it replicated (i.e. automatically generated) on the statistics page. I agree that the size threshold is the most important decision we have to make, although excluding the other types of articles should also be done if it doesn't bog down the server to cut them out on the fly. Just to make things more confusing, I vote for a minimum of 1000 bytes, which cuts our count roughly in half. However, I would support either 500 or 1500 in preference to what we have now.
I don't care a great deal what number we use for the headline count, as long as the more detailed statistics are only click deeper. --Karl Juhnke

Would it be possible to now and again run a query (perhaps from crontab) that showed how many "article" pages we have with c characters, c<1000, how many 1000<=c<2000,2000<=c<3000, et cetera? DanKeshet

At the moment:
      =0:     2
     <16:     3   (     1-15:     1)
     <31:    21   (    16-30:    18)
     <63:   111   (    31-62:    90)
    <125:  1222   (   63-124:  1111)
    <250:  4646   (  125-249:  3424)
    <500: 14138   (  250-499:  9492)
   <1000: 25474   (  500-999: 11336)
   <2000: 34739   (1000-1999:  9265)
   <4000: 40849   (2000-3999:  6110)
   total: 45172   (4000+:      4323)
The queries are on the form of SELECT COUNT(*) FROM cur WHERE LENGTH(cur_text)<500 AND cur_is_redirect=0 AND cur_namespace=0 --Brion 20:38 Aug 28, 2002 (PDT)
Thanks, Brion! I think it's pretty interesting how it works out. DanKeshet



Can we have some figures for mean article word/character counts (ignoring markup and HTML), please? This would enable better comparisons with existing encyclopedias: see the article text for comparisons.



character count... bah. thats unreliable too. Can i suggest a simple, effective, and _working_ solution? yeah thats what i thought. Since wikipedia is usermoderated, why not add the option for registered users, or maybe unregistered too to vote on how usefull they found the article, including a reason why. think slash dot (-7 too short), +5 well written, etc... it wouldn't be hard to implement, and i think it would be good, and then you could count the "real" articles based on their user approval. Of course this affects articles which are voted really low, then majourly updated to help this that still have a low score. thats why i think ratings should be cleared every time there is a majour update (eg. not minor + sometype of change comparison with the .diff file) I think. ideas anyone? I think this is pretty good. and i'd be willing to implement it, if people like the idea, i dont know how long it'd take me, cause im not familiar with the codebase, but im very profficient in php and db work as well as other programming languages. i am already registered in SF too... so if nayone likes this idea, leave a comment here, mny talk page, or [e-mail me].

Lightning, Sept 29 3:17

Alexa

move to wikipedia talk:statistics

According to a recent Wikipedia:Announcement Wikipedia is as popular as Slashdot. I was quite surprised! Is it really true? Anyone know how Alexa measures popularity? I see they offer a toolbar to download... do they extrapolate data from toolbar downloaders? Are Wikipedians more likely to have a toolbar than other users? Alexa Website Pete 12:05, 4 Sep 2003 (UTC)

Yes, everyone who have the Alexa toolbar installed effectively send the URL currently watched to the Alexa server, thus allowing them to monitor which sites are visited, and how often. How much valid these data are can of course be debated - those who worry about privacy will probably not install it for sure. But in the range of 1000th popular site I doubt that a few very active Wikipedians with toolbar can make that much change anymore, around the 100.000th it makes much more impact. andy 12:21, 4 Sep 2003 (UTC)
Alexa installs by default, whether you want it or not, and without your permission, simply as part of a Windows install. Ad-Aware and other similar spyware protection programs disable it, however. Tannin 12:40, 4 Sep 2003 (UTC)
Thanks for the info guys. I wonder if the nature of Wikipedia, where each edit means two page views (or more if you preview!), has an inflationary effect on our figures. I am pretty sure if we got another slashdotting we would still have to batten down the hatches pretty hard because of weight of numbers. And Tannin, just to check.. did you mean Alexa is activiated with every installation of the Windows OS?? That's a lot of data! Pete 14:49, 4 Sep 2003 (UTC)
Alexa does separate between page views (e.g. the numerous views in an edit process) and number of viewers (independent IP addresses) - and then adds both together in a magic formula to get the actual rank. But don't forget that a big percentage of viewers will not edit, but just view. andy 14:53, 4 Sep 2003 (UTC)
Pete: this page (which I found more or less at random on Google) has quite a bit of detail. Someone should write this up for the 'pedia. I see (from another page) that here is a class action against Alexa pending. As spyware goes, there are worse ones. But just the same, I don't like people messing with my computer without my knowledge, and (I understand) neither does the law in most countries. I think Alexa is installed as part of Internet Explorer, rather than as part of Windows - not that that distinction makes much of a difference these days. Tannin
Re Pete, "nature of Wikipedia" - most users of Wikipedia never edit an article... Martin 19:22, 4 Sep 2003 (UTC)
Wow I guess I had always just assumed that we were all writers and no writers... but Wikipedia:Statistics informs me that there are 40 page views per edit... This thread has certainly reduced my Doubting Thomas stance. Pete 23:28, 4 Sep 2003 (UTC)
It should also be noted that editing a page is a different thing from reading one. Thus it is fair to count it twice. --mav

Erik's statistics pages are just what the doctor ordered for the geeky, stats-inclined people such as myself. Another popular page is Wikipedia:Wikipedians by number of edits. However it is rarely updated as the SQL script required to create the page is apparently fraught with difficulties. Looking at some of the stats on Erik's page (e.g. recently active contributors) it might be possible to provide the data currently at Wikipedia:Wikipedians by number of edits using Erik's code, making for much more frequent and hassle-free updates. Anyone else think this idea is worth doing? I would email Erik myself, but won't do so just in case he has been bombarded with similar requests since setting up the stats pages. Pete 10:52, 16 Oct 2003 (UTC)


Wikipedia in March 2004: a month in stats

I've been perusing the en.wikipedia stats for the last month's trends. Here's the headline figures:

  • 115,080,901 hits
  • 952,395,093 Kb transferred
  • Daily average: 3,712,287 hits/day

excluding Main Page, Current Events, Special pages, admin pages

  1. 100px-Beowulf.firstpage.jpeg - does anyone know why??
    • appears to be empty, or am I missing something? --Phil | Talk 16:42, Apr 1, 2004 (UTC)
    I think that was a conflation of a URL and an image link; I've corrected it, but the actual image page is Image:Beowulf.firstpage.jpeg. Marnanel 17:04, Apr 1, 2004 (UTC)
    • My suspicion is that somebody has been leeching this - including it inline in something other than a Wikipedia article (a manuscript as a forum avatar?). A hunt through the logs for the referer on requests for it would soon confirm that, and tell us who the culprit is. Either that, or its a pretty weird bug in the log analyser. - IMSoP 12:06, 2 Apr 2004 (UTC) (oh dear, must resist the urge to tidy up leeching and disambig avatar properly: too much work to do...)
      • In fact, looking at the most popular referer stats, I'd say it was someone on this messageboard here - IMSoP 12:13, 2 Apr 2004 (UTC)
        • Hmm. That site really doesn't have anything to do with Beowulf. I'm guessing few if any of the people featured are spear-danes (although I believe I did see Grendel's Mother) -- Finlay McWalter | Talk 20:38, 2 Apr 2004 (UTC)
          • Um, I'm not sure if you're joking or not, but given that I haven't time to put any decent info on avatar, I'll explain briefly for anyone who is confused. People will put any image that they think looks cool into their preferences for a messageboard, just to make them stand out from the crowd. I notice one member there has a (badly squashed) image of a bank-note, for instance. You'll note that the image in question is a thumbnail, not the original - perfect size for such a use. This kind of leeching can actually be a real nuisance for smaller websites, because of the huge amount of bandwidth it eats - a friend of mine almost had to pay his host for excess use because someone liked his b3ta submission, but didn't even scale it down! It's perhaps not such a big deal for Wikipedia, but if its still happening, it might be worth tracking down the user responsible (through, as I say, the referer logs) and politely asking them to host the image themselves.
          • If, on the other hand, you were making a subtle comment about the somwhat adult content of that messageboard, I apologise - I meant to warn readers when I realised, but became ensnared in other matters. - IMSoP 22:10, 2 Apr 2004 (UTC)
            • Yes, I was trying to be a smart-alec, but only those who've followed the link (which hopefully is no-one) will get it. I have to go wash my eyeballs out now... -- Finlay McWalter | Talk 22:24, 2 Apr 2004 (UTC)
              • Oh, come on, it's not that bad - it's not like it's some kind of goatse fan forum or something (if you don't know, you don't want to, trust me). In fact I glanced at their FAQ or whatever, and they seemed to have pretty decent rules, considering. - IMSoP 22:29, 2 Apr 2004 (UTC)
                • Seriously, that's pretty damn tame. I was expecting a whole lot worse ;) →Raul654 22:33, Apr 2, 2004 (UTC)
  2. Seven dirty words
  3. United States
  4. World War II
  5. Goatse.cx
  6. March 11, 2004 Madrid attacks
  7. List of sex positions
  8. Wiki
  9. Sheikh Ahmed Yassin
  10. Mathematics
  1. wikipedia
  2. wiki
  3. the answer to life the universe and everything
  4. encyclopedia
  5. penthouse
  6. saddam hussein
  7. ahmed yassin
  8. sheikh ahmed yassin
  9. sexual intercourse
  10. free encyclopedia

More at http://en.wikipedia.org/stats/usage_200403.html .

I'd just like to give the obligatory plug for the autoupdating web links I wrote:
  • [http://wikimedia.org/stats/en.wikipedia.org/url_{{CURRENTYEAR}}{{CURRENTMONTH}}.html Current month's hits]
  • [http://wikimedia.org/stats/en.wikipedia.org/usage_{{CURRENTYEAR}}{{CURRENTMONTH}}.html Current month's webalizer]
  • [http://mail.wikimedia.org/pipermail/wikien-l/{{CURRENTYEAR}}-{{CURRENTMONTHNAME}}/date.html Autoupdating link to the mailing list]
→Raul654 16:21, Apr 1, 2004 (UTC)
It would be really good if we could change the header of each page so that it says $PAGENAME - Wikipedia, the free encyclopedia - BROWSER SPECIFIC TAG at the top toolbar instead of just $PAGENAME - Wikipedia - BROWSER SPECIFIC TAG... it would be nice not to have such a low google rank for "encyclopedia" - and this might help a notch. Pete/Pcb21 (talk) 16:27, 1 Apr 2004 (UTC)
I don't think this would be an improvement, i expect the opposite. Using just the title as the html title (without wikipedia) should increase our relevance for the searches mathching the title. No need to get a higher ranking for the search phrase 'wikipedia', there's nothing better than #1. Including 'Encyclopedia' in the title of the main page and/or in default keywords in the header of each page could help to improve the ranking for that search term though. A small skin hack could do this. -- Gabriel Wicke 13:33, 2 Apr 2004 (UTC)
Your more refined approach sounds good. A specialized hack for the main page sounds like really good because "Main Page - Wikipedia" is awful. Pete/Pcb21 (talk) 13:52, 2 Apr 2004 (UTC)

When will Wikipedia reach??

  • 300,000 articles??
  • 400,000 articles??
  • 500,000 articles??
  • 600,000 articles??
  • 700,000 articles??
  • 800,000 articles??
  • 900,000 articles??

A million articles??

66.245.104.154 02:09, 10 Apr 2004 (UTC)

I thought the statistics that allowed us to look at the number of hits to each page were really cool, however, there's one problem. That file is quite long, many many megabytes.

I know it would be a bit of work, but a really cool potential addition to Wikipedia would be something which allows us to request the number of hits to a given site. For example, we could input "downsizing" and it would tell us that there were 103 hits to that site in the month of March 2004, for example.

Also cool would be a feature which include the rank of each site, that would say that the site was, for example, the 46381st most-visited site during that month, out of 221682 (another made-up number). Mike Church 07:12, 18 Apr 2004 (UTC)

How many "Real" articles in the English Wikipedia

I see the number of articles every time I come to the English Wikipedia. It is now approaching 300,000 articles.

My question is: How many of these are actually real encyclopedia articles?

If I do a random page, is it actually random? If I do a hundred or a thousand random pages and keep track of how many are just summaries of census information for US geographical places, how many are detailed descriptions of some character in a video game, how many "really" belong on E! online (music and movies), how many have really no information on them (stubs), and how many are actually "real" encyclopedia articles, would that be a good estimate?

I know I am being a little snooty here, and I know that what I am thinking about is not the only true goal of the Wikipedia project, but I have just been thinking about how to evaluate this stuff from the point of view of someone who is using Wikipedia instead of another encyclopedia such as Encarta or Encyclopedia Brittanica.

There was some discussion of the true article count a couple of years ago on this talk page, and if the discussion is somewhere else, please just point me to it.

Thanks. nroose Talk 17:29, 10 Jun 2004 (UTC)

The most recent survey in this area that I am aware of is at m:English Wikipedia Quality Survey conducted by Adam Carr in October 2003. According to that data, and if you have a reasonably exacting standard of what consitutes a "real" article, at least 20% and probably as much as 30% of articles are "real". Thus 60,000 seems a reasonable ball-park figure for number of real articles. I am sure a lot of people would be interested in an updated and expanded (1000 instead of 200 articles?) survey but these things take time. Pcb21| Pete 18:54, 10 Jun 2004 (UTC)
Well, I don't have time to do in-depth analysis. I am really just curious about what a good estimate of the number of articles I would consider to be real articles. I am not saying that other articles should not be in Wikipedia. Actually I think it is great that Wikipedia has a broader range of stuff than other encyclopedias. But, since I was curious, I wrote an HTML/Javascript page (http://home.earthlink.net/~eroose/wikisurvey.htm - it resizes the page, so you probably don't want it to come up in this browser window) to make it easier to do a survey. Just click on real or not real for each page that comes up, and it keeps track of how many of each and immediately shows you the next random page in a different window. It does not send any information back to the server. It works OK in IE, but I have not tried it in other browsers. The numbers I got by doing 100 pages was that 65% of them were "Real" to me. I'm not very picky about length or completeness. Perhaps it was too few to really provide good stats. nroose Talk 06:37, 14 Jun 2004 (UTC)
I did a similar exercise a couple of times, once a couple of years ago and once more recently. I rated 100 random pages for how much they resembled encyclopedia articles. I reckoned that about one third of articles were of real encyclopedia quality, about one third were nowhere near encyclopedia quality, with the third in the middle as promising works in progress. Interestingly, I could see no obvious sign that the average quality of articles was getting any better or worse; it looks to me that the increase in average quality through editing and the decrease through new stubs roughly cancelled out. Enchanter 18:48, Jun 14, 2004 (UTC)

Hi, don't know where to point this out but there seems to be a fluke with the usage stats. Please check it out && corrrect it

http://wikimedia.org/stats/en.wikipedia.org/

Notice that the stats for the last 12 months....<pasted> are crazy for Jan and Dec. These are not the stats for the las 12 months but rather a mixture of jan 2003 dec 2002. Maybe its a select statement bug??? Can it be corrected coz i wanted to use the stats for a statistics project...
regards

<pasted>

Jul 2004 6786854 5903348 4128308 221202 1057155 333444835 1548418 28898156 41323437 47507980
Jun 2004 8708699 7606574 5281634 295800 2406100 777885880 3845409 68661252 98885464 113213099
May 2004 3872154 3412794 2437365 274077 3506025 742530837 5755636 51184678 71668674 81315244
Apr 2004 2528076 2244552 1604463 189930 3484303 648796021 5697918 48133914 67336582 75842288 Mar 2004 3712287 3269369 2406952 280845 5134274 952395093 8706201 74615518 101350462 115080901
Feb 2004 3085573 2583379 3012502 265556 3861758 663607322 6638905 75312562 64584497 77139341 Jan 2003 258494 223980 141012 21995 409498 125092770 681872 4371372 6943405 8013325
Dec 2002 389507 343187 196038 49208 959749 146121521 1525456 6077188 10638822 12074727
Nov 2003 1630590 1157635 856691 139330 2090602 250582681 3343942 20560585 27783244 39134168
Oct 2003 1507552 1201196 721562 157068 2997839 337440023 4869113 22368431 37237095 46734120
Sep 2003 1479193 1181710 608869 155665 2894993 305957717 4669978 18266091 35451313 44375793
Aug 2003 998932 798697 421723 90404 1746797 228149162 2802537 13073417 24759634 30966904

<pasted />


Wikipedia's headline stats for July 2004

The July stats are in (see http://wikimedia.org/stats/en.wikipedia.org/usage_200407.html ) and they make some interesting reading...

July was the English Wikipedia's busiest month ever (I think), with:

  • 9,439,508 hits
  • 8,208,960 files were downloaded
  • 5,672,051 pages were served
  • 316,295 visits (not clear if this refers to unique visitors or just page impressions)
  • 2,083,869,414 Kb of data was downloaded

Excluding project and special pages (and the Main Page), the 10 most requested articles were:

  1. Nick Berg (Iraq hostage)
  2. John Kerry (new entry)
  3. Kim Sun-il (Iraq hostage)
  4. OS-tan (deeply bizarre; a must-read) (new entry)
  5. List of sex positions
  6. United States
  7. Crushing by elephant (yay, go elephants! ;-)
  8. Bobby Fischer (former chess champion) (new entry)
  9. Wikipedia
  10. Wiki

For comparison, the 10 most requested for June were:

  1. Paul Johnson (hostage)
  2. Kim Sun-il
  3. Paul Marshall Johnson, Jr.
  4. Beheading
  5. Decapitation
  6. Redmond, Washington
  7. Goatse.cx
  8. SpaceShipOne
  9. Wikipedia
  10. United States

The top 10 search terms for July were:

  1. wikipedia
  2. wiki
  3. nick berg
  4. cristiano ronaldo
  5. teresa heinz kerry
  6. encyclopedia
  7. beheading
  8. harry potter and the half blood prince
  9. marlon brando
  10. ken jennings

From this, it looks pretty clear that Wikipedia is being heavily used as a resource for major ongoing news events, particularly Iraq. -- ChrisO 16:37, 2 Aug 2004 (UTC)

Editing experience

I've been keeping an eye on Combined live stats, and none of the graphs there seem to reflect the overall "slowness" I experience when browsing or editing. The second one, which deals with server response time, would seem in theory to reflect the overall experienced "slowness", but it doesn't seem to. Is there another stat that would be more meaningful for what I'm trying to look at? P.Riis 21:24, 31 Aug 2004 (UTC)

500,000 Articles!!!

Wikipedia has finally reached 500,000 articles! What do people have to say, I wonder? --Andrew 22:03, 17 Mar 2005 (UTC)

yi ha

Is there away to find the article that have the fewest or no links to them? Falphin 20:20, 9 Jun 2005 (UTC)

Orphaned pages Nroose 12:46, 26 Jun 2005 (UTC)

Stats are over a month old

It appears that the Stats have not been updated in over a month (since May 16th). Why is that? Nroose 12:48, 26 Jun 2005 (UTC)

How to get statistics for a Wikipedia article

How do you get the statistics relative to an article?... at http://en.wikipedia.org/wiki/Don_Saklad Hits. Hourly hits. Referers. Et al.

Major error on special page

The special page for statistics includes this page [ttp://en.wikipedia.org/wikistats/EN/Sitemap.htm] which hasn't been updated since 16 May as a page that updates automatically. Could someone with the authority to do so sort this out please? Osomec 19:35, 3 August 2005 (UTC)[reply]

User statistics as of September 15, 2005

See: User:JIP/User statistics. JIP | Talk 11:54, 15 September 2005 (UTC)[reply]

Edit count

Why my favorite Kate's Tools stopped working? Vald 10:10, 14 November 2005 (UTC)[reply]

The new link to wikiside.com is to an independent site that carries ads and I don't think the stats look correct or up to date anyway. I'm thinking of removing it. Any comments. Calsicol 16:53, 5 January 2006 (UTC)[reply]

Active users

How many users have actually made an edit this month? Probably like 3% are real users.Voice of AllT|@|ESP 02:54, 13 January 2006 (UTC)[reply]

Compare to the number of those who have checked their watchlist over the same time period. Creating an account is the only way to get a portable watchlist. --James S. 17:51, 21 January 2006 (UTC)[reply]

Believe it or not

Special:Statistics has historically had about 1 in every somewhat less than 1000 logged-in users administrators. Now, however, it is 1 in more than 1000. I think we should add a new figure, which is the number of logged-in users who are not indefinitely blocked. Any thoughts?? Georgia guy 16:32, 14 January 2006 (UTC)[reply]

See ya', cnn.com! Next stop, Gatesburg....

So, how are the squids doing? --James S. 17:48, 21 January 2006 (UTC)[reply]

Page organization

Somebody promoted the "Manually updated statistics" above the "Periodically updated statistics", then moved "Search engine statistics" from one category to the other. First of all, if you know what these pages are and note the descriptions given, the recategorization was simply wrong, because the groups are of different types. As to placement on the page, while recognizing that people like to follow Alexa stuff, I think the breadth and depth of Wikipedia statistics in the other section deserves the higher placement. There are lots of places you can get data comparing Wikipedia against the rest of the internet, and the statistics here are just occasional glimpses at things that other sites do better. On the other hand, this is the best place to go for in-depth statistics specifically about Wikipedia (Erik Zachte's in particular) and I think that's what we should be featuring. --Michael Snow 18:15, 8 February 2006 (UTC)[reply]

Care to offer any examples of "places you can get data comparing Wikipedia against the rest of the internet"? I'm inclined to doubt that any good ones exist. At the moment all but one of the stats in the top section are at least 8 weeks old, which looks pretty useless in my opinion. 62.31.55.223 17:49, 9 February 2006 (UTC)[reply]
Alexa to start with; getting data from the horse's mouth seems preferable to me as opposed to looking at a secondhand compilation. Beyond that, data get released to the media periodically by companies like comScore, Hitwise, and Nielsen//NetRatings (or whatever they're called now). Whereas for detailed numbers about Wikipedia itself, there aren't really outside sources, and a compilation like Erik Zachte's is as good as you'll find anywhere from anybody. Also, seeing as how you undid my change, you still don't seem to have understood why "Search engine statistics" doesn't fit in the category you moved it to. --Michael Snow 03:26, 10 February 2006 (UTC)[reply]
The pages here are vastly better than raw Alexa info as they contain a great deal of well organised information covering a long period of time. The pages contain scores of links to Alexa comparison graphs which are not immediately available direct from Alexa (and how would you know what to compare?). Really your comments are a hurtful insult to all of the effort that has been expended. Your other examples are feeble as you admit the information is merely "released periodically". Those sources are also almost entirely U.S. centric, and therefore grossly misleading to the point of being worse than useless as sources about Wikipedia on a worldwide basis. 62.31.55.223 00:18, 12 February 2006 (UTC)[reply]
Really now, I don't see why it's such a hurtful insult to suggest that we should prioritize statistics generated by Wikipedians about Wikipedia over collections of statistics copied from outside sources and still available from those same sources. I would think it was much more insulting to describe all the work people have done to produce internal statistics as "pretty useless" simply because the updates depend on having a new database dump available to run their scripts on. --Michael Snow 00:35, 13 February 2006 (UTC)[reply]
There is nothing insulting in Michael's remarks. His arguments are certainly valid and his opinion worthwhile considering. Nevertheless I would also prefer the version with the frequently updated statistics on top and the less frequently further down. --Donar Reiskoffer 09:16, 13 February 2006 (UTC)[reply]
The "second-hand" compilation offers far more than the Alexa site becuase it is updated most days with information that is only briefly available on Alexa and is organised especially for people who are interested in Wikipedia. It is hard to believe you have even looked at it and your remarks continue to be unrepentantly hurtful and insulting. Why is information produced by Wikipedians by processing internally generated data so superior to information produced by Wikipedians from reputable external data? You position makes no sense to me at all. And the data you value so much hasn't been updated since this discussion started. 62.31.55.223

Question that someone should be able to answer

Why is the number of edits that the Special:Statistics page keeps track of larger than the numbers of the edits kept track of in the page histories?? Georgia guy 20:14, 24 February 2006 (UTC)[reply]

Because it counts deleted edits and various adminsitrative actions as "edits" - it is a non-decreasing function of time. – ABCDe 06:40, 1 March 2006 (UTC)[reply]


Very Bad News

Now the number of registered users is ahead of the number of articles. Georgia guy 22:16, 27 February 2006 (UTC)[reply]

Why do you feel this is bad news? I'd take it as neutral at worst; I doubt we would want one billion articles if there were one billion registered users. User:Ceyockey (talk to me) 04:42, 28 February 2006 (UTC)[reply]
Yeah. Turn it around the other way - in theory (sockpuppets excepted), on average one person in 6000 on the entire planet is a Wikipedian. Surely that's something to celebrate? Grutness...wha? 22:31, 28 February 2006 (UTC)[reply]

The Big 1,000,000

As of 21:56, 1 March 2006 (UTC), we are at 999,720. Looks like it'll be either today or tomorrow! --TKE

Stub statistics

I couldn't find many statistics related to the number of stubs on Wikipedia, so I tallied a few myself. See User:Dantheox/Stub percentages. Includes a chart of article count over time with overall stub count superimposed. Also includes a chart of the percentage of articles that are stubs over time. Enjoy, --Dantheox 06:45, 1 March 2006 (UTC)[reply]

There is a huge page that provides manually maintained statistics on stub types — Wikipedia:WikiProject Stub sorting/Stub types. Granted, there are not aggregate statistics on this page. Also, there are varying definitions of "stub", a major split being between a "structural" definition based on page length (which is implementable via personal preferences) and a "functional" definition based on being labeled a stub; both the the page I mention here and that mentioned above by Dantheox are based on a "functional" definition. User:Ceyockey (talk to me) 11:45, 1 March 2006 (UTC)[reply]
p.s. It would be useful if we had a method for automatically generating statistics that could be used in this page; the counts are binned so as to give a coarse and comparable view of stub counts, to make data collection easier for humans and to provide less distraction to persons viewing the page (less distraction to see several instances of "<200" than "198", "175", and "188"). Automated methods could provide a binned number and an exact number with several potential options for viewing (binned number presented with mouseover for exact; separate pages for binned and exact numbers; preferences level selection of view based on some javascript; etc.) User:Ceyockey (talk to me) 11:51, 1 March 2006 (UTC)[reply]
Than can be done...Contact me on my user page. User:Gnome (Bot) is more than capable of doing this. The code will only take 2 or 3 hours to write, as soon as I know what cretiria to use.(As in how and where to put the number)!!!Eagle (talk) (desk) 20:59, 6 March 2006 (UTC)[reply]
sounds like an interesting idea. BTW, just to expand slightly on what Ceyockey said, it's not so much that the binned numbers are less distracting - there are two reasons for them: logistical and functional:
  • logistical - since the counts are done by hand, exact numbers would become outdated very rapidly whereas bins will probably be accurate for much longer;
  • functional - the main reason for these counts is to give WP:WSS an idea of which categories are too big (and needing splitting) or too small (and needing deletion/merging). All that's really needed for that are bins to give a rough idea of size - exact figures aren't needed for that task. That's also the reason why the bins go from actual stub numbers to bolded category page numbers above 800 stubs - it makes it far easier to spot the really big categories.
For those reasons, I'd actually argue that exact figures aren't useful on that page, and would actually distract slightly from what's being done. it would be good to be able to automatically update the bin sizes with a bot, but to bins, not to exact figures. Mind you, a note at the top of the page saying exactly how many articles are marked as stubs in total (similar to the article count on the main page) would be useful! Grutness...wha? 23:00, 6 March 2006 (UTC)[reply]


The program already uses bins

The bins are different sizes, but it is only a matter of changing a couple if then statements. I agree with the bins, also the tolal count sounds great to me. Will you support a trial run of the bot on WT:BOTEagle (talk) (desk) 01:59, 7 March 2006 (UTC) PS-unless a bot flag is not needed...the policy really confuses me. My program would make only one edit per time I run it. (every week, what ever)[reply]

Thanks for the support!!!!Eagle (talk) (desk) 02:08, 7 March 2006 (UTC)[reply]

potential Problems

I really can't do this unless I get consensous to change the formatting of the project page.(something consistant that I can regex for)

The actual articles are not the problem, Its keeping the data togather.

  1. first there is the category link itself, this is no problem as they automatically have a definate beginning and end point, that code can find with no problem.
  1. next there is the description of the category...right now there is no definate begginning and end point to these. (there may be more than one sentence) unless the description is the only thing in ()'s I find it very hard to keep these with the category. Are these even nessacary to the project???, I will waste time coding to find these, but if I don't I will like this project even more!!! (another page can hold the descriptions, or we can make sure that there is a specific start symobol say ( and a specific end symbol say ). These MUST' occur only once per countable category.
I'm all ears to your suggestions?? I may be overlooking something simple. If I am feel free to tell me I'm an idiot:-)Eagle (talk) (desk) 02:12, 7 March 2006 (UTC)[reply]
  1. lastly there is the number of articles (bin). This is absolutly no problem, as I will write over this each time.

All the above is from Eagle (talk) (desk) 02:08, 7 March 2006 (UTC)[reply]

Question

I have code that will automaticlly get the categoryes and count them, right now. The problem is getting all the other stuff on the page.

Please help me give suggestions, but remember that each section must end with a unique symbol, such as $%^&*()+[]{}, ect. else the code can't find all of the pieces, As a result it will ruin the page, and we will have to revert. (That will be a waste of my time, so lets get this right the first time!!!

Here is a proposed page syntax, there can be more than one *, if the indent needs to go further.

  • [[:Category:<name> stubs]] {{<Link to stub template>}} (<Comment on what the stub category is for, ect. This is the description I am having problems with>) <Binned article count>

No special symbols are needed for the article count, as this is added by my code, (I don't have to find this on the page)

When the article count exeeds 800, it will change to <Binned article count>. NOTE right now, the program uses all numbers.
Bin values are: 5, 10, 25, 50, 100, 200, 300, 400, 500, 600, 700, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800, 3000, "CATEGORY IS WAY TO BIG".
Suggestions on these are welcome, but if you want me to count by number of pages, please give me a VERY good argument for that instead of the numbers. Remember if it is too large the number will be in bold. (also I can do this if it is too small)
Personally I prefer the numbers, and as I am the one who is doing the programming, I will go with my preference unless a strong reason is given otherwise.
  • I Will keep a database on my computer to keep trak of trends, useing the real numbers. (That way if a category is just not getting poupulated, or there is a specific question I will be able to answear it. (The database will be public on a page somewhere, but that is for another day.

All the above is my edits. People, please give imput, I really need it to feel confident about what I am doingEagle (talk) (desk) 03:23, 7 March 2006 (UTC)[reply]


Couldn't you just ignore the stuff other than the category name and number of articles? For now, the number of articles is always (atleast in theory): one of three forms
  • <\d+
  • '''\d+\s*pages'''
  • ''new''
You could just match and replace those, and leave the rest as is...Mairi 04:06, 7 March 2006 (UTC)[reply]
Problem is, what happens when the theory is not reality?? Result, messed up page.Eagle (talk) (desk) 18:35, 7 March 2006 (UTC)[reply]

I'm a little confused by this - the only stuff on each line of the page is the category, the template, one of four codes if there are associated wikiprojects, redirects or child categories (always one letter followed by "*", and the count. What is this "Comment on what the stub category is for" business? (oh, and no problem with using article numbers rather thanpage numbers, as long as the larger ones are bolded). Grutness...wha? 11:52, 7 March 2006 (UTC)[reply]

The category [[]] The template {{}}, Ok I'll have to add template to the code. (that will be more work).Eagle (talk) (desk) 18:36, 7 March 2006 (UTC)[reply]

Also, forget the commments, I was looking at the wrong page!!!Oops.

The Program works

Only one problem, My regex statements are not correct yet for one of the values. but other than that everything works. Will begin the automated counts on saterday, unless someone objects(the bot will do only one edit)Eagle (talk) (desk) 21:18, 8 March 2006 (UTC)[reply]

I'm a bit concerned about the resource-efficiency of this; isn't this going to require many thousands of page-loads if you're doing this via the wiki, on the live database? Note that Conscious is also working on this (see his recent update), via a script from off-line stats; it'd be good to co-ordinate the effort on this, at least. I'm also somewhat reluctant to impose too great a burden on people updating the page to ensure it's "machine-readable", which isn't its primary purpose. Alai 04:40, 10 March 2006 (UTC)[reply]
Its already fixed up. The regex statements are quite broad, I had to do a little formatting, but it was very minor. The bot can read everything on the page.
Though I will say one thing, the bot was designed to relive some of the burden on the project.(sure I will have to make sure that the bot can read new entries, but I will do that now) In addition now manual counting is a thing of the past.Eagle (talk) (desk) 05:07, 10 March 2006 (UTC)[reply]
Noted with the thing on the database.(I am limiting the bot to loading a page every 25 seconds.) Plus i will now only run the bot on saterday, times of little server load.(the bot will make only one edit, that is at the end)
It's no harm to put the page in a consistent format, no problems there. But the trouble is, it has to be updated manually when types are added, renamed, moved around the hierarchy, etc; if we end up telling people "keep it in exactly this format, otherwise the bot won't like it!", they'll get annoyed, be less likely to update it at all, etc. Mind you, if you can automate that part too... Alai 05:16, 10 March 2006 (UTC)[reply]
Alai, please realize that I have spent over 8 hours now programming this thing. Let me get it working correctly first. Yes I can make the page format be an automated proccess, but one thing at a time,please. Eagle (talk) (desk) 05:24, 10 March 2006 (UTC)[reply]

Automatic growth analysis

Since the analysis of wikipedia growth page tends to get outdated easily, I have mad a ruby/gnuplot script that automatically makes a graph of the growth and fits a few models to it. However: This needs a file of article creation dates to run, and I don't have a regular supply of those. Perhaps somone here who can easily create such dumps would be interested in making such dumps and analysing them with the script on a regular basis? The whole process should be easily automated. The script and an example of what it produces can be found at the bottom of my user page. Amaurea 09:37, 27 March 2006 (UTC)[reply]

Foreign languages

I am finding it hard to get from here to a page which shows the sizes of wikipedia in all its languages, rather than just English. Can someone put in a link please? --MacRusgail 18:36, 29 March 2006 (UTC)[reply]

Okay, I don't know where the community pages are to discuss things like this so... on the special statistics page special statistics page the first link ("Detailed tables and charts of Wikipedia statistics") is either broken or just not working now. If there is a better place I could have put this please reply on my talk page. TXAggie 03:22, 10 April 2006 (UTC)[reply]


Realtime statistics

The "Realtime statistics showing daily, monthly, and yearly global traffic across all Wikimedia projects" haven't worked for me the last several times I have tried to look at them. Are they working for other people? CalJW 04:41, 15 April 2006 (UTC)[reply]

If you are talking about the graphs at noc.wikimedia.org, I don't think these are available anymore, as far as I know. However, I recently came across a link to very similar statistics at Wikipedia:Village pump (technical)#Falling off the edge of a cliff in Alexa. See http://tools.wikimedia.de/~leon/stats/reqstats/ I will try to update the project page accordingly.--GregRM 02:57, 26 June 2006 (UTC)[reply]

Look at the folloking graph:

http://www.google.com/trends?q=%22Wikipedia%22&ctab=0&date=all&geo=all

Far too good information not to use... But where and how?

preceding added by 132.231.54.1

Total and English? LossIsNotMore 23:55, 5 August 2006 (UTC)[reply]

word count

I have long given up to lobby for a switch from "article" to "word" count as the main gauge of WP's growth. However, the main statistics page if at all possible should list the number of words (some 400M now?) along with the "article number". Comparing "numbers of articles" has become far too widespread and is often used irresponsibly. dab () 14:31, 27 June 2006 (UTC)[reply]

Broken link?

The link to Erik Zachte's Wikipedia Statistics Sitemap isn't working for me. Any comments? 62.31.55.223 19:58, 15 July 2006 (UTC)[reply]

I don't know if the problem has been going on the whole time, but it isn't working for me, either—almost two months after you pointed it out. I'm getting a 403 (Forbidden) on every page, although I seem to remember it working once in the past week or so. Chances are that was my imagination
In the meantime, it's on the Google cache (retrieved on October 5, interestingly). I'm going to make a note about this in the article. — supreme_geek_overlord 02:48, 13 October 2006 (UTC)[reply]

Some statistics

I compiled some statistics: how many articles from non-English Wikipedias are translated into English, and how many notable topics from specialized databases are covered on Wiki so far. My conclusions: there are about 2 millions articles in need of translation, and more then 400 million of specialized topics in need of creation :) See User:Piotrus/Wikipedia interwiki and specialized knowledge test for details.-- Piotr Konieczny aka Prokonsul Piotrus  talk  18:38, 22 July 2006 (UTC)[reply]

Most watched articles

Would someone who has access to watchlist data please compile a list of the top-5000-or-so most watched articles? I know the least watched need to be kept secret to defend against vandalism, but the most watched will be profoundly interesting and will help answer some pressing questions. This has been asked on Wikipedia:Village pump (technical)#Most watched articles without results. LossIsNotMore 00:07, 6 August 2006 (UTC)[reply]

I would like to see this also. Does this information exist? — Jonathan Kovaciny (talk|contribs) 20:41, 30 August 2006 (UTC)[reply]

Job queue

Thread moved to Wikipedia:Village_pump_(technical)#Job_queue. --kingboyk 11:56, 2 September 2006 (UTC)[reply]

Wikipedia article rankings in search engines

Wikipedia articles are ever more frequently showing up in Google's top 10 search results. I'd like to see some stats on Wikipedia articles' search engine rankings when a search for the article's title is executed. For example, if you google for world war i, Wikipedia's World War I shows up second, while a google for music put's the Music article at #14.

I'd like to see a count or even a list of the Wikipedia articles whose corresponding Google search lists the article in the #1 position, #2, #3, etc. Something like this:

  1. 9.7% (1,330,093) of articles are the #1 search result on a Google query for the article's title.
  2. 13.6% (1,619,002)
  3. 7.3% (930,177)
  4. 2.1% (301,990)
  5. 1.3%
  6. etc
  7. etc
  8. etc
  9. etc
  10. etc
  • 43.8% of articles are not return in the top 10 search results.

This shouldn't be to hard to do; just a database dump and a little Google API magic. Any thoughts? — Jonathan Kovaciny (talk|contribs) 19:59, 18 September 2006 (UTC)[reply]

Access to Wikipedia Graphs and Charts is now Forbidden

This link: Charts and Graphs is no longer accesible, however there is a link to it in special page Statistics.--tequendamia 03:23, 16 October 2006 (UTC)[reply]

Why are these stats forbidden (on all Wikimedia projects)? Anybody? --195.210.251.91 17:03, 17 October 2006 (UTC)[reply]

Once again. Why are statistics unavailable?! --213.250.11.131 09:46, 19 October 2006 (UTC)[reply]

According to meta:Wikimedia_site_feedback#Where_are_the_STATS.3F, the page accidentally contained confidential information and therefore, it was disabled. The problem is only temporary. Tra (Talk) 18:01, 24 October 2006 (UTC)[reply]
A change of policy I guess. THey became confidential.--tequendamia 21:25, 3 November 2006 (UTC)[reply]
No, what it is is that not all of the Wikimedia wikis are open to public viewing. The Internal wiki only allows unregistered users to see the main page, because it contains confidential information (this has always been like this). The statistics information accidentally contained confidential information about this wiki so it had to be disabled. I presume when the problem has been fixed, the page can be enabled again. Tra (Talk) 22:28, 3 November 2006 (UTC)[reply]
So is this ever coming back or what..? --Winterus 14:18, 12 November 2006 (UTC)[reply]

Author locations

I am wondering something that doesn't seem to be covered anywhere. Where are the 60-odd thousand article authors located? Are they mostly in English speaking countries? Where are Spanish or Portuguese language article authors located? Spain/Portugal, Latin America, the U.S., Africa? This ought to be easy enough to find out. Where are German language article authors located, etc., etc. Do "overseas Chinese" write disproportionately many articles in the Chinese language Wikipedia?

user count

Although there are millions accounts that were created, some of them are sockpuppets or vandals, some made a few edits and leave, some don't edit at all, and only a handful of those user accounts made over 100 edits. It dosen't necessarily tell me how many active users are there.--PrestonH 02:53, 3 November 2006 (UTC)[reply]

Forget it, I'll ask these ? at the refrence desk.--PrestonH 05:12, 9 November 2006 (UTC)[reply]

page views

It would be nice if a wikipedia article/page in addition to saying "This page was last modified..." at the bottom of the page, stated the page views to know how popular a particular article is. Idleguy 08:43, 25 November 2006 (UTC)[reply]

This has been suggested many times. That particular feature has been disabled for performance reasons. There is, however, wikicharts which gives this information for the top 100 most-viewed pages. Tra (Talk) 13:53, 25 November 2006 (UTC)[reply]

page view frequency

I just noticed this and it disturbs me on privacy grounds. I thought that showing (or keeping track of) how many times individual pages had been viewed was done experimentally a few times several years ago and then it stopped. I see it's started up again. This does not seem like a good idea. Libraries (in the US at least) don't keep track of how many times individual books are looked at (they are forbidden by law from doing this). They can track books being borrowed but not if you just look at the book in the library, and reference books usually cannot be borrowed. See the stat faq of arxiv.org for some more about this, and about why Arxiv doesn't keep these statistics.

Generating these numbers requires processing the server logs which are private, so info like this shouldn't be disclosed without careful consideration and discussion. I personally don't think it's a good idea to release them (or even generate or examine them internally) on any regular basis. 67.117.130.181 22:42, 21 December 2006 (UTC)[reply]

Looking at the link you mentioned, the main privacy arguments are that:
  1. People may be embarassed to find that an article they wrote is not read often
  2. People could manipulate the results using a bot
  3. People don't like the idea of 'Big Brother' watching them
To address these points:
  1. Only the top 1000 articles are available through Wikicharts, there are 6,862,837 articles in Wikipedia so the vast majority will not show up, so people shouldn't be too concerned if their article doesn't show up.
  2. Yes, they probably could quite easily, by sending multiple requests to the toolserver, and without even needing to visit the page itself. However, the results generated are reasonably accurate so I don't think this has been too much of a problem.
  3. The tool does not connect any page requests to an IP address or Wikipedia username and the results are totally anonomous.
Tra (Talk) 23:03, 21 December 2006 (UTC)[reply]

100,000,000 edit counts! Yeehah!

Have a Toast! Cheers! :)

Again another Wikipedian statistical phenomenon has arrived! However, vandalisms and inactive users aside. First off for mine third compliment and perhaps the forth for this Wikipedia itself, I truly praise, commend and greatly congratulate this English Wikipedia once again for surpassing yet anoher Wiki-record of the One Hundred Millionth (or in figures: 100,000,000) mark of the total Wikipedians' Edit Counts!!! Yet this whopping number of what both users and Wikipedians have made up of this big free encyclopedia ever since July 2002AD and yet they never stop growing (as stated and based on/in the Wikipedian User Statistics)! WOW, what else can I say to express here, man!!? Thus, Congratulations and Kudos to the English Wikipedia! Keep the numbers going and keep on editing and contributing for more! Yaaahooooo!!! --onWheeZierPLot 00:40, 26 December 2006 (UTC)[reply]