Wikipedia talk:Top 25 Report/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Archive 1 Archive 2

Removed pages

  • Traffic spam on random articles is a long-standing known issue, that is why this Top 25 list has human oversight. E.g., see in 2009, a discussion of it with regard to The Beatles: User_talk:Numbo3-bot#The_Beatles. If an article gets very high views in a short period of time, when it was previously not popular, and there is no human-based explanation for it (sudden burst in news coverage, Google Doogle subject, etc.), it is likely a computer-based anomaly which should be disregarded. It is possible some of these decisions will require careful editorial discretion, but so far the cases for removal are very obvious. Note however, that the page for "Wikipedia" is not removed from the list (when it is sufficiently popular to appear), because those appear to be legitimate views even if it is mostly reached by people wishing to find other subjects once they reach the page.--Milowenthasspoken 14:51, 21 January 2013 (UTC)
It is conceivable that some seemingly random and not that easily explained spikes could be due to some form of Slashdot effect – mention of an obscure topic on xkcd, Big Bang Theory, etc. Have you been able to spot any such instances? --Florian Blaschke (talk) 21:37, 12 March 2013 (UTC)
Tracking it would be extremely difficult. Really the only thing we can do is mention an odd spike in the notes section (like 2012 Summer Olympics this week or Sherlock Holmes last week) and hope that someone posts on the talk page with an explanation. If one is found, then we can put it back in. Serendipodous 22:02, 12 March 2013 (UTC)

Nice page

Nice page. :-) --MZMcBride (talk) 06:53, 17 February 2013 (UTC)

Week of Feb 16-23

Some weird ones this week. What do you think?

  • Seether: Suddenly gets 800,000+ hits in one day, then dies. Probably just a spam.
  • amazon.com: Traffic to that page has doubled since the beginning of February. Possibly related to disputes with states over paying sales tax.
  • Still Life at the Penguin Cafe: Out of nowhere, sudden spike between 20-22 Feb, then dies back. Unless the DYK was on that day, I can't think of what could have caused it.
  • World War II: Strange; it hasn't spiked in popularity recently, yet I don't remember seeing it in the top 25 before.

Thoughts? Serendipodous 12:24, 24 February 2013 (UTC)

  • OK, I have the new WP:TOP25 up now, feel free to add a summary paragraph if you'd like. On the above notes - I think Seether and Still Life at the Penguin Cafe are most likely non-human views. When something has a burst of human popularity, the views don't drop to almost nothing after a huge spike, there is going to be some noticeable tail. Like I had to see why Michael Jordan was in the Top 25, but the stats showed a steady rise in views before his 50th birthday, and then fall again; that make sense. A Google Doodle will usually cause a massive spike and drop, but that is always documented so you can confirm it happened. Amazon.com is a harder case, I kept it in the Top25 for this week -- Alexa lists Amazon at #9 worldwide, so its not impossible for it to have more views recently for some reason. World War II is a steadily popular article, it appeared once prior this year (Wikipedia:5000/Top25Report/January 27-February 2, 2013) - but whether it makes the top 25 seems to depend on whether its a heavy news week, when the threshold to reach the Top 25 is higher. One entry this week I found interesting was Argo (2012 film). It was the most viewed article on any movie nominated for the Best Picture Oscar, and it won. I wonder if we'd find any trends like that if we pulled stats for prior years' nominees.--Milowenthasspoken 05:44, 25 February 2013 (UTC)

Week of Feb 24-Mar 2

Please please tell me someone isn't spamming Sherlock Holmes :'-( I'm guessing that Ernst Litfass is a recurring spam, since I haven't seen him in your previous reports. ANd do you think it's time to remove G-force? Serendipodous 11:23, 3 March 2013 (UTC)

  • Looks like Sherlock had one big view day, no Google Doodle to explain it, so that will be off. Litfass is recurring spam, it has bursts of huge views, even though the de.wikipedia version has very few views. And on G-Force, I'm perplexed as to why, but we've given it the benefit of the doubt for awhile, and I am comfortable removing from the list this time, which I'll work on this evening (pacific time US). If you have other thoughts or input, feel free to weigh in!--Milowenthasspoken 21:26, 3 March 2013 (UTC)
    • To me, G-force is just another cat anatomy. =) Biosthmors (talk) 19:56, 10 March 2013 (UTC)

Question about automated views

What causes the automated hits -- misconfigured bots? Clueless spammers? Trivialist (talk) 21:47, 4 March 2013 (UTC)

We'd really have to consult with the WMF analytics folks if we wanted to know for sure (who could tell us whether the hits are coming from a single, or well-defined range of IP addresses). I think mis-configured bots are part of the problem. DDOS attacks have been known to happen, but those articles tend to be controversial, as with the Jyllands-Posten_Muhammad_cartoons_controversy. See also the "Double Mention" thread at User_talk:West.andrew.g/Popular_redlinks -- I think a lot of people under-estimate the fact that many websites query Wikipedia on-demand so they can re-serve the content inside their own interfaces (another possible spot for a coding bug and infinite query loop). I really don't see how tactics like this play into the hand of spammers. Thanks, West.andrew.g (talk) 06:36, 5 March 2013 (UTC)

Week of March 9

i've been talking about this page over at my wiki meetup. There are a few thoughts I'd like to jot down before I forget them:

1. a possible source for sudden bursts of interest is the reddit mainpage; we should check it out

It most definitely is. However, the "main page" is far too dynamic a notion to be useful. Even the top n pages have far too much turnover to be useful by the time statistics are compiled on Sunday mornings. I also couldn't find a way (or API access) that enabled me to reverse-search by URL. West.andrew.g (talk)
  • I wonder how frequently a site like reddit could get something into the Top25 without any other site noticing it and reporting on it. Certainly such sites can get something well up in the WP:5000 (at least I definitely have seen times they have), but maybe not often into the top25.--Milowenthasspoken 01:55, 13 March 2013 (UTC)

2. We should move this one level up to just "Wikipedia/Top25Report"; it should increase view counts

I have no real objection, but I don't see how such a move would directly impact view counts. The Top 5000 needs to stay in my user-space due to issues with automated editing and BRFAs. West.andrew.g (talk)
Another advantage is that it makes the talk page easier to access. So for me it resolves a lot of frustration. Serendipodous 16:50, 11 March 2013 (UTC)
The name move looks good, the prior one I chose is too cumbersome, and we are finding more people interested in these reports, so ease of access is a no-brainer.--Milowenthasspoken 01:55, 13 March 2013 (UTC)

3. We should look for categories to list this page in the Wikipedia mainspace to get this page noticed

Agreed. Be bold. West.andrew.g (talk)
  • Anything to increase awareness is good. I think the signpost article's popularity showed an appetite for this content.--Milowenthasspoken 01:55, 13 March 2013 (UTC)

4. We should get this page involved with the Wikipedia Cup. Serendipodous 15:45, 10 March 2013 (UTC)

Agreed. Be bold. West.andrew.g (talk) 18:40, 10 March 2013 (UTC)

Total view counts over time?

Once there is a fairly standard methodology on which articles to include and exclude, I think it would be very cool to keep track of the weekly sum of article page hits over time. A graph of this would be cool. There is a sum at WP:5000 for all articles, but that number includes Main Page hits and obvious non-human views. Trends on the Top 25 would therefore be more meaningful. Biosthmors (talk) 20:17, 10 March 2013 (UTC)

  • Coming up with the sum of hits isn't much more work, since the viewcounts are listed in the WP:TOP25 chart already. I am not sure what this data would really show, but we may need the chart first to brainstorm.--Milowenthasspoken 02:04, 13 March 2013 (UTC)

Can this be added to the signpost pls?

Nergaal (talk) 18:52, 11 March 2013 (UTC)

You should contact the Signpost directly with such a request. Thanks, West.andrew.g (talk) 20:17, 11 March 2013 (UTC)
  • Nergaal, feel free to advocate for this if you wish, I would be supportive of it. The Signpost had success with the Special Report we did on article popularity last month. Serendipodous has started adding a summary paragraph at the top of each chart which I think is quite nice, that might be a springboard for some more regular Signpost coverage, if they see fit.--Milowenthasspoken 02:10, 13 March 2013 (UTC)

Bot network story

  • [1] ("Massive bot network is draining $6 million a month from online ad industry, says report"). Don't know if this sheds any light on anomalous high view count articles, but its interesting.--Milowenthasspoken 16:01, 19 March 2013 (UTC)

Limonana and March 17-23, 2013

The Limonana article got pushed on the exclusion list this week, but this is one I can explain as legitimate. It was posted on Reddit in the TIL="Today I Learned" section, where it gained sufficient upvotes to make it to the front page. I know this, because I was one of the readers that arrived via that channel! Thanks, West.andrew.g (talk) 01:52, 25 March 2013 (UTC)

  • Interesting. This is a good lesson for us. Serendipodous has updated the report. I had advised Serendipodous that it should probably be removed because it was a one day spike that I saw no reason for (it does not even come up in any recent news articles). Where I erred, firstly, is in failing to see that the article was edited 12 times on March 19. Though mostly minor edits, that is an indication that the view were human views. (cf. the huge spike Sherlock Holmes had on February 24[2] -- but no edits at all between Feb 20 and Feb 28). Also, almost all the edits were from IP editors, which means they are not wikipedia regulars. But I am also having trouble finding evidence of recent popularity on reddit. I see a thread from 9 months ago[3] but must be missing its recent popularity. However, I do see a spike in tweets on the article starting on March 18, peaking on March 19, and rapidly dying. Thus, in future cases like this one, I think we must check tweet popularity and recent article edits (in addition to recent news searches, google doodles, etc.), before ruling out human views.--Milowenthasspoken 13:40, 25 March 2013 (UTC)
  • I found the recent reddit thread, I am concerned it wasn't an immediate search result for me. [4]--Milowenthasspoken 13:43, 25 March 2013 (UTC)

Munich massacre

I have toned down the speculation about why the Munich massacre article has entered the list at number 4. We can't possibly know that "thousands of users, apparently unaided, decided to reflect on that tragedy's similarity to the recent Boston Marathon bombings". Apart from both incidents taking place at sporting events, there are actually very few similarities between them. Gandalf61 (talk) 16:00, 30 April 2013 (UTC)

  • Its actually unclear to me why this article was popular, because most views came on April 25. However, since the Top25 was started, I've come to realize that I was oblivious to the power of reddit to make an article popular without showing much spread outside reddit.--Milowenthasspoken 19:06, 1 May 2013 (UTC)
There may be a less poetic reason for "munich massacre" getting so many hits on that date. On April 25, Bayern Munich beat Barcelona FC 4-0 in the first leg of the Champions League semi-final. The papers somewhat tastelessly referred to this as the "Munich massacre".
There are times when I weep for humanity. Serendipodous 15:35, 2 May 2013 (UTC)
Yes. Sadly, the football explanation does seem more probable. Gandalf61 (talk) 05:49, 3 May 2013 (UTC)

Cat anatomy

According to Reddit [5], high school zoology classes in many parts of the United States dissect cats around this time of year and are tested on it. So that would explain the continuing high view counts. --PiMaster3 talk 23:54, 19 May 2013 (UTC)

Possibly, but this has always been a high-rated article, at least since January, and it peaked in views last month. If that's when cat dissections happened, then maybe that's what's going on. Serendipodous 06:25, 20 May 2013 (UTC)

Top 25 check (May 19)

LOTS of weird spikes this week:

  • Haggis: I appreciate the aftertaste of lung as much as anyone, but why is this so high? Is it another "Google Nose"-style prank?
  • Smart glass: could people be mistaking it for Google Glass?
  • Fulla Nayak: The world's oldest woman, who died in 2006. No reason why she would be big this week.

Definite removals as they link to redirects:

Maybe it's just me:

  • Attack on Titan: It fits the bill; popular TV series ending this month, but I'm a bit perplexed as to how it ended up on the English Wikipedia, since as far as I can tell it hasn't been shown outside Japan.

Any thoughts? Thanks in advance. Serendipodous 08:11, 19 May 2013 (UTC)

The Fulla Nayak article was a post on the popular "Today I Learned (TIL)" section of Reddit last week. However that post seems to lack the "upvotes" necessary for great prominence and Reddit driven traffic. However, in personal experience, most authors of TIL posts just poach their content from other Internet or real-life news stories, so perhaps the catalyst for this post and most traffic was something elsewhere on the Internet. Otherwise, I have no ideas regarding the spikes. Thanks, West.andrew.g (talk) 05:14, 21 May 2013 (UTC)

Top 25 check (May 26)

  • Facebook: Always a popular article of course, but there seems to be no single reason why it finished top of the list this week
  • Naked Came The Stranger: There was a TIL about this; is there a way to tell if it appeared on the main page?
17,000 up-votes (and 30,000 votes total) is sufficient for much prominence. West.andrew.g (talk) 12:08, 26 May 2013 (UTC)
  • Cult: Can't explain this at all.

Oh, and someone is definitely spamming haggis. Maybe Haggis spam could be the next big thing. I'll call Hormel.

Serendipodous 08:02, 26 May 2013 (UTC)

really need a second opinion this week

I'm pretty sure that Juliane Köpcke is due to [6] a TIL with over a thousand replies], since it's a redirect that links there. However, there are a lot I'm not sure of:

  • Jim Lovell: Massive spike on June 2 and 3. No recent TILs to explain it, and the latest he's appeared in the news was on 15 May.
  • Rosetta Stone: Anomalous 1-day spike on June 7; perhaps not coincidentally the day after Rosetta Stone Inc. announced the pricing of its stocks. But since a) the link was to the actual Rosetta Stone, not the company, b) similar pages like Rosetta Stone (software) and Rosetta Stone (company) don't get nearly those numbers and c) it was a solitary spike with no tapering off, I can't be sure whether the spike was due to a spambot.
  • Second Congo War: Always in the news, as it never really ended, despite peace being declared a decade ago. The only TIL was 11 days ago, with only 3 replies.

Serendipodous 12:12, 9 June 2013 (UTC)

I agree completely on the Juliane Köpcke point from Reddit, and also that this week's exclusions are a bit trickier than usual. Regarding Rosetta Stone it would be incredibly interesting if algorithmic trading platforms were mining company's Wikipedia articles for sentiment analysis or something similar (although this particular statistic would not necessarily indicate that).
Moreover, none of the three outstanding articles are so dull that I believe they *can't* be the result of human views. All three of the topics probably have some "TIL" type tidbits in them, although I couldn't find any recent sources. The "second screen effect" is really hard to source from afar, but we know it can drive a huge number of views. For example, I visited the related Congo article this week given that Anthony Bourdain's heavily-advertised CNN television show will feature it (but this is not "the war"). A single prime-time documentary on the History Channel could rocket any topic near the top. How to programatically search for this stuff is an open question.
None of this really answers the inclusion question; and I am not sure there is ever a right answer (short of including them on the Top-25 and seeing if we can crowd-source the answer). West.andrew.g (talk) 13:38, 9 June 2013 (UTC)

OK, open forum this week

Wikipedia is being attacked. I don't really know what else to call it. A Polish user is spamming Wikipedia with weird redlinks; this started last week and has exploded now. Not only are there redlinks, there are hordes of unexplained spikes for seemingly random articles. If this continues it may no longer be possible to construct the Top 25, at least not objectively. I request 2nd opinions on the following articles:

This is painfully tough to sort through. I think a term like "attack" may be a bit over the top. There is no evidence that this a (D)DOS attack, given that the traffic is comparatively quite small (compared to a known DDOS attack on Muhammed_cartoon_controversy described in our Signpost article) and the subject matter seems to lack controversy. The point of a DDOS attack is not to drive up numbers -- but to prevent access to normal users.
What we may be facing is intentional "gamesmanship" of the popularity list. It would be easy to write a bot to continually fetch these articles and drive up their numbers, but our reports are a tiny forum for such promotional efforts. Also, there is a lack of commercial focus. It seems unlikely that Latent inhibition and Conjugate gaze palsy have too many commercial advocates out there.
I think we are certainly missing many sources of legitimate traffic, which may not be easily searchable in hindsight. Reddit is a popular website, but there are many others that could drive similar traffic numbers. For example, Drudge Report, SlashDot, and Fark.com are all similar in format to Reddit and probably in the same popularity sphere. There are probably 10's of other examples. This doesn't explain away pop-culture items or "Yahoo!", but maybe some of the odder stuff. It's probably the case that CNN and other news sites link to Wikipedia prominently. However, these views fall in line with current events, so we think little about these external sources. One could spend hours trying to trace down the origin of any spike, but that is probably not a sane use of our time.
I think redlinks are an entirely different class. The fact we see so many Polish redlinks this week doesn't scream "spam" to me, but a mis-configured content scraper working on behalf of a Polish language website/content-farm. It can't be "spam" if there is no entity to promote! This is virtually impossible to validate, but I do not think that red-links should ever be a part of the top 25.
I have no great guidance on how to handle all the above cases. I feel like spikes *should* be explainable, and I am therefore inclined to include them in the list (although the description would be lacking). However, when spikes have a lack of editing activity in the same time period (especially by IPs), that comes across a bit odd to me. We could do limited talk page posts asking "how did you get here?" -- but many times the spike might already have subsided. It is the Yahoo! case that bothers me more than any other.
Moving forward, I think we should get in contact with the WMF analytics folks (they have a mailing list). It would be nice to know if they save referrer headers. With a very small sampling and complete anonymization we could make considerable traction on this problem. User agent strings might also be helpful in distinguishing humans from automated views. If you are willing to wrestle with the bureaucracy, I am willing to crunch any data they would give us into a digest-able/intuitive format. I also have admin privileges, if that does anything to squelch any privacy fears they may have. Thanks, West.andrew.g (talk) 13:53, 8 July 2013 (UTC)
It looks like I have a lot of learning to do. My monthly Wikipedia meetup is next week; I could use it to network with the right Wikimedia people. Serendipodous 18:52, 8 July 2013 (UTC)

List format

Great work; but please take care not to leave blank lines between items in bulleted lists. iv;e fixed this week's and the last few reports. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:39, 11 July 2013 (UTC)

Rename

Any reason this page could not be moved from "Top25Report" to "Top 25 report"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:02, 11 July 2013 (UTC)

Good point. It was originally a user subpage. Serendipodous 20:13, 11 July 2013 (UTC)

Attack on Titan

For several weeks Attack on Titan is said to be ending/have ended in June on the top 25 report. This is not the case. The anime is slated for two seasons, probably ending in September/October. Also, it is being streamed online with English subtitles through official channels. _dk (talk) 04:20, 23 July 2013 (UTC)

Thanks. Sometimes I really only have Wikipedia to go on and obviously that's not always enough. Serendipodous 05:51, 23 July 2013 (UTC)

Second opinion this week

  • Scipio Africanus: Why the defeater of Hannibal would suddenly get 555,868 views in two days I cannot fathom. Perhaps related to the Spartacus TV series?
  • Death of Marvin Gaye: No reason that I can determine for this article to shoot up in popularity, though it seems like it may be due to a forum thread somewhere.
  • Lycos: Why this vegetative web portal would suddenly come to life this week I have no idea, however its view history does appear to indicate it is natural. Serendipodous 08:05, 4 August 2013 (UTC)
I didn't personally view any of these articles this week. The Death of Marvin Gaye in particular looks like it is ripe for a "TIL" type posting (being killed by his father). Most troubling though is Lycos with its natural growth. I know you like to get a second perspective on these, but absent a personal experience with one of the articles in question, I have no greater perspective than you! A firm inclusion/exclusion policy (though undoubtedly arbitrary) might simplify the task. I'm off to Wikimania, and I'll see if I can bug any of the analytics folks there. Thanks, West.andrew.g (talk) 16:02, 6 August 2013 (UTC)

Cat anatomy

Hey, is Cat anatomy still piling up the views? Or was this a weird blip in the spring? I just wondered if it was still popular. Thanks! Liz Read! Talk! 16:25, 25 August 2013 (UTC)

No, it's still there. Serendipodous 16:30, 25 August 2013 (UTC)

Yahoo!

What were the "shady means" that propelled Yahoo! into the top 25? Trivialist (talk) 02:17, 26 August 2013 (UTC)

It's speculative, but when Yahoo! bought Tumblr back in May, views of its article suddenly rose 44-fold, stayed that way for 23 days, and then shot right back down again. This looked artificial to me, and pretty much everyone else who looked at it. Yahoo! is a popular website, but it has always been a popular website, even before it came into the top 25. So yeah, the current view surge may be genuine, but I don't trust it anymore. Serendipodous 08:23, 26 August 2013 (UTC)
@Trivialist and Serendipodous: with all due respect where is Wikipedia:Neutral point of view in: Lycos and Yahoo!: geriatric web portals seem to be back en vogue, for no apparent reason... XOttawahitech (talk) 18:42, 3 September 2013 (UTC)
This isn't an article; it's not in the mainspace and it's not intended for the public at large, so the rules don't apply. There is absolutely no neutrality in how it is put together; there couldn't possibly be. I have to make judgement calls; I decide whether a sudden rush of views is legitimate or due to non-human activity, and the evidence I have to go on is minimal. Since no one ever bothers to double check what I do, my POV is going to dominate. Serendipodous 19:03, 3 September 2013 (UTC)
Giggle giggle. I think we also could consider the documented interest in Wikipedia hits and stock market directions: doi:10.1038/srep01801. For yahoo at least, these "shady" techniques could be be attempts to influence trading behavior through Wall Street trading algorithms. Who knows. Biosthmors (talk) 19:11, 3 September 2013 (UTC)
  • I like your style of journalism, Serendipodous. Thanks for reporting. Blue Rasberry (talk) 18:38, 5 September 2013 (UTC)
  • Just seems to me that Yahoo! has been on a general roll lately, between the Tumblr acquisition and reports that it actually had more traffic than Google for a few days recently. A comeback, I guess. Don't know if I'd attribute suspect motives to its page view count; I'd probably just chalk it up to being a perennially popular article, like Google and Facebook. oknazevad (talk) 02:18, 7 September 2013 (UTC)
Just to pile on, I think the snark User:Serendipodous imparts is a crucial element in the enjoyment of the top 25. If you want objective, you can stick to the statistical boredom of the WP:5000. It's obvious other people appreciate it is as well, as the page is quite popular (although stats.grok.se seems to be down at the moment). In response to "since no one ever bothers to double check what I do, my POV is going to dominate.", I'll agree that I certainly don't try to influence what goes here, but I am always eager to see the new week's report and give it a once over for typos and other glaring errors. West.andrew.g (talk) 14:28, 10 September 2013 (UTC)

Cat anatomy

Wow! It's gone....Where did it go?-Seonookim (What I've done so far) (I'm busy here) (Tell me your requests) 06:41, 13 September 2013 (UTC)

no. 1,836. :-0) Serendipodous 13:52, 13 September 2013 (UTC)

Top 25

The Top 25 list is always so much more interesting than the Top 10. I am surprised that Arrow just missed the list...that TV show has hardly gotten any media attention and it's a surprise that it wasn't canceled after the first season. I'm a fan but there aren't many of us. Liz Read! Talk! 22:26, 18 October 2013 (UTC)

P.S. Looking at User:West.andrew.g/Popular_pages, why did you leave off Conan the Barbarian? That was a curious one, for sure. Liz Read! Talk! 22:28, 18 October 2013 (UTC)
Because it got 800 thousand views in one day, and nothing the next. That just doesn't happen naturally. BTW, I list all my exclusions at the bottom of the page, including my rationales for them. Serendipodous 23:03, 18 October 2013 (UTC)
Thanks, Serendipodous. Having put together similar lists for Twitter trends, I know you have your reasons, sometimes I'm just curious to know why random articles chart so high. Maybe Conan was mentioned on a TV show or something, causing a temporary rush to WP. Liz Read! Talk! 12:18, 19 October 2013 (UTC)
The reason I post my exclusions at the bottom of the page is because I want other people to suggest alternate reasons. As far as I can tell, there is no suggestive internet activity, either on the web or the news feed, that could suggest a reason for the spike. If someone can figure it out, that would be great, but I can't. Serendipodous 12:31, 19 October 2013 (UTC)
A reboot of Conan the Barbarian (starring Schwartzenegger!) was just announced so maybe fans got word early and wanted to look at the page. Hard to imagine him getting into fighting shape...in the original, Conan was mostly shirtless. He was 34 years old in the original and will be 67 in the remake. Liz Read! Talk! 11:11, 25 October 2013 (UTC)
Except that was announced today; the spike was two weeks ago. I supposed MAYBE there was some kind of early leak but surely that would have been all over the web? Serendipodous 11:16, 25 October 2013 (UTC)

Rule of three

This week requires me to invoke my rule of three: if I have three or more exclusions I can't explain, I ask for a second opinion before posting.

Lena Horne (1,247,428 views) When a dead celebrity comes top of the list, it's usually because of a Google Doodle or some anniversary. But I can find neither. Nor can I find a recent Reddit thread
June and Jennifer Gibbons (295,508 views) These kinds of obscure but interesting/tragic stories usually rise in popularity becuase of Reddit. But there are no recent Reddit threads big enough to explain this
XXX (film): This has been bubbling under for some time, and thus is probably likely to be due to automated views (old movies don't get that kind of sustained attention). Of course I could be wrong. Serendipodous 09:20, 27 October 2013 (UTC)
Here's the Reddit thread for the twins. _dk (talk) 10:30, 27 October 2013 (UTC)
Wow! Thanks!" For future reference, how did you find it? Serendipodous 10:44, 27 October 2013 (UTC)
By being on reddit all day ._. But seriously, I googled the keywords with domain:reddit.com, though I couldn't have known what to search for without seeing the original thread first. _dk (talk) 16:26, 27 October 2013 (UTC)
I searched Twitter for another Lena Horne death announcement--she died in 2010 but it was reannounced in May 2013 and received a lot of circulation--but couldn't find anything since then. Sometimes, a celebrity will drop a reference to some person, place or event and that will cause a rush of fans to look it up if they are unfamiliar with it. But I couldn't find that either. Liz Read! Talk! 17:10, 1 November 2013 (UTC)

Top 25

Da-yum! 9+ million views? That's incredible! It's memorable to get over a million or two. It makes me think I should start some record-keeping. Google Doodles really rule! Liz Read! Talk! 02:21, 12 November 2013 (UTC)

I don't know what to think about this; part of me is thinking the numbers are too high to be real. Serendipodous 04:59, 12 November 2013 (UTC)
I know that Google Doodle's are, to some extent, localized. While the U.S. gets a certain Doodle for, e.g., Veteran's Day, this is not what the folks in Indian are going to see even in an English language edition. I wonder if the Shakuntala Devi Doodle just might have unusually crossed WP's largest viewing populations in this fashion? Didn't the Doodle also appear during Diwali? Could that somehow influence Indian browsing patterns? West.andrew.g (talk) 20:16, 12 November 2013 (UTC)
But I was just looking at Most Viewed Pages in 2012 and the cut-off for the Top 10 was 18M, and that was over all of 2012, the entire year. The fact that one page got 9M views in one week? Either something is wrong with the stats or the totals for Most Viewed Pages in 2013 are going to be outrageously high. Liz Read! Talk! 02:37, 14 November 2013 (UTC)
Wikipedia's year-over-year traffic growth has been trending upward for some time. However, we need some help from analytics folks to tell us whether this Shakuntala Devi business is wholly legitimate. It would be interesting to know whether: (1) the stats are aggregated in near real-time, or (2) some process is run hourly over the raw logs to produce the summaries. If the latter, it wouldn't be an overwhelming request to try to count the number of unique IP accesses for a given page. Our only hope in these situations is additional data. West.andrew.g (talk) 15:05, 14 November 2013 (UTC)

2nd opinion

Not my rule of three, but I'm having problems deciding whether I should keep or drop these two articles:

Both are new to the top 25 and both are related; that would suggest a similar origin. While their watch patterns aren't that similar over long periods [7][8], they both appear to have reached similar numbers in the past week. While this list has some bot infestation, the lack of capitals suggests that if this is a bot, it isn't the same one. Thanksgiving may be a reason for their popularity, however this seems a rather tangential take on the celebration. Serendipodous 09:42, 17 November 2013 (UTC)

  • Both look suspicious to me. Meat has had a huge influx since October, something is steadily going at that one. Vegetarian Cuisine is even more suspect, a huge rise in the past week or so. I think you could remove them.--Milowenthasspoken 18:06, 17 November 2013 (UTC)

Are the global warming appearances spam?

The climategate article is doing too well; it should have faded with the typhoon coverage, which has disappeared. Instead global warming continues to choke up the list. I'm beginning to wonder if the numbers are real. Serendipodous 20:20, 24 November 2013 (UTC)

  • I think its spam, and probably has been for this recent spike.--Milowenthasspoken 00:49, 25 November 2013 (UTC)

Rule of three, again

  • Juniper berries: links to redirects indicate a single point of origin, which usually means a Reddit thread or a bot. And there are no recent Reddit threads that I can see that have enough notice to drive that kind of traffic.
  • Java: No recent tsunamis, earthquakes or volcanoes; no recent upheavals, religious conflicts or treaties. All I can think is that it might be due to a bot searching for the programming language.
  • Plastic: Massive one-day spike on the 22nd. No reason that I can see. Serendipodous 11:07, 24 November 2013 (UTC)
Oops sorry for the delay. I don't quite know what the rule of three is here, but it seems like you have a valid methodology. Biosthmors
Despite some pretty thorough investigation on my part, I could find no legitimate explanation for these items. West.andrew.g (talk) 09:14, 27 November 2013 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── What I do know, however, is that entries like Entertainment_Culture and Hospitality_Recreation are either (a) bot-driven, or (b) driven by a single source. These were redlinks in the last report, and have since been made redirects. West.andrew.g (talk) 09:20, 27 November 2013 (UTC)

Add a section for "spurious" popularity

Popular articles are interesting and worth noting even when they are driven by a single source; they can just have their own section at the end. – SJ + 18:07, 2 December 2013 (UTC)

Articles with one source are listed if I can find it (as with the one generated by a Reddit thread); the only times I exclude them are when I can't find the source, which means I can't tell if it's due to automated views. All excluded articles are listed at the bottom. Serendipodous 19:15, 2 December 2013 (UTC)

Bitcoin

As much as I love all the snide that goes into the top 25 list, I don't think it is fair to characterize bitcoin as "the oddball digital currency that is mostly beloved of child porn addicts, illegal drug consumers and radical libertarians". Negative stereotype much (well, except the last part)? Besides, now there're more speculators than anything else. _dk (talk) 08:12, 12 December 2013 (UTC)

Well to be fair I was only paraphrasing "The Economist", but I suppose you're right and the focus has shifted. Serendipodous 09:48, 12 December 2013 (UTC)

Tops in 2013

Is there an end-of-the-year list published? I searched for a 2012 list but couldn't find one. I'm sure it takes a lot more data processing than a weekly list and I'm not sure whether it's done on en.wiki or over at http://stats.wikimedia.org/. Liz Read! Talk! 14:00, 13 December 2013 (UTC)

We weren't producing these lists for all of 2012, I don't think. Also this will be straightforward per your other post: User_talk:West.andrew.g#Popular_pages. West.andrew.g (talk) 19:16, 13 December 2013 (UTC)

XXX and variants

I wonder if there's a more prosaic reason for XXX (film)'s appearance. Given that XXX, XXXX and .xxx also appear reasonably highly in the list, along with the names of various porn sites, I wonder if people are cluelessly Googling "xxx film" and blindly clicking on the first link that comes up... Smurrayinchester 21:44, 17 December 2013 (UTC)

That's frighteningly likely. As likely as "G" being for Google. So yeah. I may add it in. Serendipodous 22:27, 17 December 2013 (UTC)

Oddball overload this week (Need help!)

I have a rule of three: if there are three or more articles within the top 25 whose presence I can't explain, I post them on the talk page for a second opinion before posting the list. Well look at what Santa brought me for Christmas this week:

Wowza. No explanations jumped to mind on any of these for me. While Reddit tends to be a popular entry point, Sa'idi Arabic in particular got me wondering if there might be Middle Eastern, Asian, Indian, etc. equivalents that primarily operate in non-English but might link to English Wikipedia articles. If these traffic sources are legitimate we'd expect to see a bit of a tail on the distributions, right? Doesn't this happen in the Reddit cases? In what context could a dramatic one-day spike w/o tail ever look legitimate? West.andrew.g (talk) 23:10, 29 December 2013 (UTC)
Those are weird and I have no idea what could cause them. Earlier explanations have been some kind of spam or the action of bots. Are there any thoughts about what purpose there might be for spam or bots to go to a Wikipedia article? I heard a supposition that some botnet out there was programmed to go to Cat anatomy, then the botnet's manager could look at the traffic on Cat anatomy and from that, determine the number of bots in the net. That seems possible, but unlikely. I'm interested in any other ideas. SchreiberBike talk 02:48, 30 December 2013 (UTC)
Not a direct answer to the question, but I think content scrapers are an often overlooked part of the traffic composition. Imagine you run a music service like Pandora and want to serve band histories/summaries to your users? Why write these yourselves when you can just query the Wikipedia API? Moreover, you can make this happen on the client-side to save your bandwidth, and make them do it every single request to make sure the user is served a "current" version. Pandora doesn't do this, but plenty of popular sites do. In particular I am aware of a popular rap/urban music website that does. It's interesting to consider how these demographics might differ from those who browse WP natively. These views are human, but still to a certain extend mechanized via the API. West.andrew.g (talk) 13:51, 30 December 2013 (UTC)

The Attack (Animorphs) could be legit; it does have a slight tail-off suggestive of a Reddit thread or similar. HTTP follows a typical pattern for a one-day spike, but it's to a redirect, which most people wouldn't do. So; I have to put the list up today; what do you think I should do? Serendipodous 15:07, 30 December 2013 (UTC)

For the redirect HTTP, there's no accompanying spike in the article it redirects to, Hypertext Transfer Protocol, so it doesn't seem like a human pattern.SchreiberBike talk 19:43, 30 December 2013 (UTC)
  • For what its worth too late to be of much help, I agree with your removal of The Attack (Animorphs) - the second day views could just be a one-hour (or one minute) spillover of the bot, without looking at hourly data (we don't look at), removal was the right call.--Milowenthasspoken 00:06, 3 January 2014 (UTC)

Hi, three comments:

  • For hourly data and much more, have a look at wikiviewstats: Stephen Amell, today
  • Check suspicious stats against google trends, see oil of cloves. (Aussies?), Stephen Amell
  • Gaming of page views might soon become a problem, especially with high profile reporting ("Top viewed Wikipedia articles of 2013" "Why G?" newsreports ^^) --Atlasowa (talk) 16:47, 9 January 2014 (UTC)

Wow. That could prove really useful if grok.se doesn't come back online. Thanks! Serendipodous 23:41, 9 January 2014 (UTC)

Rule of three in spades

I recall Carl Tanzler being mentioned in an AskReddit thread about what is the creepiest Wikipedia article (or something like that), so that's probably where the spike came from. EDIT: It's this thread. _dk (talk) 19:15, 13 January 2014 (UTC)
  • My thoughts -- I used the hourly tool cited in the above section, which is fantastic for seeing granular data on these questionable candidates. I know we haven't used this hourly data tool before (is it new?), but its very useful and I would suggest doing so for 2014 and onwards.
  • E. T. A. Hoffmann: Drop. Views jumped from 17 in one hour to 28,000+ on Jan 8 the next hour at 9:00 UTC. On Jan 10 at 9:00UTC they dropped back to normal again. A 48 hour spike, exactly, with no edits to the article done during the spike. I would remove.
  • The Godfather. Drop. Using same tool, crazy 8 hour spike. Would drop.
  • Pornography. Drop. unexplained jump starting on Jan 8, continues to now. Would drop.
  • Alliance (Firefly): Keep. The hourly view pattern of this one varies a lot from the above ones, views start increasing at 4:00 UTC, and climb over next few hours, then drop a bit, then vary a bit, than tail off into the next day. Also, the article was edited multiples times on Jan 8, mostly by IP editors, which is usually the sign of reddit or similarly sourced popularity. I vote to keep.
  • Netbook. Drop. All views in a big four hour spike without incline or tail.
  • Carl Tanzler: Keep. seems more like Alliance (Firefly) in hourly view patterns, along with multiple edits on day of spike. Also, its german counterpart de:Carl von Cosel had a views spike on the same day (we don't normally see bot view spikes across languages especially when the article has a different title.) I would keep.
  • Memory (song). Drop. Has a big two hour spike. Then drops. Then comes back, then drops completely again. No edits to the article during the spike.
--Milowenthasspoken 20:31, 13 January 2014 (UTC)

Deaths in 2013 in the December 29, 2013 to January 4, 2014 report

I see you left out Deaths in 2013 in that report (with 309,906 views, it would have been 14th place). Between the fact that the week was about half in 2013 (precisely how much would depend on one's time zone), and the fact that people's interest isn't as much of "the current year's deaths" as in "recent deaths", I think that this would make perfect sense - and this article should have been on the list. עוד מישהו Od Mishehu 18:26, 22 January 2014 (UTC)

According to the initial report, it only got 21,000 views. Serendipodous 20:27, 22 January 2014 (UTC)
I'd recheck that @Serendipodous:. On your link it comes in 22nd place (raw) with 309,906 views. West.andrew.g (talk) 21:03, 22 January 2014 (UTC)
Huh. Well, OK. Must have missed that in my initial workthrough. It happens. Serendipodous 21:21, 22 January 2014 (UTC)

Rule of three, again

Sigh. There are times when I really hate this job. The hour-by-hour viewing tool that I made such a fuss about last week is now down, which means I'm back to erring on the side of exclusion. Also, someone has decided that anyone with a name remotely similar to Rick Jones is worth 300,000+ views. And aside from that, there are a lot of questions this week, many of which should be traced to Reddit. Are we certain there aren't any other forums that we could search?

Serendipodous 13:41, 26 January 2014 (UTC)

Is it possible to see HTTP referers of the traffic?
It would make clear where the traffic comes from, which pages have bot-generated traffic and which pages have been mentioned on hi-traffic social or news sites.
There could be some Indian or Chinese newspaper in place where you suspect Reddit. 109.132.0.146 (talk) 14:34, 26 January 2014 (UTC)
Very possibly, but Wikipedia analytics are terrified of taking too much information from users, so we're never going to get that information. BTW, to those watching this page, I must publish this by tomorrow, so any advice now would be helpful. Serendipodous 20:38, 27 January 2014 (UTC)
Human behavior is so perplexing sometimes. Unless it is a spamming fluke. I didn't know how much Reddit influenced views on Wikipedia. Liz Read! Talk! 20:18, 13 February 2014 (UTC)

Rule of three

Any of these four celebs could have logical reasons for interest:

particularly McConaughey; however, they all owe their presence in the list to massive surges of views on the exact same two days, which suggests that this isn't a coincidence. Serendipodous 16:09, 16 February 2014 (UTC)

Well, I did a quick search and McConaughey is supposedly doing a great job starring in HBO's True Detective which airs on Sunday nights. Not sure about the other three, I did a search for Washington, didn't find anything, and for Presley, I found that 2/17 is the anniversary of his first gold record but I can only see that is relevant if it is the question on a radio call-in quiz. Ever think that spikes might be people searching for specific Wikipedia articles to answer quiz questions? Of course, given the number of radio stations, along with social media, across the world, it would be impossible to track this down. Liz Read! Talk! 20:43, 17 February 2014 (UTC)
I have nothing to explain spikes on any of the 4 articles above. However, towards Liz's point, I previously took a very informal look at how Jeopardy questions were reflected in the statistical data (surely the USA's most popular quiz show, I'd guess). One could observe minor "bumps" as the show aired across multiple timezones, but nothing that would amount to a "spike" or be sufficient to push an article to the heights of the Top25. West.andrew.g (talk) 00:11, 18 February 2014 (UTC)

IPv6

I think IPv6 is artificial. I haven't seen any news stories about it lately, and it's been on the list too long to be a Reddit thread. And it's the kind of topic (like Java) that could be due to a malformed bot. -- Ypnypn (talk) 17:15, 21 February 2014 (UTC)

Yeah I'm starting to agree. I've been including it as the kind of topic that would be of interest to computer aficionados (ala bitcoin) but I have to face the obvious at this point. Serendipodous 19:00, 21 February 2014 (UTC)

Article traffic guru input needed

See User_talk:West.andrew.g/Popular_pages#Slightly_off-topic_question_about_page_view_stats --ThaddeusB (talk) 03:27, 3 March 2014 (UTC)

Last week

The link to the Top 25 for February 23 - March 1 seems to be missing, and I'm having trouble finding it by reconstructing the title. Powers T 17:45, 11 March 2014 (UTC)

Fixed. Serendipodous 18:06, 11 March 2014 (UTC)

Commentary

Your comments are usually the most interesting aspect of the chart...although they are a bit too cynical for my taste, they are usually right on the money. But I wish you'd lay off "actor-of-the-moment Matthew McConaughey" because it seems dismissive. You've used that qualifier for him three or four times now which makes me wonder whether it's an observation or an evaluation. Could he return to just being "actor" or "popular actor" McConaughey? Liz Read! Talk! 20:23, 18 March 2014 (UTC)

He is very much the current actor of note, having just won an Oscar. Serendipodous 20:53, 18 March 2014 (UTC)

Pornography

http://en.wikipedia.org/wiki/Pornography is #1 in Google with search term "porn", so no wonder it is a popular article. — Preceding unsigned comment added by 188.162.65.28 (talk) 09:11, 21 March 2014 (UTC)

If that were the case, it would always have been popular, but wasn't. It just suddenly turned up in the top 25 one day. Serendipodous 10:30, 21 March 2014 (UTC)
It is definitely the case. And google updates its algorithms almost every day — Preceding unsigned comment added by 188.162.65.28 (talk) 12:10, 21 March 2014 (UTC)
I have no doubt that it is popular, but it's popularity never translated into Wiki searches until a few months ago. Serendipodous 12:59, 21 March 2014 (UTC)
That means that another day the page was not #1 in the google feed but #10 or #13141.
Oh, it seems that the popularity of porn googling is decreasing since August 2013... And on User:West.andrew.g/2013_popular_pages, Porn is only #7.902 of #10.000. What happened to The Internet is for Porn? :-P --Atlasowa (talk) 20:52, 21 March 2014 (UTC)

Rule of three

@Serendipodous: I don't see any reasons why these articles would have spiked. Stereoscope is certainly an aberration (only a little more than two days for a spike), and I'm thinking the same for automobile (the spikes over the last 60 days are completely random, and the real average is between 3000 and 5000). I'm less sold on High-temperature superconductivity, but I also can't find any logical reason for the obscene number of views. Ed [talk] [majestic titan] 22:15, 6 April 2014 (UTC)