User talk:EpochFail

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Welcome to The Wikipedia Adventure![edit]

TWA guide left bottom.png
Hi ! We're so happy you wanted to play to learn, as a friendly and fun way to get into our community and mission. I think these links might be helpful to you as you get started.

-- 02:13, Sunday, September 25, 2016 (UTC)

Get Help
About The Wikipedia Adventure | Hang out in the Interstellar Lounge


  In the area? You're invited to the
   Art+Feminism meetup
  Date: Sunday, March 8, 2015
  Time: 12:00 - 4:00pm
  Place: Walker Art Center

Improving POPULARLOWQUALITY efficiency[edit]

Would you please have a look at this discussion on the talk page for WP:POPULARLOWQUALITY? Is there a way to have some measure of article popularity replaced in the deprecated Quarry database's pageviews field, or a new field? I have a feeling that iterating over a list of most popular looking for the top N wp10 stub predictions and sorting them by their wp10 start class prediction confidence might be easier than downloading 24 hours of full dumps at a time. EllenCT (talk) 15:02, 16 June 2016 (UTC)

Hi EllenCT. Have you seen the Pageview API? It might serve this need nicely. --EpochFail (talkcontribs) 20:25, 20 June 2016 (UTC)
Yes; sorry, it only provides the top 1000 articles, which typically contain only a handful of articles ORES wp10 predicts are stub-class. I need the top 200,000 to get about a hundred stub-class articles, which I intend to sort by aggregating their start-class confidence and pageview-based popularity. I would love to include a global importance measure. I can't use per-WikiProject measures of importance.
When I asked for the Pageview API to be extended to provide the top 100,000 I was told that could not be done because it would leak personally identifiable information, which is completely preposterous. Much finer-grained pageviews enabling discovery of the full popularity ranking of all articles is released every hour. Could you please explain this to Jdforrester and reiterate my request? EllenCT (talk) 20:54, 20 June 2016 (UTC)
You can file a phabricator task and tag it with "analytics" and the team can look at it. I cannot think of any privacy issues from the top of my head but actually adding 99.000 new rows per project per day per access point (mobile-app, desktop, web) is actually not as trivial as you might think. It's 100.000 *3 *800 of rows every day, two orders of magnitude higher than what we currently have when it comes to storage and loading of data. NRuiz (WMF) (talk) 22:39, 20 June 2016 (UTC)
@NRuiz (WMF): can you please explain what that means in terms of number of kilobytes of additional storage required and additional CPU time? I do not wish to figure out how to file a phabricator request and would ask that you please do that for me. EllenCT (talk) 01:08, 21 June 2016 (UTC)
@EllenCT: I support NRuiz (WMF) in thinking a top 200,000 end-point in the API is not trivial from a storage perspective. Currently the top end-point represents about 1Gb of data per month for 1,000 articles per project (this is a lower bound). Growing it to 200,000 values per project would incur 200 times the storage, meaning 200Gb per month. While this could be feasible without too much traffic and load problems (the top end-point takes good advantage of caching in varnish), from a storage perspective it would need us to reassess the API scaling forecast: We have planned our system to grow linearly in storage for at least one year, so we can't really take the 200Gb / month hit currently. --JAllemandou (WMF) (talk) 09:16, 21 June 2016 (UTC)
@JAllemandou (WMF): how would you feel about doing it first, and then deciding whether to store it after a review to decide whether there is any point in storing the larger list? I only want to store subsets like ORES wp10 stub-class predictions (with their start-class confidence) and membership in WP:BACKLOG categories and maybe WikiProject importance, when available (@EpochFail: perhaps with the revision ID and date when the importance was evaluated? Do you think we can create per-WikiProject importance coefficients or other transforms to approximate a global importance useful for ranking cross-topic backlog lists?) I would certainly love to see just that subset stored, and am sure it would be both much smaller and easily well worth it. EllenCT (talk) 12:06, 21 June 2016 (UTC)
@EllenCT: Unfortunately it's not as simple as "doing it first" :) While computation can easily be done ad-hoc and results provided in files for a one shot trial, having that data flowing through our production system involves storage and serving endpoints, which are problematic. Are you after regular data streams or one shot test? --JAllemandou (WMF) (talk) 09:49, 22 June 2016 (UTC)
@JAllemandou (WMF): something like User talk:EllenCT/Top 594 stub predictions from 20160531-230000 but daily instead of hourly, plus all the articles in at least the WP:BACKLOG categories listed on Wikipedia:Community portal/Opentask, with the redirects and disambiguation pages filtered out, and 1000 instead of 594 articles, please? In the future when we have good importance prediction models we can scale each WikiProject's importance by some coefficient for a third ranking score as part of the score normalization and standardization process prior to combining. EllenCT (talk) 12:47, 22 June 2016 (UTC)
@EllenCT: My understanding is that you are looking for regular pageview-top 1000 stub-predicted articles (one precision, we can't currently filter out disambiguisation nor redirects). While I think such a project would be very useful for various reason, the amount of work required and the current priorities make it not to be picked up any time soon. This task is in our backlog making sure we keep the idea alive. --JAllemandou (WMF) (talk) 08:57, 24 June 2016‎ (UTC) signature added by EpochFail (talkcontribs) 18:07, 24 June 2016 (UTC)

Bad edit but not Vandalism[edit]

Hello EpochFail, I am participating in the ORES project, I have a doubt on how to tag edits that are not vandalism and are done in good faith but should be undone since they do not follow policy or stile recomendations. An example would be this edit. The user changed a wikilink into a bare-URL reference, the problem is that Wikipedia itself is not a reliable source for Wikipedia so the edit should be undone. should I tag the edit as damaging even though is not vandalism? What criteria should I apply for other cases?. Regards.--Crystallizedcarbon (talk) 18:26, 12 August 2016 (UTC)

Hi Crystallizedcarbon. I think you have the idea exactly right. You should tag edits like these as "damaging" and "goodfaith". We're actually intending to use the models to specifically look for these types of edits because goodfaith new users who would respond positively to training and guidance tend to make these types of mistakes. When you think about "damaging", consider what patrollers would want to review and clean up. There's a lot of edits that are well meaning (and maybe even not all that bad!) but they still damage the article.
Thanks for asking this question. If you wouldn't mind, I'd appreciate if you'd add these notes (and your exampled) to the documentation at es:Wikipedia:Etiquetando/Valorar_calidad. --EpochFail (talkcontribs) 18:46, 12 August 2016 (UTC)
Thank you for your quick answer. Will do. Regards. --Crystallizedcarbon (talk) 18:50, 12 August 2016 (UTC)
Should I add it on the project page itself or on the talk page instead? --Crystallizedcarbon (talk) 18:54, 12 August 2016 (UTC)
Crystallizedcarbon I'd add it to the project page and then make a post about it on the talk page. It might be nice to kick off a conversation to check if anyone else has a question. --EpochFail (talkcontribs) 18:55, 12 August 2016 (UTC)
Good idea.--Crystallizedcarbon (talk) 18:56, 12 August 2016 (UTC)
Yes check.svg Done --Crystallizedcarbon (talk) 19:38, 12 August 2016 (UTC)

RfC: Protect user pages by default[edit]

A request for comment is available on protecting user pages by default from edits by anonymous and new users. I am notifying you because you commented on this proposal when it was either in idea or draft form. Funcrunch (talk) 17:34, 31 August 2016 (UTC)