Jump to content

User:Excirial/Blocknote

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by R12056 (talk | contribs) at 18:17, 7 June 2010 (Removed Category:Wikipedia administrators (using HotCat)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Excirial


Excirial
   
  Userpage Talk Awards E-Mail Dashboard Programs Sandbox Sketchbook Blocknote  
 
 


My Blocknote!

Blocknote Section

This section is mainly the junkyard of copy and paste. While the sketchbook is structured, i use this section to save information i find on the net that i need later on, but can't dump anywhere else.

Blocknote

300 Bowling and Entertainment Centers AMA Foundation Leadership Award American Institute of Chemists Business Information Services Library Clearwater Valley High School Comparison of e-book readers Comparison of revision control software Freshman fifteen Hemas Holdings Human hair growth James Flack Norris Award James Flack Norris Award for Outstanding Achievement in the Teaching of Chemistry Jana Skinny Water Jari-Veikko Kauppinen Primavera Systems Pushdo Pushdo botnet Reflexstock Rustock botnet Srizbi Srizbi botnet Stanislaw Komorowski Stanisław Komorowski The Ort Institute Transatel Trove Categorization Tweetnation Watershed segmentation algorithm Winchester 1200 User:Excirial/Activity User:Excirial/ArchiveTemplate User:Excirial/Awards User:Excirial/Blocknote User:Excirial/Content User:Excirial/Dashboard User:Excirial/Dashboard/Content User:Excirial/Links User:Excirial/Mail User:Excirial/Navigation User:Excirial/NoPATolerance User:Excirial/Playground User:Excirial/Playground2 User:Excirial/Playground4 User:Excirial/Sketchbook User:Excirial/Status User:Excirial/UserBoxes User:Excirial/dashboard.js User:Excirial/friendlytag.js User:Excirial/huggle.css User:Excirial/monobook.js User:Excirial/recent2.js File:CVHS Logo.jpg File:Computer Icon.png File:Globe Icon.png File:Primavera logo.png File:Reflexlogo2.png File:Sketchbook1 Icon.png File:Sketchbook2 Icon.png

Coreva RFBA

[[Category:Wikipedia bot requests for approval|]]

[[User:|User:]]

Operator: Excirial (Contact me,Contribs) 18:39, 7 January 2009 (UTC)

Automatic or Manually Assisted: Fully automatic, with the possibility to manually override the bots behavior if desired.

Programming Language(s): C#.net, DotNetWikibot Framework (Might be obsolete in future versions)

Function Summary:

  • Query Wikipedia API every X minutes (Idea: 5-10 min ish) for new pages
  • If bot is cold started, fetch newpagelist with the last X (Idea: 25-50) pages. (See: Note 1)
  • If the bot is running, only fetch the list of new pages since the last visit.
  • If the bot has found any new pages, load the page content and start to parse it.
  • Bot will parse the content to determine if any maintenance tags have to be placed.
  • If there is a need to place a maintenance tag, add the tag to the article, and resume with the next article.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: 1 edit per newpage tops. (Estimated 1-5 edits a minute, with 2 edits a minute average while working on new pages only)

Already has a bot flag (Y/N): (Not applicable, new bot)

Function Details:
Note: Coreva-Bot had a previous bot request located Here. Prototypes of the previous idea behind Coreva showed that it would be virtually useless. This request is for a functionally completely different bot (But with an identical name).

Coreva's main task is placing maintenance tags on new pages that require them, similar to the way most newpagepatrol's work their beat. Coreva's will regularly(every 5-10 min) check the newpage list for new article's, fetch the new article's content, parse the content (See: Parser Table) and finally update the article, adding required maintenance tags.

Just like the previous Coreva, this one should also be quite light on server resources. The bot queries the server's new page list every 5-10 minutes, and (So far) each article re quire's two server queries (getting the article's content, and a query to check if the article is an orphan). Category counts, link counts et cetera are handled internally by the bot. Additionally, the bot will require one database write to add the template's (In case this is required). The estimated edit rate for the bot will be 2 edits per minute on average. (See: Note 2)

Coreva is not a miracle, and will never replace a living newpage patrol. Coreva cannot patrol for WP:CSD and does not understand hoaxes, advertising or vandalism. However, a lot of article's slip of the newpage list without having any form of maintenance tags. About half the pages on the newpagelist show as not being patrolled, and even though this is a very rough guess, this equals more then 2.000 pages a day. (See: Note 3) Since adding maintenance tags is thoroughly boring work, i think Coreva could spare quite a few patrols a bit of boredom :).(Unlike CSD tags which require at least some form of using your brain, maintenance tags require nothing more then checking 20 indicators, most of them nothing more then: Present/Not present)

Finally, just like the old Coreva, its still pretty much work in progress, which is only done in spare time. While the progress on this Coreva is much faster then on the previous one, i assume it will still take a few months before it is capable of being a fully automated bot. Even if it would be technically capable to do so, it will not be a fully automatic bot until i tested it thoroughly (few weeks i guess) in assist mode, which means Coreva would only me feedback on what tag it would place on every page it checks. This way any annoying mistakes in the parser should be ironed out, while at the same time it allows to improve the parser code.

Parser Table

This table gives an overview of the templates Coreva will be placing on the articles, along with the current criteria configuration for doing so. Note that this is still pretty much in beta stage; templates may be added and removed depending on tests. Also, the criteria are still based on very simple algorithm's. Coreva's tests are conducted on a very small and varied set of locally stored articles, thus criteria are still general. In their current form they should, however, produce very little false positives (But would likely have quite a few false negatives). So all in all: Work in progress! (See: Note 4)

Tag Criteria Comment
Wikify No internal links Amount of internal links = 0
Uncat No categories in the article. Amount of categories = 0
Unreferenced No references in the article Not Ref tags, or references/notes header detected.
Footnotes Article contains a standard "Notes" or "References" header, but no Ref tags -
Internal Links Article contains less then (Amount of words / amount of links) internal Links Percentage not yet set.
Orphan Article is linked by (0-2) articles. -
Stub Article size is smaller then X Suggestion: <1kb / 100 words / 1000 characters (inc. spaces)
Sections Article contains to little sections or readabilities sake (Note: section equals a linebreak) < 6 sections counter && (Amountofsections * 2500) > articlesize.
Too many links To be determined For this i still need to analyze guidelines, and the appropriate category.
Too many categories Amount of categories > X X: 10? 20? 30? Depends quite a bit on the article size. Perhaps a base of, say, 10, and another cat for every x words. (For example, World War II has 42 cats, but its a huge article).

Notes

  • Note 1: A second idea is to let the bot store his last query time permanently, and query all new (Non patrolled) pages since the bot went off line. These pages could then be processed at a lower priority, meaning that they would only be processed once the bot runs out of pages to process, with a limit on the amount of pages processed each minute. (So, if the bot limited itself to 5-10 edits a min tops, it would mean that 3-8 old low priority pages could be processed a minute). Being in the CET timezone, this would translate to a 250 page queue or so that could be processed during the reminder of the day.
  • Note 2: I am currently in doubt if the bot should notify the user with a template in case maintenance tags are placed, encouraging the article creator to recheck the page while it is still "Warm". This would double the bots database writes, and at the same time i cannot predict if user are adverse to being templated, or if anyone would chance an article (Or ask for help). On the other side: If a user created a page on the basis of web site's, it would prevent a hell lot of wasted time for other users to verify all the article's content from web searches.
  • Note 3: This is based on statistics from May, 2006. During that time Wikipedia got 3600 new articles a day. Nowadays the number is most certainly quite a bit higher, but due to the difference between peak and normal hours, its quite hard to make a guess based on special:newpages. :)
  • Note 4: Its rather obvious, but since i didn't mention it: Coreva does not add template's to pages marked for CSD, and does not add templates that already exist.

Discussion

Admin To-Do list

  • Complete WP:NAS course.  Done
  • Check my pages for texts that need to be changed after promotion.  Done
  • Update userpage and userboxes.  Done
  • Correct User:Excirial/Mail's future incorrectness.  Done
  • List myself for recall. ☒N Not done and not likely to be done
  • To keep things simple so I don't have to write an entire rationale. Everyone is free to comment on my conduct of course and if this seems to happen to often, I will gladly surrender the tools.
  • Need to get some more experience with this process first.
  • Read the WP:BLOCK, WP:BAN and WP:Protect policy another time to make sure everything is fresh in my mind.  Done
  • Thankspam(?) -> Feels a bit of a waste of everyone's time. Maybe a message thanking everyone on my talk page will suffice. ☒N Not done and not likely to be done
  • See WP:Thankspam. Thanking every person involved would just take away time that can also be used for "real" contributions. I think I'll just say "Thank You" by making sure I won't end up on ANI in a negative sense. :)