Jump to content

Wikipedia:Bots/Requests for approval/Ganeshbot 5: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Arbitrary section break: +overview of Village pump +summarizing the process
Line 184: Line 184:


:::::Anna, Wikipedia is mostly text based. That makes it difficult for computer programs (bots) to analyze and update. If MediaWiki (the software that runs Wikipedia) had some kind of database support, the bot could have easily kept the articles in sync with WoRMS. <font color="navy">— [[User:Ganeshk|Ganeshk]] ([[User talk:Ganeshk|talk]])</font> 01:37, 29 August 2010 (UTC)
:::::Anna, Wikipedia is mostly text based. That makes it difficult for computer programs (bots) to analyze and update. If MediaWiki (the software that runs Wikipedia) had some kind of database support, the bot could have easily kept the articles in sync with WoRMS. <font color="navy">— [[User:Ganeshk|Ganeshk]] ([[User talk:Ganeshk|talk]])</font> 01:37, 29 August 2010 (UTC)

Overview from [[Wikipedia:Village_pump_(proposals)#Species_bot]]:
* another five wikipedians have shown support for this task of the Ganeshbot.
* one wikipedian (User:Shimgray) have shown support for generic articles only, while is being "opposed to creating a large number of individual species articles".
* no one have shown disagreement in the village pump.
--[[User:Snek01|Snek01]] ([[User talk:Snek01|talk]]) 15:45, 29 August 2010 (UTC)


I would like to thank to all for their comments (including those ones, that have never edited any gastropod-related article and those ones, that have never created any gastropod-related article so they have experience neither with this Bot nor with gastropods). I would summarize the task (to be everybody sure, that it is OK):

{| style="border: 1px solid #006400;" cellpadding=5px
|-
|
* 1) preparing:
** 1a) GaneshK will write at the wikiproject which taxa are available to import and wikipedians (usually wikiproject gastropods members) will check it and will approve or will modify it (or will refuse it, with recommendation of another source)
** 1b) or wikipedians (usually wikiproject gastropods members) will write what is needed to be done, and if nobody will disagree, then it is OK.
* 2) creating stubs: operator GaneshK will do the task with the GaneshBot. Result: created articles are considered to be autonomous valuable articles.
|}

* 3) additinal checking and improving.
** There are yearly, half-yearly or continuously made checking for NEW changes in the source and those NEW things are being implemented, if they are considered to be OK.
** Articles are normally improved during normal editing process.

This describes the real situation how it have been working and how it works.

Everybody can comment any phase of this process anytime. Usual and often possibilities are like this:
* For not yet created taxa article: "Better/updated source for the taxon EXAMPLE-TAXON-1 is the source EXAMPLE-SOURCE-1. Use the source instead of WoRMS."
* For allready created taxa articles: "Better/updated source for the taxon EXAMPLE-TAXON-2 is the source EXAMPLE-SOURCE-2. Update it (manually or robotically)."

Wikiproject Gastropods members will be happy to do it. Put your notice at [[Wikipedia talk:WikiProject Gastropods]].

Consider, that this (formal) request for approval deals with phase 1) and phase 2) only. If somebody have comments to phase 3), then feel free to share your opinions at the [[Wikipedia talk:WikiProject Gastropods]]. Thanks. --[[User:Snek01|Snek01]] ([[User talk:Snek01|talk]]) 15:45, 29 August 2010 (UTC)

Revision as of 15:45, 29 August 2010

Operator: Ganeshk (talk · contribs)

Automatic or Manually assisted: Automatic

Programming language(s): AutoWikiBrowser and CSVLoader

Source code available: Yes, available at WP:CSV and WP:WRMS

Function overview: To create gastropod species and genera articles based on data downloaded from the WoRMS database. The bot will run under the supervision of the Gastropods project.

Links to relevant discussions (where appropriate):

Edit period(s): Weekly

Estimated number of pages affected: 500 per week

Exclusion compliant (Y/N): N/A

Already has a bot flag (Y/N): Y

Function details: The bot will create species and genera articles under the supervision of WikiProject Gastropods. Here are the steps:

  1. Bot operator will propose a new family that needs creating on the project talk page.
  2. Gastropod project members will approve the family and provide a introduction sentence. Here is an example.
  3. Bot operator will download the species data from WoRMS using AutoWikiBrowser and WoRMS plugin. Only accepted species will be downloaded.
  4. Bot operator will run AutoWikiBrowser with CSV plugin to create the articles using a generic stub template, project provided introduction sentence and the data downloaded from WoRMS.
  5. Bot operator will maintain a log on the User:Ganeshbot/Animalia/History page.

There are about 10,000 species articles (approx.) that are yet to be created.

Note: This bot has been approved to create a smaller set of similar stubs in March, 2010. This request is for getting an approval for all new families approved by Gastropods project.

Discussion

information Note: For anyone new to the discussion, in Ganeshbot 4 (later amended at Wikipedia talk:Bot Approvals Group/Archive 7#Wrong way of the close a BRFA) this bot was approved to create about 580 stubs for the genus Conus. Despite stating that "about 580 stubs will be created (nothing more)",[1] Ganeshk was somehow under the impression that further approval was not required to create additional articles. When this was brought to the community's attention at various places, including WT:Bots/Requests for approval#Wikipedia:Bots/Requests for approval/Ganeshbot 4, Ganeshk stopped creating the articles without approval. Anomie 02:09, 19 August 2010 (UTC)[reply]


I'm almost positive something like this involving plant articles or insects cratered badly when it was found the source database had errors. Are we sure this database is reliable to use? MBisanz talk 01:56, 19 August 2010 (UTC)[reply]
From what I hear from members of the Gastropods project, WoRMS has the best experts in the world. Their database is not infallible, but overall beneficial. Ganeshk (talk) 02:05, 19 August 2010 (UTC)[reply]
You're probably thinking of anybot, although there were other issues that contributed to that situation becoming a major clusterfuck. Anomie 02:16, 19 August 2010 (UTC)[reply]

Needs wider discussion. I see you already informed WikiProject Gastropods. Please advertise this request at WP:VPR to solicit input from the wider community. Anomie 02:09, 19 August 2010 (UTC)[reply]

I have posted a note at the VPR, Wikipedia:Village pump (proposals)#Species_bot. Ganeshk (talk) 02:39, 19 August 2010 (UTC)[reply]
I understand your concern MBisanz, but in my understanding, no database is completely invulnerable to mistakes. Though this may be contested, WoRMS is a somewhat reliable source, considering it is gradually revised by several specialists. Information provided by WoRMS may or may not change with time. It evolves, as does Wikipedia. We from project gastropods aim to closely observe those changes, so that the informations contained in the gastropod articles are true to their source, at least. I recognize that a large number of stub articles created in an instant can make things difficult, mainly because we are just a few active members, but then again I think this bot is very beneficial to the project, if used with caution. --Daniel Cavallari (talk) 02:30, 19 August 2010 (UTC)[reply]

How many articles can the gastgropod project check with just a few active members? The bot created about 100 stubs a day for the past few months for a total of 15000 stubs? Have these 15000 stubs been checked? I checked a few and found concerns. I volunteered to point out problems if I could speak directly to the gastropod family experts, but I was insulted by a gastropod member for my poor spelling, repeatedly insulted. I think the inability to work with other members of the community and the unwillingness to accept criticism and the tendency to focus on personal insults over taxonomic issues spell disaster for this bot. The bot is either continuing to run or is being operated as an account assistant by its operator, this also makes it hard to know what the bot is doing. The operator will have to have all rules of bot operation explicity outlined, as he took his own statement of "580 articles (nothing more)" to mean 15000 articles. What other bot rules will be misinterpreted? —Preceding unsigned comment added by JaRoad (talkcontribs) 03:05, 19 August 2010 (UTC)[reply]

I am also concerned that gastropod members are using what they consider a "somewhat reliable" resource that is evolving through time like Wikipedia. Wikipedia is not considered a reliable source for Wikipedia articles. Writers are expected to use reliable, stable, and non-primary sources, not "somewhat reliable" sources. —Preceding unsigned comment added by JaRoad (talkcontribs) 04:20, 19 August 2010 (UTC)[reply]

JaRoad, This link will show you that the bot has actually stopped creating articles as of 8/15/10. Ganeshk (talk) 04:29, 20 August 2010 (UTC)[reply]
If quality control is being questioned, I suggest that members of the gastropod project agree on an acceptable percentage of defective articles generated. Then, select and examine random articles that were produced by Ganeshbot. Determine the percentage of defectives and take it from there. Anna Frodesiak (talk) 05:29, 19 August 2010 (UTC)[reply]
  • Comment Although my general reaction is very much against bot-creation of articles (I think it is crazy), I was impressed with the couple of species articles I looked at. However, I know little to nothing about gastropods (or bots). It is dismaying that the original BAG approval was so badly misunderstood: it seemed quite clear to me. I wonder, from a broader point of view, whether this is a wise thing to be doing at all. What happens when the WoRMS[2] database is enhanced or corrected? How do such changes get here? A content fork on WP is not helpful. What about changes in the WP articles: can relevent changes be fed back to WoRMS? What do the denizens of WoRMS think about all this? Similar thoughts for WikiSpecies[3] (FAQ[4]). I have seen some discussion about why this data should be going into WP rather than WikiSpecies but since the latter is supposed to drive the former I don't understand the rationale for the data coming to WP first. What to the WikiSpecies folks think? Anyway, just my thoughts. Thincat (talk) 10:41, 19 August 2010 (UTC)[reply]
With regard to your question about how changes on WoRMS can get here, I have plans to write a bot that will compare Wikipedia with WoRMS and list articles that will need updating. I intend to file a BRFA for that in the future. WoRMS was happy to hear that Wikipedia was using of their database as a 'trustworthy' taxonomic data source. We are listed under the user list for their web services functionality. Ganeshk (talk) 00:00, 20 August 2010 (UTC)[reply]

Support As I have written previously there is unified support of this by Wikiproject gastropods members. The bot is running since March 2010 without any problems. I would like to thank to User:JaRoad, who have found a "mistake" affecting 6 articles (or maximally up to additional 10 articles, and some thinks that it even was not an mistake) in highly specialized theme in this family Category:Velutinidae. The "mistake" was made by one of wikiproject gastropods members. It was made neither by a bot nor by a bot operator. We have remedied it and we have taken precautions. The bot is specialized in creating extant (living species) marine gastropod articles, that is only a small part of the project. The bot works systematically according to its operator instructions. Additionally the bot works in cooperation with WoRMS http://www.marinespecies.org/users.php (see Wikipedia listed there). That guarantee also automatic or semi-automatic update in the future, if necessary. Maybe it seems for other wikipedians, that nobody takes care about those generated articles. That would be incorrect prejudice. See for example the history of "List of Conus species" where is exactly written "all species checked". For example last month have one user uploaded ~1000 encyclopedic images and he have added them mostly into those articles started by this bot. This bot is doing exactly the same thing, that would human members of the wikiproject gastropods do. There are no known real issues with this bot. Feel free to formally approve it. Thank you. --Snek01 (talk) 13:21, 19 August 2010 (UTC)[reply]

Support The core of the gastropod team stands by the accuracy of the articles, and so do I. I watched as the first batch was prepared. It was meticulously fact-checked by JoJan and others before the bot generated the stubs. The bot is an asset to the project, and ought to continue. Furthermore, the introductory statement to this page has an objectionable tone of indictment. Anna Frodesiak (talk) 13:46, 19 August 2010 (UTC)[reply]

Support I find the bot stubs to be very good, certainly as good (or better than) stubs that are created manually by project members or other contributors. We are using the most up to date system of taxonomy. And yes, as Anna says, we reviewed the process very carefully over many weeks before the process was put into effect because we understand the possible dangers of mass bot generation of stubs.This is not our first experience with bot generated stubs, a good number were created back in 2007. Thanks, Invertzoo (talk) 17:10, 19 August 2010 (UTC)[reply]

Oppose Due to the misunderstanding, there are now fifteen thousand stub articles about slugs and snails, largely unchecked, and for which there is frequently no information to be added. The aim of the Wikiproject is to have a similar article for all 100,000 articles in the database. I cannot personally see any reason for this. We should have articles about gastropods that have more information about them, where the article can be fleshed out and more content added. I share the concern about the WorMs database, and do not think that there is any need to reproduce it in Wikipedia. Elen of the Roads (talk) 18:09, 19 August 2010 (UTC)[reply]

All of them are checked (by me or other project member) prior its creation. By the way, the task of the bot is to create less than 19.000 articles (according to the information Wikipedia_talk:WikiProject_Gastropods/Archive_3#More effective importing) from which the majority of them is already done. There is only need to finish the task in progress. --Snek01 (talk) 19:17, 19 August 2010 (UTC)[reply]
Yes, but the task of the bot was only to create 600 articles, not nineteen thousand of the things. The bot was allowed to operate on the basis that the Wikiproject would expand the entries - and it was only supposed to create entries on Conus sp, which are rather better documented than most. "I have checked prior to creation" does not really address the requirement to check after creation, and add further information. There is no reason to duplicate the WorMs database in wikipedia. Elen of the Roads (talk) 21:44, 19 August 2010 (UTC)[reply]
Elen, I think Wikipedia has the potential to realize E. O. Wilson's vision of creating an Encylopedia of life, "an electronic page for each species of organism on Earth", each page containing "the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.".[5][6] If the bot is creating accurate articles of species that have been reviewed at WoRMS (please note that the bot only downloads records that are marked as accepted), what is the harm in having a page for that species on Wikipedia? The page will develop over time as people come in and add additional information. The bot gives the page a good starting point. Ganeshk (talk) 00:20, 20 August 2010 (UTC)[reply]
Elen, but we are expanding those stubs and checking them when needed (the only thing what is usually need to check are wikilinks only when linking to homonyms). For example Conus ebraeus, Conus miliaris was expanded nicely as well as other ones. Even your presumtion "There is no reason to duplicate the WorMs database in wikipedia." is wrong. If there is encyclopedic content somewhere that is useful for wikipedia, then we will normally duplicate it, for example we duplicate some images from Flickr on Wikipmedia Commons as well as we duplicate encyclopedic text content from any other free source. Look for example at article Velutina velutina how it is "duplicated" from WoRMS and tell me, what are you unsatisfied with. You have written "I cannot personally see any reason for this." Is the reason, that I would not be able to make this Start class article(s) without Ganeshbot enough for you? Even if you still will not see any reason for this, then you do not need to disagree with it, because other people consider it not only reasonable, but also necessary. I have started about ~2000 articles by myself and I am not a bot. Of course I have also expanded much more ones. I must say, that starting them was quite tiresome sometimes. I would like to also enjoy expanding articles as you. Would you be so generous and could you allow me to focus on expanding articles of any gastropod instead of starting them, please? --Snek01 (talk) 00:50, 20 August 2010 (UTC)[reply]
Elen, I am sorry, but I don't understand what you mean when you say about these new stubs that "there is frequently no information to be added". On the contrary, I think every single one of them "can be fleshed out and more content added". That is the whole purpose of creating them, so that we can easily add images or more info with more references. Invertzoo (talk) 02:58, 21 August 2010 (UTC)[reply]
If that's the case, you won't object to the bot creating articles only at the pace that you can flesh them out, and you'll be OK with finish fleshing out the 15,000 its already created before its allowed to create any more.Elen of the Roads (talk) 11:07, 22 August 2010 (UTC)[reply]
You yourself are very much against the idea of a large number of stubs, I can see that, but as far as I know, there does not appear to be a WP guideline against stubs. And, unlike many other kinds of stubby articles, these species stubs have a fact-filled taxobox and intro sentence, as well as a decent reference, so they are actually already quite rich in information, despite being physically short still. It may not seem so to you, but these stubs are already quite useful to a reader who is curious to find out more about a certain species. I also think you will find that throughout most of Wikipedia's biology coverage of individual species, especially those of invertebrates and lower plants, stubs are the norm rather than the exception. At the Gastropods project we have been creating rather similar stubs by hand for a very long time without any objections. Thanks for your interest, Invertzoo (talk) 15:42, 24 August 2010 (UTC)[reply]

Support – The bot doesn't do anything else that what we, the members of the project, have been doing manually all these years, The Gastropoda is one of the largest taxonomic classes in the animal world. Without a bot, we're facing an impossible task. The data from WoRMS are very reliable, made by the best experts in the world, You won't find a better expert anywhere to check these data, so who do you want to check those data ? As to the so-called mistake in Velutina, I advise the community to read the disccusion at Wikipedia talk:WikiProject Gastropods#Phalium articles, The integrity of the content generated by the bot is not at stake, but the bot permission is the real issue. This bot has saved the members of this project perhaps thousands and thousands hours of work, generating all those new articles. Once an article exists, it is much easier to add information. I'm in the process of uploading to the Commons about 2,500 photos of shells of sea snails from an internet source with a license suitable for the Commons. This is an enormous job that can't be done by a bot because each name has to be checked if it is not a synonym. I cannot insert these photos into wikipedia, unless there is already an article about a genus or the species in question. Otherwise, this would take me years if I have to create all those articles. For most people consulting wikipedia about gastropods, and certainly for shell collectors, the photo is the most important part of the article, The text is more a matter for experts or knowledgeable amateurs, who understand what a nodose sculpture or a stenoglossan radula represents. JoJan (talk) 18:57, 19 August 2010 (UTC)[reply]

Support – As I see it, the bot is not a mere addendum, but a necessity. Taking into account the number of species described, we're dealing with the second most diversified animal phylum, the phylum Mollusca, and it's largest class, the class Gastropoda. There are tens of thousands of extant and fossil gastropod species, and creating each one of those stubs would be an inhuman task... That's why we need a bot. WoRMS is not absolute, but it is one of the most reliable online databases available. I understand that, with proper supervision and due caution, no harm will come out of Ganeshbot. Daniel Cavallari (talk) 00:10, 20 August 2010 (UTC)[reply]

Oppose as currently inplemented. The lack of prior approval and poor communicationskills by bot operator and project will continue to be a problem. The bot operator has now posted a list of 100s of problematic articles, various types of synonyms that should be redirects rather than articles. The project members could have spent time looking for problems and readily found these instead of fighting to protect the bot. It would have established a wikipedia-beneficial future method for dealing with bad bot articles. These articles need fixed now, no bad taxonomic article should sit on Wikipedia while editors know its bad. The bot operator created no plan for fixing these articles. Neither did the wiki project.

In my opinion a bot set up to scour multiple species data bases at the request of a h uman editor could greatly benefit writers of species articles. The hujman editor could verify a dozen species in an hour or two then ask the bot to create just the formatted article with taxonomy box, categories, stub tags. This could save the human editor many hours of tedious work. The bot could get species from algae, molluscs, plants, dinosaurs. It could be multiple bots, even, with a central page for requests. This would be the best of both worlds: more articles, decided by humans, tedium bny bots. JaRoad (talk) 01:41, 22 August 2010 (UTC)[reply]

Let me just say two things in response to JaRoad's comments. Firstly his assessment of our "communication skills" is based solely on his current personal perspective over the last several days, and as such it is arguably not at all relevant to the bot issue. Secondly and more importantly: if you talk to any invertebrate zoologist who is actually a taxonomist, he or she will tell you that articles or entries that use what may or may not be a synonym name are an extremely common occurrence, not only here on Wikipedia but throughout all writings on biological taxa, especially at the genus and species level. I think you will find this same issue within every large taxon of invertebrates that has not been exhaustively studied, whether the articles or entries are or were created by humans or by a bot. I would not even call these "bad" articles or "bad bot articles". The nomenclatural issues on many species of gastropods are extremely complex. First rate experts within the field very often disagree in quite polarized ways as to what the "correct" name should be for a species of gastropod. I can give you several examples if you like. There really isn't a way to simply "verify" species names as JaRoad suggests. Thank you, Invertzoo (talk) 03:13, 22 August 2010 (UTC)[reply]

The topic is the bot not me. Taxonomy is not the topic either. Editors make decisions about species validity on wikipedia. My suggestion is that only editors make these decisions. Although my suggestion is a counter proposal to this bot, this bot could make a useful tool as part of this counter proposal. I have not suggested any way to simply verify species names. JaRoad (talk) 04:49, 22 August 2010 (UTC)[reply]

No, I am sorry, but you are quite wrong on this point, which is indeed about taxonomy and nomenclature. Editors on Wikipedia must not and can not make decisions about which species are valid; that counts as Original Research, which is not allowed here. All we can do is to cite a reliable reference to back up the use of a name as it is currently applied to a certain morphotype. The validity of a species and a species name is a weighty scientific opinion, which can only be determined by an expert researcher who knows the relevant historical primary literature well, who has consulted the relevant type descriptions in that family, and who has examined the actual type material for all of the claimed taxa, by visiting the various museums throughout the world that have the types of the relevant species and supposed synonyms and carefully examining that material. Invertzoo (talk) 15:24, 22 August 2010 (UTC)[reply]

Yes, they do. Wikipedia editors decide that WoRMS is a reliable resource eand its listing of species versus synonyms is going to be used, therefore WoRMS listing of accepted names is a source for valid species. Then if WoRMS is in disagreement with another secondary or tertiary source the editor decides which of the two sources is the correct one for the name of the article and how and why the other source earns a mention as to the controversy rather than being the name for the article. Mollusc editors have already decided that the chosen taxonomists on WoRMS will be the deciders of species names on Wikipedia, hence you have chosen to confer validity on the WoRMSZ set of species names, not all of which are accepted 100% by all mollusc taxonomists. This is done for all controversial species of any type of organism on Wikipedia. Maybe you only create articles about noncontroversial species.

Back to the suggestion I raised. This removes the wholesale stamp of validity on one database and returns it to where it belongs: to the editors creating the articles through secondary and tertiary resources. JaRoad (talk) 16:18, 22 August 2010 (UTC)[reply]

This your surmise is wrong. The decision about articles is always on human editors, who are trying to independently evaluate available information. Then they are making their own human decisions when one source is in disagreement with another. Things are being done exactly as you wish to be done. --Snek01 (talk) 09:43, 23 August 2010 (UTC)[reply]

Arbitrary section break

To summarize the discussion so far:

  • WikiProject Gastropods fully intends to create all these stubs anyway, and in much the same manner as the bot does. The bot just saves the tedium of actually copying and pasting the infobox and such.
  • There is some concern over the accuracy of the WoRMS database, but it has been contended that the database is populated and actively maintained by experts in the field and thus should be reliable. Is there any reason to doubt this?
  • There is concern that the 15000 already-created stubs have not been reviewed by the project. Is there work on reviewing this backlog, and if so what is the progress? Is there any reason not to accept the suggestion that bot creation of more articles should wait until that backlog is taken care of?
    • Note that that does not mean this BRFA should be postponed until that time, just that a condition of approval be "the bot will not start creating more articles until the existing backlog is taken care of".
  • There is some concern that, as gastropod classification is changed and species are merged, no one will bother to update the many stubs created here. Is this a legitimate concern? Is this being considered?
  • There is some concern that the classification system used by WoRMS is not generally accepted by mainstream scientists in the field. Is this a legitimate concern? Even if so, does the bot creation of these articles actually prevent proper weight being given to other mainstream classification systems?

Did I miss anything? Anomie 16:17, 24 August 2010 (UTC)[reply]

A few opinions:

  • "...thus should be reliable. Is there any reason to doubt this?..." Again, why not do what a factory does, and check random samples, and set a standard for acceptable % of faulty articles. Or, at least figure out the magnitude of this problem within the 15,000 articles. We might be talking about only 30 articles.
  • Tag specific groups of articles with an incertae sedis template that states something like "This is a group/clade/family of which the taxonomy may be in flux.."
  • Establish a plan for the very valid concern that classifications WILL change.
  • Keep producing articles. Incoming content and images will otherwise have nowhere to land.
  • So, is this a debate over WoRMS and their people, or the endemic flux of the whole class? If it is the latter, then we should wait 30 years before producing stubs. We know that's not going to happen. So, if it is the latter, then produce the stubs, and work around the problem.
  • Anna Frodesiak (talk) 01:01, 25 August 2010 (UTC)[reply]


I am not at all clear as to what is supposed to constitute a "faulty article". To my mind, not that they are absolutely perfect (is there such a thing?), but the great majority of all of our stubs are currently at a (relatively) good level of correctness, bearing in mind how crazy and how vast the field of gastropod malacology is, and this level of correctness applies to both those that stubs that were made by hand and those that were produced by automation. Such synonym articles as currently exist can not really be considered "faulty" because the information is completely verifiable, even though it may not represent some kind of ultimate biological truth (if there even is such a thing). The supposed error in the few Velutina stubs is arguably not an error at all. The set up of each family is checked before the bot is run. If we are going to demand 100% perfection in accuracy in all stubs, or perhaps in all articles in the whole encyclopedia, then most work on Wikipedia will grind to a halt. We certainly do agree that it turned out the bot was not authorized to create so many stubs, and this is unfortunate, but almost all of us at the Project had no idea that the authorization was lacking. I feel it is important not to impose some kind of punitive demands as "retribution" for what was a genuine misapprehension. Thanks for your patience and understanding, Invertzoo (talk) 04:13, 26 August 2010 (UTC)[reply]
The WikiProject is being asked to take some ownership of the issue, and to give a plausible assurance of planning and quality checking, in the terms outlined by Anna Frodesiak. "Faulty" means that when a clueful human reads and digests the article, including checking sources, that errors or strong defects are noticed. Not just a quick "that looks good", but a thoughtful appraisal of whether the article is sound and warrants inclusion in Wikipedia (Is it sufficient as it stands? Would it need significant improvement to merit being an article? How likely is it that thousands of such articles would ever be improved? Instead of articles, would the topics be better handled some other way, such as a list? Is it likely that classifications will change? How could that feasibly be handled?). Johnuniq (talk) 07:18, 26 August 2010 (UTC)[reply]
Thank you Johnuniq for a very clear and cogent message that is also constructive and helpful in tone; that was a very welcome contribution to the discussion. Yes, the project can certainly set something up along the lines that Anna and you have suggested in terms of checking. Just so you know, Daniel and I for the last year have made our way through 6,000 of the older pre-existing stubs (many machine made dating from 2007, and many handmade from 2004 onwards) updating those stubs and fixing them up to reach a better quality and a standardized format. That work has included updating the taxonomy using the most recent overall system and many other improvements. So two of us at least are already used to working for a year with one approach to quality control. If you can give the Project some time to work out what would be the best system to check new stubs and the best system for updating taxonomy and nomenclature, and who will do what, that would be good. Unfortunately I am currently on vacation (until September 6th), so I cannot spare anywhere near as much time on here each day as I would at home. Best wishes to all, Invertzoo (talk) 16:23, 26 August 2010 (UTC)[reply]
  • There are not known real issues with this bot. Generated stubs are useful, complete and valuable as they are. Nobody have proved evidence of any problem.
  • There is necessary nothing to do with generated stubs. Normal continuous checking for taxonomic updates would be fine for every article of any species either human created or Bot created, but it is not necessary.

--Snek01 (talk) 00:26, 27 August 2010 (UTC)[reply]


By "faulty article" I mean a small error in the taxobox or such. That's all. After all, these stubs usually contain only a single sentence stating that the subject is a gastropod, and what family it is etc. Simple.
If 1 out of 1,000 stubs gets something wrong in the taxobox, I do not see that as a reason to stop the bot. It is doing more good than harm. Wikipedia must have an acceptable margin for error. I think, upon examination, that gastropod articles have fewer errors than general articles.
Johnuniq wonders if such simple articles are worth existing if they consist of so little. Each species needs to be represented, even if only visited once a year. Articles get drive-by improvements from the large body of occassional users. The sooner Wikipedia has all species represented the better. The world needs a comprehensive, centralized dBase. I'm thinking of the state of things in 10 years. Let's get critical mass. This whole problem of conflicting species info is related to lack of centralization.
I would like to hear what Ganeshk says about bots handling sweeping changes to groups of articles when classifications change.
Also, it would be nice to see an automated system for checking article, if necessary. Anything to assist or avoid manual checks.
The bottom line for me, is, if we deem WoRMS a good source within a reasonable margin of error, create all 100,000 articles, and deal with problems en masse with bots.
Finally, any comment on my suggestion for a incertae sedis template? Anna Frodesiak (talk) 01:35, 27 August 2010 (UTC)[reply]
Anna Frodesiak, I am slightly concerned by your "the world needs a comprehensive, centralised database". You do realise that Wikipedia cannot fulfil this function (Wikipedia does not consider itself a reliable source). Elen of the Roads (talk) 09:23, 27 August 2010 (UTC)[reply]
An unreliable source now. But Wikipedia is only a few years old. In a decade or two, who knows? Critical mass might be just what this class of animals needs. Anna Frodesiak (talk) 11:15, 27 August 2010 (UTC)[reply]
Anna, to your question about the bot handling the changes, it will be difficult for the bot to update an article where the humans have done subsequent edits (the order is lost). The bot can create subpages similar to the unaccepted page to alert the human editors about discrepancies in status, taxonomy etc. Ganeshk (talk) 11:47, 27 August 2010 (UTC)[reply]
But when classifications change, doesn't that usually just mean a search and replace? Anna Frodesiak (talk) 13:24, 27 August 2010 (UTC)[reply]
It is not just a case of search and replace. Here is an example. I had to change the introduction sentence, add a new category and make other edits to accommodate the classification change. The bot cannot make these decisions. It will make a mess. Ganeshk (talk) 13:58, 28 August 2010 (UTC)[reply]
Ganesh is right when stating that the bot cannot handle the changes in taxonomy, only report them on a subpage. These changes have to be done manually (as I have been doing in the last few days) because there are sometimes ramifications into other genera. Every change has to be checked thoroughly. Also the new name may not have an article yet, either for itself or for the whole genus. This has complicated my task, keeping me busy for hours on one change from a synonym to the accepted valid name. That's why it's such a shame that the bot has been halted temporarily. It could have created these articles in seconds while it took me hours to do so.
And as to the disputed need for all these stubs, I can state these aren't really stubs since they contain already a lot of information : the latest taxonomy (most handbooks and websites are running far behind in this), eventually synonyms (again very useful for checking the validity of a name). From the moment they exist, it's easy to add the type species or even a photo. These are important characteristics, wanted by most readers of these articles (such as shell collectors). Text can be added in a later stage and eventually it will be done so. Of course our ultimate goal is to add the finishing touch to each article, but that's a goal for the far future, unless a few hundred new collaborators join our project. JoJan (talk) 14:03, 27 August 2010 (UTC)[reply]
I'm hearing two things:
  • 1. The bot cannot handle changes.
  • 2. The bot can create articles in seconds.
My questions:
  • How could a bot help with what you are doing right now?
  • (Big picture): If the bot creates 90,000 more articles, and there are classification shifts, what then? Will we have an ocean of inaccurate articles with no automated way of fixing them? Anna Frodesiak (talk) 14:23, 27 August 2010 (UTC)[reply]
  1. The bot cannot help us with the changes, as this involves many things, such as deleting the talk page of the synonym (CSD G6) (I can as I'm an administrator), creating new articles for a genus that was referred to (as I just did for Brocchinia). The new synonyms have to be included into the taxobox of the accepted name (and changing the accession date for WoRMS in the template). While doing so, I have already sometimes remarked that there were additional new synonyms for the accepted name. These other synonyms have to be changed too. Furthermore, one has to make a choice between making a redirect to the already existing article of the accepted name or making a move from the synonym to not yet existing article of the accepted name. As you can see this involves a lot of things that only can be done by us and not by a bot.
  2. I think Ganesh is best placed to answer this question. But, in my opinion, this shouldn't be too difficult for a bot to accomplish. JoJan (talk) 15:02, 27 August 2010 (UTC)[reply]
So the answer to Anna's second questions is yes, there might be a surfeit of articles needing changes, at least for a while? Ganesh and yourself both at one point seemed to be saying that it was not possible for a bot to make the changes, although having the bot make articles for all the new synonyms would be possible. —Preceding unsigned comment added by Elen of the Roads (talkcontribs)
The answer is yes, it will take time for the changes to be fixed. But I won't call it a ocean of inaccurate articles. Out of 15,000 stubs, only 300 articles had a classification change in the last 6 months. If the Gastropod team continues to review the articles as the bot is creating articles, we will not end up with a mountain of articles that need fixing. Already 30 articles out of 300 have been fixed. Ganeshk (talk) 18:37, 28 August 2010 (UTC)[reply]
Elen of the Roads:
  • Why "...a surfeit of articles needing changes, at least for a while..."? Why just for a while?
  • Why do bots make articles for new synonyms? Don't we get rid of those articles?
Ganeshk:
  • If the bot makes another 15,000 articles, won't 2% have problems, just like the first 15,000? Won't the sum total then be 30,000 articles all experiencing a 2% per six-month classification change? It seems that JoJan spent a lot of energy fixing 30 out of 300. I'm still a bit unclear about how to maintain 100,000 articles with such labour-intensive checking.
If I am talking nonsense, please say. Anna Frodesiak (talk) 22:10, 28 August 2010 (UTC)[reply]
Anna, Wikipedia is mostly text based. That makes it difficult for computer programs (bots) to analyze and update. If MediaWiki (the software that runs Wikipedia) had some kind of database support, the bot could have easily kept the articles in sync with WoRMS. Ganeshk (talk) 01:37, 29 August 2010 (UTC)[reply]

Overview from Wikipedia:Village_pump_(proposals)#Species_bot:

  • another five wikipedians have shown support for this task of the Ganeshbot.
  • one wikipedian (User:Shimgray) have shown support for generic articles only, while is being "opposed to creating a large number of individual species articles".
  • no one have shown disagreement in the village pump.

--Snek01 (talk) 15:45, 29 August 2010 (UTC)[reply]


I would like to thank to all for their comments (including those ones, that have never edited any gastropod-related article and those ones, that have never created any gastropod-related article so they have experience neither with this Bot nor with gastropods). I would summarize the task (to be everybody sure, that it is OK):

  • 1) preparing:
    • 1a) GaneshK will write at the wikiproject which taxa are available to import and wikipedians (usually wikiproject gastropods members) will check it and will approve or will modify it (or will refuse it, with recommendation of another source)
    • 1b) or wikipedians (usually wikiproject gastropods members) will write what is needed to be done, and if nobody will disagree, then it is OK.
  • 2) creating stubs: operator GaneshK will do the task with the GaneshBot. Result: created articles are considered to be autonomous valuable articles.
  • 3) additinal checking and improving.
    • There are yearly, half-yearly or continuously made checking for NEW changes in the source and those NEW things are being implemented, if they are considered to be OK.
    • Articles are normally improved during normal editing process.

This describes the real situation how it have been working and how it works.

Everybody can comment any phase of this process anytime. Usual and often possibilities are like this:

  • For not yet created taxa article: "Better/updated source for the taxon EXAMPLE-TAXON-1 is the source EXAMPLE-SOURCE-1. Use the source instead of WoRMS."
  • For allready created taxa articles: "Better/updated source for the taxon EXAMPLE-TAXON-2 is the source EXAMPLE-SOURCE-2. Update it (manually or robotically)."

Wikiproject Gastropods members will be happy to do it. Put your notice at Wikipedia talk:WikiProject Gastropods.

Consider, that this (formal) request for approval deals with phase 1) and phase 2) only. If somebody have comments to phase 3), then feel free to share your opinions at the Wikipedia talk:WikiProject Gastropods. Thanks. --Snek01 (talk) 15:45, 29 August 2010 (UTC)[reply]