Wikipedia talk:Version 1.0 Editorial Team/Article selection

From Wikipedia, the free encyclopedia
Jump to: navigation, search

/Archive1: January to April 2007
/Archive2: October 2007 to August 2008

Problem with articles under WP:F1 scope[edit]

For some reason, all articles under WP:F1 scope shown here have "Unused" importance, despite the fact that the importance is shown clearly in the actual template on the talkpage, see here and here for instance. Dunno why that problem is occuring. Also, it says that it was last updated at "Sunday, 14 September 2008, 04:25 UTC." - that is wrong as 1995 Japanese Grand Prix has been a featured article for several weeks, yet it shows up at the link as a good article. D.M.N. (talk) 16:44, 15 September 2008 (UTC)

Thanks for pointing this out. There are two issues that are coming into play here.
The reason that all of the importance ratings show up as "Unused" is a categorization issue. Your project was unlucky to find a situation where the logic used by SelectionBot is not the same as the logic used by the WP 1.0 bot. I will be running an update on Tuesday to fix any of these newly-discovered bugs. So far this is the first one. (Specifically, the issue is that Category:Formula One articles by importance is not in Category:Wikipedia 1.0 assessments.)
The reason that the 1995 Japanese Grand Prix is showing up under an older rating is that the selection is made from the most recent database dump, which is out of date at the moment. This issue will be resolved for future releases by developing different code for the selection bot, but unfortunately there is not time to develop a new system for this release. Nobody suspected the database dump system would be offline for an extended period of time. I will add clarification to the HTML that the data is based on the last database dump.
You can use Wikipedia:Release Version Nominations to nominate any new FAs that weren't picked up. In the case of the 1995 Japanese Grand Prix article, it is already going to be selected, so there's no need to nominate it. — Carl (CBM · talk) 17:06, 15 September 2008 (UTC)
Hello. Thanks for the response to the above. I'll check again in the next few days to see if the fixes have been made. Kind regards, D.M.N. (talk) 17:50, 15 September 2008 (UTC)
I have not forgotten about this. I am compiling a list (which is short, so far) of issues like this, and I'm planning to run a new upload to resolve them all at once. I will do that either today or tomorrow. Since the ending date for working on articles is Oct 20, there will be at least one month between that new upload and that deadline. — Carl (CBM · talk) 17:28, 17 September 2008 (UTC)
OK. Thanks, D.M.N. (talk) 18:54, 17 September 2008 (UTC)

Why aren't these bot edits hidden from my watchlist?[edit]

I wanted to hide these edits from my watchlist but the hide bot option isn't working - have they not been tagged correctly?--Matilda talk 23:41, 15 September 2008 (UTC)

I think I understand now - these are posts to project talk pages - just there are a lot of them as I subscribe to a lot of projects :-( --Matilda talk 23:44, 15 September 2008 (UTC)
Thanks for pointing this out - it may be a behavior difference with the new api.php editing system. I will look into it. — Carl (CBM · talk) 23:45, 15 September 2008 (UTC)
IIRC, there were some complaints about the behavior of the API being backward compared to the normal UI, that you have to specify &bot=1 or something like that. Though it seems like it would be helpful to have it run without marking the edits as bot. I know I don't check the talk pages of projects I'm associated with very often, so seeing the edit on the watchlist helped. Mr.Z-man 00:14, 16 September 2008 (UTC)
If I had gone out of my way to choose, I would probably have chosen to make them visible, since like you said people may not check the talk pages otherwise. In reality, I simply didn't think about it, since I am so used to the automatic bot flag for the old editing system. Matilda's comment reminded me I need to remember to turn on the 'bot' setting for other purposes. — Carl (CBM · talk) 00:21, 16 September 2008 (UTC)

Importance=Top shown as Unassessed[edit]

Orienteering is assessed Top class within WikiProject Orienteering, but in the SelectionBot listing it appears as Unassessed. Is that normal behavior for this bot? --Una Smith (talk) 02:16, 16 September 2008 (UTC)

This is a categorization issue, which is also visible at the statistics table made by WP 1.0 bot. The problem is that Category:Top-importance Orienteering articles is not in Category:Orienteering articles by importance. However, since the Orienteering article already had enough points to be selected, and is the only Top-importance article for that project, it turns out to be a non-issue here. — Carl (CBM · talk) 02:25, 16 September 2008 (UTC)
We probably should fix that category tree to avoid this problem in the future, though. Titoxd(?!? - cool stuff) 04:27, 16 September 2008 (UTC)
Yes, of course. The current WP 1.0 bot has some that tries to keep the tree tidy, but apparently it doesn't cover this situation. At some point, I'll write a maintenance script to double-check all the categories. — Carl (CBM · talk) 11:51, 16 September 2008 (UTC)

USS Nevada (BB-36)[edit]

It is my guess that Nevada will not meet the requirements set out here to be released, but I don't see this article on the list of non-selected articles either! Just a curious question, -talk- the_ed17 -contribs- 04:23, 16 September 2008 (UTC)

It's listed here with a score of 952. The reason the quality looks wrong is that the selection was made from a database dump before the article was promoted to GA. With the GA rating the score would go up to 1202, still under the 1250 threshold. I am not personally familiar with the standards for adding extra articles to the selection, but you can raise the issue on Wikipedia talk:Release Version Nominations and someone more knowledgeable will be able to help. — Carl (CBM · talk) 12:04, 16 September 2008 (UTC)
That's why I missed it; I was looking through at only the GA's. Thanks! -talk- the_ed17 -contribs- 19:38, 16 September 2008 (UTC)


Could the word "holiday season" be replaced by a more specific and region-neutral time frame? "Holiday season" as in the western hemisphere is not the same across the world. Thank you. =Nichalp «Talk»= 06:33, 16 September 2008 (UTC)

I'd second that, though as the message has gone out to all the WikiProject Talk pages I'm not sure it's worth changing now. Something to note for next time perhaps. "Holiday season" is US English and little-used and potentially confusing even in the UK, where many would take it to mean July and August. How about "mid-December"? --Qwfp (talk) 10:09, 16 September 2008 (UTC)
I've just checked Christmas and holiday season and read that it "is generally considered to begin with Thanksgiving", which surprised me and means I just unwittingly demonstrated my own point! So we have a month or so less than I'd thought. Qwfp (talk) 10:25, 16 September 2008 (UTC)

I'm sorry for the faux paux. Several people edited the announcement, but we all failed to notice this issue. I think Qwfp is right that it's too late now, but I'll remember this in the future. — Carl (CBM · talk) 11:55, 16 September 2008 (UTC)

Wikipedia 0.7 articles have been selected for The Who[edit]

According to an automated message — Wikipedia 0.7 articles have been selected for The Who — you've asked for help in reviewing articles related to The Who. Please note that WP:THEWHO is inactive and has been for at least a year; it is marked as such using {{inactive}}. Perhaps the bot should be improved so that inactive projects are better detected. (talk) 08:15, 16 September 2008 (UTC).

Good point. — Carl (CBM · talk) 11:53, 16 September 2008 (UTC)
Thanks for the ACK; of course it isn't URG. :-) (talk) (fka (contribs)) 06:27, 17 September 2008 (UTC)


Should importance not be replaced by priority? Kittybrewster 12:39, 16 September 2008 (UTC)

Nobody seems to agree on that. Some projects use one term, some use the other. For simplicity we have just been using 'importance' in the selection bot, but it's just a piece of jargon. — Carl (CBM · talk) 13:22, 16 September 2008 (UTC)

Top-Importance for WikiProject Lepidoptera[edit]

Please note that the BOT, or whatever is calculating the scores, is not registering Top importance category, e.g. on LepidopteraGRM (talk) 17:22, 16 September 2008 (UTC)

Thanks. This is a categorization issue; the Formula One project ran into the same problem. I am going to run the data again to fix this. — Carl (CBM · talk) 21:36, 16 September 2008 (UTC)
Er, when will you be re-running the data? And ... what is the cut-off date for upgrading articles to 1250 points? Thanks—GRM (talk) 15:30, 10 October 2008 (UTC)

Missing Articles?[edit]

I've been reviewing the long version of the list for the Bristol WikiProject, it doesn't appear to contain any of the articles tagged by the project during August. Is this intentional? Fortuantly, they would not have met the inclusion criteria anyway, but it may be a problem in the future.

NullofWest (talk) 18:56, 16 September 2008 (UTC)

It was not intentional, but it is a known problem. The current selection system uses database dumps, which were unexpectedly halted at the end of July and are not back online yet. So the selection data reflects the July 22 database dump. If articles have been reassessed since then, and should be included in the release, the can be nominated at Wikipedia talk:Release Version Nominations. In the future, we'll use a different system to collect the data, that does not depend on database dumps. — Carl (CBM · talk) 21:40, 16 September 2008 (UTC)

At the time of the last database dump only 26 articles in the Melbourne project had been assessed as Top or High importance. In the last two months I have assessed the importance of all 2546 Melbourne articles which now has 193 articles rated as Top or High importance. Is there some way that a "bot boffin" can update the Melbourne articles to reflect the new assessments? - Cuddy Wifter (talk) 01:48, 17 September 2008 (UTC)

I'm sure we can work out some solution. Let me wait a couple days to see how common this sort of issue is before I move forward. — Carl (CBM · talk) 02:01, 17 September 2008 (UTC)
It's been a couple of days, and I've checked the Addendum and New Articles, but cannot find any new Melbourne articles. What's the story? - Cuddy Wifter (talk) 22:48, 22 September 2008 (UTC)
This should be corrected in the latest version. — Carl (CBM · talk) 18:35, 28 September 2008 (UTC)

Article score fomula can penalize importance assessments[edit]

Hi! While taking a look at WPTC's selected articles, I noticed some weird behavior with the fomulas being used for importance scoring. From Wikipedia:Version 1.0 Editorial Team/SelectionBot:

When assessed: Importance score = Assessed_importance_points + External_interest_points.
When unassessed: Importance_score = External_interest_points * (4/3).

The problem is, it's possible for a project to un-assess an article and raise the importance scoore. It's also possible for this to affect whether or not the article gets selected. For example, take Effects of Hurricane Katrina on New Orleans, with scores taken from . It scored 1229.

Its importance score was 829, composed of 200 from the "Mid" importance rating and 629 from external interest.

If the project were to un-assess the article, it would instead score 629*4/3 = 839 total importance points, raising its score by ten points. (While this wouldn't affect selection in this case, I think the example illustrates that it's certainly plausible.)

This doesn't seem like desired behavior. Perhaps for an assessed article, take the maximum of the two scores? —AySz88\^-^ 20:49, 16 September 2008 (UTC)

This is a valid concern. The same problem would be more dramatic if the article went from being ranked low-importance to having no importance rank. But any Top or High importance article that would benefit from having the importance rating removed would almost certainly already be selected.
The problem of what to do with articles that could have an importance assigned but currently don't have one is very difficult. The option of simply giving them 0 points above the external interest points was rejected. But since we don't know what importance they should have, we have to guess somewhat. The experience of this release, and feedback about the algorithm, will help shape the algorithm for the "real" WP 1.0 release in the future. — Carl (CBM · talk) 21:50, 16 September 2008 (UTC)
I noticed that the tail end of my comment was cut off when it was moved here. What do you think of taking the maximum of the two schemes? This would boost articles which have external interest scores higher than parity for their assessed importances, while low-interest articles can still be boosted with assessed importance. —AySz88\^-^ 01:01, 17 September 2008 (UTC)
Sorry - didn't mean to cut it off. Personally, I favor the "don't use the 4/3, just treat them as unassessed" scheme, but I will implement any system that has consensus for the next release. It's too late to make any major changes for this release, though. We're using Wikipedia:Release Version Nominations to correct any issues with the scoring system this time. — Carl (CBM · talk) 01:32, 17 September 2008 (UTC)
Here is a possible solution for unassessed articles, that may also discourage de-assessing an article. From assessed articles, compute linear equation y = ax + b (or similar) where y = assessed score and x = external interest score; use equation to assign pseudo assessed score to unassessed articles. I would be interested to see a graph of the data used to compute the equation; it would give an idea of how many assessed articles may be assessed too low or too high. In some projects, certain of the housekeeping articles are assessed Top importance but no editor finds them interesting enough to develop beyond Start class. If those articles' external interest is low too, that may allow the projects to reassess and "downgrade" those articles. --Una Smith (talk) 19:31, 17 September 2008 (UTC)
Your suggestion of plotting the equation and studying it is certainly a good idea. We want to keep refining the system, but we also don't want to make it unwieldy - I think we can probably study it in a more detailed way as we go forward. We believe we have something that is good enough for Version 0.7, but it may turn out that it's not really good enough for Version 1.0. But we've come a long way from simple guesswork, which we used on V0.5!
We found problems with earlier iterations, if we didn't compensate for non-assessment; articles like goat, rabbit and camel which are obvious articles to include were failing to make it in, simply because they were getting a zero or a 200 score instead of the 4/3 correction. I accept the 4/3 correction is rough, but at least we're not losing major topics now. Maybe the graph Una proposes will give us something better. We are most concerned for the articles close to the score threshold, and in that score range it works fairly well; it may work very badly for an article well outside that range.
As for the problem of a Low or Mid actually penalizing a score, I expected that to occur in some cases, though again these are really only likely with articles that are a long way below the threshold. There are two possibilities:
  1. The article was badly assessed; the external interest is indicating that perhaps it is more important than the assessor thought. For example Bisphenol A has been in the news a lot this year, and scored 74 inlinks, 10 interwikis and 50,700 hits, but was ranked only Mid-Class by WP:Chem, so even as a B-Class it fell below the threshold. Someone pointed out that this was too low, and I upped it to High. In this case, the SelectionBot work may be helpful in providing new data for projects to look at.
  2. Alternatively, it may be that the topic is a major one in some other area, but relatively minor. My favourite example here is Albert_Einstein ranked Mid-Class by the New Jersey WikiProject but Top-Class by Physics. That's right - his impact on Physics was immense, his impact on NJ was moderate. The article has 2,126 inlinks, 110 interwikis, and 286,590 hits, putting it into the top ten biography articles for importance. In the NJ list the importance score is only 1399, whereas in physics the importance score is a huge 1646. That to me is right; clearly NJ is penalising the article, but that represents the situation accurately, and the article should make it into the collection primarily on his importance to physics, not to NJ.
As you can see, I think about this stuff way too much! Una, if you're interested in the refining work, you're welcome to work with us! Walkerma (talk) 21:51, 17 September 2008 (UTC)
The first scenario was what I had in mind when proposing taking the maximum score out of the two formulae. But I think the second suggests that there could be problems in general with using "importance to the WikiProject" as a proxy for "importance to the encyclopedia", since the algorithm may be missing articles where the most relevant wikiproject has neglected to add the article (or that project doesn't exist yet). It may be useful to generate plots of all the external-interest scores of Top/High/Mid/Low/NA articles (taking just the largest one), to see how the distributions are and how well they correlate. How important is keeping the manual importance ratings over the external importance scores? —AySz88\^-^ 00:47, 18 September 2008 (UTC)

Oh, this is probably apparent now, but to be clear about the bad behavior:

  • If the article has over 1200 external interest, rating it Top will decrease its total score.
  • If the article has over 900 external interest, rating it High will decrease its total score.
  • If the article has over 600 external interest, rating it Mid will decrease its total score.
  • If the article has over 300 external interest, rating it Low will decrease its total score.

The main worry seems to arise if you have fair-quality Mid articles, the bolded line above (for example, 700 external interest + 200 Mid importance + 300 B-class = 1200, almost at the threshold). —AySz88\^-^ 01:15, 18 September 2008 (UTC)

What the frell?[edit]

I'm not convinced that this assessment is giving sane results for a lot of projects. Why didn't you just ask the projects, then weed out ones that were problematic? Shoemaker's Holiday (talk) 03:37, 17 September 2008 (UTC)

Could you explain in more detail what your concerns are? The selection criteria are detailed on the other side of this talk page; the more detailed you can be, the better. Also, remember that there is a Wikipedia:Release Version Nominations to allow projects to manually select articles in addition to the automatic selection. — Carl (CBM · talk) 04:00, 17 September 2008 (UTC)
The article selections, in many, perhaps most cases aren't really sensible ones, and there doesn't seem to be any quality component whatsoever, excluding featured articles in favour of B-class or lower in many cases. For instance, Trial by Jury (FA) not included - The Mikado (weak B) included. By prioritising weak (but popular) work in favour of quality work, it does seem a slap in the face to those of us working hard on quality.
As well, using "importance" assessments as done by the projects themselves leads to very different criteria. For instance, what's classed as "Medium" importance to the G&S project would, in all likelihood, be "high" or even "top" in other projects, because we use a bottom-heavy scheme. Shoemaker's Holiday (talk) 04:17, 17 September 2008 (UTC)
It is true that the selection favors importance over quality to some extent. How else could we get a release that covers a broad selection of the topics that readers are most likely to look for? This isn't to say quality is not important - an FA class article get 500 points for quality, while a Top-importance article gets only 400 points for importance. So in that regard high quality counts more than high importance.
The difference in importance ratings between different projects is an issue, I agree. After this release, I am planning to generate some statistics for which projects have unusually high or low percentages of Top-importance articles. One goal of those stats is for projects with a lower percentage of Top-importance articles to have a chance to change their rankings before WP 1.0. — Carl (CBM · talk) 04:28, 17 September 2008 (UTC)
Wait, did you just say your solution to this problem was to get Wikiprojects to follow how your team feels ratings should be used? Shoemaker's Holiday (talk) 03:37, 19 September 2008 (UTC)
Doesn't it make sense to include the most well-known comic opera by G&S? Trial by Jury is much less known (I've never even seen it), and it did receive an extra 200 points for being an FA vs B. So we certainly DO try to promote quality, but we also need to have a balanced selection that includes important topics. The Mikado is important for us to have. However, we do hope to refine the algorithm still further. Walkerma (talk) 04:37, 17 September 2008 (UTC)
Well, yes, except that there's a reason that H.M.S. Pinafore, The Pirates of Penzance, and the Mikado are called the "big three". If anything, I'd say the other two are even better known. If you're going to exclude Pinafore and Pirates, then you may as well select by quality. It's not that including the Mikado is inherently bad, it's including *only* the Mikado that makes it weird. Shoemaker's Holiday (talk) 08:14, 17 September 2008 (UTC)
It is certainly worth investigating. Looking at the data here, all three articles have the same quality and importance assessments, so the difference comes from different external interest points. The thing that seems to have made the difference is that Mikado has 6 interwiki links, while Penzance and Pinafore have only 4 each. So either (1) there are interwiki links missing or (2) there are other projects that do include only Mikado but not Pinafore or Penzance. Going from 4 to 6 interwikis adds about 50 points to an article's score in the current algorithm. In the end, I think we will always have to have some sort of manual nomination process for "articles that complete a set". — Carl (CBM · talk) 14:51, 17 September 2008 (UTC)
I think interwiki links may be overrated. Is having four extra entries in other languages really the equivalent of the difference between a GA and an FA for the current article? GreenReaper (talk) 03:48, 18 September 2008 (UTC)
We could remove Mikado and have it redirect to Gilbert and Sullivan in the release. After all, it does have five paragraphs, a picture and an audio file in there. (This is probably the better option when over half of a set is unselected, if a summary article exists.) GreenReaper (talk) 03:48, 18 September 2008 (UTC)
I think I'd propose having the "big three" included; these are all B-Class, and are much better known than the others. The reason the interwikis count more at that level is because it is a log function, and going from 4 to 6 is a big jump (50 points); going from 20 to 22 would be a much smaller change. But the jump from B to FA is 200 points, and from GA to FA is 100 points. Walkerma (talk) 02:32, 19 September 2008 (UTC)

Bugs in the SelectionBot?[edit]

Hi, I just wanted to alert you to the possibility of bugs in the SelectionBot? First, as I wrote back in June, whole WikiProjects seem to be missing from the Release Version, such as Wikipedia:WikiProject Electronics. Second, some articles are missing, such as the Featured List, List of scientific publications by Albert Einstein. Third, I noticed that the evaluation data are sometimes out of date, e.g., Newton's theorem of revolving orbits which has been a Good Article for almost a month but which is listed as a Start class on the most recent SelectionBot evaluation [1]. It hasn't been a Start-class article since July. Thanks for looking into these, and good luck with the publication! :) Willow (talk) 16:26, 17 September 2008 (UTC)

Very strange. We did resolve how to deal with projects such as Electronics, and I expected them to be included. So I don't know why it isn't in the latest data, but will make sure it is in the final 'bugfix' upload that will be done soon. I'm collecting a list of issues like this to fix all at once.
We do know that the selection data being used is out of date; it's a very unfortunate result of the current selection system relying on database dumps, and a complete failure of the database dump system that still is not resolved. Articles that have been reassessed much higher since the last dump can be handled at Wikipedia:Release Version Nominations, which is the manual side of the selection. — Carl (CBM · talk) 17:23, 17 September 2008 (UTC)
There is a problem with WP:Primates as well, though I know that project and Electronics were in the earlier iterations of the list. Strange! WP:Palaeontology got missed because they didn't have the categories set up right; that has been fixed, but can you be sure to include them in a mini-run if you do one, CBM? Thanks, Walkerma (talk) 04:10, 18 September 2008 (UTC)

Pier Gerlofs Donia[edit]

Select the Pier Gerlofs Donia article. It is of great importance to both the piracy project, Frisian project and Netherlands project. It has been rated of Top importance and B-class quality. Thanks in advanche,

Jouke Bersma —Preceding unsigned comment added by (talk) 07:59, 18 September 2008 (UTC)

SelectionBot bug fixes[edit]

I have run a new version of the selection data, which is available for preview here. I will double-check it tomorrow when my eyes are fresh, but I wanted to give others a chance to look over it as well. The following issues should be fixed:

  1. FL articles were not given the right importance bonus.
  2. Categorization issues with the WP 1.0 categories were resolved. These affected projects such as Formula One (higher on this page). The most visible symptom was that importance was always shown as "unused".
  3. Some projects were inadvertently left off the original selection, including Electronic. These are included in the new data.

I am also checking to verify that all articles previously selected are still selected. There will be a handful (under 20) that the fix to #2 causes to no longer be selected, but I will handle those with the nominations page myself.

Once I double-check the data, I will overwrite the old selection data so that links will still work correctly (having two copies that differ slightly would be a continuing source of confusion). — Carl (CBM · talk) 00:43, 21 September 2008 (UTC)

Ah, thanks for resolving the issues. D.M.N. (talk) 07:21, 21 September 2008 (UTC)

I've updated this to add the Palaeontology project. The current plan is:

  • Notify the projects that are now included but were not included before.
  • For the ~80 new articles in other projects that are now selected, but weren't before, I will nominate them manually. This includes such as Formula One.

— Carl (CBM · talk) 02:44, 22 September 2008 (UTC)

Update: I have uploaded two tables:

— Carl (CBM · talk) 13:49, 22 September 2008 (UTC)

College Football articles[edit]

Hello, the college football wikiproject has several Featured Articles. I am surprised to see that it looks like only one was selected. For instance, could you please help me understand why 2005 Texas Longhorn football team was not chosen? thanks very much, Johntex\talk 17:37, 28 September 2008 (UTC)

It's because that article was promoted very recently. The automated side of the selection was written to use database dumps, and the database dump system had an unforeseen total failure at the end of July. The manual side of the selection, at Wikipedia:Release Version Nominations, can handle articles that have been promoted to FA after the selection data was dumped. We're going to switch to a different method for future release selections, so that we are no longer reliant on database dumps. — Carl (CBM · talk) 18:39, 28 September 2008 (UTC)
Thanks for the reply, but I am not sure matches up. The 2005 Texas Longhorn football team was promoted over a year ago: September 9, 2007. So, it was promoted prior to any issues occurring in July of this year.
The topic of college football currently has 12 featured articles. Half were promoted in 2007, the other half in 2008. Can they all be included in this selection? Best, Johntex\talk 07:09, 19 October 2008 (UTC)


Are lists like this one ever updated? I'm asking because after this list was published, I made many changes in article assessment (especially importance) and having an update with those changes included would make room for more changes (if needed). -- Ynhockey (Talk) 15:41, 1 October 2008 (UTC)

That page is static. If pages should be included in WP 0.7 that were not included in the automated selection, you can nominate them at Wikipedia:Release Version Nominations. — Carl (CBM · talk) 16:55, 1 October 2008 (UTC)
In that case, is there any way to update that page manually, regardless of WP 0.7? -- Ynhockey (Talk) 19:23, 2 October 2008 (UTC)


I have a couple of questions:

  • I noticed that the output of the bot does not include "law" as a category. Did I miss something?
  • How can I display the score of an article that I have written or contributed to? (talk) 15:23, 2 October 2008 (UTC)
The category is called "Legal" so the law articles are listed there. There is no tool to display the score like that, though I was going to ask for such a thing to be written like that - currently you need to find what it was tagged for, then go and look under that project. If the article is new, it won't appear. The bot currently has to use a dump from July for some data, and so even something written in August may not be in there. We may be able to do more in the future, but remember that this is a new bot! Cheers, Walkerma (talk) 16:18, 2 October 2008 (UTC)
Found it. Thank you. (talk) 19:12, 2 October 2008 (UTC)

Have you screwed the whole thing up?[edit]

According to this, not only was Pacific typhoon selected when it clearly had issues (i.e. a tag), but I found fundamental flaws in the bot assessment itself, that have spread across all articles. Using this as an example:

  1. How on earth did it find 550 internal links in Pacific_typhoon, or even the older version [2]? I found 38.
  2. Where are the interwiki links? I found none. Why is this bot including the interlanguage links - 15 I counted?

I took a couple of hours off just to clear some of the backlog and then I've found this - I think I have a reason to be ticked off. What can be done to fix this mess? Or was this the way it's supposed to be assessed and is this what needs to be fixed? Ncmvocalist (talk) 07:06, 4 October 2008 (UTC)

First of all, let me thank you for helping out with the listings! I'm snowed under with work myself (grading 80 organic chem exams right now), so it's hard to commit a lot of time. The toolserver seems to be down, so I can't check the details right now, but I can address many of your concerns:
  • Links-in or inlinks refers to the number of other articles that link to that article, which can be found here. The hypothesis is that an article on a more important topic will tend to have more articles linking to it than if it covers an obscure topic.
  • The interwiki count is the number of foreign language versions of that article. It usually provides a good, objective measure of how important the topic is around the world.
It's true that some of the articles have cleanup tags on them, and WikiProjects have been given a [list of articles that have them, and this list is automatically updated every hour (when the toolserver is working!) as projects fix these problems. These articles are typically on important topics, and to exclude them would leave a gap in our content, which may be worse than having an article with a cleanup tag. We may have to tighten up our requirements as we move towards our first full 1.0 release, but for now we want to give projects the chance to fix things. Cheers, Walkerma (talk) 16:47, 4 October 2008 (UTC)


May I request that the following articles be removed: Bender (Futurama), Philip J. Fry, and Doctor Zoidberg? They're B-Class (though they look C-Class to me), and I don't think they'll be fixed to the deadline. - A Link to the Past (talk) 17:32, 11 October 2008 (UTC)

Interesting Example Data for Study of SelectionBot[edit]

I became uber interested in how scores were attained by the SelectionBot tonight and came up with this little study of example data for overall scores. Interesting how Project Scope doesn't account for much and everything else is 20% +/-5. Lwoodyiii (talk) 22:40, 1 January 2009 (UTC)

Jpeg of Excel Doc
Thanks! This is really helpful! It would be really interesting to see how this affects a wide range of articles, from obscure articles to popular ones, and from big projects to small ones. I know that with a "large scope" project, the scope counts quite a lot. As we think about the next release, we will want to tweak the formula, and this sort of visualization is very helpful. Cheers, Walkerma (talk) 23:19, 1 January 2009 (UTC)
No problem!! I really enjoyed doing it. Where would be the most appropriate place to publish more visualizations? Thanks Lwoodyiii (talk) 00:56, 2 January 2009 (UTC)
I would say that most should be done here, but if we have a really nice one, it might be posted on the 1.0 talk page which is much more highly watched. If you were doing a whole series, it should be on a subpage - but I don't want to distract you from your main work! If you want to do more, probably the first thing would be to decide what to show. Firstly, can you clarify what data are represented here? Thanks, Walkerma (talk) 18:12, 2 January 2009 (UTC)
I would just like to be useful to the 1.0 Release version. I'm an extremely new Wikipedian (~1 week), but I find this particular project to be of great importance for distributing a quality version of Wikipedia. As far as the data, the first line is for the Software engineering article. I was interested to see where it was in the list and how it got there in the first place. The other rows are examples of numbers for each part of the score, starting with the minimum or likely minimum and going to the maximum. I wanted to see if you increased certain parts of the formula how it would affect the overall score. Thanks for any guidance you can provide. Lwoodyiii (talk) 18:55, 2 January 2009 (UTC)

Another run/custom runs?[edit]

We lowered our priorities and quality ratings over at WP:FURRY after the last run, as well as removing a bunch of articles. It would be interesting to know how we're doing now, and in the future. Is there any plan to run this bot regularly, or to allow for the use of custom runs that (say) use the API rather than database dumps to get information for a particular project? GreenReaper (talk) 19:44, 8 April 2009 (UTC)

Unfortunately it is not possible to use the API instead of database dumps, because the scoring algorithm requires things like a count of incoming links that cannot be efficiently fetched over the API. However, the new version of the WP 1.0 bot, which is under development and almost ready for public tests, is going to include the ability to generate scores dynamically, so they will be updated all the time. This bot been delayed somewhat as my job-related workload is very high, and because the WP 0.7 release required much of the time I had for wikipedia, but the new bot has been moving forward slowly but surely. Once the academic semester ends, I will be able to devote more time to the bot in May. Of course, if anyone else would like to help with the coding I would be glad to find a way for them to do so; I have already had a great deal of help from one other person. — Carl (CBM · talk) 00:39, 9 April 2009 (UTC)


No comma needed between "October" and "2010". Rich Farmbrough, 08:03, 7 October 2010 (UTC).

Bot still checking old project title[edit]

Selectionbot just popped into the Ontario Roads wikiproject, except it still thinks that its the Golden Horseshoe WikiProject. Not sure what to do here; figured this is the best place to let somebody know. - ʄɭoʏɗiaɲ τ ¢ 19:25, 6 November 2010 (UTC)

Longevity COI[edit]

A discussion about longevity WP:COI has been initiated at Wikipedia talk:WikiProject World's Oldest People#End COI. As a recent contributor to this page, your comments are solicited. JJB 20:18, 11 November 2010 (UTC) (Redirected from User talk:SelectionBot) JJB 20:18, 11 November 2010 (UTC)

Including articles from V0.5[edit]

Sorry if this is the wrong place to post these questions - should it be at Wikipedia talk:Version 1.0 Editorial Team/Index#Help finding removed? Thanks for reading. --trevj (talk) 15:22, 23 March 2011 (UTC)

  • Aha... or does this mean that they'll be included? (And if so, then why isn't WP:RISCOS linked to from that list?) --trevj (talk) 15:25, 23 March 2011 (UTC)
  • I've just noticed that the 2008 list was the 'Selected articles', but the 'Selection data' doesn't include the relevant articles either. --trevj (talk) 15:31, 23 March 2011 (UTC)

Score = Importance + Quality + Popularity[edit]

Just to offer an alternative interpretation of what an article selection score means, it's also plausable to say the score is a function of three key factors - importance, quality and popularity. I also would suggest that's a more descriptive and useful way to say what basic criteria are used to select articles. None of the calculations change, they're just grouped differently in the presentation. Here are the basics.

Overall article score = Importance_score + Quality score + Popularity score.
Importance score = Base_importance_points.
Quality score = Quality_rating_points.
Popularity score = WikiProject_scope_points + External_interest_points.

The WikiProject scope points are based on the External interest points. Another name for external interest is popularity. Hence, the Popularity score fully encompasses the notion of External interest.

I see two key advantages to this presentation style of the article selection score. First, the Importance score is returned to its original meaning on Wikipedia. It's a simple numeric representation of the Top-, High-, Mid- or Low-Importance ratings. Second, the Popularity score is an intutively purer factor related to the set of External interest measures added to the established WikiProject assessment factors of importance and quality.

A handy bonus for WikiProjects is this Popularity factor can be added to Importance and Quality ratings of articles without any additional manual work, because the calculations are totally automated. So, some time down the road, the concept of popularity can be more fully folded into the overall WikiProject efforts to offer important, high-quality articles that are of interest to contemporary readers. --RichardF (talk) 17:52, 7 July 2011 (UTC)

Missing external interest data[edit]

Many articles seem to lack external interest data, such as in this list of unassesed athletics articles. Why is that? --Klättermusen (talk) 09:39, 2 December 2015 (UTC)