User:Smallbones/vital articles
Starting page to remind me of promises at [[1]]
This discussion has been closed. Please do not modify it. |
---|
The following discussion has been closed. Please do not modify it. |
Thanks Smallbones - in your ever widening research you may also want to take a look at enWP; Wikipedia:Vital articles/Level/1: (I do not know if those Icons are up to date) and see what the improvement history of those are. You could also compare those across projects or go deeper into the 1000 (which admittedly have a bit of serendipity in selection). Also maybe write it up for the internal notice. Alanscottwalker (talk) 16:37, 23 February 2016 (UTC) That's an excellent suggestion Alan, though I'd go further than Level 1.♦ Dr. Blofeld 17:49, 23 February 2016 (UTC)
|
Initial plan:
- Collect data data on the above 10 articles, and
- 10 random articles from vital articles level 4 (get enough to match years started, throw out the rest)
- 10 random from Wikipedia sample of 1000 (earliest 10 non-vital to match years (almost))
- data includes Dec 31 for first 3 years and last 5 years mORES score
- (Dec 31 1st month, and 2015, mORES bytes, word count - maybe later)
- pageviews Jan. 2015
- questions
- do the three classes have equal starting quality?
- do they improve at the same rate?
mORES class (if stub probability > .70) classify articles as sub-stubs. The final quality classes I use range from 1 to 7:
1 (sub-stub); 2 (stub); 3 (start); 4 (C); 5 (B); 6 (GA); and 7 (FA).
Data
[edit]# | Article | year | mORES 1st 12/31 | 2nd 12/31 | 3rd | 2011 | 2012 | 2013 | 2014 | 2015 | Page views | Notes to self |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Earth | 2001 | 5 | 5 | 5 | 7 | 7 | 7 | 7 | 7 | ||
2 | Life | 2001 | 3 | 3 | 5 | 7 | 7 | 7 | 7 | 7 | ||
3 | Human | 2001 | 3 | 5 | 5 | 7 | 7 | 7 | 7 | 7 | ||
4 | History of the world | 2004 | 5 | 5 | 5 | 7 | 7 | 7 | 7 | 7 | ||
5 | Culture | 2001 | 1 | 3 | 5 | 7 | 7 | 7 | 4 | 4 | ||
6 | Language | 2001 | 2 | 3 | 5 | 4 | 7 | 7 | 7 | 7 | ||
7 | The arts | 2004 | 1 | 3 | 5 | 4 | 4 | 4 | 4 | 4 | ||
8 | Science | 2001 | 3 | 3 | 5 | 7 | 7 | 7 | 7 | 7 | ||
9 | Technology | 2001 | 3 | 3 | 3 | 7 | 7 | 7 | 7 | 7 | ||
10 | Mathematics | 2001 | 5 | 5 | 5 | 7 | 7 | 7 | 7 | 7 | ||
20 | Afonso I of Portugal | 2001 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | ||
21 | Iraq | 2001 | 1 | 2 | 5 | 6 | 6 | 6 | 7 | 6 | ||
22 | Korean War | 2001 | 3 | 3 | 5 | 7 | 7 | 7 | 7 | 7 | ||
23 | Phonetics | 2001 | 1 | 3 | 3 | 4 | 4 | 5 | 5 | 5 | ||
24 | James VI and I | 2001 | 1 | 3 | 5 | 7 | 7 | 7 | 7 | 7 | ||
25 | Computer architecture | 2001 | 3 | 3 | 3 | 5 | 5 | 5 | 4 | 4 | ||
26 | Meteorite | 2001 | 1 | 3 | 3 | 7 | 7 | 7 | 7 | 7 | ||
27 | Wars of the Roses | 2001 | 1 | 3 | 5 | 4 | 7 | 7 | 7 | 7 | ||
28 | Rainforest | 2002 | 2 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | ||
29 | Hundred Years' War | 2002 | 3 | 3 | 5 | 4 | 7 | 7 | 7 | 7 | ||
30 | Chain rule | 2001 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 5 | ||
31a | Clark, New Jersey | 2002 | 3 | 3 | 3 | 5 | 6 | 6 | 6 | 6 | replace Cy Young Award was a list | |
32 | Ninth Fort | 2001 | 1 | 1 | 1 | 3 | 4 | 4 | 4 | 4 | ||
33 | East Machias, Maine | 2002 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | ||
34 | Economy of Jordan | 2002 | 3 | 3 | 3 | 4 | 5 | 7 | 7 | 7 | ||
35 | Transport in Swaziland | 2002 | 1 | 1 | 1 | 3 | 2 | 2 | 3 | 3 | ||
36 | Mary River National Park | 2002 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | ||
37 | Zero divisor | 2002 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ||
38 | Rockwood, Tennessee | 2002 | 3 | 3 | 3 | 4 | 4 | 4 | 6 | 6 | ||
39 | Morganville, New Jersey | 2002 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
Discussion
[edit]How do Wikipedia articles improve over time? My interest in that question led me to do the exploratory investigation shown at User:Smallbones/1000 random results. Alanscottwalker asked whether the most important articles on Wikipedia, as listed at WP:Vital articles, might develop differently over time. This question was seconded by Dr Blofeld.
Their concern is that the most important articles may be too large or complex to develop in the usual manner that other articles develop. Perhaps a different method or a special effort should be used for these articles? I address the following questions:
- Do vital articles start their lives as higher or lower quality than other articles?
- Do these articles improve in quality at a faster or slower rate than other articles?
- Do they currently have higher or lower quality?
The measure of quality used is a modified form of the ORES article quality rating system. ORES was programed by machine learning to predict how editors would rate articles, based on easily observed article attributes, such as text length, and internal and external links. It's strengths are its consistency and its ability to quickly update or guide a new quality rating. To my knowledge I'm the first person to use ORES to rate earlier versions of articles and this exercise might be useful in seeing whether it is also useful in this area. I also used the random in category function, which I had not used before, to randomly select articles in the Level 4 Vital articles category.
The ten Level 1 Vital articles are arguably the most important articles on Wikipedia. These articles form one of three groups of article that were examined. They are remarkable in that 8 of them were started in 2001, Wikipedia's first year. The two others History of the world and The arts were first edited in 2004.
Ten Level 4 Vital articles were selected by the "Random in category" function, out of 9,812 articles in the category. Actually only 8,762 articles (89%) were contained in the Category:All Wikipedia level-4 vital articles. Remarkably 8 of these articles were also started in 2001, with 2 started in 2002.
The third group of articles examined were randomly selected earlier as part of the User:Smallbones/1000 random dataset. To match the other two groups, the earliest 10 articles in that dataset were selected. Remarkably, two of the selected articles were also vital articles - Karl Marx, and Polymerase chain reaction. These articles were removed and replaced with the next earliest articles. The article Cy Young Award was also removed and replaced when it was discovered that it is classified as a list rather than as an article. This group consists of 2 articles started in 2001 and 8 started in 2002.
It is truly remarkable that these vital articles were almost exclusively started during the encyclopedia's first 2 years, and that articles from the encyclopedia's first 2 years tend to be classed as vital. This investigation may be limited by this focus on early articles.
A modified ORES score was calculated for each of the 30 articles for the December 31 version in each of the following years: the first year of the article's life, the second year, and the third year, as well as for the last 5 years, 2011-2015. The scores were then averaged for each group and are presented graphically below.
Group | Avg. 1st year | Avg 3rd year | Avg. annual improvement (yr. 1-3) |
Avg. 2011 | Avg. annual improvement (middle period) |
Avg. 2015 | Avg. annual improvement (2011-15) |
---|---|---|---|---|---|---|---|
Vital-L1 | 3.1 | 4.8 | 0.85 | 6.4 | 0.21 | 6.4 | 0 |
Vital-L4 | 1.9 | 4.0 | 1.05 | 5.2 | 0.15 | 5.8 | 0.15 |
Non-vital | 2.2 | 2.4 | 0.10 | 3.3 | 0.12 | 4.5 | 0.30 |
Conclusions
[edit]While the 10 "most vital" articles in Wikipedia did start their lives at a fairly high quality level, the "Level 4 vital articles", a tiny subset of all Wikipedia articles, appear to have started their lives at about the same quality level - or at a slightly lower level - than a randomly selected sample of non-vital articles.
Both samples of the vital articles did increase in quality very quickly in their first three years of life, roughly increasing by one quality level each year. The randomly selected non-vital articles did not increase as quickly, averaging an annual increase of only about 0.10 level until 2011.
Once the vital articles reached the level of 5 or 6 (approximately "B" or "GA" class) the quality increase naturally slowed - the highest level possible is only 7. However, once the non-vital articles reached level 3 on average (roughly "Start" class), their annual quality increase almost tripled to 0.30 levels per year.
More than 50% of all Wikipedia articles are classed as "Stubs", so it is clear that the vital articles examined are well above average quality. Their early starts 2001-2004 are partly responsible for this quality - they have had a long time to mature and get better. But the non-vital article of a similar age did not increase in quality as quickly. Indeed they grew very slowly until on average they reached level 3. The early starts of all the articles examined here limit our ability to generalize about other Wikipedia articles, which typically are much younger.
The concerns about the quality of vital articles should not be dismissed because of the findings shown here. These are all exceptionally important topics - as shown by their exceptionally early start dates. The tools used here show that compared to typical articles they are high quality. But it should be recognized that, because of their exceptional importance, vital articles should all be of exceptional quality.