|Featured Editorial: "Drive-By Reviews": A Retrospective About Reviews During Backlog Drives
- by ΛΧΣ21™
Guidance is sometimes needed to keep the reviews at a desired standard. This is mostly done to avoid the so-called "rubber-stamp" or "drive-by" reviews that are commonly seen at drives.
Reviews are the process into which a selected user verifies that an article meets an specific standard. Inside the good article process, the task of the reviewer is to make sure the article is up and already meeting [at the time of its promotion] the good article criteria, which consists of six different elements: The article has to be well-written, factually accurate and verifiable, broad in its coverage, presents the facts in a neutral point of view, be stable and illustrated [if possible]. Usually, some of the nominations listed already meet these guidelines, and they can be promoted without further comment. Although, a pattern of dissaproval of this methodology has been widely seen recently between both reviewers and constant contributors to the process.
The main concern that those users arise is the existence of the so-called "rubber-stamp" [or "drive-by"] reviews. Rubber-stamp reviews are those in which the reviewers slightly review an article against the criteria, and overlook critical issues the article may have. This brings up to the promotion of articles that, one way or another, fail to achieve the necessary standard considered from good articles. As history has proven, and previous backlog elimination drives have showcased, criterion 4 [neutrality] is the most difficult to evaluate, and criterion 6 [images] is the one which gets overlooked more often. Several other criteria are measured differently from user to user, and some users may have higher promotion standards than others.
Because of this, promoting articles without leaving some comments, even if they end up being beyond the criteria, is highly discouraged and most users recommend against so. Why? The main reason for this is that leaving a review empty, only stating that the article is being promoted, can be confused with the rubber-stamp reviews mentioned before and does not demonstrate how the reviewer measured the article against the criteria. But, from where does the rubber-stamp review come? The origins of this peculiar type of review dates back to the beginnings of backlog elimination drives, and is the direct result of the existence of competition between the users participating in such drives.
This affirmation was successfully confirmed in the last June-July 2012 drive, when, after the addition of a leaderboard, users started reviewing a considerable amount of articles in a very short timespan, only to achieve the first position of the drive. So, one of the main reasons of the existence of drive-by reviews is competition. A first-place race that renders reviewers, in their attempt to achieve the highest number of reviews, unable to take the time to verify the articles against the criteria and spot all the possible failures that it may have.
How do we solve this? Drive-by reviews are not a big issue by themselves, and represent a small percentage of all reviews done inside the process. Several users may argue that the number is way above, others that it's way below, but in general, drive-by reviews are only a big issue when backlog elimination drives are running, and this have caused several never-ending discussions regarding the very existence of such drives, and all the undesirable phenomena that appears under their development. Currently, after the request for comment that ran for 15 days in November 2012, the new format designed for drives appears to have slayed drive-by reviews from them, although new challenges have appeared. This only demonstrates that the only way to find solutions is establishing changes and tweaking the current system, until we find one that perfectly fits what community asks for. Now, we have several balance issues regarding the highlighting of older nominations over newer ones, but that is just another issue that increased in popularity after community found consensus over issues that were darkening it before.
What on Earth is this based on?
- "Usually, around 35% of the nominations listed already meet these guidelines, and they can be promoted without further comment." Says who? Is this just a statistic plucked from thin air? J Milburn (talk) 11:29, 4 December 2012 (UTC)
- "Drive-by reviews are not a big issue by themselves, and represent hardly 5% of all reviews done inside the process." More statistics plucked from some unstated location. Further, as you have failed to define "drive-by review", this statistic is useless. J Milburn (talk) 11:34, 4 December 2012 (UTC)
- I haven't failed. This is a drive-by review: "Rubber-stamp reviews are those in which the reviewers slightly review an article against the criteria, and overlook critical issues the article may have." Those statistics are made up of my personal investigation at GAN over the last 5 years, including approximately 5,000 reviews. Same for the other statistic above. — ΛΧΣ21™ 14:06, 4 December 2012 (UTC)
- If a "rubber-stamp review" is the same as a "drive-by review", then no, you haven't. You didn't tell us that, though. However, how have you had the time to personally review 5,000 reviews and accurately judge whether the original reviewers have "slightly review[ed] an article against the criteria", but "overlook[ed] critical issues"? If you have, I'd be interested in reading up on your methods, and taking a look at your findings. Surely, research of that level deserves better than a few numbers in a short article like this... J Milburn (talk) 15:18, 4 December 2012 (UTC)
- Well. I've had time. I enjoyed checking reviews and that was the way I learned how to review articles [although I still make mistakes]. I can explain my methods carefully to you if you wish, because such data is interesting. As an example, my method and my research provided me the information about criterion 4 and criterion 6. Notwithstanding, as I said above, you should take my numbers with a grain of salt. This essay represents my vision about an specific matter related to reviews. Oh, about "rubber-stamp review" and "drive-by review", they are, indeed, the same. Actually, the name isn't important. Such types of reviews have received very different names since 2005 to date. What is important is the essense, how the reviews are done, where do they come from, and why. That is what I discussed on the essay. — ΛΧΣ21™ 15:24, 4 December 2012 (UTC)
- If that's all you're doing, you need to be far, far more careful with your language. "Usually, around 35% of the nominations listed already meet these guidelines, and they can be promoted without further comment." That is taken from the article, and is a normative claim. Now you seem to be implying that you only meant to claim that 35% of nominations were promoted without further comment. That's a descriptive claim, based on statistics statistics from "your personal investigation", the methodology of which has barely been explained. The normative claim, I would argue, is sheer bollocks, while the descriptive claim may be or may not be- I don't have access to your statistics, I can't possibly try to analyse them. J Milburn (talk) 15:37, 4 December 2012 (UTC)
- Well, what I tried to say was that around 35% of the articles currently listed at GAN can be promoted without further comment because they are up to standard. This is true, with or without my investigation, I guess. My original claim is normative, not descriptive, although it should be the latter. Of course, english is not my native language and I still have some issues with the correct interpretation of my writings, as you can see :/. Talking about my metholodogy, this is mostly what I did: Check what the reviewers mostly commented on, what they mostly overlooked, what they were focusing one, specific concerns about the topic of the article, etc. Then, I collected those numbers and evaluated the results. I did that from 2005 to 2007, and then from 2009 to 2011, if my memory works well. I checked articles from 2008 as well, but I was busy that year. I am thinking about doing another scan in 2013 to build up a more accurate statistic and post my findings for everyone to see. — ΛΧΣ21™ 15:43, 4 December 2012 (UTC)
- "I tried to say was that around 35% of the articles currently listed at GAN can be promoted without further comment because they are up to standard." That's exactly the kind of claim I want to dispute. If you genuinely believe that around 170 of the around 480 currently listed at GAC can be promoted without further comment, then you have a vastly different GAC philosophy to myself and, I would dare say, the vast majority of other reviewers. I have completed over 100 reviews, and, if I remember correctly, only once did I promote off the bat- even then, I left comments. This includes articles by prolific FA producers like Sasata, Ucucha, Ealdgyth and others. In any case, how could you possibly know something like that unless you had actually reviewed a good hunk of the articles sitting in the queue right now? J Milburn (talk) 15:54, 4 December 2012 (UTC)
- Well. You made your point. Although that's why I said can and not are promoted. I don't promote articles without commenting further, although of my 100 reviews, I have promoted around 20 with no more than two nitpicky comments. Hmmmmm. Now that you added the numbers [170 and 480], seems like a made a mistake... Let me do some searching... — ΛΧΣ21™ 16:00, 4 December 2012 (UTC)
- Self-fulfilling prophecy? Hahc21, you initiated the RfC, you closed the RfC, you provided those "35%" values that nobody could verify, and now you wrote this piece to "prove" that you're right? Seriously? OhanaUnitedTalk page 14:49, 4 December 2012 (UTC)
- Ehm no. The 35% comes to scans I have made of the GAN page. Also, this is an essay, so the numbers may be taken with a grain of salt. This piece expresses my opinions about the matter, not the truth. Why is everyone thinking I am writing something that is universally true? And also, I initiated the RFC because we, the GAN community, needed to find consensus to some issues that have been raised since a long time ago, and I closed the RFC with community consensus. The results of that RFC are what community thinks, not me. In my opinion, I'd prefer the "New Reviewer Recruitment and Training Drive" that was proposed by The Special One, but community wants drives. I did not push any proposal because, personally, I did not have any favourite [A main reason why I did not voted on the page, if you check]. Although, I expected users to blame me for it... — ΛΧΣ21™ 15:19, 4 December 2012 (UTC)
- "Also, this is an essay, so the numbers may be taken with a grain of salt. This piece expresses my opinions about the matter, not the truth. Why is everyone thinking I am writing something that is universally true?" If you throw your opinions out like this, you're just inviting criticism. If you want to have your views and be insulated from criticism, don't publish them; especially not as an opinion piece. J Milburn (talk) 15:37, 4 December 2012 (UTC)
- I have nothing against criticism, actually I'm glad you are commenting my writing. Actually, I was expecting criticism for this piece of opinion. What scares me is criticism about the RFC. — ΛΧΣ21™ 15:45, 4 December 2012 (UTC)
- After realizing that exact numbers are original research and may not be accurate, I have removed them. I believe that they add nothing to the context and to what I was trying to explain with the editorial. — ΛΧΣ21™ 18:18, 4 December 2012 (UTC)
- Comment: I find it telling (and of concern) that you chose to say enforcement is necessary instead of guidance is necessary. You're speaking of inexperienced (in article review, anyway) editors trying to help with a backlog. They make a mistake, and you "enforce" rather than gently help them to understand criteria? No wonder you have backlogs, and sometimes find it hard to get new editors interested in this sort of work. The attitude is hostile to newcomers. And yes, I'm aware it is a caption, and may be intended to be funny, but I don't see how that makes it any more acceptable. Even a lion tamer image would be more acceptable as that implies training not military "enforcement". One puppy's opinion. KillerChihuahua?!? 18:51, 5 December 2012 (UTC)
- Oh. I changed it to "guidance", you are right. I try not to be hostile, to newcomers. I usually welcome them and offer as much help as I can. it was intended to be funny, but you point is good. Thanks. — ΛΧΣ21™ 23:28, 5 December 2012 (UTC)
- Yeah, I figured it was supposed to be funny. I'm glad you made the change; sometimes it is hard for an insider to see how phrasing would appear to an outsider. KillerChihuahua?!? 12:49, 6 December 2012 (UTC)