Wikipedia talk:Cherrypicking

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Essays
WikiProject icon This page is within the scope of WikiProject Essays, a collaborative effort to organise and monitor the impact of Wikipedia essays. If you would like to participate, please visit the project page, where you can join the discussion.
 ???  This page has not yet received a rating on the project's impact scale.
 

consequently misrepresenting what the source says[edit]

I am not sure cherry picking is limited to what a specific source says. If I go across a series of books and pick opinions which support my view, while ignoring within that series of books blatant caveats, then that is also cherrypicking. By presenting controversial ideas as if they are established truths.--Inayity (talk) 20:31, 29 January 2014 (UTC)

Correct, with respect to a series that is meant to be used by a typical reader as one series, such as a multivolume encyclopedia, but not for, e.g., books on a large topic by different authors offered by a publisher as one series (often under one editor) but with the whole series rarely acquired or read by one reader. (The latter type of series is often identifiable by a legend on the copyright page in the Library of Congress cataloguing information or a frontmatter page being devoted to listing titles in the series.) As a practical matter, within that limitation, I doubt the problem is often significant, but it can arise. I'll try to draft a clarification for the essay and, hopefully, add it this or next weekend.
Whether cherrypicking causes noncontroversial content to be replaced by controversial content or causes controversial content to be replaced by noncontroversial content doesn't make a difference to whether it is cherrypicking, although it could matter in how we edit or in whether the article is, say, neutral or properly weighted.
While cherrypicking outside of Wikimedia could include cherrypicking across sources that are related only in being about the same subject, we shouldn't criticize that as cherrypicking in Wikipedia but instead should retrieve the unrepresented source and add it. That's because no editor is expected to know of all of the important sources on a subject, even though it's helpful if they do.
Nick Levinson (talk) 02:40, 30 January 2014 (UTC) (Edit Summary only: 02:50, 30 January 2014 (UTC))
Done. Thank you. Nick Levinson (talk) 20:53, 2 February 2014 (UTC)

Definition[edit]

In the Multiple sources section, the text currently states: "When editing Wikipedia, it's not cherrypicking in general to miss contradictory or qualifying information from a different source than a source that had information already being used, because we don't expect editors to be familiar with all of the possible sources that could be cited on a topic. Therefore, to have gotten information from one source without acknowledging that it was contradicted or qualified in another source is not a valid criticism of an editor's work in Wikipedia as cherrypicking. However, feel free to edit an article consistently with the different source, if the source is otherwise eligible to be used in Wikipedia."

It states that, but only referring to picking certain points from one source is not usually what Wikipedia editors (or people in general) mean when they state that a person was engaging in cherry picking; in my experience on Wikipedia, they usually mean that an editor who has knowingly prioritized what some sources state over what other sources state without the matter being an appropriate use of WP:Due weight. For example, in this recent discussion, I argued that WP:Cherry picking can mean knowingly prioritizing what some sources state over what other sources state; when I read the lead of the WP:Cherry picking essay, and the Multiple sources section lower in the article, I qualified my statement in parentheses by indicating that the editor is aware of sources that state differently. Are others, such as you, Nick Levinson, open to making the WP:Cherry essay a bit broader, but with care? I notice that the Multiple sources section uses the words "in general," but it's not clear as to what the non-general factor applies to -- editors who knowingly prioritize what some sources state over what other sources state without the matter being an appropriate use of WP:Due weight. Flyer22 (talk) 12:47, 12 June 2014 (UTC)

I am open to the possibility and will draft something soon. I'll read the discussion you referenced; so far, I've only glanced at it. I'm just getting through some time-consuming life crises and am only beginning to resume doing more than checking my watchlist. Maybe a clarification will work; maybe a substantive edit is appropriate.
One concern about the proposal is this: Suppose editor A is aware of two sets of sources but personally judges one set to be valid and the other to be nonsensical. Editor A is free to add from the first set and not to add from, or even hint at the existence of, the second. Then editor B comes along and decides that the second set (maybe from independent knowledge) should be cited, too, and edits accordingly. Assuming all other Wikipedia criteria have been met, such as that on due weight, editor A has no business deleting content based on the second source set (and, of course, editor B has no business deleting content based on the first source set). A hypothetical example would be a physician describing treatments of choice for a condition but considering alternative medicines to be dangerous because their use delays good treatments and someone else adding the alternatives because the German government considers them safe and mildly efficacious. Another hypothetical would be adding history based on scholarship and someone else adding on the same history subject but citing children's books (some use of children's literature for adult topics has been accepted for Wikipedia although I mostly disagree and wouldn't use it). I don't think we can require that editor A add from the second set because we'd be imposing a huge research burden on editors to dig up a huge range of tangential sources as a precondition to adding any substantial content. If we had never had an article on Mars and someone were to create one, the burden would almost mean that only a professor assisted by a classroom of students could write the first acceptable version.
Trying to define all nongeneral cases, on first impression, might be too much to do. We generally allow exceptions where circumstances justifying them can be pointed out without requiring that a policy list all possible cases, but I'll consider that aspect, too.
The issue is good and should be addressed. I'll need a little time. Thank you for raising it.
Nick Levinson (talk) 20:12, 14 June 2014 (UTC)
I clarified the lead to distinguish the two senses. I think the principles and language of due weight and neutrality already serve where cherrypicking is beyond a single source.
The date rape article may have had a problem with the deletion of sources; I didn't look into that, I didn't read prior revisions of the article, but in principle that would be a different problem. If someone were to replace all the content and sources of the rape article with a single-source statement that a woman who says she was raped is just an adulteress or a prostitute and therefore couldn't have been raped and therefore rape doesn't exist, that would deneutralize the article but we wouldn't consider it cherrypicking, because the last editor may simply have been unfamiliar (even though probably deliberately unfamiliar) with the other sources and we don't require editors to know all the relevant sources, even sources cited in Wikipedia. We could call it cherrypicking, but that would be redundant of calling it POV editing or UNDUE. Rather than introduce a synonym into Wikipedia practice, I think it's more useful to recognize the narrower problem of cherrypicking from a source with which an editor should have been familiar (e.g., if a source is only 200 words long, an editor should have read the whole thing) by having a word for the distinct problem, thus the essay's focus.
Physics presents a related case. Lots of sources support string theory. But about a few years ago I read a modern book by a professor of physics who doubts its validity, arguing the lack of physical products confirming string theory which he said is required for the Nobel prize in physics and that the number of spatial dimensions (beyond three) keeps increasing and the size of atom smashers keeps growing as proof keeps eluding our reach. Yet finding that book was an act of serendipity for me; most discussions I see of string theory don't mention that minority view. Maybe most up-to-date physicists could cite the critique, but editing Wikipedia on physics, even editing it well, does not require personally being an up-to-date physicist, because neutrality can be achieved by editors who know only of string theory editing accordingly with an editor who knows of the critique editing accordingly, the two sets of editors together producing a neutral and comprehensive article.
Maybe even better would be a hypothetical consensus among notable chess masters that in a certain circumstance the queen should be moved but where one notable grand master, who's alone, says that would be idiotic. The others don't like being called idiots and respond by not acknowledging the lone view. Editors following the consensus probably would have no clue that there's even a debate. We shouldn't blame the editors for not knowing and it would be unfair to say they were cherrypicking, which is rather accusatory. If an editor encounters that lone view on chess in a source, that editor should add it.
Religion presents another kind of example. Wikipedia reports on Islam and on Hinduism. A great many editors who adhere to each faith doubtless know how to find secondary sources that are studies of the other faith, and maybe can even name them, but may disbelieve them, and we do not expect anyone editing on either faith to avoid cherrypicking by retrieving, researching, and citing sources they don't consider valid. And we don't ban an editor on Islam for refusing to edit on Hinduism, or vice versa.
Regarding the clause "in general", where the essay says, "[w]hen editing Wikipedia, it's not cherrypicking in general to miss contradictory or qualifying information from a different source than a source that had information already being used", an example of an exception to the general case where an editor should be held adversely responsible may be this: An editor has cited a first edition of a book, which states two facts, and has also cited the second edition, which contradicts one of the facts, but cites the second edition for the uncontradicted fact without editing Wikipedia about the contradiction. But explaining that in the essay would require quite an expanded explanation for a case that would be rather rare, and for which a remedy already exists (someone else can edit for the contradiction); and we'd have to figure out when someone should be adversely responsible for missing the chance to edit.
If a Wikipedia article should be criticized for POV or UNDUE, I would do that without adding another word for the same thing. If a deletion of sourced content was unexplained, editors who revert or store often say that in an edit summary. If a Wikipedia article attributes to the source what's not in the source, that comes within not misrepresenting a source. I reserved cherrypicking for a narrower kind of misrepresentation, for which we otherwise don't have a term. It may be that when I came across the term on talk pages prior to creating this essay I understood the term as being used in the narrower sense and that other editors understood it as being used in the wider sense, and that kind of ambivalence would be confusing. I opt for encouraging a common understanding of the term, preferably the understanding for which we don't have another word.
Nick Levinson (talk) 19:42, 15 June 2014 (UTC)
Hi Nick, boy I disagree with you! I work on a lot of health related articles, and one of the biggest problems I come across is editors who have a strong POV, and constantly add primary sources that support that POV, and add loads of content based on them. Cherrypicking sources is a big problem, especially in selecting one primary source among the many that exist that say different or contradictory things. This behavior is against the deep preference on WP for using secondary sources to determine weight, but primary sources are of course allowed, so editors can do this over and over by pointing to that allowance. A discussion about cherrypicking sources - especially primary sources - that reflect a particular POV - would be very helpful to include in this essay - and to include in the lead of it. Editors do have a responsibility to take time and study secondary sources about topics they want to edit, so they can accurately represent weight. Above you talk about Hindi and Muslim editors.... I would say that if the Hindu editor chooses to edit content about some topic related to Islam, he or she does have a responsibility to study the secondary literature on that topic and edit according to those reliable sources and to not just cherrypick primary (or other) sources that support whatever position they like, or that the primary source in front of them happens to state... do you see what I mean? Jytdog (talk) 11:01, 16 July 2014 (UTC)
Quick response: We agree on the importance of the article accurately reflecting sources generally and on not having an article be a POV article. Where we disagree is on the responsibility of one editor, as viewed in Wikipedia's policies and guidelines. What must be neutral is the article, not an editor. If an editor inserts a POV and thereby deneutralizes the article, the rest of us can neutralize it. To hypothesize a good-faith extension, suppose an editor claims that our solar system has 40 planets and produces a source saying so; it being a fringe claim, we delete it; the 40-planet editor in good faith adds another source to like effect, we delete that, too, and so on, so that it isn't edit-warring but it misrepresents either the consensus view or significant dissents. If explaining neutrality, fringe, etc. doesn't work and the editor persists, we can address that editor's misbehavior. This essay's purpose was to define cherrypicking for a case where an editor has presumably read the source the editor cited but cherrypicked specifically from within that source. It isn't useful to accuse an editor of cherrypicking from all sources because generally we exempt any particular editor from being familiar with most of the sources on a subject, relying instead on the community of editors to achieve neutrality in an article, including determining proper weight. Regarding Hindu/Islam, about which perhaps I wasn't clear enough, both are religions and therefore both come within the single discipline of comparative religion, but we don't require one editor to have that level of expertise so that they should be familiar with the sources in the different religions before editing on one of the religions; instead, we rely on the community of editors to bring the article to where it achieves the necessary neutrality on religions. Elsewhere in Wikipedia, I was in a discussion about editing by children on topics of interest to adults (such as music); we allow children to edit even though they likely have less knowledge than adults have. We, of course, don't tag an article as having been edited by a child and therefore that readers should accommodate the effort by the child, much as we might in someone's home when a child walks around showing off their new artwork, but we probably edit the article to bring it up to standards where the child's effort doesn't meet Wikipedia's article standards.
I'll likely be back online Saturday and maybe I'll clarify the article relative to your comments by then. Thank you for bringing them up.
Nick Levinson (talk) 02:02, 17 July 2014 (UTC)
I've clarified, in body and lead, on familiarity with numerous sources being helpful to editing.
On sources as primary, that doesn't affect the problem of cherrypicking by any definition either within a source or across all sources relevant to an article. Secondary and tertiary sources are just as susceptible to cherrypicking. Although you wrote especially of primary sources, I think it's possible you would agree on this nondistinction, but, if you don't, please clarify why cherrypicking would be more offensive when done from primary sources than from others. If it's simply that cherrypicking is more common when done from primary sources, I can address that concern in the essay, although I hadn't noticed that to be more frequently relevant to charges of cherrypicking, and perhaps that's a function of wihch articles you or I edit. Let me know if that's the concern.
Probably editors who cherrypick tend to do lower-quality editing generally, and it's possible to include a caution against that, i.e., if someone is cherrypicking they should check what else they're doing wrong, but I doubt it's more true of cherrypicking than it is of using unreliable blogs, failing to attribute quotations, and so on: all serious problems, but a caution against multiple violations of policies could be added to all policies but then it would appear to be redundant and I suspect such cautions would be deleted as unnecessary. We could write a separate essay on not committing multiple violations, but I'm not sure what practical purpose it would serve that the separate policies don't serve now. There's no claim that, for example, being neutral cancels the need for verifiability. It's generally accepted that all policies apply simultaneously.
On cherrypicking as meaning omitting "different" information, different is too vague for this context. We agree on not omitting "contradictory" information. But to require reporting "different" information would be far too burdensome. For example, if no editor may report on a drug that is for treating a condition unless they also report on (say) the other 13 drugs that are also used for the same purpose and on the possibly hundreds of other drugs that could be used off-label for the purpose would mean that most editing would come to a halt while we wait for an expert to edit.
If we require every editor (or every editor of substance, since we could leave out editors who edit only on grammar, style, and so on) to know the range of sources on a subject before editing, we would essentially require that editors be experts. We would then need to know why an editor self-proclaims being an expert, so we'd need editors to post their credentials on their user pages. Statistically, some would appear to be exaggerated or invented, so we'd quickly then want peer or community review of editors' credentials. To confirm claims to credentials, anonymity woud probably have to come to an end or there'd have to be created a committee entitled to break editors' anonymity to confirm claims to credentials. If an article's subject specialty is of interest to very few editors, limiting editing to credentialled experts and trying to populate a committee to confirm credentials may result in no one able to edit the article. What to do with old articles that don't meet a new requirement that editors be experts when experts cannot be found would require a procedure that may not yet exist. All this could be implemented, but only against Wikipedia's policies. Amending policies can be proposed, but I doubt the proposals would last long.
Nick Levinson (talk) 21:05, 19 July 2014 (UTC)