Jump to content

Wikipedia talk:Plagiarism: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Accidental plagiarism in a wiki environment: Possible, yes, but highly improbable and not worth worrying about.
→‎Plagiarism or excellent article?: obviously not. If the source is cited, it is not plagarism. There may be other problems, but not "plagarism".
Line 554: Line 554:
==Plagiarism or excellent article?==
==Plagiarism or excellent article?==
Our FA on [[Rabindranath Tagore]] extensively [[Rabindranath_Tagore#Citations|cites a single work, Dutta & Robinson]]. Is this plagiarism, or is it just an excellent article, based on the most authoritative source available? <font color="#0000FF">[[User:Jayen466|Jayen]]</font>''<font color=" #FFBF00">[[User_Talk:Jayen466|466]]</font>'' 19:22, 3 May 2009 (UTC)
Our FA on [[Rabindranath Tagore]] extensively [[Rabindranath_Tagore#Citations|cites a single work, Dutta & Robinson]]. Is this plagiarism, or is it just an excellent article, based on the most authoritative source available? <font color="#0000FF">[[User:Jayen466|Jayen]]</font>''<font color=" #FFBF00">[[User_Talk:Jayen466|466]]</font>'' 19:22, 3 May 2009 (UTC)
::::If the source is cited, it is not plagarism. There may be other problems, but not "plagarism". --[[User:SmokeyJoe|SmokeyJoe]] ([[User talk:SmokeyJoe|talk]]) 12:00, 12 May 2009 (UTC)
:Without access to the text, how is one to know? Google book, unfortunately, doesn't offer a preview. :/ --[[User:Moonriddengirl|Moonriddengirl]] <sup>[[User talk:Moonriddengirl|(talk)]]</sup> 19:41, 3 May 2009 (UTC)
:Without access to the text, how is one to know? Google book, unfortunately, doesn't offer a preview. :/ --[[User:Moonriddengirl|Moonriddengirl]] <sup>[[User talk:Moonriddengirl|(talk)]]</sup> 19:41, 3 May 2009 (UTC)
::I thought there might be a potential problem of "substantial taking", independently of the quality of any paraphrasing, on any topic where there is really only one authoritative standard work. For example, this might be the one available, definitive biography of a minor historical figure, or perhaps also a book or paper by a highly-regarded theoretical physicist who is considered to have written the most authoritative work on a particular topic. For an apparent example of the former, see this GA: [[Hugh Trenchard, 1st Viscount Trenchard]]. More than two-thirds of its 150 footnotes are to Boyle 1962. Even assuming there is good paraphrasing, most of the article is clearly based on that work. By some of the academic standards we have discussed here, this is plagiarism. <font color="#0000FF">[[User:Jayen466|Jayen]]</font>''<font color=" #FFBF00">[[User_Talk:Jayen466|466]]</font>'' 21:11, 3 May 2009 (UTC)
::I thought there might be a potential problem of "substantial taking", independently of the quality of any paraphrasing, on any topic where there is really only one authoritative standard work. For example, this might be the one available, definitive biography of a minor historical figure, or perhaps also a book or paper by a highly-regarded theoretical physicist who is considered to have written the most authoritative work on a particular topic. For an apparent example of the former, see this GA: [[Hugh Trenchard, 1st Viscount Trenchard]]. More than two-thirds of its 150 footnotes are to Boyle 1962. Even assuming there is good paraphrasing, most of the article is clearly based on that work. By some of the academic standards we have discussed here, this is plagiarism. <font color="#0000FF">[[User:Jayen466|Jayen]]</font>''<font color=" #FFBF00">[[User_Talk:Jayen466|466]]</font>'' 21:11, 3 May 2009 (UTC)

Revision as of 12:00, 12 May 2009

Userbox?

I got bored and played around with making an anti-plagiarism userbox. Here it is, if anyone wants to improve it or to use it:

This user removes copied and plagiarized material on sight and will never apologize for doing so.

{{User:Politizer/Plagiarism box}}

I'm thinking I should try to find a way to get a link to this project page worked into it. (I tried having the image link there, but {{click}} doesn't seem to work inside userboxes.) —Politizer talk/contribs 02:03, 16 November 2008 (UTC)[reply]

Well... one of the options for plagiarised material is to rewrite it yourself, so it's not completely cut-and-dried as to simple removal. We haven't really thrashed out here what exactly to do about plagiarised material, nor what to do about serial plagiarisers. (Although it certainly seems that just linking this page is sufficient to send people into fits :( ). Franamax (talk) 02:22, 16 November 2008 (UTC)[reply]
True; that's what I've done with some material I find (especially when it's just a couple sentences, such as the issue that sparked the big fight in the section above). It's generally when I come across an entire multi-paragraph section or article that I get mad and remove it (especially if it's an article that I have little interest in and don't want to take the time to clean it up) with a note on the talk page explaining where the material was taken from. In a perfect world I would be able to sit down every time I find plagiarized material and turn it into good prose with references back to the source, but generally I don't have the time or attention span (and I get indignant too easily). I know a lot of people object to simple removal of plagiarism, which is the reason for "will never apologize for it" in the box.
Thanks for your input! —Politizer talk/contribs 02:31, 16 November 2008 (UTC)[reply]
Yes, that's another piece of the puzzle on "what to do about it": the preferred option is to just rewrite the offending material yourself, especially if it can be easily done. For more complex passages or in topic areas with which you're not familiar, plagiarized material can be removed from the article - however, this should only be done if you additionally leave a note on the article talk page pointing to your removal and describing your rationale. This is necessary so that editors more familiar with the article can rescue the material themselves. Now if only that could be said in half the word-count! :) Franamax (talk) 02:49, 16 November 2008 (UTC)[reply]
On the technical side, you could just change the two links to be WP:Copyright#Contributors.27_rights_and_obligations and WP:Plagiarism (and there should be a seealso on this project page linking to the mainspace Plagiarism article). Franamax (talk) 02:27, 16 November 2008 (UTC)[reply]
(edit conflict)Good idea, that's a lot better than the links I have there now. —Politizer talk/contribs 02:31, 16 November 2008 (UTC)[reply]

Another view, and a plea for guidance

I've been adding new articles to WP as part of the DNB sub-project of the missing articles project. I always start by adding the original DNB source to Wikisource. I then start my new WP article by dumping the original Wiksisource text into WP to start the article, adding a general attribution template and an external link to the Wikisourse article. I then begin editing the article just as I would any article that needs copyediting, with no attempt to distinguish the DNB text from newer text.

In my opinion, the origional DNB authors and publishers have no legal or (Berne convention "moral") rights to the DNB material, because copyright has expired. However the original authors have essentially the same (non legal, non-Berne convention) moral rights as any other contributor to Wikipedia. That is, if we leave petty legalisms aside, we have a moral obligation to see to it that an interested reader can determine which contributor wrote each and every word in an article. However, we have no more (and no less) obligation in this regard to a PD author than we do to a GFDL author. That means that there is no need to keep everything in quotes, and there is no need to attribute every single word in the article, inline, to the individual author. If we did this for GFDL editors, none of our articles would be readable at all. Of course, if something is in quotes, it must be attributed (to make any sense at all of quoting) and it must not be edited. But we should not put PD passages in quotes unless we are somehow commenting on the quoted material, or the if the fact that a particular author made a particular statement is itself notable. Except in these unusual cases, we should treat the PD author as a proto-wikipedian who would have added the material to WP under the GFDL, given the chance.

Now to the Plea: Is there something wrong with my methodology? -Arch dude (talk) 00:41, 7 January 2009 (UTC)[reply]

I don't know anything about the legality of this stuff (although what you are describing seems to be comparable to the approach taken towards 1911?) but my personal opinion is that just because we legally can copy PD text doesn't necessarily mean we should. I view PD sources as sources for information, not as scaffolding for the article—and I treat them by using a lot of the information and citing them a whole lot as refs, not by copying the text. That is just my personal editing style, though. Copying text from PD sources may not constitute plagiarism per se, but I do believe it constitutes undesirable (not necessarily "poor") writing style. No offense meant to you as an editor—at the rate with which you are adding new articles, and the wide variety of topics, what you're doing is probably the only way to churn out that much material—but my general feeling is that WP should be a new source, not an agglomeration of other sources (even if those sources are tweaked, copyedited, and improved, which I assume is your intention in incorporating DNB text). Again, I have no policy or anything behind me saying that, it's just the way I feel. Politizer talk/contribs 01:30, 7 January 2009 (UTC)[reply]
Thanks for your candid response. I think I understand your postiion, and I am happy that you seem to think of this as an asethetic problem rather than an ethical problem (assuming that I understand your position.) In this spirit may I ask you: in your opinion, how does making a change to the words written by (say) Leslie Stephen in 1890 differ from making a change to the words written by (say) User:Politizer in 2008? WP (in my opinion) has the equivalent moral and ethical obligations to these authors. Note that the Oxford University Press, when updating the DNB text to create the ODNB, does not appear to feel any moral obiligation to preserve Stephen's text with full attribution. Why should we? -Arch dude (talk) 02:23, 7 January 2009 (UTC)[reply]
That's a good question. I don't know all the real logic behind this, but my intuition is that I write my contributions for Wikipedia, and Stephen didn't. While there's nothing wrong with using Stephen's words for WP, I personally prefer to use stuff like that as a cited source. I agree with you that, in cases of PD text like this, it's more a style issue than an ethical issue.
The other thing I have against using PD text is that a lot of readers may not understand copyright law and how public domain works, and when seeing a WP article that's similar to an old source they might think it was plagiarized; even if they're incorrect in thinking that, WP loses credibility for that reader. I know that was my own reaction the first time I encountered PD text (from DANFS) before I was more familiar with WP and copyright issues. Politizer talk/contribs 02:27, 7 January 2009 (UTC)[reply]
I've said this before somewhere: I think it helps to conceptualize the wikipedia editors as one collective of authors. Politizer's words do not need to be distinguished from my words; "we" wrote the article together. However, it is all a lot clearer if we keep track of other authors' words, whether the other is an individual or another PD or private collective. I definitely appreciate your asking for views here, and for your sensitivity that is already clear. I do think it is a moral issue, though, and it is a marketing-type issue too: for wikipedia to be a credible source it is / would be better for wikipedia to function like one careful collective of authors giving credit where credit is due to others (for wording as well as for content). I believe that it has become practice that for articles to reach FA status, they cannot include non-quoted passages of other text. So, I would hope you would just put the passages from the DNB source (what is that, anyhow?) into block quotes and keep track of which words are theirs. It should make it easier, in the long run, to develop those articles. doncram (talk) 02:44, 7 January 2009 (UTC)[reply]
I believe DNB refers to the Dictionary of National Biography. Politizer talk/contribs 02:49, 7 January 2009 (UTC)[reply]
Sorry, but I conceptualize the WP editors as members of the human intellectual collective. "We" are not separate from shcolars, authors, and editors that came before us, and "our" moral obligations to each other are no different than our moral obligations to our predecessors. We are all part of a chain of scholarship that extends back to the first written word. Yes, we need to preserve attribution, but we also need to help our predecessors continue to inform our readers. If every author rigidly preserves quoted content forever, then we end up with a morass of quotatins 30 levels deep with 30 levels of footnotes: clearly an insane result. Note: the Dictionary of National Biography (DNB) is a 63-volume compendium of British biographies, written over a period of 15 years from 1885 to 1900. -Arch dude (talk) 03:02, 7 January 2009 (UTC)[reply]
(e/c) I may have the indent level wrong, but I'll comment anyway:
  • Our basic principle is that we attribute the original words of the authors to those authors. Further changes can be traced through article history, but the original creative contribution has to be attributed - and by that I mean the specific words. In the case of PD-by-time-elapsed work, I doubt there's a legal requirement, I do feel there is an ethical requirement, and of course here on this page we're (slowly) working toward a policy requirement.
  • For PD works, the single best approach is to rewrite in your own words what the original, and almost always anyway outdated, source has to say. Next best is to blockquote the text. Next after that, make a single edit to insert the copied text, along with a PD-attribution template.
  • Moving something out of blockquotes, or changing something within the quotes, is problematic, unless it's fully rewritten. Extra care must be taken if this is done, such as long edit summaries or accompanying talk page notes. This satisifies the GFDL History section requirements.
  • And in any case, I think the best approach is to just ask yourself: "Is there any way someone could take these words as being mine, when I know myself that they're not?" There's tons of server space, so just make sure that when you incorporate PD text, you make sure that a reader in the future can trace back exactly (on this site) what text was copied. Nutshell: give credit where credit is due; assume no undue credit.
Just my opinions, but there they are. Franamax (talk) 03:09, 7 January 2009 (UTC)[reply]
In addition to what Franamax said about the minimum standards for citation, I consider that using significant parts of a old DNB biography in a WP article is an inadequate way to proceed if there are better sources--which for UK historical figures would include as a minimum the current edition of the DNB, and a check for other reputable biographies. This is in most cases over 100 years after the articles there were prepared, and the view of what facts are reliable may have changed, as well as the interpretation.
With respect to interpretation, I wold use the old DNB only for the purpose of showing what the accepted UK academic view of people was at the period when it was written, and that by the named individual who wrote the article there--Stephens did not write the DNB, but edited it--the articles are all signed. Additionally, an errata vol. were published in 1966.
There is an exception--if a contemporary reliable academic source says that the old DNB bio of an individual is still reliable, it could be used--if accompanied by that quotation. DGG (talk) 22:34, 7 January 2009 (UTC)[reply]
Stephen was the original editor. There are more than 600 authors of original DNB articles. Stephen did in fact write a few of the articles, including at least one of the articles (James Stephen (undersecretary)) I added to WP. Our current discussion here is about plagarism and attribution, not about whether or not a DNB source dump is a good way to create an article. I personally think that it is lot better than not having an article, but we should have that discussion over at Wikipedia:WikiProject Missing encyclopedic articles/DNB (and I agree with you that your way can create a better article.) So, what do you think about attribution? I believe that starting the article with a raw dump of the DNB text, with attribution in the edit summary, an attribution template on the page itself, a link to the text at Wikisource, AND a comment on the talk page (yes, all four) is more than adequate to fulfill our moral obligation to the original author. Note that the (tiny) project DNB team over at Wikisource is spending a very large amount of effort to postively identify the individual authors for each of the 29,000 articles, so in a sense we are doing a better job of attribution than the original DNB did.-Arch dude (talk) 01:18, 8 January 2009 (UTC)[reply]
Now I'd say that's barely adequate, since the original author didn't agree to the terms that their text could be mercilessly edited. The question arises as to when, if ever, the PD-attribution template can be removed: every word may have changed, but the original author's article structure may still be intact (their creative contribution to "approaching the subject"). However in practical terms, starting with a straight dump and including the "all four" seems to me to satisfy the GFDL requirements, in particular the History section which allows traceback and attribution. Arch dude, it seems that nothing is lost by your approach, so (just at this moment at least :) it satisfies me. Franamax (talk) 01:35, 8 January 2009 (UTC)[reply]
We need to be clear about what we are trying to accomplish. This discussion is about The WP Plagarism policy. This is distinct from "the GFDL requirements" from copyriht law, and even from academic plagarism requirements. In order:
  • GFDL does not mention plagarism. There in nothing in the GFDL that prevents direct copying of a PD work.
  • Copyright law places no restriction on PD work.
  • Academic plagarism rules are not binding on WP, although I think we should use them to guide WP policy and guidelines.
WP policy is what we are working on here. Speaking personally, I feel no more obligation to the DNB authors than I do the WP editors. The DNB authors' works are released under PD, and the WP editors work is releaseed under GFDL: Note that the DNB author's work is also being "relentlessly edited without attribution" by Oxford University Press in later editions of the DNB, who purchased the copyrights in 1917 from the the first publisher, who acquired the rights from the original authors. As it happens, I feel that we do have an obligation to our readers to preserve attribution, but that obligation is to our reader: that's what a plagarism policy is all about. But our readers already know (or should know) that WP is a collobarative work, and any reader that so desires can learn how to discover exactly how an article has evolved. By ensuring that the original PD work is fully cited when initially added to the article, we have fulfilled our obligation to our readership. -Arch dude (talk) 11:05, 10 January 2009 (UTC)[reply]

Coming in late, comments

Kudos to whoever introduced the phrase "Copyright and free are short words but complex concepts." I like. :)

In terms of "free works", I think it's important to note that licensed is not the same as public domain. GFDL works remain copyrighted, though the license for reuse is liberal. If they are used outside of license, copyright infringement still occurs. I'm not going to implement that directly, though, since I'm new to this conversation and want to get a feel for it before diving in. :) Also, it's probably worth noting that there are compatibility issues with CC on Wiki text. We're still unable to use material that is "attribution share-alike". I'm not real clear on the differences between the various CC licenses (though it looks like I'm going to have to learn 'em), but I think it's probably important to clarify here to prevent people reading this and thinking that all CC text is free to use.

Wikipedia doesn't currently insist that all copied work is attributed (ETA: wait: I'm thinking on rereading my own comment that I'm taking attribution in the "inline" sense, whereas what's meant here is Attribution (copyright). If so, we might want to clarify for folk like me that general attribution, not line-by-line, is what is meant, by contrast to my reading of WP:NFC, where inline is required for quotes). We like it to be credited, sure. But we have a whole slew of PD attribution templates that rather vaguely note that some of the text on the page is copied from a PD source. I think we need to soften that wording. I'm inclined to think that exhortations like "it is imperative that their work is distinguishable from the prose of the Wikipedia article" is likely to meet difficulty finding consensus, since it is evidently quite a change from common practice. I do like "it is important to retain an anchor to the originally copied text, so that subsequent changes can be traced", which fits very well with the spirit of GFDL, but I expect that ultimately material will be melted together just as our contributions are. But it seems likely that no "anchor" (in terms of an active internet presence, as I understand it) will exist, which would perhaps make it better to urge people to retain the citation?

With respect to "What is Not Plagiarism": IANAL. Lists of information may or may not be copyrighted, depending on the nature of the list. :) If creativity has gone into the selection of elements (in terms of which facts are included and order of presentation), then it may be protected by copyright (see Feist Publications v. Rural Telephone Service). If the list is a comprehensive list--such as an alphabetical directory of telephone numbers or a list of ingredients in a recipe--it is not. If the list is a list of words or song titles, it's not protected. If the information is presented in sentence form, it probably is. But that's a question of copyright. The question of plagiarism is different. Copyright does not protect "sweat of the brow," or the labor that goes into compiling information, but it seems to me that plagiarism probably does. Unless a list is a common compilation (which one might find freely reproduced anywhere, say), I'm inclined to think we ought to give credit. Thoughts?

Speaking of jargon, "Elgoog hcraes?"

Copying of copyrighted works: "Such additions can be dealt with either by attribution, turning it into a quote with a source, or by truncation or removal of the copied material." This seems to suggest that each of these remedies is equal. I'd suggest appending something like "depending on the nature of the infringement." I like the next bit, which seems straightforward and clear-cut (though we may need to note that extensive quotation can also refer to overuse of short quotations from one source; I can provide examples :)). I get a bit caught here, though: "This may still be a violation of copyright as a derivative work, though the same concerns about plagiarism would apply if the phrases, concepts and ideas in the copied material were not attributed to the original author." Though seems to indicate a contrasting idea. Where's the contrast? I'm not quite sure what's meant to be conveyed by this, but wonder if rephrasing would be useful: "This may still be a violation of copyright as a derivative work. Structure, presentation, and phrasing of the information should be your own original creation. The same concerns about plagiarism apply if the phrases, concepts and ideas in the copied material are not attributed to the original author." (I have utilized some language from the current proposed revision to the copyright FAQ, here.) I wonder if it would be helpful to incorporate some of User:Dcoetzee's suggestions from this talk thread here as well. I think they're very good.

Look forward to feedback. Lacking any, I may WP:BOLD. :) But, again, I'm not familiar with the work-environment of this proposal, so I want to introduce myself to it before diving in. Hi. --Moonriddengirl (talk) 12:51, 7 January 2009 (UTC)[reply]

Short answer before the longer ones after some analysis:
  • MRG, I'll accept your kudos, thank you. :) [1] Nice to see you, what took ya so long? I'm sure you can add some valuable perspective, given your involvement with many of these issues.
  • The work-environment is much like some government offices, one person at the counter, the rest of us having a coffee-break. :) I'll try to give some feedback anon, but go ahead and make judicious changes as you see fit.
  • The jargon bit reflects our incomplete development process, there are still quite a few raw bits here. I think maybe the OP didn't want to indicate a preference for an individual search-engine. I'd have used the more sophisticated "Oogle-gay Earch-say", but to each his own. :)
More soon (hopefully). Franamax (talk) 01:44, 8 January 2009 (UTC)[reply]
Ah--word puzzles! Got you. :) Having tested the bathwater, I've climbed in and splashed about a bit. Judiciously? One hopes. :) My primary purpose was to condense and structure. I'm new to this policy proposal and still energetic enough to try to bull it through the molasses of wiki-apathy, so here's hoping we can get it in place soon, as I think it is sorely needed. But I'm repeating myself. See below. --Moonriddengirl (talk) 15:55, 10 January 2009 (UTC)[reply]

Longer answer to your points:

  • GFDL works remain copyrighted - I would have said "true, but is that relevant to a discussion of plagiarism?" but I'm OK with the wording you've introduced. Note though that your restructuring has subtly altered the message, since the "In all cases..." wording now appears to be part of the sub-section rather than its original intent to be applicable to both PD and free works.
  • Your para. 2: the current wording of the "In all cases..." sentence is a bit of a mish-mash resulting from some passionate debate above about the history of article development (such as the big EB1911 dump); philosophical approaches to using PD sources such as whether we wish to acknowledge Berne Convention-style moral rights that survive the copyright limits and prevent us from modifying the original authors work; methodological approaches such as putting PD-text into WikiSource first, transcribing directly into en:wiki, transcribing and modifying when making the first entry; &c. The whole issue is somewhat vexed and you might gain some insight by reading through some of the long discussions above. This is probably the weakest part of the whole guideline, I don't think we actually arrived at agreement and there are some strong feelings on the subject. (And yes, it included attribution templates)
  • The "retain an anchor" sentence I believe was mine, and if so what I meant was that there needs to be a single place on-wiki where you can see exactly what the PD-text was before it was modified (like, say, an oldid field in the {{PD-attribution}} template). Again, this conflicts with actual and historical practice.
  • Lists - this is a difficult one. I'm not sure if I've ever written an original FUR (which is a list of items) - I've just copy/pasted other people's work and changed a few words. Plagiarism never occurred to me, although I did credit the source for the last FUR I created, since I was feeling a little guilty. By your comment, whoever first created the FUR I copied indeed contributed "sweat of the brow". And what if I go to the NASA website and find a "moon landings" page to create "Chronological List of Moon Landings"? My list would contain the identical information, date/vessel/country, in the identical order. Did I plagiarise a PD work? I don't have to ref the source either, since they are uncontroversial facts and I wiki-linked all the lander names to their articles. The danger here is that in a later discussion I say that I took the list from a NASA page and someone else says "OMG you're a plagiarist!" Given the very strong connotations of the term on-wiki, we need to consider this carefully.

I'll shortly (or longly) try to summarize some of the oustanding general issues from my notes. Thanks for your help and I like your rewrites! Franamax (talk) 06:00, 11 January 2009 (UTC)[reply]

Quoting other policies

Where possible, it is better to link to other policies and guidelines than quote them, as there is danger that the other document will change to deviate from the text here, creating a policy fork. For that reason, in my bold restructuring of the section on addressing these concerns, I've removed the quote from NFC. The link should be sufficient. YMMV, of course, but I thought to explain my reasoning here. --Moonriddengirl (talk) 14:48, 10 January 2009 (UTC)[reply]

I think that when we started working on this, we were working from existing text in other policies and felt the need to ensure that we were basing our edits here on the actual text when we read it there. Also, I think we were all doing gut-checks on the lines of "if I believe in this, am I actually doing it?" and we may have ended up being very specific about sources and attribution. I also think you're right about policy forks, if this becomes a guideline it will be cross-linked and will properly evolve in tandem, so there's much less need for direct quotes. Franamax (talk) 06:09, 11 January 2009 (UTC)[reply]

Resources box?

Although we don't have a "plagiarism" infobox like {{Wikipedia copyright}}, I'm wondering if a specially made one would be helpful. I like them for their quick visual catch and the handy compilation possibilities. (See one I put together on my userpage here.) Thoughts? Would this be a good thing or window dressing? --Moonriddengirl (talk) 15:51, 10 January 2009 (UTC)[reply]

I like the content of your sample box and it might also help to visually break up the "wall of text" at the start of the guideline, which can be off-putting for some readers. I'd say give it a try, I recently discovered an extra button called "Undo" :) Franamax (talk) 06:03, 11 January 2009 (UTC)[reply]

Proceeding with implementation

This has been around a while; perhaps it is time to expedite matters? I can't see that an RfC has been conducted on this. The image plagiarism section needs a bit of work, but it looks to me like it's about at a point to chuck it in the pool and see if it swims. :) It's certainly (in my not really humble at all opinion) much needed. Thoughts? If others feel it's ready, I'd be happy to do a bit to the image section, launch an RfC and drop it at VP again, with a hope to remove that "proposed" tag in the near future. --Moonriddengirl (talk) 15:51, 10 January 2009 (UTC)[reply]

I've tackled the image stuff somewhat and also requested feedback from User:Dcoetzee, who has (unfortunately for him) become one of my go-to guys. (He's also an admin at commons, and is so presumably experience with images. :)) --Moonriddengirl (talk) 14:56, 11 January 2009 (UTC)[reply]
Nice work, that was an area somewhat lacking. Some notes:
  • Do we properly define yet what is plagiarism but not a copyvio? You touch on it with the image taken from an 1825 textbook example. I imagine the scenario there would be that the image description is "I made this myself" or some such, without the addition of "by scanning page 714 from Principiae Mathematicae (Newton, Isaac ca.1643)"? Does your text make clear to the novice editor exactly where plagiarism comes into the picture? And do they actually acquire rights of any kind when they make a 2D representation of a 2D object in the PD?
  • Can you expand to touch on other media (since we no longer have "Image:"'s now, we have "File:"'s)? What is the definition of plagiarism of an audio clip? Also, even though they have an image format, should charts and diagrams be treated separately? If I copy a molecular structure or a planetary orbit diagram without specifying the paper page I was looking at, am I plagiarizing?
  • And is there any way to better separate the copyright issues from plagiarism in general, especially with an eye to shortening the text a bit? I'm not sure how that could be done, just asking. :) Franamax (talk) 22:38, 11 January 2009 (UTC)[reply]
The problem with separating copyright infringement from plagiarism for me is that they often seem to go hand in hand--particularly with images. While with text, you can copy not quite enough to create infringement but enough to create plagiarism, with images that's hard to do. The only case I could come up with of plagiarism that is not infringement is when an image is pd or otherwise free for use, but not properly credited. I suppose we could strip all the reference to licensing from the text to abbreviate it and focus only on sourcing? Any ideas you have for improving, diminishing or refining are fine with me, even if it means that merciless editing thing we're warned about when we contribute. I'm not likely to feel ruffled. :) It's all for the good of the project, and we can talk about it if I disagree. I don't know that charts & diagrams need separate treatment, although you may be thinking of reasons that I'm not. It seems to me that it's the same with a photograph: we need to know where it came from. If you don't say, you're not working within policy (whether its plagiarism or copyright infringement). As far as audio clips, I'm having trouble imagining plagiarism: maybe if they took a clip of a government speech and claimed it was their own? Do you think this needs separate handling, or should we just generalize most references to images to "non-text media" or something? Maybe Dcoetzee will have some good ideas. He helped out with the recent change to the copyright FAQ and made some good suggestions to the (probably) pending change to WP:C. :) --Moonriddengirl (talk) 23:02, 11 January 2009 (UTC)[reply]

"image" to file?

Image: to File: - a reference to the change in the formal name of the storage space, precisely because not all entries are images per se. Audio clips - lifting a clip from the National Archives and using the summary "this is my performance of The Well-tempered Clavier", I admit it's far-fetched. Charts and diagrams - if I look at a diagram of the structure of insulin published in Nature and reproduce it exactly in my own graphics program, with the same colours, shading and legend, upload it with the summary "I made this diagram of the insulin molecule" - am I plagiarisng? If I do it for methane? (probably not) Rubisco? (take a look at the lead image - I'm sure ARP did generate it himself, but what if I just scanned the image from a journal and uploaded it?) Franamax (talk) 23:24, 11 January 2009 (UTC)[reply]
You'll have to bear with me on excessive detail and questioning (if you can). It's the legacy of a background in computer science and engineering, both of which are best served by examining all the possibilities and making sure there are no unexpected gaps. :) Franamax (talk) 23:28, 11 January 2009 (UTC)[reply]
Well, text is my region, so I'm by no means any kind of expert on image/file/sound issues. But plagiarism is basically claiming credit for the work of others, or as the headline of this proposed policy says, "the copying of material produced by others without attributing that material to the original author, whether verbatim or with only minimal changes." That seems to apply to diagrams and performances of Bach as well as anything else. Do you believe we need examples of non-image media files that may be plagiarised, or do you think we would be served best by just generalizing? --Moonriddengirl (talk) 23:35, 11 January 2009 (UTC)[reply]

What parts of this document are different from Wikipedia:Copyrights and Wikipedia:Citing sources? Why should another policy page be added when it is entirely covered by existing policies? —Centrxtalk • 23:43, 11 January 2009 (UTC)[reply]

This is what WP:C says about plagiarism: "Note that copyright law governs the creative expression of ideas, not the ideas or information themselves. Therefore, it is legal to read an encyclopedia article or other work, reformulate the concepts in your own words, and submit it to Wikipedia. However, it would still be unethical (but not illegal) to do so without citing the original as a reference. See plagiarism and fair use for discussions of how much reformulation is necessary in a general context." Wikipedia:Citing sources is not a policy, but a "style guideline." Policies are standards; guidelines are advisory. I tend to think that coverage in existing policy is rather skimpy. --Moonriddengirl (talk) 23:51, 11 January 2009 (UTC)[reply]
Insofar as Wikpedia:Citing sources is only a style guideline, the principle is covered under Wikipedia:Verifiability. I don't see how the quote from WP:C is relevant; it is simply deferring the issue of uncopyrighted copies to citation. I cannot find a single part of this proposed guideline that does not belong in, and almost invariably is already found in, another guideline. —Centrxtalk • 06:58, 12 January 2009 (UTC)[reply]
It seems relevant to me in response to your question. You asked how it differs from WP:C. I told you exactly what WP:C says about plagiarism. --Moonriddengirl (talk) 12:58, 12 January 2009 (UTC)[reply]
We're not in any way trying to create policy here. We're trying to create a guideline that summarizes the specific issues, offers advice, and outlines best practice. I can point to quite a few places where the issue of plagiarism is discussed, both on the AN boards and specific incident requests on this talk page. Centrx, you may feel that any issue can be dealt with through appropriate interpretation of WP:COPY and WP:CITE, but it seems as though quite a few other editors disagree. The word arises often enough that we can contemplate offering some consensus-built specific guidance to our editors, especially those less experienced in nuances and interpretation as yourself. We want to build a centralized resource, rather than send editors through a wiki-hunt. Franamax (talk) 00:29, 12 January 2009 (UTC)[reply]
The word is simply a dictionary definition, repeated incorrectly in the first sentence of this guideline. If this is not the creation of a new policy and is merely to be a centralized resource, it belongs at some page Wikipedia:Copying, since "plagiarism" means there was an intent to take someone else's work and pass it off as one's own, which is besides the point. An editor who innocently copies some text to improve Wikipedia, but without attribution, is not committing plagiarism, but still runs afoul of copyright or citation. —Centrxtalk • 06:58, 12 January 2009 (UTC)[reply]
Your definition of plagiarism as requiring intent is not universal...or necessarily legally defensible in the US (not that it could ever come to that here). In 1992, a Princeton student took her case to court in part over the question of intent; Princeton had withheld her degree for a year because she had plagiarized in a Spanish paper, correctly citing her source for some passages but failing to do so in others. She lost.(See [2] and plentiful other sources; the trial judge did indicate a personal opinion that Princeton's punishment was too severe, with which I heartily agree, under the circumstances.) Purdue University distinguishes between "deliberate and accidental plagiarism" here--and they aren't alone. Cf. [3] and [4]. If plagiarism requires intent, there can be no "accidental plagiarism." (See also this 2008 article, which to Princeton adds Yale, UC Berkeley, Cornell, Vanderbilt and Richmond as among those universities which do not include intent in their definition of plagiarism.) That said, I can see why you would be uncomfortable with the title of the document if you believe "plagiarism" is an accusation of bad faith, while those who don't ascribe intent don't see it that way. Evidently, the matter is somewhat contentious: see [5]. Given divergence in definition, obviously we should be clear in our terminology or in our definitions that plagiarism is not necessarily bad faith. In my experience, copyright infringement is often not committed by bad faith users, but by individuals who don't understand what they're doing wrong. --Moonriddengirl (talk) 12:58, 12 January 2009 (UTC)[reply]
In the academic context, they are zealous and will exact the same punishment for accidental copying as they would for plagiarism. This does not change the fact that the meaning of "plagiarism" includes intent, that even with the loosest interpretation still implies intent, and that otherwise the word is reduced to having no distinct meaning, since it would be covered by the much clearer "copying".
Copyright infringement requires no intent, so it is not analogous to "plagiarism". —Centrxtalk • 03:49, 17 January 2009 (UTC)[reply]
I've placed plagiarism & definition in a google book search and looked at the first four hits to come my way. "Intent" is not part of this definition, this definition, or this definition. This one doesn't offer a definition, but tabulates the various definitions that exist. I understand that by your understanding of plagiarism, intent is required, but I reiterate that this definition is not universal. And, lo, in good Wikipedia fashion, I provide WP:RS to WP:V that. :) --Moonriddengirl (talk) 19:40, 20 January 2009 (UTC)[reply]

break

I've combined addressing problems into one section. It loses detail, but hopefully provides pointers to the relevant guidelines where necessary. Centrx's question does remind me of my original purpose here, though. WP:C currently directs users to article space to find out how to revise to avoid plagiarism. This hardly seems appropriate, since article space doesn't have either the force of policy or guideline. It seems that a subsection on that might be appropriate here. --Moonriddengirl (talk) 00:21, 12 January 2009 (UTC)[reply]

Does silence on this point mean that others don't agree or that others don't want to write it? :) I'm inclined to think that such a section would be one of the primary benefits of having such a guide, since our WP:C policy explicitly doesn't answer the question. --Moonriddengirl (talk) 13:34, 12 January 2009 (UTC)[reply]
Or silence as in being asleep? ;) Do you mean a section on "How not to plagiarise?" Currently, we just have the links in Resources (on both the project and talk pages), which is certainly an obscure place, so I'd agree a section like that would be a good thing. Beyond linking to resuorces such as Univ. of Indiana as above, I'm not sure exactly what to say though.
Wrt Centrx's comments above, maybe we will also need a "I've been told that I'm plagiarising, what now?" section, or something to indicate that it's not the same as being accused of clubbing baby seals, since you may have done it accidentally, read our section on How not to plagiarise, &c. Franamax (talk) 17:42, 12 January 2009 (UTC)[reply]

Comments

Hey all, just looked over this page, a few comments:

  • It claims "anything you contribute to any WMF wiki" is free; the danger here is that some WMF wikis such as Wikinews use licenses incompatible with the GFDL. I'm suspicious of the claim that some Creative Commons licenses are considered compatible with the GFDL - as far as I know we've never accepted text contributions under any CC license. Is there official word on this somewhere?
  • Regarding metadata: it is implied that the primary purpose of metadata is to credit a source and that editing this metadata (which does not require a hex editor - there are tools for editing metadata, e.g. ExifTool) constitutes circumvention of access-control technology under the DMCA. This seems dubious all around to me. Contributors naturally should be permitted to revise metadata where it contains errors or may be augmented. The important thing is that it is not modified to remove or alter a valid attribution (and the DMCA isn't applicable at all). Of course, metadata should never contradict information on the image description page (if it is in fact the metadata that is in error, it should be corrected) - either this or a digital watermark, visible or invisible, is a sign that the image description page information is not correct. Another thing to check is consistency - if a user's uploads indicate different camera data for each image, it's doubtful they actually have 100 different cameras lying around.
  • It's worth emphasizing somewhere that sometimes copyright violations occur on Wikipedia that are not plagiarism; for example, if an article is copied wholesale from a copyrighted source, and a link to it is placed at the bottom. I've seen this happen a few times.
  • Another good example of a PD source used in building the English Wikipedia is the United States Census Bureau census reports, used to build the Rambot articles on U.S. towns. This is interesting to note as it's a source falling in the second category (ineligible rather than old).
  • Note that plagiarism is fundamentally a moral concern with many viewpoints; at one end of the spectrum are people who believe that authors should always have the last word on how their works are used, and at the other end are people who believe that even the requirement to give attribution stifles the circulation of free information. If this were made policy, it would be necessary to seek a wide consensus to ensure that it reflects the moral attitudes of Wikipedians at large. I like the current approach of phrasing the justification in terms of practical issues, like facilitating the consulting of sources.
  • It's worth noting that our policy is to delete images without source information (CSD F4, Category:Images with unknown source), which is generally understood to include author information.
  • On the other hand, it's also worth noting that when author information is unavailable, it need not be specified, as long as the license does not require it; it's important to specify the source so that the license can be verified, but this is a best-effort thing, if the source did not give author information neither can we.
  • In terms of discussing plagiarism with contributors who may have fallen afoul of it, I'd recommend the terms "copying without attribution" and/or "content missing attribution." They're less loaded terms and place the emphasis on the content and on the necessary corrective action (attributing it).
  • It may be worth noting somewhere that plagiarism is primarily concerned with attributing the author of a work, whereas copyright is primarily concerned with the copyright holder. If the author transferred their copyright to a publisher or a client, that doesn't eliminate the moral requirement to attribute them, nor is there any moral requirement to attribute the copyright holder (I'd go so far as to say the former copyright holder of a public domain work is completely irrelevant).

I was skeptical at first that this page would be useful, but it really does appear to address situations that other policies do not concern themselves with - in particular, how plagiarism arises on Wikipedia, and what the appropriate response is. Dcoetzee 20:32, 12 January 2009 (UTC)[reply]

Thank you for weighing in. :) (a) I'm responsible for any misunderstandings of metadata, I'm afraid, and really welcome any corrections. Until I waded in, all it said was "consistency of EXIF data." I'm still not entirely sure what that means. I've altered that section, and if you don't think it's helpful for investigating plagiarism, it can perhaps be truncated further or removed. What would one look for there to help in determining plagiarism? (b) Do you have an opinion on a good placement for the emphasis that CV may not be plagiarism? (c) With respect to the divergence of opinion, this has been touched on above in the Wikipedia talk:Plagiarism#Proceeding with implementation section (and probably sooner; I just got here). Do you have an opinion on whether this is better handled by placing this under different title or by retaining the current title and explaining that some defs. require intent, some do not, etc? I like your suggestion about how to approach discussing the matter; those seem like good terms. (d) Good point on the diff between author and copyright holder. (e) (out of order) I know CC-By 2.0 and CC-by-SA are not compatible with text on Wikipedia per. I recently had cause to ask User:Stifle if this was the case with CC-By 3.0, and he confirmed that it is. I don't know the full scope of CC, though, so I have no idea if they have something that is compatible with text. This page used to say, "There are several CC licenses which the author can pick. Some of these do not require attribution, however Wikipedia does not recognize this aspect, we insist that all copied work is attributed." Should this simply be altered back with the word "author" change to "media uploader" or some such? (Still dreading our probable conversion. Whole new game to learn!) --Moonriddengirl (talk) 21:01, 12 January 2009 (UTC)[reply]
No problem. :-) Regarding Creative Commons licenses, the original text was mistaken; CC does not offer any license that does not require attribution (with the exception of their Public Domain Dedication, which isn't really a CC license), and if CC-by is incompatible, it's quite likely the others are too as they all build on it. I'm on the fence about changing the page title; if I were to move it it would be to something like "Attribution," "Misattribution," or "Attributing sources." On the other hand, plagiarism is probably the first term a lot of people would think of, if not necessarily the most unambiguous or civil one. I did some editing on the metadata section, and I do feel like metadata is useful to discuss here; feel free to refine it as you please. As for where to discuss CV that isn't plagiarism, perhaps just an sentence or two in the intro section - this isn't a big deal. Dcoetzee 22:24, 12 January 2009 (UTC)[reply]
On the CC licenses, I believe that the original text is again mine [6] (which is strange, I'm used to seeing only my "the"'s and "when"'s staying on pages ;). I may have misread the "mix-and-match" bit, currently the choices are all "CC-by", which requires attribution. However, there is a set of CC- which don't require attribution, it's just that they were retired in 2004. If it would simplify the guideline, my text can be retired - so long as the intent in those original words is discarded. That would mean that if you do find CC-licensed material without the "by", it's fair game for copying and it's not plagiarism to fail to attribute the source. Alternatively, change to "Some retired licenses do not require attribution, however...".
The EXIF stuff has been reworded well to better indicate which clues it provides.
Any DMCA (turns to the side, spits on the ground ;) and involved discussion of copyright issues, I'm a little leery on including at length. Altering EXIF data and copying images of others while claiming them as your own to me is much better handled mostly on the copyright policy pages, where the clear legal issues can be covered. Beyond the difficulty of maintaining parallel pages on the same topic, it's much less morally fraught to just say "that's a copyvio" than it is to read this guideline first and say "you're a plagiarist". I'd much rather see more discussion on this page of scanning something from that 1825 textbook and claiming that you made it yourself, or copying an image of a maolecule. (See just above, in the bit before the thread got blurred with other objections) EXIF/copyright issues are pretty well covered by the very first line on the page, "Plagiarism may also be a copyright violation..." - to me, the more we stick as closely as possible to the moral issue of plagiarism (and how to avoid it), the more clear and better the guideline. Franamax (talk) 23:53, 12 January 2009 (UTC)[reply]
Hmmm. If the primary purpose of this is to discuss plagiarism as opposed to copyright, I wonder if the section on "acceptable sources" is off-topic. It seems that the whole section could be wrapped up in a sentence like "It doesn't matter where you find information or ideas—whether it is copyrighted or free content—you should acknowledge your source." In terms of plagiarism merely, there really are no "unacceptable" sources. (In terms of WP:V, now....) As far as the EXIF stuff, really, my only point in turning that into text was to try to make clear what it meant. Since I had no clue what EXIF was when I read it, I thought it might be more helpful to readers of a guideline/policy/whatevah to specify what it is and how you check it for consistency. If I've gone off-target in some of what I included, please, yank it. --Moonriddengirl (talk) 01:48, 13 January 2009 (UTC)[reply]
Yeah, I agree, I took a stab at it. I think there are two separate topics being addressed here, one is "how do I borrow material from a free source without plagiarizing" and the other is "how do I detect and repair plagiarizing in existing content." I separated these out and also separated text and images. I also removed most of the stuff about Creative Commons; ideally we'd have a place to link describing what licenses are acceptable, but that doesn't appear to exist yet. What do you think? For what it's worth I like the EXIF section as it stands, this isn't discussing them in the context of copyright so much as plagiarism detection. Dcoetzee 05:02, 13 January 2009 (UTC)[reply]
Ok, on reviewing, I'm stuck at "Attributing media borrowed from other sources". I'm (literally) sitting here with my 1910 version of Conan Doyle's Best Books with a picture of Arthur (frontispiece) that I want to scan. We're saying here that I can't use any "self" templates - so what do I use? Keep in mind that I'm goal-oriented, I'm gonna upload it anyway, and I'm gonna be pissed off at whatever bot puts the colour-y thing on my talk page. Franamax (talk) 11:53, 13 January 2009 (UTC)[reply]
You have a 1910 Conan Doyle? Cool! Actually, I'd like to know the answer to that one. I find images hard to work with. :) Wikipedia:Upload doesn't seem to have an option for "It's really, really old." --Moonriddengirl (talk) 12:17, 13 January 2009 (UTC)[reply]
Heh - when I moved here to Kitsilano two years ago, the first thing I did was check out the local music and book shops. Two blocks and five shops away from my place, honest to God, I walked in and asked the same thing I've asked for 20 years: "Do you have The White Company by Conan Doyle? She said "Yes, it's over here". That was my welcome to Vancouver. :) Franamax (talk) 12:30, 13 January 2009 (UTC)[reply]
Whew - on further review, that's a pretty radical rewrite. Lots of good stuff is added, although discussion of free sources is gone, it's moved over to "borrowing" now, and - well, just too difficult to assess in one single diff. Dcoetzee, I'd be inclined to revert and ask you to make your changes section-by-section and reshuffle-by-reshuffle so it would be easier to discuss, but that would set back the work you've done. I guess I have a choice here to either sum up my to-do list and ask back some of the original people (what I was planning to do), or just shrug and walk away. In any case, I'm off for a few days and will have only occasional access. Good luck! Franamax (talk) 12:15, 13 January 2009 (UTC)[reply]
Ack! You're leaving just as I'm getting into it? Enjoy whatever you're up to. :) Obviously, I'm not as familiar as what existed as you are, but the only thing I see that's been wholesale removed is the section on acceptable sources. Do you disagree that a discussion of acceptable sourcing is off topic for addressing plagiarism? As I indicated above, I'm inclined to think that a simple note of "It doesn't matter where you find information or ideas—whether it is copyrighted or free content—you should acknowledge your source" about covers the question of acceptable sources as it applies to plagiarism. I worry that if we seem to have "instruction creep" we might contribute some confusion to the question of whether this document is necessary or redundant. (And I certainly acknowledge that my first major addition crept far into the land of copyvio....) --Moonriddengirl (talk) 12:34, 13 January 2009 (UTC)[reply]
My apologies for doing too much in one edit! I can sum up for you the jist of what I did: I created two main sections, one for creating new content based on free sources and one for repairing existing plagiarism; I eliminated most of the explanation of acceptable sources to copy text from, such as Copyright-expired works/Copyright-ineligible works, since that's a copyright concern; I moved the content "How to properly attribute public domain material" into a subsection of the new section "Attributing text borrowed from other sources"; I moved the last paragraph of this section regarding images into the new section "Attributing media borrowed from other sources", to separate discussion of text and media; I moved the first paragraph of "How to address copied text or images" regarding the EB 1911 articles into its own subsection of "Attributing text borrowed from other sources" and expanded on it a bit. If you have anything else you'd like me to do please just let me know.
As for your public domain image, the simplest thing to do is to go to the upload page, click "Other", and then add your own license tag (see Wikipedia:Image_copyright_tags/Public_domain for the full list). Most PD images are also eligible for upload on Commons; it doesn't have a nice list, but it has Commons:Category:License_tags and Commons:Licensing. Dcoetzee 20:11, 13 January 2009 (UTC)[reply]

Plagiarism, definition and intent; guidance on how to paraphrase

I have tried to address some of the concerns about the definition of plagiarism, including intent, in the proposal, here. I'm still wondering if we ought to try to define how to properly paraphrase here or if it is sufficient to provide links to universities that do so as we currently have. This, again, is raised by the fact that WP:C currently directs users to article space to find out how to revise to avoid plagiarism. This hardly seems appropriate, since article space doesn't have either the force of policy or guideline. Franamax seems to agree here that it might be useful, and I'm willing to try to tackle it if others don't think it's wandering too much. It could be a new third section, after "Attributing text borrowed from other sources". It would probably primarily be a (cited) rehash of various university guidelines, such as this one. (Oh, and I've archived this talk page because it was almost 350 kb long. I also added the "talkheader" so as to have a handy place to keep archives.) --Moonriddengirl (talk) 14:51, 13 January 2009 (UTC)[reply]

A dissertation on the meaning of "plagiarism" is not relevant to Wikipedia policy, and would serve just as well in the article plagiarism linked from Wikipedia:Copyrights. Name-calling "plagiarism" and lengthy asides do not belong in Wikipedia policy. While a page on copying may be appropriate to summarize, connect, or supplant existing policies on copyrights and citation, the current proposed policy seems to be alternatively a) redundant, at length, with those policies; and b) irrelevant to Wikipedia policy.

More specifically:

  • Almost the entire lengthy introduction is reducible to "Material on Wikipedia copied from other sources must be GFDL-compatible and must be cited."
  • Section "Plagiarism defined" is a) a folksy definition for students that is inappropriate for a Wikipedia policy; and is b) not relevant to Wikipedia policy, which to avoid confusion should be concerned only with defining disallowed actions, that is, copying.
  • Section "Attributing text borrowed from other sources" and section "Attributing media borrowed from other sources" are entirely redundant with Wikipedia:Copyrights and Wikipedia:Citing sources.
  • Section "What is not plagiarism" is downright wrong: these examples, while not copyright infringement, still should have sources; indeed, sources are especially important for the pure facts of infoboxes. This is an example of the confused purpose of this proposed policy: it worries so much about "plagiarism", that it recommends bad practice for citations merely because that omission citations would not be plagiarism! This section is, however, a good example of a pithy summary of a subordinate (proposed) policy.
  • Section "How to respond to plagiarism" has promise, if it concerned itself exclusively with copying, though I suspect it is redundant with Wikipedia:Copyright problems and some of its subpages. —Centrxtalk • 04:29, 17 January 2009 (UTC)[reply]
Addendum: To drive home that "plagiarism" is not the proper subject of this policy: Much copying and alleged copyright infringement is actually done by the creators or copyright holders themselves, such as for advertisement. Even if you subscribe to an expanded definition of "plagiarism" the copied work must still be someone else's. All the policy on citation and copying, and the how-to on identifying copies, must apply to that popular mode of copying. If you notice academic policies against a student using parts of his own previous essays, this reveals how special the schools' definition of plagiarism is: the university's "plagiarism" essentially means "any academic act that violates policy". They may wish to invent such a title, but it is a tautology that does not add constructive meaning, and Wikipedia should not import confused disparagement. —Centrxtalk • 04:39, 17 January 2009 (UTC)[reply]
(1) I agree that the introduction could be shortened. (2) I disagree that this section is irrelevant and not simply because I drafted it. :) So long as it is currently titled, I think it's necessary to alleviate concerns of those who feel that plagiarism is a mens rea matter. Otherwise, the very title of the document does become potentially bitey. Perhaps it could be shortened and incorporated into an abbreviated intro. (3) There is nothing in Wikipedia:Copyrights or Wikipedia:Citing sources about "Copying within Wikipedia". Other material in that section probably could be shortened with a pointer to Wikipedia:Citing sources, although I think the pd attribution templates are handy to point out. I'm inclined to think the "media" section is beautifully brief and to the point. (4) I may agree with you the "What is not plagiarism" section overstates a bit, unless what is meant by it is that no inline citation is necessary. But citations are not required for "common knowledge." "Puppies drink milk" is common knowledge and also a simple logical deduction (mammals drink milk; puppies are mammals; puppies drink milk). I'm inclined to think that this could be condensed and included in another section (which would eliminate what you do like about it :)). (5) The closest redundancy that I can think of (and I spend most of my time at WP:CP and its various subpages) is Wikipedia:Copyright problems/Advice for admins, but information on how to evaluate for copying seems useful for all editors. I do not believe the media infringement material is redundant.
Your final note seems to be on the point that this guideline should be retitled. I don't know how others feel about that. I myself don't care what we call it, so long as it addresses what to me are the salient points: (a) unacknowledged borrowing is bad practice, (b) unacknowledged borrowing can be fixed, (c) here's how. (With, of course, subpoints of those as necessary to define/explain/expand.) --Moonriddengirl (talk) 20:50, 20 January 2009 (UTC)[reply]
  • Mens rea is irrelevant to Wikipedia policy. The policy is the same regardless of motive, and repeated infractions require the same response regardless of motive. See also Wikipedia:Assume good faith. Furthermore, this policy covers actions that do not require any mens rea.
  • The small parts of section "Copying within Wikipedia" that are not already present there, belong in Wikipedia:Copyrights or Wikipedia:Citing sources if they belong anywhere.
  • Common knowledge requires sources just as much as anything else, knowledge considered common is often very much wrong, and any statement of common knowledge can in fact be required to have a source: if challenged a source must be provided. To wit, it is even misleading to say "puppies drink milk": some puppies do not drink milk, though they can, and puppies that drink non-dog milk become sick. Sources clarify this; and leaving such a bare assertion is worse than plagiarism.
  • Evicting the facile notion of "plagiarism" and minimizing redundancy requires more than retitling the page. Unless Wikipedia:Copyrights and Wikipedia:Citing sources are also restructured, essentially all that belongs remaining in this page would be the How-to. —Centrxtalk • 19:54, 21 January 2009 (UTC)[reply]
You were the one who kept adding "intentional" to the definition. Mens rea seemed very important to you; perhaps I have misinterpreted that action. At this point, if I'm understanding you correctly to say that failing to source a commonsense assertion such as "Puppies drink milk" "is worse than plagiarism", I think perhaps the communication gap between us, at least, may be uncrossable. :) Perhaps an WP:RFC will help bring wider response. --Moonriddengirl (talk) 20:10, 21 January 2009 (UTC)[reply]
  • "Intentional" is accurate for the definition of "plagiarism", but irrelevant to Wikipedia policy. Either this page is about "plagiarism" and refers to intent, and is not a policy, or this page is about copying and does not refer to intent.
  • Misleading, superficial statements imported from the vague intuition of one's own "common knowledge" are indeed worse in an encyclopedia than copied text from a reliable source. —Centrxtalk • 22:46, 21 January 2009 (UTC)[reply]
Sometimes, reliable sources don't even seem to cut it. --Moonriddengirl (talk) 23:24, 21 January 2009 (UTC)[reply]
"Schaum's Quick Guide to Writing Great Research Papers" and "Guiding Students from Cheating and Plagiarism to Honesty and Integrity" are popularized How-To books for students that adopt the specialized meaning of "plagiarism" used in that industry. They are not reliable sources for the meaning in general; they are not even authoritative works within the field of academia; they are irrelevant, though they would not very well prove your assertion. The "Historian's Toolbox" specifically states "stealing" and "representing...as one's own", which require intent and misrepresentation. "Student Plagiarism in an Online World" is a tiny survey of student opinions, not a reliable source on the meaning of a word. —Centrxtalk • 05:46, 22 January 2009 (UTC)[reply]
In any event, it matters not: even under the broad school-wise meaning "plagiarism" is not a fruitful topic for a policy. Those student handbooks refer to turning in "someone else's" work, for assignments that are under one's name "as one's own". On Wikipedia, submissions are anonymous and appear under no one's name, and apparent copyright infringements or uncited text from the company website is prohibited even if the copier wrote the text and owns the company. —Centrxtalk • 06:05, 22 January 2009 (UTC)[reply]
Prove my assertion that the definition requiring intent is not universal? I think they do, quite handily, and if we took it to WP:RSN I suspect I'd get consensus on that. :) Proving a universal definition for a term that is utilized in many disciplines in many cultures is a bit difficult. (Here's a book that addresses at length cultural differences in defining plagiarism.) But you're right that quibbling over the definition is fruitless. --Moonriddengirl (talk) 12:35, 22 January 2009 (UTC)[reply]
To be clear, as I said above, a meaning of "plagiarism" without intent is still not productive for this page as a policy. That said, the readable part of the book you cite discusses the "concept" of plagiarism and "theft of intellectual property rights" in general; it does not appear to discuss the meaning of the actual word. The reliable sources on the meaning of the word generally, universally being irrelevant, the OED and Webster are clear. In any event, I repeat, submitting your own writings is far from plagiarism, yet would be caught up under this proposed policy, and the ways of identifying "plagiarism" and of dealing with "plagiarism" on Wikipedia are the same as the ways of identifying and dealing with copyrighted and uncited works in general. —Centrxtalk • 04:53, 23 January 2009 (UTC)[reply]
Okay. Shorter OED (1993; alas, the most recent I possess) says, "plagiarize...E18. 1. v.t. Take and use as one's own (the thoughts, writings, inventions, etc., of another person); copy (literary work, ideas, etc.) improperly or without acknowledgment; pass off the thoughts, works, etc., of (another person) as one's own." Improper copying doesn't require bad intent. The book above is focused on concept, but also differing definitions, including the term "inadvertent plagiarism" as it is used in the field of psychology (Cryptomnesia). "Inadvertent plagiarism", obviously, is an oxymoron if the essential definition of plagiarism involves intent. In some definitions, it would be. In others, it would not. As far as self-plagiarism and this document, this document currently says, "Plagiarism is the taking of someone else's work and passing it off as one's own, whether verbatim or with only minimal changes" and refers to "duplicating the work of others without credit", so there already seems to be language in place intended to prevent charges of self-plagiarism. If you can identify the section that you fear will be used to prevent self-recycling, maybe we can clarify that. Of course, we do have the challenge of verifying identity. Since copyright is a legal matter, we can't take people's word for it when they say they are Dr. Imminent Authority, publisher of "Important Document", but must have external verification. Perhaps plagiarism will allow for more assumption of good faith; I don't know. The other matter--which seems to come down to overall redundancy--seems to be one on which we simply disagree. WP:C is narrowly focused on the matter of legal concerns. It doesn't care if you cite your public domain source or not. Wikipedia:Citing sources is specifically a Style guide (so it says), and in my opinion is not really the proper place to carry the load of defining academic integrity on Wikipedia. A separate document seems ideal for that to me. I respect that your opinion on this matter differs, but this may be, again, a matter that will require wider community input to resolve as we find out where the will of the community lies. --Moonriddengirl (talk) 12:17, 23 January 2009 (UTC)[reply]
It is not a matter of opinion. Any non-short coherent policy on this subject must necessarily cover violations that are not plagiarism. This policy ought to be designed for the concept that covers these violations, not confused with plagiarism. Whether "Important Document" is plagiarism makes no difference for identifying and correcting its uncited unvouched presence. If the purpose is an accessible summary of Wikipedia:Copyrights, or to create a policy that all text copied from elsewhere must be attributed, those purposes are not accomplished by a page on "plagiarism". —Centrxtalk • 03:47, 24 January 2009 (UTC)[reply]
So, we're back to the question of title. --Moonriddengirl (talk) 12:39, 24 January 2009 (UTC)[reply]
No, the entire page is infused with the limited concept of "plagiarism". There is even an entire section on its definition! —Centrxtalk • 21:37, 25 January 2009 (UTC)[reply]

Paraphrasing considered harmful

I'm uncomfortable with having guidelines for paraphrasing to avoid plagiarism. Here is my reasoning:

Paraphrasing is used to avoid copyright violation, not to avoid plagiarism. There is no need to paraphrase to avoid plagarism. To avoid plagiariam, you cite your source. If you cite your source, you have not committed plagiarism. If you fail to cite your source, you are likely to be committing plagiarism and you are certainly violating Wikipedia's rules, even if you paraphrase.

Plagiarism and copyright infringement are independent: you can commit either one, or neither, or both, depending on the situation. Copyright prohibits using the creative aspects of certain works without permission: if you have permission of the copyright holder, you can copy verbatim without attribution without violating copyright (but not at wWikipedia.) You can copy unattributed non-copyrighted work verbatim without breaking the law (but not on Wikipedia.)

Paraphrasing is a mechanism that is intended to permit a writer to extract the non-creative portion of a copyrighted work. There is no equivalent for plagiarism: If you use a source, you either cite it or you are plagiarizing: this is true whether or not you have "removed" the original creative element. As an example, Copyright does not protect "sweat of the brow." That is, even if someone goes to a great deal of effort to compile a huge database, the database is not thereby copyrightable (in the US.) However, if for instance you are a scientist and you use such data without attribution, you will be severely censured.

In my opinion, when a work is not in copyright, we pay much more respect to the original author by copying verbatim than by paraphrasing. We should paraphrase only when we are forced to do so by copyright law. -Arch dude (talk) 01:50, 21 January 2009 (UTC)[reply]

Paraphrasing also exists to more concisely summarize material, say in compilation with other sources...which is often part of what encyclopedists do. We may often be put into the position of needing to paraphrase even public domain material, unless we're going to compete with Wikisource. :) And, of course, even if you cite your sources, you can plagiarize whilst paraphrasing if you give the impression that you are summarizing when you are actually reproducing. For example, see this and this (which notes "Even with attribution, plagiarism can exist if the writer paraphrases excessively or quotes without using quotation marks.") --Moonriddengirl (talk) 02:38, 21 January 2009 (UTC)[reply]
you are correct: paraphrasing can and should be used to create a more encyclopedic presentation.But this has little to do with plagiarism: just cite the source. Your other argument is that even a cited source result in plagiarism if it the result appears to be your own. This is relevant in an academic environment, but I do not believe that this is relevant to the Wikipedia environment. In academia. a paper is assumed to be the work of the listed author or authors. Here at WP, the article does not have listed authors and the reader should already have a presumption that an article is a collaborative work, Any reader interested in actual authorship will need to look at the edit history, and any editor who copies or paraphrases a PD work should attribute the work in the edit summary. Again paraphrasing cannot mitigate plagiarism and should not be encouraged for this purpose. -Arch dude (talk) 00:11, 22 January 2009 (UTC)[reply]
I find your note about the different expectations of academia persuasive. Given that, I think that on reflection I agree: paraphrasing is more an issue for copyright on Wikipedia than plagiarism. --Moonriddengirl (talk) 00:31, 22 January 2009 (UTC)[reply]
It depends on exactly what "paraphrasing" means. A paraphrase can be plagiarism if the source isn't cited; even if cited, it should be made clear that the structure and selection of ideas are taken from elsewhere (and not only information). In fact, a close paraphrase can even be a copyright violation in certain circumstances, because creative expression isn't limited to literal wording. So we should encourage editors to express things in their own words, but not give the impression that lengthy close paraphrasing is necessarily "safe". --Amble (talk) 03:04, 21 January 2009 (UTC)[reply]
But again, your argument is about paraphrasing to avoid copyright infringement, not about any possible relationship between paraphrasing and plagiarism. My complaint is that there is no such relationship. -Arch dude (talk) 00:11, 22 January 2009 (UTC)[reply]
Shouldn't it be judged by the same standard though? To judge the sufficient level of paraphrase to avoid plagiarism, the standard to avoid copyright infringement would be the same. (In other words, "pretend that it's copyrighted") Otherwise, blockquote or make explicit that "this is a direct copy" in one edit, modify in the next. That is the crux for me, am I representing these words/this structure as my own? I have very few choices, a source, an edit summary, my pseudonym on the edit - so how do I indicate honestly that I'm copying stuff almost word-for-word, with a few changes? Franamax (talk) 00:50, 22 January 2009 (UTC)[reply]
No. Plagiarism and copyright violation are not mutually exclusive. When a close paraphrase preserves structure and selection of material, there is a risk of plagiarism. In some extreme cases, it may also constitute copyright violation as well. The entire point of a policy on plagiarism is that our standards are higher than the bare minimum demanded by copyright law. My point is simply that organization and choice of material matter too, not only literal wording. This is true for copyright and it's true for plagiarism policy. I think we agree that it's not helpful to present paraphrase as a blanket solution. --Amble (talk) 01:01, 22 January 2009 (UTC)[reply]
But structure and selection comprise elements of copyright too, crucially so in some cases. Minimal rewording while using the same structure is still copyvio (while also being plagio). Do you have a specific example of the distinction for a copyright work, where a modification is not a copyvio but is still a plagio?
This guideline developed largely around usage of PD works, and the grey area of GFDL/CC-BY stuff, with attention to the general issue of plagiarism (as in accusations thereof). The distinction between copyvio, plagiarism, moral rights, and whether or not Wikipedia aspires to a higher "ethical" standard than the rule of law is a bone of contention, so please do expand on your thoughts on higher standards. Those standards are important to forming consensus on this guideline. Franamax (talk) 02:29, 22 January 2009 (UTC)[reply]
Yes, copying the structure while paraphrasing the wording could be a copyvio. However, the only case I know of, Salinger v. Random House, was an extreme case concerning unpublished personal letters. Perhaps a simpler case of closely paraphrasing an encyclopedia article would also be found to be a copyvio; I don't know. The one example I know of was from an article (now deleted) in which paragraphs were constructed by stringing together sentences from a few (cited) sources, with clauses rearranged and a few words replaced with synonyms. Several editors believed that this sort of paraphrase was acceptable, but I argued (and the consensus seemed to be) that it's unacceptable plagiarism regardless of whether or not it's provably a copyright violation. My main concern is that our guidelines not encourage people to build articles in this way. --Amble (talk) 05:38, 22 January 2009 (UTC)[reply]
Wherever it may wind up, I'd also very much like to see people discouraged from this practice. :) --Moonriddengirl (talk) 12:37, 22 January 2009 (UTC)[reply]
No. Copyvio and plagarism should NOT be judged by the same standard. We should directly copy and cite PD works explicitly to ensure that we preserve and attribute the original author's words to avoid plagiarism. If the law allowed it, we would do the same for copyrighted works, but the law does not allow it, so we are forced to paraphrase to remove the copyrightable creative elements. After material is incorporated by either of these means, we may choose to (further) paraphrase for editorial reasons to make the article better. Neither of the reasons for paraphrasing (copyvio avoidance and editorial improvement) have anything to do with plagiarism, so a discussion of paraphrasing is not appropriate for the plagiarism policy. -Arch dude (talk) 00:30, 23 January 2009 (UTC)[reply]
I think we are talking about different issues here. I have in mind use of copyrighted works where excessive close paraphrase constitutes plagiarism, but may or may not reach the level of copyright violation. From my limited knowledge, the application of copyright law to paraphrases has not been widely tested and doesn't give a clear guide. Your concerns are somewhat different, since you are discussing the use of public domain works as sources of article text. I don't disagree with your points regarding public domain works. --Amble (talk) 01:28, 23 January 2009 (UTC)[reply]
Fine. In that case, we need to make it clear that if you paraphrase to avoid copyright infringment, you must still cite your source: paraphrasing, for any reason, does not relieve you of your responsibility to cite your source. This is an even stronger reason to avoid recommending that editors paraphrase to "avoid plagiarism." Plagiarism avoidance is never a reason to paraphrase. The plagiarism policy should have a statement similar to the following: "paraphrasing does not mitigate plagiarism and should not be used for this purpose. If you need to paraphrase for some other reason (to avoid copyright infringement or to improve the encyclopedic content or tone) you must still cite the original source." Since paraphrasing is not used to mitigate plagiarism, there should be no encouragement of the practice and no "how to paraphrase" section in this policy. If a "how to paraphrase" section is added to another policy (e.g., the style manual or the copyright violation avoidance policy) then that section should point back to the plagiarism policy to clarify the (non)relationship between paraphrasing and plagiarism. -Arch dude (talk) 04:42, 23 January 2009 (UTC)[reply]

←Just to note that there is now an essay on the subject at Wikipedia:Close paraphrasing. --Moonriddengirl (talk) 00:06, 31 January 2009 (UTC)[reply]

Best practices

The essay now contains a passage that is misleading in certain ways. It states: "Material from public domain and free sources is welcome on Wikipedia, provided it is properly identified and attributed. The best practice is to copy free content verbatim and indicate in the edit summary the source of the material. Further changes such as modernizing language and correcting errors should be done in separate edits after the original insertion of text. This allows a clear comparison to be made between the original source text and the current version in the article."

What is meant as best practice, is that in the evolution of articles relying upon public domain material that it's better for text tracking purposes, and as a favor to following editors, for any public domain text being put into an article, to be clearly put in, in one well-labelled chunk. That is better than pasting in PD text and, in the same edit, changing some of the wording, which makes it hard for others to separate what is from the original source, later, if issues come up.

That is not at all, however, best practice in my view, and I think many would agree with me. In my view, if public domain material is going to be introduced, it is best practice to put any such passage from public domain into blockquotes or quotation marks. Then, in later edits, editors can reword material and remove it from quotation marks, when the source no longer needs/requires crediting for its wording. Proper attribution of public domain text involves giving credit both for the source of facts and ideas as is done by footnoting and for wording, which is done by quote marks or blockquoting supplemented by footnotes as to the exact pages of source material.

I argue this latter approach is "best practice" because it is far more rational for the development of articles that ever may be featured articles. Featured article standards have evolved now to disallow use of public domain text covered by a generic PD template. I believe that from some interactions in FAC (though i was never involved very much there and I am not sure of how strongly ingrained this is or not). Also, it should be mentioned somewhere that articles built of pasted-in material are NOT eligible for DYK consideration. Pasting in is not "best practice" if you would like for your work to be highlighted in DYK or FA.

Also, I assert that the "best practice" for use of Federal government material on historic sites is not to paste in public domain material, but rather to go by the second approach. On this I believe I am speaking for most wp:NRHP editors. It may be that "best practices" for use of public domain material vary across types of public domain material and across wikiprojects involved in handling those types of sources.

So I think the paragraph should be revised to suggest its good idea about separating paste-in from subsequent edits, and it should describe more than one approach to using PD text with advocating the paste-in and freeforall approach that it describes. Also it should not be so broadly asserted that PD material is welcome in wikipedia, unconditionally. This passage has just now caused some difficulty between a new editor trying to do right, bumping up against evolved standards on use of Federal material on historic sites and some experienced editors including me who don't want to loosen our standards. doncram (talk) 00:44, 24 February 2009 (UTC)[reply]

A recent case at DYK and wp:NRHP has made it abundantly clear that the "welcome" message serves new editors poorly. It is obvious that in 3 areas of wikipedia, at least: NRHP subject articles, DYK nominated articles on any subject, and Featured Articles, that paste-in public domain material not clearly separated by quotation marks or blockquotes from other text is generally really rather unwelcome. Paste-in material from DANFS may or may not still be as welcome as it once was in wp:SHIPS articles; I believe it is no longer welcome in FAC articles from that wikiproject. Paste-in material from eb1911 was once welcome, but it is widely believed that mistakes were made in how it was brought in, and that it is very costly later to try to weed out the pasted-in material as many have been doing. I am also sure that there are vast areas of public domain material which is unreliable, offensive, or otherwise un-encyclopedic. If there are no objections, I will edit this draft guideline to modify, considerably, the welcoming of any and all public domain material. doncram (talk) 01:25, 26 February 2009 (UTC)[reply]
I strongly disagree. I'll come back with more later, but I agree with the discussion above stating that Wikipedia is a collaborative project, and incorporating public domain works allows us to collaborate to a greater extent. If someone else has already compiled research for us, it is a waste of our time, pure and simple, to rewrite it (unless issues of reliability, style, etc) so dictate. There is a lot of public domain content on areas that are undercovered in Wikipedia. The quickest way to jumpstart our coverage is to copy-paste, and subsequent editing should get such articles into fine shape. Consideration of the reliability of the source is appropriate, of course, but this is true for any writing here. Calliopejen1 (talk) 07:00, 26 February 2009 (UTC)[reply]
doncram, that wording resulted from extensive discussions about use of PD text. Perhaps it should say "best practice for when you need to modify the PD text". There is an essential tension here: we are allowed to freely incorporate PD text into our articles (as witness the thousands of articles created with {{EB1911}}); we must accurately attribute all sources; we must correct factual errors / update with new information; and we should reword old sources for contemporary use.
A blockquote is a great way to go iff it can be maintained. Perhaps a minor correction can be inserted following the quote, as discussed in the archives. PD text can be completely rewritten over time and morph into something completely different, an example case is discussed in the archive (Anadyr River). And I can't find it in the archive right now, but another approach would be to read the PD-source, change it on the fly while typing it in, then attribute the entered text to the PD source. It's that last method that the "best practice" wording is meant to discourage. We need to indicate at some point what exactly was from the public domain. Many of the contributors here agreed that was a clear line: this was from the public domain; here is where it was changed.
There is also (or at least was recently) text indicating that blockquotes are best for verbatim copying. However we also need a best practice for how to introduce and then extensively modify PD text. It's fine to say the best way is to completely rewrite it, but we also need to be open to different editing methods - and as also noted in the archives, lots of people want to contribute, but they're not all good writers.
As far as featured content goes, while I do appreciate your concern in that area, and I would agree that no-one should get a DYK for copying 1500 words out of an old book (or should they maybe? they've created an article and attention is attracted so that it can be rapidly improved, the purpose of DYK, right?) - I think those considerations might be better expressed in the FA and DYK guidelines, where they will be directly relevant. Franamax (talk) 08:05, 26 February 2009 (UTC)[reply]
No, the consensus in DYK is that no one should get credit for that. Blockquotes are specifically excluded from DYK word-counting formulae, and failing to put stuff into blockquotes that should be in blockquotes does not earn an exception. I do agree that it would be nice if there were very clear FA and DYK guidelines which could be pointed to. I will look for specific guidelines there, and i can invite regular participants from there to comment here. I will also look to clarify public guidelines in the wp:NRHP area. I don't expect you to take my word on it, simply, please allow me to assert that I know from some experience in FA/FAC reviews and in DYK and in wp:NRHP, that copied in PD text is often not welcome. My main point is that the current "PD is welcome" message is a gross overstatement of the actual state of affairs, in at least some process areas of wikipedia (DYK and FAC) and in some content areas (wp:NRHP) and with respect to some sources of public domain material (material that is offensive, unreliable, and unencyclopedic). There are new editors who come to wikipedia and believe they can make a contribution by pasting in text that they believe is PD. It is often not helpful and is counter to practice in areas in wikipedia, and they should not be unduly encouraged to come and paste stuff in. In practice, it is very touchy telling a new editor that the contributions they are making are not helpful, or are problematic for purposes of building articles towards GA and FA quality. It is better for them and for the "regular" editors in an area not to encourage them incorrectly that mixing in PD text will be appreciated. The effect of encouraging addition of PD text when its welcomeness is not true, which this wp:plagiarism draft guideline currently implements, is in fact mean to new editors. I'm not commenting about paraphrasing or not, I'm not saying material must be paraphrased, it is just clear that practices in some areas of wikipedia is now that verbatim text, if used, is put into quotations or blockquotes and credited both for the wording and the source, not just for the source. It is, in practice, mean to some new editors to assert otherwise in this guideline. doncram (talk) 08:38, 26 February 2009 (UTC)[reply]
I (think I) know where you're coming from - and I'm not suggesting you change the DYK criteria, I was just riffing there. I can see from the point where you work that you are faced with many, and often new, people looking for recognition for contributions which turn out to be not their own. To that subject, the current wording may actually be good, since it encourages people to clearly note the exact text they have copied and thus allows you, the reviewer, to ably evaluate how well they have adapted and rephrased it. Otherwise, it seems to me the incentive would be to copy PD with enough subtle changes on entry as to make it very difficult for you to trace down the true source. So we wish instead to present a clear "correct" path to using PD text.
As regards the exact wording of "best practice", indeed it needs to be matched to the desirability of using blockquotes - but recall that bq's preclude incremental changes. It's great to say that it either has to be within an inviolable quote or completely rewritten, and I kind of agree, but that's not really compatible with the style of a wiki, which moves in fits and starts. Fundamentally, we wish to encourage addition of content and remove obstacles to adding it, so if we can draw the clearest path to adding PD works, we should do so.
Now as to editors who "believe" text is PD, that's always going to be a problem. But at the very least, if we encourage them to copy verbatim and cite the source, we can evaluate the text and look for the overlap to WP:COPYVIO.
As far as being mean to new editors, heh, look at the top of the window where it says Wikipedia - that's another word for "mean to new editors", there are dozens of mean places and FAC/DYK is not excluded. But you may be talking about those new editors who come seeking rewards such as DYK's and think there may be an easy path. There just isn't, but I'd respectfully suggest that is your problem, not mine. Put another way, the purpose of the Plagiarism essay is to address the general concept and show the path to adding PD content to en:wiki properly. This essay is not concerned with qualifying for FA/GA/DYK status - that is procedural, not encyclopedic. The vast majority of editors here, and I would suggest the large majority of editors who ever read this, are not concerned with that particular area and are instead amateurs in the true sense of the word. Franamax (talk) 09:27, 26 February 2009 (UTC)[reply]
I think it is a basic aspect of respect to mention the source and author of a work that previous contained the information that is being used, regardless of PD or not. If Wikipedia wants to be a better encyclopedia than Britannica, it would be best to follow these practices. What would this mean exactly? Well, we can work out those details later. But I think we should strive for this ideal. Ottava Rima (talk) 15:25, 26 February 2009 (UTC)[reply]
I agree with this statement. We need to get away from just adding a note at the bottom re: PD text, to actually using footnotes to explicitly say which parts come from where. When I create articles from PD sources (see Mali for an example that uses a significant amount), I use the template {{PD-notice}} (which I created) to mark footnotes where PD text has been incorporated. Calliopejen1 (talk) 20:39, 26 February 2009 (UTC)[reply]
(I haven't read this whole thread, I'm just jumping in) I know there's no legal problem with using PD text, but I have always maintained that it's still not desirable and, while we don't necessarily have to edit this guideline to say "you will burn in hell if you use PD text," we should at least not be encouraging it. For one thing, it reflects poorly on the encyclopedia: most readers don't know a lot about the difference between PD and copyrighted text, and when they look up an article and find it to be the same as something they read elsewhere will think "ah, Wikipedia sucks, bunch of plagiarizers"—indeed, the first time I encountered a WP:SHIPS article that was entirely DANFS text I almost marked it for speedy deletion because I thought it was copyvio. Secondly, even if it's not "wrong" to import PD text, there's no good reason to do it if you're article-building. I'm often struggling to find good references for an article and have a nice, meaty Footnotes section...so why throw away a reference by simply copying it, when you could instead stick it in <ref></ref> tags and beef up your references section and build a nice article out of it? rʨanaɢ talk/contribs 17:13, 26 February 2009 (UTC)[reply]
The answer to your question is the History of Cambodia series, and many similar areas of Wikipedia. Not using (as in copy-pasting) this is nearly as good as throwing it away, because no one in the near future (five years? ten years? who knows) is going to research the damn thing themselves. Where we have reliable tertiary sources in the public domain, this is a GREAT reason to use the text for article building. Calliopejen1 (talk) 20:33, 26 February 2009 (UTC)[reply]
And just to clarify, I don't have any big problem with explicitly saying PD text is not welcome in DYK (though to a certain extent that is foolish, because tracking down and formatting PD text can take TONS of time as I know from personal experience, and is often worth of rewarding) or FA. These processes should be able to set their rules however they want. My big problem is changing from saying that PD text is welcome as a general matter. Calliopejen1 (talk) 20:37, 26 February 2009 (UTC)[reply]
I appreciate what Calliopejen1 is saying with respect to pasted in material providing something for obscure subjects in wikipedia. It is Calliopejen1's personal opinion, not a fact, though, that putting PD text in place for those subjects speeds the day in which a better article is in place. I happen to believe that in practice, having pasted-in PD text often blocks progress. For one thing, it tends to be daunting for new editors, if there is massive material in place written in a certain way. Also, it tends to engender edit wars when others wish to develop the article, perhaps by wiping the slate clean of mixed up PD text and other writing. Also, to whatever extent others do edit the PD text mixed into an article, that work is wasted if the entire material is later wiped out by those wishing to start fresh, in which cases it would seem better for the PD text not to have been put in in the first place. There are differences of taste present, and it is a matter of opinion about which process of article development works best or fastest. doncram (talk) 00:15, 27 February 2009 (UTC)[reply]
Okay, I took a shot at revising the "welcome" passage and a bit more, to make it clear that pasting in text is not always welcome. I have tried to put forward a positive example of one place where public domain text has been welcomed, in ships articles using the DANFS source. I've asked one ships editor to check what i wrote. I don't want to discuss whether or not wp:ships and wp:NRHP should take the differing positions that they do about different public domain sources that are available in their areas; I mainly want to get across that there are differences and that PD text is not universally welcomed, irregardless of its quality and the status of the wikipedia articles to which it might be added. Also, I think it needs to be said that it is okay to treat public domain text like other text. I have been involved in situations where another person, adamant that PD text "can" be mixed in without violating copyright law, took the ridiculous position that the PD text cannot be quoted and footnoted like other text. It needs to be said, you can quote from PD sources and treat them just like other sources, as is done generally now i believe in wp:ships articles brought up to FA status. doncram (talk) 00:15, 27 February 2009 (UTC)[reply]

Doncram, I have some major concerns about your recent changes:

  • You've vastly increased the length of text. People just don't read that much in one shot.
  • You're hedging around the plain fact that PD text is acceptable, provided it's properly attributed. Whether or not it's relevant is an editorial decision.
  • You are leading off the section by indicating that PD must add to the existing article - but that's most often not the case, PD is more often used to start an article. Some (short) wording about "welcome if it adds significantly" would be good though.
  • You're discussing DANFS, but EB1911 is already in there as an example of generating articles.
  • You're discussing FA and DYK criteria. Those have nothing to do with a guideline on plagiarism, they have only to do with FA and DYK. Have you beefed up those guidelines so that editors interested in those achievements are aware of the requirements? The most needed here is along the lines of "use of PD text may affect article assessment". People reading this are looking for information about plagiarism, not how to get pretty stars for their userpage.
  • You're changing the message. When PD text is copied verbatim, it must be attributed, either in the edit summary or elsewhere. If it's not, that is the very definition of plagiarism. That may not have been worded emphatically enough in the existing text, but it looks to me to have been diluted further.

Please revisit your changes. I'm inclined at this point to just revert them and start over to address some of the concerns raised above. And please be aware of tl;dr - we really need to keep things concise to have an effective message. Franamax (talk) 00:23, 27 February 2009 (UTC)[reply]

Thanks Franamax for your comments. I don't know what you mean by "tl;dr", by the way. I'll respond point by point here, rather than within your comments.
    • Length. Yes, what i wrote is perhaps now too long for the purpose here. I think that providing some positive and negative examples of where PD text has and has not been welcomed in wikipedia is important. This could be relegated to a separate article "Wikipedia:Plagiarism/Past use of public domain text in wikipedia" perhaps? I would object to your simply removing these examples, but i do think it could be appropriate for you or someone else to go ahead and split some out to a new article, linked from here.
    • PD is acceptable? Acceptable for what purpose, where? I am trying to get away from too-broad statements that PD text is unconditionally "welcome". I don't think it is unconditionally acceptable, either, in real practice, and on various policy and editorial grounds. Editors are coming here for guidance who want to add in big blocks of PD text, and they need to be told, here, that it is not always welcome or "acceptable", although it may be legal in terms of copyright law. If you want to say that PD text is often acceptable, instead of saying it is often welcome, that is okay by me though.
    • Saying that PD text has often been used to create new articles is okay by me, that is accurate. It needs to be said that is often not welcomed, though.
    • I didn't actually see the EB1911 mention. That is mentioned only in a section further below. At least that section mentions big campaigns to add PD material can be controversial. I think that section should be integrated into this one. It perhaps could be said that adding any new PD material can be controversial, because it can be seen as the start of a campaign to add a lot of PD material. It is not necessary to give too many examples, others can be relegated to a separate linked article. But, I think giving at least 2 examples to convey that the PD sources vary, and the acceptability of adding them varies, is really needed.
    • Discussing DYK and FA criteria is very relevant here, I think. People should not be encouraged to add in PD text without being given some idea that their additions will likely be removed, eventually or immediately, and to suggest there is immediate reason (for getting DYK recognition) to putting in the PD material in blockquotes or quotation marks, one acceptable treatment. This could be more briefly suggested though, I agree.
    • I did not mean to dilute the message, about PD text must be attributed. Please restore the stronger language where necessary. I do think the previous version overly strongly suggested adding PD material with PD attribution template, in lieu of adding PD material using regular quoting and sourcing, which is valid and needs to be suggested as a viable, often preferable in the eyes of editors in some areas, alternative.
    • I would rather you tried to work with what I added, rather than revert back to the previous version. Don't you agree that i added some legitimate, important points and relevant examples that clarify matters? I do agree it should be done with less wordiness. doncram (talk) 01:28, 27 February 2009 (UTC)[reply]
Generally, yes. :) And looking over the details of your response, yes. I've just spent the last two hours going over the last two months of changes and I have some notes that might amount to a major reorganization of sections and paragraphs, with the aim of incorporating recent views and making it all more readable. On the other hand, I could well fail in the attempt. Should I accept this mission, it would likely start with moving back beyond your changes of today, with the aim of re-incorporating the intent of your changes. Or not. Seems we're singing from the same musical score though. :) It's just that the normal editing process has not resulted yet in a coherent document. Franamax (talk) 03:26, 27 February 2009 (UTC)[reply]
Okay, go ahead and put in a complete rewrite, or edit in place, either way. I appreciate that you've received my input, thanks. doncram (talk) 04:43, 27 February 2009 (UTC)[reply]

This section is now disharmonious with the free-licensed section. Why do we have so many caveats for PD text and so few for GFDL text? Calliopejen1 (talk) 19:31, 1 March 2009 (UTC)[reply]

I agree, actually. The guideline could do with a complete rewrite. doncram (talk) 19:58, 3 March 2009 (UTC)[reply]

He's apparently added all this text to this page to retroactively win a dispute he had (which he'd already won by making me leave Wikipedia, I guess this was the coup de grace). Is this really what you people do with your lives? lol. --Miss Communication (talk) 00:33, 11 March 2009 (UTC)[reply]

Indeed my actions here are in immediate response / followup to recent interactions with Miss C, in which she invoked the "PD is welcome" message from here. I'd rather say i was taking Miss C's points about guidance given here and elsewhere seriously, for future new editors, rather than i was trying to "win" an argument with her. doncram (talk) 02:19, 11 March 2009 (UTC)[reply]

possible development of wp:pd

Perhaps much or all of this discussion on the use of public domain material should be included in a positive Wikipedia content guideline on using public domain material, rather than under the negative label of plagiarism. And it seems onerous for wp:Plagiarism to cover all proper practices; it should be more about identifying attribution problems and how to contest or remedy them, I think. I think that wp:PD, which is labelled as a content guideline page, should carry a lot of the burden for describing proper options for public domain material use. I've opened discussion at Wikipedia talk:Public domain#where are the guidelines? towards redeveloping that page to serve this purpose. doncram (talk) 19:58, 3 March 2009 (UTC)[reply]

Creative "use of force" ...

Wikipedia articles may embed full verbatim texts only when those sources are licensed PD or GFDL. GFDL texts must be referred to the author. But what about all other kinds of free licensed text (mostly including CC-BY and CC-BY-SA)? According to some people I met, CC-BY text can not be verbatim copied in Wikipedia, but needs to be rearranged in order to avoid an original license misuse charge. Brief explanation: the Italian Army officially stated to Wikipedia that most part of Army web site contents are licenced under CC-BY. Some wikipedians on it.wiki do not consider texts released under CC-BY licence fully compatible with Wikipedia GFDL policy as they consider CC-BY licence "would force" Wikipedia articles status to CC-BY too, which is incompatible with present Wikipedia GFDL choice. How can we deal with this and where we can find useful data for better assessing this dispute winner ? --EH101 (talk) 18:43, 28 February 2009 (UTC)[reply]

Hmm, I can see arguments on both sides of this. I'm going to copy it over to WP:Media copyright questions and see if anyone there has an opinion. See Wikipedia:Media_copyright_questions#CC-BY_vs._GFDL Franamax (talk) 22:10, 28 February 2009 (UTC)[reply]

Direct translation and plagiarism

Hello, could anyone add some description about cautious warning on direct translation and copyrighted derivative works? Since an issue relating to plagiarism based on a direct translation from a foreign language is being discussed at ANI, I think this guideline needs to be addressed on the essay or guideline page. Thanks.--Caspian blue 19:15, 5 March 2009 (UTC)[reply]

Dispatches article on plagiarism

Please see Wikipedia:Wikipedia Signpost/2009-04-13/Dispatches - the Signpost's Dispatches' article on plagiarism. Carcharoth (talk) 19:30, 13 April 2009 (UTC)[reply]

Attribution templates

Useful reading material at WP:FCDW/Plagiarism.

These templates should all be deleted and this practice of wholesale copy/pasting from other sources into Wikipedia with only a notice at the bottom that the text was ripped from another source, even though that source is "public domain", should be stopped and future practice of it strongly discouraged. This is something that should probably come down from the level of the Wikimedia Foundation itself. Cirt (talk) 05:36, 15 April 2009 (UTC)[reply]

Cirt, it may be a long slog through the talk archives here, but there is extensive discussion on the wiki-historical practice of incorporating PD text and how exactly to manage the transition to mercilessly edited text. Many thousands of our articles are based on PD sources, EB1911 is an excellent example. When you advocate deleting the templates, you imply deleting the PD text also. Beyond the loss of content, this becomes impossible for well-integrated articles.
More generally, we have no policy restriction to prevent incorporating free sources. An imperfect analogy is with PD images or audio clips - we use and reuse them all the time. You can compare for instance to Durova's vast work "mercilessly editing" historic images to improve their presentation quality. She states the PD source and notes the changes made. The new work becomes part and parcel of the article. The analogous process with text is no different - we are free to incorporate PD text and improve it. However, there is a large discussion as to how exactly we go about that.
And even more generally, if you explicitly say "I copied this" - it's demonstrably not plagiarism. Franamax (talk) 08:00, 15 April 2009 (UTC)[reply]
Cirt: Why? Stifle (talk) 14:28, 15 April 2009 (UTC)[reply]
This is clearly a big issue and I am not sure myself what the best answer is or the best way to address this - I just wanted to bring this up on this page because it is a discussion the community should have, to avoid plagiarism. Essentially, read WP:FCDW/Plagiarism for more info. Perhaps it is difficult to address this with regard to the troubling situation of the many articles that copy/paste from other sources without proper attribution to which specific parts of the texts of those articles is copied verbatim from other sources - but maybe going forward we should cease doing this practice in the future (copy/pasting verbatim text from other sources without proper attribution to each specific part of those texts) and instead develop a better way to utilize these public domain sources going forward, so as to avoid plagiarizing them. Cirt (talk) 08:28, 16 April 2009 (UTC)[reply]
I have to note that there's already healthy dissent with that portion of the dispatch at its talk page. Although I didn't write that section, I am one of the dispatch's authors, and I myself found the question that Colin raised here pertinent. Of course, I've also advocated at that talk page altering these templates to indicate that language may have evolved to change the views of the original, but that's not a pressing plagiarism issue; at some point, I'll raise that at the templates' talk. --Moonriddengirl (talk) 11:17, 16 April 2009 (UTC)[reply]
"...altering these templates to indicate that language may have evolved to change the views of the original." I agree with that. Note that, I'm firmly on the side of Wikipedians who do not want to treat free content the same as non-free content when considering charges of plagiarism. Plagiarism is the presentation of another's work as one's own; reuse of externally-produced free content is fine so long as proper attribution is given. We just need to agree on what is proper. We have a goal to produce an encyclopedia that presents the sum of human knowledge. Nowhere in our goals do I see that we need to do that alone. --mav (talk) 00:49, 17 April 2009 (UTC)[reply]
My preferred solution is twofold:
  • First, I think that we should state as best practice that unquoted PD-text is inserted verbatim (or as close as possible) with a single well-indicated edit. "As close as possible" leaves open normal wiki-formatting and fixing that Elizabethan tall-f vs. "s" thing, but precludes injecting originally authored phrases in the PD insertion. Normal editing proceeds from that point, correcting inaccuracies, adding ref's, sentence-wise rewrites - but the original and exact PD copy is preserved in the history, so there can be no doubt as to the authorship. In fact, I'd favour making this an absolute requirement, but others may disagree, see for instance this (possibly tl;dr) talk thread.
  • Second, I favour a triple-barrel approach for PD attribution: 1) clearly indicate in the edit summary that you are placing PD text not written by yourself; 2) Indicate on the article talk page that you are adding PD text, with a link to the article edit and the source, either online, ISBN link, or best effort notation; 3) Place a suitable PD attribution template on the article page (I think the page bottom is just fine).
  • Third, the PD attribution templates would ideally be modified to allow inclusion of edit links to allow the casual reader to easily identify exactly which pieces of PD text were incorporated. Tracing the subsequent changes is just part of the normal wiki-hunt.
  • Fourth, I think that additions of PD text to relatively mature articles should be deprecated or strongly discouraged. Not that I've seen that happen ever, but just in case. However when a valuable source enters the public domain and can be used to initiate a bunch of new articles, I say "heck yes!" - I'm thinking here about about insects and alligators, if we can expand our coverage, why not do it?
I'll leave my "twofold" solution comment at the top, just to show why I didn't choose accountancy as a career. :) Feelings are quite strong on this and I make no claim on my preferred solution being optimal - but I think we can all find a middle ground here. Franamax (talk) 02:11, 17 April 2009 (UTC)[reply]
I agree with all of this except the 4th. Often a older PD source will be found which can add sufficiently important information even to a mature article. Often articles are created from scratch without even looking for PD text. A much greater amount of PD material is now of course becoming available on the net, and can be easily used. New PD sources are of course created continually as such things as US government publications and open access material are published. (Non-PD sources not in the public domain in the US will not become PD for many years from now, under current legislation, so we need not deal with that now.). I am concerned that a great many of our "reasonably mature" articles, even many classed as Good articles, can be greatly improved by what may be available in the PD. DGG (talk) 18:22, 19 April 2009 (UTC)[reply]
I take your point. What I was thinking of was the potential effect on structure and tone of a mature article when you drop in a chunk of text that was written in 1905. I have no problem whatsoever with adding new information, but I think extra care has to be taken to properly integrate it.
For instance, if a text on Silver maples is released by the US Forestry Service, it's not appropriate to blanket copy it into the existing article. It will largely duplicate what's already there, with a different structure and tone of writing. IMO that would actually subtract from the quality of the article. However, directly copying a section, lets say "Common pathogens of the silver maple" which is not currently in our article - yes, that would be a net benefit I guess, since it rapidly expands our article and makes the free text available for improvement through the normal editing process. I'll back off the "strongly discourage" bit in favour of "use extra care" - how's that? Franamax (talk) 19:17, 19 April 2009 (UTC)[reply]

Promotion to Guideline status

Another editor has promoted this to guideline status but I have reverted as it seems that there are too many loose ends above and no clear process of acclamation to indicate that there is consensus for this guideline in this form. Colonel Warden (talk) 07:16, 24 April 2009 (UTC)[reply]

All guidelines and policies are subject to additional changes. Nor is a 'clear process of acclamation' normally required. This has been in proposal stage for long enough, has been fairly stable, and there is minimal opposition to the notion that Wikipedia ought to include plagiarism within its formal policy and guideline structure. It's somewhat of an embarrassment to the project not to have it. Nonetheless, I will open a request for comment on the proposal. Note that formal RFC is not a requirement for guideline promotion. DurovaCharge! 18:24, 24 April 2009 (UTC)[reply]
No, but consensus is. And right now there is no consensus for the guideline in it's present form 189.105.47.84 (talk) 20:00, 24 April 2009 (UTC)[reply]
Since this is the only edit by this IP address, presumably you have a main account? DurovaCharge! 20:49, 24 April 2009 (UTC)[reply]
I fail to see the relevance of my status, but no. I just happen to have a dynamic IP. 189.105.47.84 (talk) 20:53, 24 April 2009 (UTC)[reply]
From Wikipedia:Requests_for_arbitration/Privatemusings#Sockpuppetry: Sockpuppet accounts are not to be used in discussions internal to the project, such as policy debates. Since there is no visible evidence of any other edit history outside this discussion, your declaration that the proposal lacks consensus might be ignored by uninvolved editors. DurovaCharge! 21:02, 24 April 2009 (UTC)[reply]
How convenient! Except this is not a sockpuppet account. I've been here a long time, editing with my IP adress which just happens to be dynamic. (dynamic IPs are the norm in my part of the world). Also, people don't need your permission to ignore me if they so wish. If you don't want to engage me then thats you prerogative but don't try to discredit me for no reason other than that you disagree with what I've said. You should know better 189.105.47.84 (talk) 21:12, 24 April 2009 (UTC)[reply]
Perhaps you should consider registering for an account. There are many advantages, including establishing a reputation by which other editors may know you (for more on that, see Wikipedia:Register#Reputation and privacy.) As to consensus, the RfC will determine that. --Moonriddengirl (talk) 21:17, 24 April 2009 (UTC)[reply]
Agree with Moonriddengirl. Alternatively, you could link to prior IPs that you used to demonstrate a consistent edit history. DurovaCharge! 23:55, 24 April 2009 (UTC)[reply]

RfC

Template:RFCpolicy Promote this to guideline? DurovaCharge! 18:44, 24 April 2009 (UTC)[reply]

Reasons for

Plagiarism is not identical to copyright.

A plagiarism statement ought to have been policy years ago. This was overlooked because WP:PLAGIARISM was a redirect to other pages. First here, then here. The problem was that neither redirect defined plagiarism, and copyright is a separate concept. This page has been in proposal stage since June 2008 and was highlighted in the 13 April 2009 Wikipedia Signpost under the title "Let's get serious about plagiarism". Promotion to guideline is essential to establishing credibility for this project, and also essential for explaining proper citation requirements to plagiarists. It's high time to promote the page. DurovaCharge! 18:44, 24 April 2009 (UTC)[reply]

Reasons against

Wikipedia has survived without a policy or guideline on plagiarism for eight years. The legal requirements for submitted text and images are clearly dealt with in the WP:COPYRIGHT article. There is no concern with keeping this article on plagiarism in Wikipedia for reference but to upgrade it to a guideline is unnecessary. At around 4,500 words this proposed guideline adds further bloat to the already bloated collection of guidelines in Wikipedia. The addition of yet another guideline may discourage new editors from editing and give the incorrect impression that Wikipedia is more focused on policies and guidelines than producing a well-written encyclopedia. The Wikimedia wiki noted that it was desirable to avoid instruction creep back in 2004. — Preceding unsigned comment added by Cedars (talkcontribs) 27 April 2009 (UTC)

Everything in this proposed guideline is covered by existing policies and guidelines, specifically WP:V and WP:CITE, thus we don't need to cover the same ground. (Discussed in the talk archive here and in other places). Included for completeness of discussion, not as my view. Franamax (talk) 02:50, 29 April 2009 (UTC)[reply]

Discussion

  • Yes, promote this to guideline. A guideline on plagiarism is long overdue; if Wikipedia is to be taken seriously as a work of scholarship, it needs to acknowledge scholarly standards. --Moonriddengirl (talk) 19:05, 24 April 2009 (UTC)[reply]
  • I agree that this should be a guideline. Plagiarized material brings disrepute on the encyclopedia.   Will Beback  talk  19:08, 24 April 2009 (UTC)[reply]
  • It should honestly be a policy, but one step at a time. Ottava Rima (talk) 19:19, 24 April 2009 (UTC)[reply]
The guideline is specifically needed to help new wikipedia contributors. In several cases I have seen new contributors very much turned off by wikipedia, upon their adding material and then being criticised for possibly plagiarizing. In the absence of a guideline, it is very confusing and dismaying for new users, who can believe they are contributing within all rules, but encounter editors (me included) asserting that at least some types of public domain text is not wanted and/or that sourcing should be done differently. It is much better for the new contributors to be able to point them politely to a guideline. And it is better to have some experienced editors focused on refining a central guideline, rather than having practice emerge out of conflicts with new users. doncram (talk) 19:13, 27 April 2009 (UTC)[reply]
  • (Scrambles for notes...) I'm not sure I'm totally comfortable with the current state of the proposal and the recent Dispatch page seemed to turn up a minor division on the issue of quoting/not quoting PD text (for which read a possible vast yawning chasm). :) No objections to making it a guideline though, it might get appropriate attention that way and be fleshed out faster. Also, a guideline might facilitate establishment of a formal forum where plagiarism concerns can be brought. It's a hugely imflammatory term, since it carries with it the suggestion of dishonesty. You can call someone a troll or an idiot and they still have their dignity. Call them dishonest though... Hence the need for central treatment. Franamax (talk) 22:05, 24 April 2009 (UTC)[reply]
Strike my lack of objection, since I'm not happy with the extant text. I'm no longer convinced that this can be furthered with more success as an existing guideline, better to improve as a proposal and resubmit.</s,a;;> Franamax (talk) 03:55, 29 April 2009 (UTC)[reply]
  • Comment: a read through the archives is well worth the time, since some pretty long-term and experienced editors had input and perspective on almost every issue that may come up in future discussions. Franamax (talk) 22:05, 24 April 2009 (UTC)[reply]
  • Wikipedia policy development typically is descriptive, rather than prescriptive. Policy and guideline documents are generated to help people respond to recurring situations (rather than reinventing the wheel each time), to codify best practices (to keep track of what works), and to allow us to maintain some consistency across the project. Plagiarism is definitely a recurring, persistent problem. Having a coherent, cohesive guideline which describes our most effective strategies for dealing with plagiarism is long overdue. Give this one the rubber stamp. TenOfAllTrades(talk) 00:29, 25 April 2009 (UTC)[reply]
  • Absolutely. It's also worth saying that Carcharoth put a fair bit of work into this at one point. But the lack of a guideline on plagiarism has been a glaring absence here. --jbmurray (talkcontribs) 00:43, 25 April 2009 (UTC)[reply]
  • Yes, this is an important step towards a sensible guideline, though only a start. Until recently, plagiarism issues have been treated badly in Wikipedia; the guidelines have been inadequate and resources offered to editors almost non existent. This has opened the way to unnecessary copyright violations, and I suspect, a significant under-body of as yet unrecognised copy violations. It has also resulted in some editors acting with a focus on and concern for the copyright issues, but treating other editors in careless and unproductive ways. --Geronimo20 (talk) 02:22, 26 April 2009 (UTC)[reply]
  • This is like having a referendum on funding children's hospitals, the kind where the voter information guides say "no contact information was provided" for the oppose side. There may be disagreements over minor details (how to fund the hospital), but essentially no one's going to argue against it. So yes, I support the proposal. Recognizance (talk) 02:24, 26 April 2009 (UTC)[reply]
  • Oppose. See above arguments. Cedars (talk) 14:08, 27 April 2009 (UTC)[reply]
  • Support. I've been unpleasantly surprised to find out how many people are unaware what plagiarism actually is and how to avoid it. Several times recently at FAC, reviewers have identified articles that were in part plagiarized from various sources - the editors writing those pages were clueless on why it was bad. We need a guideline to point to so that editors can be educated about this very serious issue. Karanacs (talk) 19:26, 27 April 2009 (UTC)[reply]
  • Wikipedia all too often displays profound intellectual laziness, if not a disturbing lack of ethics and respect for the original work of others. Expediency, efficency and technical legality are common, yet unacceptable excuses for such trespasses. To be somewhat topical, the AIG bonuses were legal - note, however, their stark departure from ethics. That Wikipedia has thus far "survived" is not relevant. That other guidelines may constitute bloat is not relevant. The proposed guideline is, frankly, not terribly well written; it is nevertheless meritorious. Editors, if any, deterred by a formal preclusion of plagiarism will not be missed by any project that values quality and intellectual propriety. Эlcobbola talk 19:54, 27 April 2009 (UTC)[reply]
Umm, cough, fishbone in my throat... The AIG bonuses were the outcome of contracts signed with persons before the current debacle, and those people performed to their contracts. I don't want to debate that issue here, but I dispute that ethics were involved per se. It's like saying that a rat is unethical for negotiating a maze. The rat should still get the reward, even if the maze has collapsed.
Other than that, I agree with your points, and we do need a framework to ensure that in fact no ethical breaches occur in our own little world. Franamax (talk) 02:42, 29 April 2009 (UTC)[reply]
The departure of ethics is not that they were paid, but that they were (in some cases and/or initially) retained by the payees. The point is that being legal is not necessarily tantamount to being ethical; the example I chose to communicate that may not be prefect, but it is sufficient to understand the spirit of the argument. Belabouring of this point is not a productive or necessary use of our time. Эlcobbola talk 02:58, 29 April 2009 (UTC)[reply]
  • Support Per Elcobbola. A text-based reference work like Wikipedia must have a plagiarism policy. Awadewit (talk) 21:27, 27 April 2009 (UTC)[reply]
  • Support upgrading it to guideline status. –Juliancolton | Talk 01:22, 28 April 2009 (UTC)[reply]
  • Support. Among the supporters here we have prolific featured article writers who have taken time out from their own article writing to ensure that plagiarized material doesn't run on Wikipedia's main page, also volunteers who have poured long days into uncovering and undoing the contributions of prolific plagiarists. These are tedious tasks that they undertake because they care about this site's ethics and credibility. Some of this site's contributors are quite young and need an introduction to the concept of plagiarism. The few who would depart in a huff rather than comply would not only not be missed--their departure would lift a burden from our best editors' shoulders and leave them more time to create featured content. Consider this RfC a palpable expression of thanks to those who have worked to eliminate plagiarism; you know who you are. :) DurovaCharge! 15:43, 28 April 2009 (UTC)[reply]
  • Support I came here a long time ago planning to help build this page, and got sidetracked (and intimidated by how much work needed to be done). Since then it has grown into a great resource. And plagiarism is one of the main things that can hurt WP's reputation, so it's critical that we have something official to address it. rʨanaɢ talk/contribs 23:29, 28 April 2009 (UTC)[reply]
  • Support. Absolutely. --Moni3 (talk) 23:40, 28 April 2009 (UTC)[reply]
  • Support Definitely needed for credibility and because we need a guideline (I would support it as policy too) on one of the most important aspects on what not to do for article writing and researching. Dabomb87 (talk) 23:42, 28 April 2009 (UTC)[reply]
  • Support: I would much rather have a solid guideline or policy in place to help a new editor get truly proficient and lose ten casual editors who might plagiarize (causing more work for the rest of us) than the other way around. I understand the concern about instruction creep, but this feels like something that can't be helped. (Does anyone affected by instruction creep really go beyond the Nutshell, anyway?) I'm as opposed to biting the newbies as anyone (as I was bitten) but on the other hand: If people really want to contribute meaningfully to the project, they need to understand how it works (as I did). Scartol • Tok 00:13, 29 April 2009 (UTC)[reply]
  • Support: We need a guideline or policy because these are tricky issues, and we can't rely on common sense. The boundaries of plagiarism are drawn according to academic tradition and practice, and not everyone is familiar with these. As for biting the newbies, it is much more helpful to point someone to some useful guidelines and help them do the right thing than to simply delete their work when they do the wrong thing. Even for someone like me who is used to academic writing, it's also quite useful, because the informal rules in my field are not really the right ones for a general encyclopedia. --Amble (talk) 00:30, 29 April 2009 (UTC)[reply]
  • oppose Not another bloody guideline! I edit some quite technical articles that involve a lot of research, and WP's policies and guidelines already total more than I read up for several research-intensive articles. --Philcha (talk) 00:37, 29 April 2009 (UTC)[reply]
    • If the only objection to this is the overall number of guidelines then perhaps it can be joined with WP:COPYVIO at some point, since they are similar concepts.   Will Beback  talk  00:49, 29 April 2009 (UTC)[reply]
      • That just gaming the metric implied in my previous comment. I oppose the addition of words or KB to WP's already bloated corpus of policies and guidelines. --Philcha (talk) 11:45, 29 April 2009 (UTC)[reply]
  • Support. Mere copies of existing public domain material should be archived onto Wikisource where it will not be tampered with, and the Wikipedia article can use it as a cite, rather than dumping the text into Wikipedia with a few minor tweaks and pretending it is our work, and then placing it under a copyright license, which is akin to Copyfraud. Thank goodness we are getting serious about this. John Vandenberg (chat) 01:02, 29 April 2009 (UTC)[reply]
  • Support As long as people think edits like this are "fixing copyright issues" we need this as a guideline. Ruhrfisch ><>°° 01:44, 29 April 2009 (UTC)[reply]
  • Support. We need this guideline to send the clear message that appropriating the words of another, regardless of the status such as the Public Domain designation, is not what creating an encyclopedia is about. —Mattisse (Talk) 02:02, 29 April 2009 (UTC)[reply]
  • Strong Oppose Should Wikipedia have a guideline on plagiarism? Yes. Should this be it? No. I have serious concerns about the text of this proposed guideline as it presently stands. First of, the tone is heavily biased against incorporating public domain texts into the project. I don't think I need to remind you all however, that this has been a long stading practice and as long as adequate atribution is provided there is nothing wrong with it. Which leads me to my second, and most important point, the guideline conflates alot of issues that have absolutely nothing to do with plagiarism, the whole section debating the merits of public domain sources should be removed for instance. Finally, I'm abit dismayed by the comments of some people here, which clearly show that they don't fully understand the distinction between copyright violation and plagiarism and might be supporting for the wrong reasons or under false assumptions. 189.105.99.200 (talk) 02:22, 29 April 2009 (UTC)[reply]
  • While I do think we need a guideline on plagiarism, I don't believe this proposal is ready. It is neither clear nor cohesive and the intro is way too large altogether. Overall it has too much material suited for Plagiarism and not enough material suited for a Wikipedia guideline.--BirgitteSB 03:28, 29 April 2009 (UTC)[reply]
I agree that there are many things wrong with it (for example, it spends far more time explaining what isn't plagiarism than what is), but I believe we can start with this as a draft. Is there anything here that you absolutely can't agree with? Awadewit (talk) 03:33, 29 April 2009 (UTC)[reply]
I really question the "media plagiarism" section. See how the tern doesn't merit an article? And the "What is not plagiarism" section is a real problem for me. And most of all the lack of focus on actual guidelines on how to appropriately copy text into Wikipedia. Guidelines should not focused on the wrong way and this proposal is. For example a proper Wikipedis guideline would be titled "Avoiding plaigarism". I mean we don't call it "Oringinal Research" for a reason--BirgitteSB 03:50, 29 April 2009 (UTC)[reply]
Without having looked into it closely; my first impression is that "media plagiarism" doesnt belong in here, and "What is not plagiarism" is unnecessary as that is essentially "what cannot be copyrighted". But taken as a while, this page is good enough for a guideline. John Vandenberg (chat) 05:09, 29 April 2009 (UTC)[reply]
I really in the descriptive camp, even more for guidelines rather than the policies. And compared to my expectations of guideline; this just isn't written as one. It is hardly the end of the world if it were promoted despite the problems. But on the other hand, I believe most people here don't just want a guideline, any guideline on plagiarism. They want a truly useful guideline.--BirgitteSB 18:10, 29 April 2009 (UTC)[reply]
  • Oppose and leaning now toward a strong oppose. I've taken to heart the last two objecting editors (though I do wish 189.105 had signed in first), and I have to change my previous non-objection. I've worked on this for a while now, so I may be too close to the subject. Nevertheless, on re-reading of the current proposed guideline:
  • It is an absolute wall of text, it's way too long and meandering. This guideline is really only meant for two audiences: newish users who don't understand the concept and need concise guidance as to what is and what is not OK; and less-new users who are looking for guidance when they've spotted something iffy and need to quantify. It should be much shorter! From my experience: "it's a long story" - "put it into a small package"
  • The tone is now more aggressive than perhaps what was originally envisaged. I detect a more present-tense case, for instance "Material plagiarized...is not being properly presented" suggests this is something that just happened, whereas I would phrase "When you include material...you do not properly inform your audience" (apology if I wrote the original!).
  • Conversely, we went from "copied" to "borrowed" in headings and text. Sorry DCoetzee, but I disagree, if you borrow it, you give it back eventually, right? We're talking about "copying" material.
  • And to echo the IP above, the "Attributing"->"Public domain" section is just impenetrable for a new editor, but seems to say "it's welcome - but here's all the hundred reasons why it's not". That's just an impression, but I don't think that impression is conistent with our mission. No policy (or ethical or moral) imperative says that we can't freely use PD-text within our actual articles PROVIDED that it is properly attributed.
  • Echoing above, this guideline dhould very clearly draw the distinction between copyvio and plagio and it doesn't right now. Copyvio is copyvio, it already has it's whole own procedures and it is (or was) noted right at the top. Copyvio shouldn't keep being discussed throughout the guideline.
  • And echoing again - we need to focus on the basics: what is plagio, how not to plagiarize, how to spot plagiarism, how to deal with it.
  • And to bring up aother point that may not win me any friends, I'm dismayed by the current fixation on DYK and FA achievements. Those are wiki-internal subjects and completely irrelevant. A simple note to consult the relevant rules at those venues is sufficient. DYK/FA are important, but they can't get in the way of us adding content - they just help us to turn it into quality content.
Sorry for the length. In summary, I would propose non-promotion, and those editors who are still interested could take apart and reformulate this. Some of the incoherence is due to sporadic interest, so a sustained effort would be good. However I think that would be much more difficult should the current text be promoted as a guideline. Franamax (talk) 05:11, 29 April 2009 (UTC)[reply]
You are close to it, and I am close to it too, having modified what was an overall "PD is welcome" message to mention some of the "hundred reasons why" or places where PD is often not welcome, including the points about DYK and FA. I think that the whole issue about plagiarism is whether adequate attribution is provided or not, and there are important differences among us about what serves as adequate attribution, yet to be worked out. Plagiarism can be defined simply as situations where there is less clear attribution than is reasonably expected. And, DYK and FA articles are good examples where expectations on attribution are higher, in part because implicit claims of credit for writing by wikipedia editors (individually or as a collective) are more salient. I think the right thing to do now, though, is say, yes, this is a guideline now, and it is important to get consensus behind improving it. doncram (talk) 06:29, 29 April 2009 (UTC)[reply]
I understand what you're saying doncram, and yet: "there are important differences among us about what serves as adequate attribution, yet to be worked out" - doesn't that preclude adooption of the current text? There is near-unanimous agreement that we should have a plagiarism guideline. I'm less sure on consensus to adopt this plagiarism guideline, in that I've not seen many comments on the specific text as oppposed to approval of the general principle. Nevertheless, if promotion happens now, any future changes will be judged against the "consensus" version adopted right now as the guideline. I'm always wary of the "now or never" approach that puts stones on the ground - they're devilish hard to move around later. Franamax (talk) 07:55, 29 April 2009 (UTC)[reply]
Understood. Why don't we all conclude that the apparent consensus is indeed that there should be a guideline at least. You're a principal author of the current version, and are uncomfortable with it. I myself objected to the overly broad welcome of PD text in previous versions, and edited it to reduce that, but I recognize the current text (perhaps especially where i contributed) is not general enough or otherwise appropriate for a good guideline. I certainly accept there could be room for a good rewrite, perhaps by someone else altogether. I wonder if the authors of the Signpost article, namely Awadewit, Elcobbola, Jbmurray, Kablammo, Moonriddengirl and Tony1, could get it together to do a rewrite / proposal for a guideline. I happened to think that the Signpost article was ambitiously titled (as "Let's get serious about plagiarism"), but then it did not actually come through with any strong advocacy on what wikipedia should do about the issue. How about we ask and/or give those authors a chance to come through with a serious proposal here? doncram (talk) 09:51, 29 April 2009 (UTC)[reply]
I wasn't aware that they needed any particular invitation. Franamax (talk) 10:07, 29 April 2009 (UTC)[reply]
  • Comment: Revisions: to address some of the concerns that this proposed guideline is too long, I have done some restructuring. It did it all in one go so that it can be easily reverted and compared: [7]. I moved some of the material from the lead into a new section so that the lead would be more concise. I have restructured the "Public domain text", including removing several examples of how to format quotations, since a reference to the proper styleguides seems to suffice and we don't have to reinvent the wheel. I've tried to focus it more narrowly on plagiarism rather than addressing all matters that might relate to incorporating public domain text, but I have still attempted to note other concerns that had been previously raised (such as whether material is reliable or neutral). --Moonriddengirl (talk) 12:23, 29 April 2009 (UTC)[reply]
    Two goes. :) [8]. --Moonriddengirl (talk) 18:56, 29 April 2009 (UTC)[reply]
  • Comment - I think Wikipedia needs a plagiarism guideline, but this proposal still needs serious work. It contains a lot of contradictory information and is quite confusing, IMO, as to what constitutes plagiarism and what should be done about it. Kaldari (talk) 18:27, 29 April 2009 (UTC)[reply]
  • Oppose. Agree with Kaldari. Too long, too confusing. I can see merit in such a guideline, but this is not it. One specific point: The issue of how attribution affects plagiarism is not as clear as I would like, especially since attribution can be in-text attribution ("Smith has written that ..."), a footnote reference citing Smith's work, or both. If I write, for example, "Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was the most original new act he'd heard this decade, citing her idiosyncratic vocal style and eccentric dress sense", footnote ref to Rolling Stone given, then I can hardly be accused of plagiarism, can I? Or do I now have to write "vocal style" in quotation marks, and "eccentric", and "dress sense" too? Or should I reformulate Smith's opinion to the extent that it does not sound anything like what Smith wrote any more? The alternative is to only give authors' opinions in full verbatims. If we quote half a dozen commentators, that will be tedious reading. Jayen466 21:58, 30 April 2009 (UTC)[reply]
  • In a word... yes. If you use someone's exact phrasing, quotation marks are required. At least that's the way I was taught. Recognizance (talk) 05:55, 1 May 2009 (UTC)[reply]
Recognizance, of course I can reuse the entire quote in quotation marks, and that may often be a good way to go, if I want to communicate colour. But assume Smith wrote: "Her vocal style is idiosyncratic! Her dress sense eccentric! I love her! Amanda Palmer's the most original new artist I've heard in the entire decade." Would you argue that an editor who had inserted the summary I wrote above ("Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was the most original new act he'd heard this decade, citing her idiosyncratic vocal style and eccentric dress sense") should have put quotation marks around each word that also occurs in the original source? That would look like this:

Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was "the most original new" act he'd "heard" this "decade", citing her "idiosyncratic" "vocal style" and "eccentric" "dress sense".

Jayen466 07:48, 1 May 2009 (UTC)[reply]
  • As some of the editors above said, there is a difference between writing an academic essay and writing a Wikipedia article. One should be a product of your original thought. It should contribute something new. The other most definitely should not be a product of your original thought. To the contrary, WP:V requires that any material inserted by editors should be "directly supported" by the source, without the addition of any original analysis by yourself whatsoever. As such, the entire Wikipedia concept is built on what would be plagiarism in the academic context.
  • I would rather have more (and more prominent) guidance in WP on how much paraphrasing is necessary to avoid copyright infringement than a guideline on avoiding plagiarism. Any such discussion will also need to address how big a portion of the cited work has been paraphrased. For example, it is my belief, based on this posted earlier, that a close paraphrase of twenty sentences from a 300-page book is no reason for concern from a copyright point of view. (At any rate, I think close paraphrasing is far less of a problem than linking to copyright-infringing sites.)
  • It is obviously inappropriate to build a whole article on a single copyrighted source, reflecting the structure of the source in the article structure, reproducing 50% of the intellectual content and original thought in that source, and I am not advocating that. But on the other hand, consider that if we mention an author in our text and/or cite their work, that is also an advertisement for them, and exposure to a huge population of potential buyers out there. Google books today routinely shows a big part of people's books in Preview, and so does amazon. I really think that worrying about a close paraphrase of a few sentences from a book is somewhat disproportionate. Jayen466 07:38, 1 May 2009 (UTC)[reply]
the entire Wikipedia concept is built on what would be plagiarism in the academic context. That's not quite true. There's an organisation called Annual Reviews that publishes ... annual reviews of progress in various sciences (e.g. Halanych, K.M.. (2004). "The new view of animal phylogeny". Annual Review of Ecology, Evolution, and Systematics 35: 229-25). I don't know its rules, but the content is a review of umpteen scientist's work, although as far as I can see it allows a little more POV than WP, in emphasis rather than pure content. --Philcha (talk)
I meant in the academic context of writing an essay, thesis etc. Jayen466 09:06, 1 May 2009 (UTC)[reply]
Good to see, at last, some common sense being injected into this debate. An antidote to the lazy and closed mindsets of some academic contributors who treat Wikipedia as though we are making submissions to Nature. Let's hope this is the start of some sanity. Thank you Jayen466 :) --Geronimo20 (talk) 09:18, 1 May 2009 (UTC)[reply]
(reindent to address this point) I actually agree with a good deal of what Jayen said in terms of the differences between academia and Wikipedia, but I still maintain that "eccentric" is an example of something you'd put in quotation marks in the example given. Recognizance (talk) 18:33, 1 May 2009 (UTC)[reply]
(unindent) I don't see the common sense in asking for more guidance on avoiding copyright violation, to the extent that is different from plagiarism, in a discussion on a guideline for plagiarism. Plagiarism is different from copyvio. But, Jayen does have a good point that writing for an encyclopedia is different than writing for Nature or other academic publications. Indeed encyclopedia articles are not supposed to be original, and the implied claim of credit is lower, and the reasonable expectation of readers for providing exactness in all sourcing is lower. In an encyclopedia article, there should be relatively less footnoting and quoting, and relatively more paraphrasing. What plagiarism is, is providing less attribution than is reasonably expected for the given medium. Standards for attribution of an encyclopedia article are in fact lower than for top academic journals. The guideline should be written to cover that. P.S. The most intelligent discussion I ever read on plagiarism was: Roger Clarke, 2006, "Plagiarism by Academics: More Complex Than It Seems", Journal of the Association for Information Systems Clarke provides scales for evaluating the seriousness of plagiarism as an offense, according to five factors: "whether the plagiarism is intentional or accidental, the nature of the new work, the extent to which originality is claimed in the new work, the nature of the incorporated material, and the nature of the attribution provided." For the nature of the new work, he means nature of publication, from formal refereed papers down to unpublished, informal materials. Intermediate categories would be scholarly books, textbooks, informational brochures, newspapers, trade publications, casual publications in student newspapers or email lists or blogs. I think encyclopedia articles would be in the middle, are a lot like textbooks, where you do not see much footnoting: it gets in the way for readers who are trying to learn, and there is little/no implicit claim of originality. doncram (talk) 14:22, 1 May 2009 (UTC)[reply]
Perhaps ironically, precisely because of its injunction against "original research," but also in part because of concerns about its lack of credentials, Wikipedia articles (at least those that it presents as its "best work" at FA) are in fact much more heavily footnoted than typical academic articles. --jbmurray (talkcontribs) 16:16, 1 May 2009 (UTC)[reply]
WP is not an encyclopedia like any other, and therefore I disagree with the notion that sourcing standards should be less exacting here.
WP disclaims originality through the WP:NOR policy. It aims to offer readers an overview of what reliable sources have written. It is important to remember that this overview is compiled by a random and self-organizing (i.e. unsupervised) set of contributors, comprising mostly minors and lay people, along with a very few genuine subject matter experts. That is why WP needs footnotes. ;) They help to demonstrate "lack of originality".
I think the demand for paraphrasing should be restricted to what is legally required to avoid copyright infringement. Editors need clear guidelines on that: the need to use quotation marks for verbatims, an explanation how much is okay to quote or paraphrase closely, etc.
Beyond that, I see value in encouraging proficient writers to paraphrase well, so we arrive at a professional result in our best work. But paraphrasing should not be demanded beyond what is legally required for copyright reasons. This will enable everyone to contribute. It will also help to maintain accuracy with those editors less adept at paraphrasing. And it takes care of cases like the above one with the music critic, where it is arguably desirable to use the words the source used. Jayen466 16:12, 1 May 2009 (UTC)[reply]
Jayen, I studied writing in graduate school where a course in relevant law was required curriculum. Our instructor was a lawyer and the textbook she wrote for our course became a minor bestseller. We were very interested in knowing the amount of paraphrasing that is legally required to avoid copyright infringement and she, being quite good at her branch of law, could not answer. There isn't a mathematical formula; if it goes to trial it's a bit haphazard how decisions come out, which is one reason wise people avoid coming close to that line. Yet that belongs in a discussion of the copyright violation policy; plagiarism isn't a legal concept. It appears your quarrel is with that policy, not with this proposal. DurovaCharge! 17:41, 1 May 2009 (UTC)[reply]
I don't think you quite follow. One of the external links in the proposed guideline is to a set of pages on a Duke University site, explaining what plagiarism is. The set includes this page here: [9]. As you can see, it says: "A paper composed mostly or entirely of paraphrases from other authors is very likely to be described as 'patchworking' (discussed later in this tutorial). Even if you have cited every paraphrase correctly, you've forgotten to include your own analysis!" Basing our idea of plagiarism on such sources just isn't quite correct, because in Wikipedia's case, such a "patchwork" is precisely what we are aiming for. We don't want editors' own analysis. If I am writing an essay for my university course, producing a "patchwork" is something to be avoided. See [10]. If I am writing a WP article, producing a "patchwork" is what I'm supposed to be doing. It is absolutely clear that copyrights must not be infringed. But the academic concept of plagiarism, on which parts of this guideline seem to be based, does not fit the context of WP. WP editors are not trying to establish reputations as independent scholars and researchers, which is what the university system is designed to produce. The rest is governed by our copyright policy. So what does this proposed guideline add? Jayen466 18:13, 1 May 2009 (UTC)[reply]
I understand quite well: you read a paper on plagiarism and asked about its legal implications. That is not a fruitful avenue of query. Avoidance of plagiarism does not require breach of WP:NOR. The Wikimedia Foundation is an educational charity; its projects aim at respectability. It would be incompatible with that mission to take credit for other people's work even if that work is in the public domain. DurovaCharge! 19:21, 1 May 2009 (UTC)[reply]
I don't recall talking about the use of PD work. Some aspects of what I was talking and thinking about were competently discussed above, under #Paraphrasing_considered_harmful, by Arch dude and Moonriddengirl. Beyond that, I am all in favour of attributing and naming sources, incl. for public domain work. We are probably talking at cross purposes, so let's just leave it there for now. Jayen466 19:36, 1 May 2009 (UTC)[reply]
  • Strong support - As per Moonriddengirl and agreeing with Ottava, it should be policy. Dougweller (talk) 17:14, 1 May 2009 (UTC)[reply]
  • Support This should be either policy or guideline. One on plagiarism is long overdue, and the Wikipedia community should be informed on what exactly plagiarism is and why it should be avoided at all times. Timmeh! 22:09, 6 May 2009 (UTC)[reply]

Problematic passages

Along with Franamax above, I find the references to FA and DYK out of place in a proposed guideline. Next, here some examples of wordings in the existing version that seem confusing or unhelpful:

  • "In some cases, it is not necessary to cite a source or sources. For example, stating "common knowledge" may not be plagiarism (though in certain circumstances, it may be)." – It may, or may not. Wanna flip a coin? Removes confidence and certainty, rather than inspiring it.
  • "An easy way to test for plagiarism of online sources is to cut and paste passages into a search engine. Exact matches or near matches may be plagiarism." – We are clearly talking about unsourced material – otherwise there is no need to use a search engine to find the source. Copyright violation and verifiability are the primary policy issues here, and are already addressed in the relevant policies. Redundant.
  • "The names of some such programs and services for which Wikipedia has articles may be found at Category:Plagiarism detectors. Wikipedia does not endorse any of these or certify their accuracy." – Then why mention them? Too much information.
  • "It can also be useful to do a direct comparison between cited sources and text within the article, to see if text has been plagiarised, including too-close paraphrasing of the original." – If the material is attributed to cited sources, it is not plagiarism. It may still be a copyvio, based on fair-use considerations, proportion of material taken, presence or absence of quotation marks around direct quotations, etc. We already have a policy for that.
  • "An editor's reputation may also be beneficial in helping to evaluate plagiarism." – No. If multiple FA author X applies close paraphrase in service of his POV, it is okay. But if his novice POV opponents do it, then it isn't. A new gambit for content disputes: "Your stuff was plagiarism! I've deleted it."
  • "Sometimes material from a copyrighted work is copied into Wikipedia with minimal rewriting. This may still be a violation of copyright as a derivative work, and the same concerns about plagiarism would apply if the phrases, concepts and ideas in the copied material are not attributed to the original author. If the text follows closely enough on the original in structure, presentation, and phrasing to raise copyright concerns, handle it as a copyright violation. If it does not, address it as plagiarism." – Is it just me? It all seems so ... hypothetical.
  • "Direct copying of copyrighted works may be a copyright violation." – May be??

Overall, the proposed guideline says a lot that is already spelt out in WP:V, WP:NOR, WP:CS, WP:COPYVIO, etc., only it says it in a way that is much less clear. Instead of just saying, "Unsourced material is bad", it says "unsourced material is plagiarism". And yes, there is useful stuff here too, but right now the reader has to work too hard to extract it. Jayen466 21:35, 1 May 2009 (UTC)[reply]

I think the "editor's reputation" bit is a nice way of saying that newer editors sometimes don't understand how the site works. If you read something suspiciously elegant and (say) Casliber or Geogre wrote it, you'll probably save yourself a lot of time by not following it up. If the editor name shows up in red (or they've already been caught copy-pasting), it may be worth investigation and some gentle correction. Of course, if any editor is found to be plagiarising, no matter how many articlestars they have, their reputation is going to take a big hit.
At the risk of being told once again that I just don't understand, I'll say that plagiarism and copyvio are two sides of the same coin. However, one is a moral issue and one is a legal issue. This -guideline- needs to minimize its concern with copyvio. However, adequate paraphrasing is an equivalent concern in both areas. The difference is that extensive verbatim copying of PD and GFDL text is acceptable - provided that it is properly attributed. Same goes for media. We originally largely focussed on how to handle free text, but the mission may have crept along the way.
"Common knowledge" can be evaluated as plagiarism in much the same way as copyvio: copy-pasting a company address or list of directors is neither copyvio nor plagio. Same with a list of moons in the solar system. The borderline is when you copy a unique style of organization of common knowledge - say, ordered by how often each moon is blue. In that case, we would require more than a footnote if you copy-paste the "list of moons by frequency of blueness" and would want a PD-attribution template. That's my view anyway.
Generally agree with your other points. Franamax (talk) 22:13, 1 May 2009 (UTC)[reply]
Generally agree with Jayen and Franamax, except Franamax's conclusion, too confident that adding a PD-attribution template is helpful. Why on earth not give a specific footnote attribution of the PD source for blue moons, rather than tarring the entire article with vague suggestion that anything and everything in the article might be copy-pasted from the PD source. Original research, unresearched/poor claims may be present or may creep into the article, and seem to be supported by the too-general PD template. The indiscriminant use of PD templates lowers the quality (defined as fitness for use, citability, etc.) of wikipedia articles. No one could/should quote a wikipedia article, featured or otherwise, that has a PD source: are you quoting the collective of wikipedia editors or are you quoting an idiosyncratic yet PD source whose material was pasted into the article? Why on earth not give the specific attribution for a specific part of an article, when you know what is the the specific source? This repeats some comments i have made in some previous discussions cited in this Talk page or its archives, sorry about that. So, Franamax and i, anyhow, differ about utility of PD templates. doncram (talk) 00:26, 2 May 2009 (UTC)[reply]
There is no reason why the use of PD templates cannot be accompannied by footnote attribution. I'm supportive of a three way attribution method myself: Attribution template, footnote and edit summary note. Also, so long as the article is properly cited, I don't see how it would be too dificult to tell which parts were imported and which weren't (I do like the idea of linking to the revision that inserted the imported text in the attribution template though) 189.105.83.163 (talk) 13:46, 2 May 2009 (UTC)[reply]
There are editors who charge other editors with "plagiarism" at the drop of an unparaphrased phrase, even though the text in question has been attributed and otherwise paraphrased. If editors are to be subjected to this serious charge, with its implications of fraud, lying and theft, then there needs to be a policy determining when it is appropriate. There should be sanctions to prevent editors carelessly using the term as a destructive instrument of abuse. --Geronimo20 (talk) 23:34, 1 May 2009 (UTC)[reply]
I am still concerned that we are using and citing definitions of plagiarism that apply to the academic arena, which teaches people to do original research. Some of these definitions fly directly in the face of what all our policies tell our editors they should do. I think if an editor researches a source and conveys what it says, in a properly attributed manner that respects the intellectual property rights of the source author, we should say "thank you" rather than laying them open to charges of "intellectual plagiarism".
The standards we should apply should be based on those used in newspaper reporting, rather than those used in academia. That is a better equivalence. I will do some research on the kind of rephrasing quality newspapers do when referring to the content of copyrighted works and post examples on the WP:Close paraphrasing talk page. Jayen466 10:24, 2 May 2009 (UTC)[reply]
Further commentary and examples of paraphrasing in the New York Times and The Independent posted here. Jayen466 19:47, 2 May 2009 (UTC)[reply]
While I'm unsure why newspapers would be more appropriate for us as a model than, say, textbooks & encyclopedias, rather than doing research by looking for examples of close paraphrasing in existing publications (granting that not all journalists even at prominent publications are quite in line with standards), why not look for professional publications address close paraphrasing in journalism? Would we balance your examples with examples where it doesn't happen? --Moonriddengirl (talk) 20:06, 2 May 2009 (UTC)[reply]
As mentioned below, academic definitions of plagiarism stress the importance of demonstrating independent thought. That is not really an issue here, because of what Wikipedia is. That is why I thought that newspaper standards might be closer to our situation than academic definitions of plagiarism. But yes, professional publications addressing close paraphrasing in journalism might be useful. It also seems possible that attitudes to paraphrasing differ from country to country – one of the books we recommend has an interesting section about France which I dipped into. Another editor earlier on mentioned Annual Reviews summarising recent research – it would be interesting to see how they go about this, whether they reuse the researchers' own original expressions, or whether they paraphrase extensively, avoiding close paraphrase. The compilation of such reviews, too, parallels our work in some respects. Jayen466 21:36, 2 May 2009 (UTC)[reply]
But textbooks & encyclopedias don't, and they are I would think a far closer corollary to us than newspapers. Not that this might matter, if we can't find sources addressing the ethics of plagiarism in any of them. Attitudes towards paraphrasing differ not only country to country, but discipline to discipline, which does make a challenge creating suitable guidelines. Wikipedia may be forging new ground, even among encyclopedias, textbooks & newspapers, both because of our inclusiveness (no professional code of ethics already created for us) and our lack of review. --Moonriddengirl (talk) 22:11, 2 May 2009 (UTC)[reply]
Some of our articles on popular culture (bands etc.) are more like newspaper reporting. For articles on scientific topics I agree textbooks and encyclopedias are the better model. Jayen466 20:21, 3 May 2009 (UTC)[reply]
Jayen, I agree with your example of unsuitable commentary above and just went on a search-and-destroy mission to eliminate the "patchworking" reference - but I can't find it! Can you point out the link-chain that led you to this? IMO it is completely wrong as far as our mission here goes. Franamax (talk) 20:34, 2 May 2009 (UTC)[reply]
It's from one of the resources given under Further Reading: "Duke University Libraries. "Citing Sources: Documentation Guidelines for Citing Sources and Avoiding Plagiarism". Duke University Libraries, (last modified) 2 June 2008. Web. 12 Mar. 2009. (Provides hyperlinked "Citation Guides" pertaining to the most commonly-used citation guidelines, including parenthetical referencing; includes: APA, Chicago, CSE, MLA, and Turabian style guidelines; such style guides define plagiarism and how to avoid it.)" Most of these resources given under Further Reading, from what I saw, take a similar line to this one – i.e. stressing the importance of demonstrating independent thought. Jayen466 21:18, 2 May 2009 (UTC)[reply]
  • First off, I've just wandered through the -guideline- and done some trimming, rewording &c, including some modification of MRG's previous excellent work. So the hedge has changed shape a bit and as always, review of my edits is welcome!
  • doncram, again we agree and disagree at the same time. Above or maybe elsewhere you can find my proposals that the PD-attrib templates be modified to clearly indicate the exact diff where PD-text was inserted. This is crucial to me - show which exact text was copied from a free source. The subsequent changes are down to your wiki-skills, no more or less than if you want to see how much your own or my words have changed since we originally added them. It comes down to whether you know how article history works. As an object lesson, track the genesis of what I originally wrote here and what exists now (I'm fairly confident I can claim continuing authorship for "the", "they" and "also":).
  • more to don, yes, definitely copy-pasting in free text "degrades" the quality of our articles. I have no argument with you there, there is almost no case where insertion of PD-text does not degrade articles in terms of tone or style. However that doesn't apply to articles which haven't been created yet, and I don't think it applies to stubs either. Massive expansion is part of what we do here, painful as it may be for the perfectionists (BTW, I am a confirmed perfectionist, so you may be preaching to the choir :)
  • Further, I think I detect a subtle bias toward FA/GA/DYK in your comments. I'm not sure, and I wholeheartedly agree with you in any case that the article improvement and editor recognition processes are very important to what we do here. I just don't think they should have primacy over article expansion. We're nowhere close to finished yet, all we've done so far is all the Pokemon characters. As always, I'm thinking trees and insects. I disagree with you (quelle surprise:) that a PD-attrib template "degrades" an article. Rather, it is the actual content of the article that degrades it. To me, as long as the original PD-injection is clearly identified, we can always reach the point in the editorial process where that template can be removed. That is actually one of the unresolved topics of discussion here - can you and/or when can you ever remove a PD-attrib template? We never actually got agreement on that,
  • Geronimo, yes absolutely the charge of plagiarism is fraught with implication and needs to be handled in a very sensitive way. There is some bold text in the lede (which I've lately tweaked a bit) to emphasize that point. Do you wish some even stronger text? Perhaps you could propose the specifics here, or start a new section? Franamax (talk) 02:24, 2 May 2009 (UTC)[reply]


  • I could defend either position, that you can remove a PD template when you use a more specific reference that specifically cites all the material used, and includes the date of the original publication of the material, or that it should remain permanently as a warning that some of the material may be outdated. But this applies to such things as the old EB; there are also current PD sources from the US government or the like which do not have the same objection. The one thing which is not acceptable is the use of such sources without exact indication of the material which has been copied. DGG (talk) 17:38, 3 May 2009 (UTC)[reply]
  • Support I've come across plagiarism before a few times and it would be useful to have a guideline that specifically discusses the issue. In prtactical terms it seems to do little or nothing to change existing policies. Some sections could be better written but that's a detail that needn't hold up its promotion to guideline. Of course it should really be a policy, but one step at a time! AndrewRT(Talk) 00:06, 10 May 2009 (UTC)[reply]
  • Support. Looks good. Is clearly helpful to beginning editors. Definately a good idea to give it prominence. It doesn't have to be perfect to label as {{guideline}}. --SmokeyJoe (talk) 11:53, 12 May 2009 (UTC)[reply]

Guidelines on charges of plagiarism

There have been calls to clarify when charging a wikipedia editor with plagiarism is fair or not. I think this can be clarified in a guideline by following the framework for evaluating plagiarism by academics set forth by Roger Clarke in published version "Plagiarism by Academics: More Complex Than It Seems" and in preprint version with more weblinks "Plagiarism by Academics - A More Complex Issue Than It Seemed". I highly recommend reading Clarke's discussion of ways in which plagiarism is culturally bound (a more Western concept) and how music and other artistic expression often includes explicit copying or more veiled references that might be spoiled by heavy-handed referencing, and more fascinating thoughts.

I am working with the basic definition that plagiarism is a state of under-attribution of a work. In simple terms, a work is plagiarized if the degree of attribution present is inadequate, relative to what is reasonably expected for the work, by the most readers of the work, and/or by the typical reader of the work. We can obviously disagree on what is the right degree of attribution required, so we will often disagree on whether a work is plagiarized or not. Where we can recognize that disagreements are honest and reasonable, I think we should usually avoid calling another wikipedia editor a plagiarist. Also, I think we should usually comment on the article or the action taken ("the article is plagiarized" or "the edit taken by this editor amounts to plagiarism") rather than comment on the person (be slow to say "the editor is a plagiarist"). But I think that we should embrace the language of calling out plagiarism when we see it, where the term is apt.

Clarke sets forth that plagiarism by academics is more or less serious according to five factors. Applying those to wikipedia setting, I think that plagiarism is more serious in wikipedia when it appears towards one end of each of five scales:

1. intentional vs. accidental: when the plagiarism is more intentional than accidental

2. salience, or nature of the work: on the high end, when the plagiarism appears in an article that is included in a Wikipedia 1.0 or whatever version of wikipedia that is released on CDs, or slated to be published that way, or if it is in a Featured Article, a Good Article, or an article nominated for FA, GA, or DYK. On the low end, when an article is very rough, new, in a Userspace sandbox, or marked with {{Underconstruction}}

3. claim of credit: on the high end, when the extent to which originality is claimed is higher (as when the article is nominated for FA, GA, or DYK, or as when an editor claims on her/his userpage that s/he wrote or contributed to the article. The claim of credit can be explicit or implicit.

4. nature of incorporated material: when it is higher on the scale provided by Clarke:

   1.  verbatim or near-verbatim copying of:
          * an entire work (e.g. a book, book chapter or article);
          * a substantial part of a work (e.g. a section; or the diagram, image or
             table around which an entire work revolves);
          * segments of substantial size (e.g. paragraphs);
          * segments of moderate size (e.g. sentences);
          * novel or significant segments of small size (e.g. clauses, phrases, 
             expressions, and neologisms); 
   2. copying of ideas that are highly original;
   3. paraphrasing of segments of substantial size, without new contributions;
   4. paraphrasing of segments of moderate size, without new contributions;
   5. verbatim or near-verbatim copying of unremarkable segments of small size (e.g. 
       clauses, phrases, expressions, and neologisms);
   6. paraphrasing of segments of small size, without new contributions;
   7. copying of ideas that are somewhat novel;
   8. paraphrasing of segments of substantial or moderate size, but which include 
       new contributions;
   9. copying of the structure of the document, or of the argument, or of the sequence 
       of information presentation or 'plot';
  10. copying of ideas that are long standing. 

(copied directly from Clarke, http://www.rogerclarke.com/SOS/Plag0602.html This passage is copyrighted material of Roger Clarke, per http://www.rogerclarke.com/CNotice.html. I believe that this is fair use to state this much here, but I am trying to contact Clarke about that. If not, then this might technically be copyvio, although it would not be plagiarism because it is explicitly attributed. doncram (talk) 23:19, 2 May 2009 (UTC)) [reply]

5. nature of attribution provided: when the clarity of attribution is lower. For example, when there is no attribution, or when just a general PD template is present, rather than explicit footnotes following each specific idea in material copied plus use of explicit quotation marks for any creative wording.

Therefore, if I see an article involving copy-pasted material, 1) where the principal editor is experienced and known to be aware of guidelines, 2&3) where the editor is putting the article forward for DYK credit, 4) where the article includes long verbatim copied passages of distinctive, original, creative material, and 5) credit to the source is only given as just an External link, then I will call that blatant plagiarism. If the editor has repeatedly done this, despite being called out on it before, then eventually it is fair to call the editor a serial plagiarist.

On the other hand, if I see an article involving copy-pasted material where 1) the editor is a newbie, 2) the article has just been started in a sandbox, 3) the editor has not bragged about it anywhere, 4) there has been extensive reworking of the material into paraphrases which no longer include distinctive wording (except for quoted phrases), 5) there is explicit footnoting and use of quotations where appropriate, then I will not call that plagiarism. doncram (talk) 22:56, 2 May 2009 (UTC)[reply]

Accidental plagiarism in a wiki environment

The peculiar nature of the multi-editor wiki environment makes a rare form of accidental plagiarism possible. Let me see if I can lay out the steps that produce this:

  • (1) Editor A paraphrases source Z to add a paragraph or section to an article, and cites the reference. During this paraphrasing process, they present the same information in a different way, and deliberately omit some details mentioned in the source (for fear of engaging in wholesale copying from the source).
  • (2) Editor B comes along later and reads the article and reads source Z. They see that some of what is in source Z is not mentioned in the article, and so they add this information to the article, citing source Z.
  • (3) Editor C comes along and thinks that the section written mostly by editors A and B doesn't flow very well, so editor C rewrites the section to make it flow better and changes some of the words used as well, unintentionally changing the article to use some of the words used in source Z.
  • (4) Editor D comes along and reads the article and then goes to check source Z to verify what has been written in a particular section (the one jointly written over time by editors A, B and C). Editor D notices that this particular section of the article is very similar to the cited passage in source Z, and suspects plagiarism has taken place.

Has plagiarism taken place? Is this a copyright violation? Is this a problem unique to Wikipedia? What went wrong? My view is that editors B and C should have read source Z before making the changes they did, but in practice it is hard to hold people to that. I'm uncertain whether this phenomenon of articles regressing from paraphrasing back towards the wording used in the source, due to the nature of editing distributed over time and different people, is rare or not. But I believe it is certainly possible. Carcharoth (talk) 00:23, 3 May 2009 (UTC)[reply]

I think that kind of process is indeed possible. I think we should be clear that the article is plagiarized, though, however it got there, if its current state is such that it is under-attributed relative to reasonable expectations for what attribution should be. I think we have to leave editor intent out of a useful definition of whether a wikipedia article is plagiarized or not. Also, it is further irrelevant whether editors in the previous edit history were very clear with their edit labels and their original paste-in or not. The prior edit history is irrelevant. We need to be able to judge an article as it appears, comparing to the sources from which it was developed. You've described a situation where the article is plagiarized, although I agree no editor should be called a plagiarist for their role in it getting there. It is our job as a collective of editors to try to prevent such situations from arising frequently though. For example, I believe that paste-ins of PD text plus use of PD templates will often tend to lead to plagiarized article situations, while a different guideline on how to introduce PD text would work better in avoiding future plagiarized state situations. doncram (talk) 00:54, 3 May 2009 (UTC)[reply]
Possible, yes, but highly improbable and not worth worrying about. --SmokeyJoe (talk) 11:58, 12 May 2009 (UTC)[reply]

Self Plagerism

I am new to the community although I have made a substantial number of edits and corrections over the years where I had information.

Forgive me if I transgress on the culture, but I think that one aspect of Plagerism is being overlooked.

The area is where one plagerizes from one's self. Here is an actual example that I faced a little while ago when editing an article.

The article had very little detail about inportant aspects and did not even include some critical material. I had authored a book which had covered some of the missing material. To invest the time to write original material to fix the article was more of a time comittment that I was willing to make at the time. However, it would take little time to just take appropriate passages from my book and include it in the article.

I considered the idea of quoting myself and in trying to keep faith with the non-commercialization philosophy of Wikipedia, I decided against it. In a way it was a promition of my book, which if purchased would yield me royalty money. Further, it might seem like self aggrandisment.

I did not think that there was a copyright issue as I owned the copyright to the original book, and if I wanted to use the material and make it available subject to the Wikipedia license, that is my legal right.

However, I did use copyrighted material and did NOT give any attribution.

I believe that the policy under consideration should have a general exception where the contributor is using material which he has written and owns.

In contrast, I have also added material to articles where I have provided my own publication as a source documentation. But I have reserved this for the documentation of a quotation where my material was one of the major references if not they major reference in the field.

As Wikipedia matures, experts and authorities who were initially very skeptical of the Wikipedia concept had altered there opinion and see the benefits as far outweighting the shortcommings. It would seem that this is a good thing. I for example have been paid by Encyclopedias Brittainica, Americana, and the New Book of Knowledge to write "authorative" articles. For those areas where I have some expertise, it is desirable to contribute to Wikipedia. But when I sign my name to an article, I am responsible for the accuracy of the article. Where the article is unsigned and I or some other expert might be one of many contributors to the article there needs to be some guidance about quoting ones self.

LDEBarnard (talk) 18:08, 3 May 2009 (UTC)[reply]

With respect to using copyrighted material without giving attribution, please see Wikipedia:Donating copyrighted materials. You should provide verification of permission for the text. --Moonriddengirl (talk) 18:21, 3 May 2009 (UTC)[reply]

Plagiarism or excellent article?

Our FA on Rabindranath Tagore extensively cites a single work, Dutta & Robinson. Is this plagiarism, or is it just an excellent article, based on the most authoritative source available? Jayen466 19:22, 3 May 2009 (UTC)[reply]

If the source is cited, it is not plagarism. There may be other problems, but not "plagarism". --SmokeyJoe (talk) 12:00, 12 May 2009 (UTC)[reply]
Without access to the text, how is one to know? Google book, unfortunately, doesn't offer a preview. :/ --Moonriddengirl (talk) 19:41, 3 May 2009 (UTC)[reply]
I thought there might be a potential problem of "substantial taking", independently of the quality of any paraphrasing, on any topic where there is really only one authoritative standard work. For example, this might be the one available, definitive biography of a minor historical figure, or perhaps also a book or paper by a highly-regarded theoretical physicist who is considered to have written the most authoritative work on a particular topic. For an apparent example of the former, see this GA: Hugh Trenchard, 1st Viscount Trenchard. More than two-thirds of its 150 footnotes are to Boyle 1962. Even assuming there is good paraphrasing, most of the article is clearly based on that work. By some of the academic standards we have discussed here, this is plagiarism. Jayen466 21:11, 3 May 2009 (UTC)[reply]
From a plagiarism standpoint, if the paraphrasing is adequate and attribution is good, I don't think we can have much of an issue. But given the length of this page, would you mind pointing out the specific in this conversation that you think might make it so? :) --Moonriddengirl (talk) 21:21, 3 May 2009 (UTC)[reply]
Sure. A source you posted earlier gave a test for plagiarism (1) and a test for "substantial taking" (2). Substantial taking is one of the indicators of copyright infringement. The tests were worded as follows:

(1) You may not escape plagiarism even if you give attribution. This happens when you copy or paraphrase excessively so that the work as a whole is clearly not your own although you give attribution. You are presenting someone else’s work as your own. [...]
(2) How do you decide in practice whether there has been a ‘substantial taking’? One practical test is that your quotes or paraphrasing from a particular source should not be a substantial portion of your work or a substantial portion of the copyright work. This can be measured by quantity (number of words) or quality (relative importance in the copyright work and/or your work of the portion quoted/paraphrased).

Of course, substantial taking is just one half of the copyright infringement test. The other half is "fair use", also discussed in the cited text. I felt confident there until you pointed out to me that our licence allows any and all re-use of our texts, including commercial use. Jayen466 21:49, 3 May 2009 (UTC)[reply]
Shouldn't this be strictly a copyright test though? In my view, relying heavily on a single work is not plagiarism if you make clear what you are doing. This again brings up the theme here of the putative difference between academic plagiarism and "Wikipedia" plagiarism. Wikipedia-wise, if only one source is available to create an article, you still go ahead and create the article. If it's a copyrighted source, you get into the whole "substantial taking" and "fair use" thing - but think of it instead as basing a wiki-article on a book from 1895. Would that be plagiarism? This teases out the copyvio issues from the plagio issues. Franamax (talk) 22:16, 3 May 2009 (UTC)[reply]
I'd say using the 1895 book would be plagiarism if the book is not acknowledged. It should be cited like a normal source and, if used as a basis for long sections or indeed the entire article, acknowledged in the text. No pressing need to otherwise reformulate what it says though. E.g.: "According to the Victorian Encyclopedia of Entomology (1895), the Lesser Frisian Springtail is a unique member of the springtail family characterised by ..." Jayen466 00:49, 4 May 2009 (UTC)[reply]
(edit conflict) Ah, I see. Well, we should be clear on the distinction between copyright and plagiarism; the two may overlap, but (obviously) are not the same. Substantial similarity is a legal term with specific points of definition in copyright law. It's covered, albeit briefly, in our copyright FAQ. It isn't a test for plagiarism. (As an aside, I believe your "two tests" are conflating two separate parts of that document. Your first test isn't a test; it's drawn from page two (and is listed as "secondly.") It's just an authorial point. The actual "test" is from page 3, and it's step 1 of "TWO separate stages or tests in determining whether copyright infringement occurs.") (emphasis added).
As for copying or paraphrasing so excessively that the work is not your own even if attributed, which I suppose is where the concern arises about over-reliance on a source (and hence fear of substantial taking): I think we need to look at the standards of encyclopedias as set out by the American Historical Association, at least, in its Statement on Standards of Professional Conduct:

Of course, historical knowledge is cumulative, and thus in some contexts-such as textbooks, encyclopedia articles, broad syntheses, and certain forms of public presentation-the form of attribution, and the permissible extent of dependence on prior scholarship, citation, and other forms of attribution will differ from what is expected in more limited monographs. As knowledge is disseminated to a wide public, it loses some of its personal reference. What belongs to whom becomes less distinct. But even in textbooks a historian should acknowledge the sources of recent or distinctive findings and interpretations, those not yet a part of the common understanding of the profession.

Unlike many scholarly works, compendiums such as textbooks and encyclopedia articles are not presented as our own. Properly attributed facts are not a problem, even if over-reliance on language is. I would be uncomfortable including "distinctive findings and interpretations" without indicating where these came from, but basic facts with in-line citations seem fine.
Of course, your point 1 is quite within my own definition of plagiarism where it comes to close paraphrasing...insufficiently revising the creative elements of presentation. It may be a legal concern if the text is copyrighted, and it may also be plagiarism. It's just not that I can see a concern when it comes to sources of information...not if these are cited and (where necessary) attributed in text. --Moonriddengirl (talk) 22:35, 3 May 2009 (UTC)[reply]

(outdent) The document by Roger Clarke here, already referenced by Doncram above, includes a case study on plagiarism in textbooks. It states that "The literature search yielded a disappointing quantity and quality of guidance. The scope of most references was narrow, and very few directly addressed textbooks." It quotes the AHA document you quoted above, as well as an essay by the religious scholar, Hexham.

The Hexham quote Clarke gives may be of interest: "Many basic textbooks contain passages that come very close to plagiarism. So too do dictionaries and encyclopedia articles. In most of these cases the charge of plagiarism would be unjust because there are a limited number of way in which basic information can be conveyed in introductory textbooks and very short articles that require the author to comment on well known issues and events like the outbreak of the French Revolution, or the conversion of St. Augustine, or the philosophical definition of justice. Further, in the case of some textbooks, dictionaries, newspaper articles and similar types of work both space and the demands of editors do not allow the full acknowledgment of sources or the use of academic style references. ... It ... therefore seems necessary to distinguish between academic and other types of writing and to ask what is the reader led to believe an author is doing. If a book or thesis contains academic footnotes, is written in an academic style, and is presented as a work of original scholarship, then it must be judged as such and measured against the accepted rules for citation" (bolding added).

Clarke adds, "Given how trenchant Hexham is in his condemnation of plagiarism in scholarly work, the distinction he draws between criteria for scholarly and textbook writing is telling."

Under Exhibit 1, Clarke then gives the following guidelines:


The approach to incorporation and attribution in a textbook should:

  • avoid citations intruding into the presentation in such a manner that they detract from the primary pedagogical objective;
  • avoid not only express claims of originality, but also implied claims, and language that could mislead the intended audience into inferring that the work is original; and
  • provide ready access to works on which the author has drawn heavily.

Generally, incorporation should avoid the use of quotation marks, because these intrude too much. On the other hand, the use of verbatim, near-verbatim and close-paraphrase passages imposes yet greater expectations on the author in relation to attribution.

In the case of generic attributions to well-known authors (e.g. Piaget, von Neumann, Newton), and of well-known and well-documented quotations used in section and chapter headings (e.g. Keats, Martin Luther King), it may be reasonable to name the author, but nominate no specific work. Generally, however, attribution should be achieved through one of the following mechanisms:

  • Harvard-style citation, perhaps without page numbers. This approach adds to the length of the text, but minimizes the interruption of the flow;
  • numbered footnotes or endnotes. These have much less impact on the length of the text, but are nonetheless disturbing to the reader because of the uncertainty as to whether the note contains information of relevance, and hence as to whether the break in concentration is warranted that is involved in a diversion to the note;
  • no citation within the text, but attribution to the source in notes at the end of each chapter or the book as a whole. A refinement to this approach is to include within each endnote a key to the page number and line number in the text where the source has been used;
  • mention of the name of the author at the beginning of the relevant segment of text, or perhaps within the relevant segment of text, and inclusion of a reference at an appropriate point elsewhere in the publication;
  • mention in the Preface or Introduction of the authors and works used as sources during the preparation of the book.

Precise descriptions of all works to which attribution is given need to be provided. The alternatives are listed below, commencing with the most preferable:

  • a Further Reading, Recommended Reading, and/or Primary Sources List at the end of each chapter or section, which contains all works that were drawn on during the preparation of that segment. Particularly important references can be supplemented with annotations;
  • a single Reference List at the end of the book, which contains all works that were drawn on during the preparation of the book;
  • a Bibliography at the end of the book, which contains both works that were drawn on during the preparation of the book and works that were not.

Again this notes that the use of close paraphrase should go hand in hand with more explicit attribution. Of course, numbered footnotes are standard here in WP, so Clarke's comments on their disturbing the reader are less relevant. In my view, any use of close paraphrase or direct quotation should be accompanied by (1) naming the author in the text (2) adding a footnote reference to the specific work and page number at the end of the quotation or paraphrase and (3) listing the work with full publication data among the References, if the article has a separate reference section. Clarke's suggestion not to use quotation marks for verbatim quotes seems appropriate enough for textbooks, but is inappropriate for WP, I think. WP should use quotation marks around direct quotations. Jayen466 23:40, 3 May 2009 (UTC)[reply]

Wikipedia should use quotation marks around all text copied from PD sources and it can never be subsequently modified? What form of "quotation mark" shall we use for media copied from PD sources and processed through image enhancement software?
This goes to the heart of the matter. Can we or can we not freely incorporate and modify text and media which have been placed in the public domain? Some say no, it must remain inviolable within quotations. Others say it can be freely modified so long as it is properly attributed. Remember, we are not talking about copyright violation here - we already have that subject covered. Franamax (talk) 00:15, 4 May 2009 (UTC)[reply]
I'm sorry, I mostly think of copyrighted sources in these discussions. I did not mean to give an opinion on how to treat PD text. If asked for one, I would say modify, rephrase and expand freely, but cite each individual sentence based on the PD source. In other words, cite it as if it were a copyrighted source, but you don't need to paraphrase or restate to prevent copyright infringement, and you don't need to place quotation marks if it is a verbatim quote from a PD source. However, try to mention the source in the text: "According to the 1911 Encyclopedia Britannica edition, ..." if it can be elegantly done. That would be my 2 cents. Jayen466 00:32, 4 May 2009 (UTC)[reply]

Clarke includes a section on "Necessary or Inherent Plagiarism":

The case study of textbook plagiarism in the previous section has relevance beyond textbooks alone. Parts of many other kinds of publications are intended "to make existing knowledge accessible" and to "address a particular market need" or "a particular audience." In particular, various sections of scholarly works such as refereed journal articles and conference papers, theses, and academic monographs, have an expository purpose, in relation to pre-existing knowledge.
The preliminary sections of many works comprise the recitation of existing bodies of theory, in order to set the stage for extensions to, criticisms of, and/or testing of, that theory. If such recitations stray too far from the words used by prior theorists, then the author of the new work would be subject to accusations of misrepresentation or at least inaccuracy. Hence it is very challenging to 'use one's own words' while being faithful to the sources. Paraphrasing and generic attributions are therefore tolerated. The context in any case implies that little or no originality is being claimed. There is accordingly tacit acceptance of practices that would otherwise be castigated as plagiarism.

Jayen466 05:49, 4 May 2009 (UTC)[reply]