Wikipedia talk:Close paraphrasing

WikiProject Essays
This page is within the scope of WikiProject Essays, a collaborative effort to organise and monitor the impact of Wikipedia essays.
 High  This page has been rated as High-impact on the project's impact scale.

Creative expression[edit]

I have restored the Belloc examples, which seem to have been removed with little or no discussion. To simply say "the test of creativity is minimal" gives no guidance to editors. The most common forms of creativity in non-fiction are fanciful words, figures of speech, metaphors etc. Examples help, particularly one that shows that a violation of creative expression may use completely different words from the original. I propose to also add a paragraph on translation from foreign languages:

A literal translation from a foreign language is a form of paraphrase, since all the words or phrases have been replaced with equivalent English-language words or phrases. This may or may not be acceptable, depending on whether any creative expression – anything other than simple statements of fact – has been taken from the foreign language source. For example, consider two literal translations from the Turkish language:

  1. "Seen through smog, the sun appears red"
  2. "The sun looms through the haze like a red omen"

The first is a simple statement of fact and should be acceptable. The second carries over the figurative expressions "looms through" and "like a red omen", so presumably is not acceptable despite using completely different words from the original.

Comments? Aymatth2 (talk) 01:28, 20 June 2015 (UTC)

  • Since there is no objection, I will make the change. I think it is important to give examples, but any ideas on better examples would be welcome. Aymatth2 (talk) 00:06, 29 June 2015 (UTC)
    • Apologies for not seeing the note. I'm not comfortable with the example given for several reasons and have removed that bit. The first sentence contains fact, but has creativity in formulation. The facts can be expressed in many ways. One could say, "The sun appears red through smog" or "Smog changes the light filtering from the sun so that it appears red" or "Smog adds a red appearance to the sun", for instance. The spark of creativity may be minimal, but it exists. A literal translation of a single sentence is not likely to be much of a problem, but the more you translate and the more closely you translate, the more likely you are to create a copyright problem. I think more nuance is required there to avoid misleading readers of this essay. --Moonriddengirl (talk) 13:23, 29 June 2015 (UTC)
      • I agree, the example is not really addressed to the problem at hand. bd2412 T 13:38, 29 June 2015 (UTC)
        • Usually a "literal" translation is not word-by-word but includes changes to sentence structure and sequence. The French tend to put adjectives after nouns rather than before (an omen red), and the Germans to put qualifying clauses at the front, and the main verb last (Though smog seen the sun red appears). I agree there maybe should be an added caution. But examples are also really helpful. How about the following?

Translation from a foreign language is a form of paraphrase, since all the words or phrases have been replaced with equivalent English-language words or phrases. This may or may not be acceptable, depending on whether any creative expression – anything other than simple statements of fact – has been taken from the foreign language source. For example, consider two translations from the Turkish language:

  1. "Istanbul is a large city"
  2. "The sun looms through the haze like a red omen"

The first is a simple statement of fact and should be acceptable. The second carries over the figurative expressions "looms through" and "like a red omen", so presumably is not acceptable despite using completely different words from the original. But even if you only carry across statements of fact, the more you translate and the more closely you translate, the more likely you are to create a copyright problem.

@Moonriddengirl: @BD2412: Comments? Aymatth2 (talk) 15:46, 29 June 2015 (UTC)
I would be comfortable with that, Aymatth2. --Moonriddengirl (talk) 00:37, 30 June 2015 (UTC)
  • Since nobody else chimed in, I have made the above change. Aymatth2 (talk) 18:11, 7 July 2015 (UTC)

Hillaire Belloc example[edit]

But use of the phrases "indolent expression" and "undulating throat" would violate copyright. - it most certainly would not. To quote from the Wikilegal "the amount and substantiality of the portion used in relation to the copyrighted work as a whole"

Moreover the phrases have both been used before: E.G.

I love thy mellow note,
Pealing, so beautifully, from the spray
Whereon thou sitt'st with undulating throat,
Chaunting thy matins to the dawning day.

(1824) All the best: Rich Farmbrough, 19:08, 5 November 2015 (UTC).

  • The phrase "indolent expression and undulating throat", which defines the similarity between the two types of beast, may be seen as the essence of this short work. An article using the terms could be said to have "appropriated almost verbatim the most creative and original aspects" of the work. See Wainwright Securities Inc v. Wall Street Transcript Corporation (18). We should warn editors to avoid copying any fanciful figures of speech. A lawyer friend of mine says you never know what a judge will decide. Best to err on the safe side. This is meant to be an encyclopedia. Boring. As for the thrush poem, I am generally opposed to burning books, but in this case ... Aymatth2 (talk) 23:11, 5 November 2015 (UTC)
  • "You never know what a judge will decide" is a very different animal from "would violate copyright." --Moonriddengirl (talk) 12:44, 6 November 2015 (UTC)
  • Maybe it should say "could violate copyright" then. It is exactly the sort of similarity of phrasing that typically gets jumped on at DYK, with good reason. A short phrase cannot in itself be copyrighted, but using the same uncommon phrase in the same context (e.g. description of a lama) suggests copying. The judge then decides whether there is "substantial" copying of the creative part. She will not say "anything up to 23% the same" is acceptable: the Wainwright v. Transcript case hinged on similar wording of a small but central part of the work copied. There is no way to predict the decision, but to reduce risk it is best to not copy any fanciful wording. Aymatth2 (talk) 14:18, 6 November 2015 (UTC)
  • Original: And second, he says that likely to aid comparisons this year was the surprisingly limited extent to which Fiber Divisions losses shrank last year.
  • Paraphrase: The second development likely to aid comparisons this year was the surprisingly limited extent to which the Fiber Division's losses shrank last year.
This is a run of 19 words, and only one of the taken sections. It is also notable that the judgement is somewhat flawed, describing the appellant's acts as "unprincipled chiselling" which casts doubt on the judge's unbiased interpretation of the law. Some of the obiter from that judgement are very unfortunate.
All the best: Rich Farmbrough, 21:00, 7 November 2015 (UTC).
First a red herring: There's a huge difference between "indolent expression and undulating throat" and "indolent expression" and "undulating throat" - also the context is important.
However there is little doubt that that quoting five words from a poem, even a short poem, would not constitute copyright infringement. It is established law that quotation for commentary is permitted, though there is uncertainty over amount and proportion, there is no reason to adopt extreme measures. A good explanation of the de facto situation can be seen at When Quoting Verse, One Must Be Terse by David Orr. Notably Orr cites the "three or four line standard" as "playing it safe". The "two word" or "less than five word" standards don't even get a look in.
All the best: Rich Farmbrough, 21:00, 7 November 2015 (UTC).
  • Maybe the judge made an error in Wainwright v. Transcript, but it is a sample judgement. Other judges may make the same error. With Salinger v. Random House, Inc. an example of close paraphrasing was
  • Original: "He looks to me like a guy who makes his wife keep a scrapbook for him"
  • Paraphrase: "[Salinger] had fingered [Wilkie] as the sort of fellow who makes his wife keep an album of press clippings."
The wording is different, but the creative concept – more than the facts – is carried across. Perhaps the Belloc example is not great. The idea is that the verse is being used as a (dubious) authority on Lamas, not as a poem with an excerpt quoted for the purpose of critical commentary. The principles that the essay should convey, with examples, is:
  • Avoid reproducing fanciful wording, because that may be interpreted as violating copyright
  • Even if you change the wording, do not copy fanciful concepts, ditto.
Are there better examples that can illustrate the same principles? We are trying to warn editors to stay well on the safe side, even if they could get away with more. Better examples? Aymatth2 (talk) 00:44, 8 November 2015 (UTC)
Our article on the case states "However, the essay illustrates that a judge may be tempted to use copyright law to support an objective other than simply protecting commercial rights." - which is precisely my point above.
Importantly, though, that case is about unpublished works ("the scope of fair use is narrower with respect to unpublished works") and is about rights to expressive content not about literal reproduction.
The scope of copying again is not a dozen words but "often more than ten lines of one letter had been copied in this way".
The extensive paraphrasing was deemed (probably quite rightly) to impact on Salinger's financial interest in his unpublished works
Moreover In 1992 the Copyright Act was amended as a result of the Salinger case to include a sentence at the end of §107 saying that the fact that a work is unpublished "shall not itself bar a finding of fair use if such finding is made upon consideration" of all four fair-use factors.
Now as to how we should advise our editors, it's a complex field and no-one is prepared to give hard and fast guidelines - quite rightly. I would suggest that we give some examples of guidance - and link to them.
Certainly we should also not shy away from our strengths, in the case of commentary on poetry, for example the use is transformative, and it may be legitimate to quote the entire work, especially if it is short. (It is certainly accepted as legitimate to quote the title, even when that is longer than the poem.) Conversely if we take a large amount of text from a reference work for the same purposes, it is not transformative. Fortunately there is little or no expressive content in, for example, reference biographies. In those cases we needn't worry ourselves over-much with whether "close paraphrasing" has occurred (especially if we adhere to NPOV).
As for examples:
  • In general I think the examples in Salinger v Random House make a better illustration than the Belloc one. (It would be better still to have examples that were not from an unpublished materials case - and preferably post 1992.) It should be made clear that on the one hand it took many such examples to constitute a copyright infringement, but on the other it was in the context of a book, not an article.
All the best: Rich Farmbrough, 17:03, 8 November 2015 (UTC).

Some stuff that is more relevant to direct quotes[edit]

Here are some quantitative guidelines I have seen:

  1. Anything 10 words or less, almost regardless is going to be fine.
  2. 3 or 4 lines
  3. Up to a quarter of a short poem, 5% of a long one
  4. 2-300 words from a book

I prefer, though, the guidance from The Poetry Foundation (and The Program on Information Justice and Intellectual Property and The Center for Social Media,)

The principles are all subject to a "rule of proportionality." The fair use rights of poets, teachers, scholars, and others extend to the portions of copyrighted works that they need to accomplish their goals. Thus, while in some cases fair use may extend to an entire work, in others relatively brief portions may constitute "too much." Importantly, there are no numerical rules of thumb that can be relied upon in determining whether a use is fair. Code of Best Practices in Fair Use for Poetry

- A document written in response to poet's "general sense that their ability to do their work with confidence was often impeded by institutional regulations based on very straitened interpretations of copyright."

All the best: Rich Farmbrough, 17:03, 8 November 2015 (UTC).

  • I got involved in a discussion a while ago over an article I started on El emigrante, a very short story. I nominated it for DYK and it went on the front page, then got slammed for reproducing the story in its entirety. All four words. The text of the work was deleted, but I could not resist starting an article on ¿Olvida usted algo? – ¡Ojalá!, a work of installation art with a four-word title. Later some bold editor put the text back into the El emigrante article. The first three words appear on public signs all over the place. The story is probably too short to be covered by copyright, and unlikely to go to court. But you never know. Aymatth2 (talk) 20:39, 8 November 2015 (UTC)
  • Numbers do not really work. Best to keep quotes short, and use them only when the precise wording is relevant. A quote of a public statement by a dead politician is safer than a quote from a work of fiction by a living author. Work that has not been published can legally be quoted to a limited degree in the right circumstances, but this essay just says "don't do it". With a typical article that draws facts from a source it depends on whether the judge thinks there is "substantial similarity". The shorter the source, the longer the amount copied and the closer the wording, the more likely. There is also the idea of the "essence" of the work, the core part, having much more weight. Plain and simple statements of fact are always safest. Aymatth2 (talk) 20:39, 8 November 2015 (UTC)
    • The type of material under consideration is important too. One cannot discuss poetry without quotation. "Once, as the snow of the year was beginning to fall" is very original and there is no reason to paraphrase. That is why the Belloc example is poor.
    • For other material the exact wording is sometimes important. "Never in the field of human conflict was so much owed by so many to so few"
    • In other cases still, we want to show that a source supports our statements, in footnotes. This is a widely used technique in academia. And the extent of the quotes often reaches a paragraph of 200 or more words.
    All the best: Rich Farmbrough, 14:20, 11 December 2015 (UTC).
  • The section on Wikipedia:Close paraphrasing#Quotation of non-free text could be tweaked to say that a discussion of a poem may include short quotations, since the exact words are relevant. Agreed that the Belloc example is not great: the fact that it is a poem obscures the idea that it is hypothetically being used as a source of facts. What we need is an example that illustrates a) directly copying a creative choice of words and b) copying a creative figure of speech using different words. Academics can perhaps get away with reproducing excerpts from their sources. It is safer for us to just identify the source, with a url if there is one. The reader has to assume that the cited source supports the statement. Allowing quotation of non-free content to show it supports the statements in an article opens the door to articles that are just cut-and-paste from non-free sources. Not worth the risk. Aymatth2 (talk) 14:55, 11 December 2015 (UTC)

"with or without quotation marks"[edit]

Saw this thread come across AN and figured the was struck by what the user quotes from this page: "Limited close paraphrasing is appropriate within reason, as is quoting (with or without quotation marks)". This seems misleading. Quoting without quotation marks is almost always a bad idea. The only two exceptions I can think of off-hand are:

  1. if you're quoting something like, say, a statistic or otherwise a combination of words that does not have a "minimal degree of creativity" (or however we'd like to describe what qualifies for copyright in the first place -- though you'd still want to attribute where you got it, of course);
  2. block quotes

There may be another -- even something obvious I'm overlooking -- but I think it's highly likely this could be misunderstood as "quotation marks are optional", which they are not. I'm going to go ahead and boldly remove that parenthetical. — Rhododendrites talk \\ 13:27, 1 October 2016 (UTC)

  • I support that change to the lead. The section on Quotation of non-free text says, "With direct quotation, editors should clearly distinguish the quoted material from the original text of the article following the guidelines for quotations." There is no need to repeat those guidelines here. Aymatth2 (talk) 14:29, 1 October 2016 (UTC)
  • A case that probably does not need to be spelled out here is quoting someone who spoke in a foreign language. Putting the translation in quotation marks would be misleading, since those are not the exact words, but it may be important to be as close as possible to what they said. Thus, Marilene Ramos said BR-319 could not be treated as a conventional paved road, with no controls. This is a very close paraphrase, a literal translation from the Portuguese original, but quotation marks would be misleading. I think. Aymatth2 (talk) 00:17, 8 October 2016 (UTC)