Wikipedia talk:Copyrights

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Lowercase sigmabot III (talk | contribs) at 00:18, 26 September 2020 (Archiving 2 discussion(s) to Wikipedia talk:Copyrights/Archive 16) (bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


CiteSeerX copyrights and linking

Are there any concerns about linking to CiteSeerX without first checking the link and verifying the copyright status?

  1. CiteSeerX is a federally funded, state-associate college run site
  2. {{cite xxx}} and {{citation}} include it as a parameter
  3. CiteSeerX has a DMCA link on each page
  4. Federal immunity links: you must register copyright and the feds are largely immune
  5. State immunity links: a copyright violation case brought by a photographer against a state university in USA: Indiana 1:16-cv-02463-TWP-DML and similarly in Kentucky. And Ohio, Indiana, Florida (more elaborately, with consideration of "established state procedure to deprive of property" and due process), Michigan, Michigan again
  6. I am refering to filling in the template parameter, not linking to a PDF

AManWithNoPlan (talk) 16:21, 20 February 2019 (UTC)[reply]

  • We cannot be 100 % sure that a court would clear CiteSeerX/Penn State University, but then every single thing out there on the internet is potentially subject to copyright litigation. What matters is that we cannot possibly be considered to "know or reasonably suspect" that CiteSeerX operations infringe copyright in USA, as there is no relevant precedent warning us it could, and indeed it's a reputable and solid institution which would likely have a very strong case in court based on various precedents.

    As armchair lawyers we can make hypotheses about how it could end up on the wrong side of the law, but until something happens in the real world out there they're just hypotheses; in articles we would require extraordinary sources for such an WP:EXTRAORDINARY claim as the potential illegality of CiteSeerX, or a fortiori of the linking to it or even the reuse of articles linking to it. Nemo 13:10, 27 February 2019 (UTC)[reply]

    • That is a somewhat one-sided and limited presentation of the issue. A more comprehensive discussion can be found at Wikipedia:Edit filter/Requested/Archive_12#CiteSeerX and Citation bot. Also pinging David Eppstein who initially raised the issue and may wish to be aware of this discussion. However, a central point is that CiteSeerX may feel they can get away with things (are unlikely to be sued, or unlikely to lose if sued) under various academic fair use theories; but these do not apply to Wikipedia and our reusers, and our standard isn't actually a legal calculus of risk but a (more conservative) policy prohibiting linking to such material. It is long established policy on Wikipedia that we do not link to known or probable copyright violations: which is why the argument above focuses on claiming there are no copyright violations on CiteSeerX. The above is also in the context of a bot automatically adding links to CiteSeerX, which is why the argument is focussed on establishing that adding such links do not require any human judgement and verification of copyright status (which an automated process is unable to provide). In other words, I think this (WT:C) is the right venue to discuss this issue. --Xover (talk) 06:21, 28 February 2019 (UTC)[reply]
      • that’s exactly why I started this discussion here. It’s not really Bot related question other than a tangential outcome of the conclusion. Lastly, the people who hang out on the Bot talk page are mostly smart and knowledgeable; but not necessarily well-versed and knowledgeable about this specific topic. AManWithNoPlan (talk) 14:43, 28 February 2019 (UTC)[reply]
        • None of AManWithNoPlan's points/justifications/excuses actually address Wikipedia's copyright and linking policies and Wikipedia's fair use policies, which are much stricter than what is legally required. My position is that: (1) Although CiteSeerX provides other useful metadata, the primary purpose for readers of providing CiteSeerX links is to guide them to the online copies of publications hosted by CiteSeerX. (2) Because of its greater permanence, CiteSeerX is useful to include as a link even when more direct links are also available. (3) Because most CiteSeerX copies of papers are derived from author copies, author-uploaded repository copies, or publisher copies of papers (all ok to link here) and show their provenance to those copies, we can link to them and should not have a blanket ban on all CiteSeerX links. (4) A significant minority of CiteSeerX copies (maybe 10%) come from other sources, including pirate sites, stashes of related work by non-author researchers, and course reading lists by non-author instructors. (5) These other copies clearly do not respect copyright, and it is not up to us nor relevant to determine whether they are legal under fair use for CiteSeerX to include, but they also clearly do not fall within Wikipedia's more-strict-than-legally-required fair use requirements. (6) We cannot evade Wikipedia's fair use requirements by the pretext of linking to other sites with weaker fair-use requirements and letting them make the determination of whether something is fair use. (7) Therefore, we should not link to CiteSeerX pages that are of type 4 rather than type 3. (8) It is not currently feasible for a bot to distinguish between links of type 3 and of type 4. (9) Therefore, CiteSeerX links should only be added under human oversight. (10) If an automatic tool such as Citation bot provides the capability to add such links, it should only do so in a mode that provides a human editor with the ultimate decision over whether to include it, and that shows prominent warnings about copyright that can educate human editors over which links to include and which not. —David Eppstein (talk) 20:06, 28 February 2019 (UTC)[reply]
what an enormous waste of time. google scholor has all these exact same links. if we started adding the cluster parameter like this link https://scholar.google.com/scholar?cluster=17866832921325231667 to citations, would that also be cause for blocking? —Chris Capoccia (talk) 03:08, 17 March 2019 (UTC)[reply]
Ooh, the "someone else is doing it too so it must not violate our internal policy" defense. How convincing. —David Eppstein (talk) 03:11, 17 March 2019 (UTC)[reply]
No. Chris Capoccia has asked a specific question about the implications of your argument. Try again? Nemo 09:53, 17 March 2019 (UTC)[reply]
Ok, Chris asked a specific question, which appeared to be rhetorical and the answer to which should have been obvious from my previous position, but since it was apparently not obvious to you: Google scholar links arguably fail WP:ELNO #9, but let's ignore that for the purposes of this answer. Google scholar has many other purposes (searching, tracking citations, ranking scholars and publication venues, etc) but the specific purpose of the "cluster" links is to help you choose among online copies of the paper. Just like with CiteSeerX, most of those copies are legitimate but some are not. The specific cluster you link to appears to include only legitimate copies (official publisher copies and an open-access journal hub), so that would be ok. Even in the case of clusters that mix those kinds of copies with the non-copyright-respecting ones, it would still be ok. The only links I want to keep out are the ones where all copies are non-copyright-respecting. If Google scholar has clusters for which that is true (less likely because it also includes subscription-only publisher copies), I think those specific clusters should not be linked. —David Eppstein (talk) 19:22, 17 March 2019 (UTC)[reply]
Ok, so now Google Scholar is also "know[n] or reasonably suspect[ed] [...] carrying a work in violation of the creator's copyright". What about the web pages which link such Google Scholar clusters? Nemo 09:17, 18 March 2019 (UTC)[reply]
You are the one saying it is suspected of that. I have said above that all-copyvio clusters are much less likely on Google scholar than on CiteSeerX. Can you point me to even one of them? Or are we going to go farther and farther into a maze of less and less likely hypotheticals until we reach one that you declare is absurd enough to win whatever point you are trying to make? The copyvios on CiteSeerX are real. —David Eppstein (talk) 15:53, 18 March 2019 (UTC)[reply]
I have no familiarity with the very peculiar standards you use to declare that a PDF is a copyright infringement, so I suggest that you search on Google Scholar any of the titles which you declared infringing and report back on your results. Thanks, Nemo 16:46, 18 March 2019 (UTC)[reply]
I suggest that you stop assigning meaningless makework delaying-tactic tasks to other people, stop basing your arguments on wishful thinking or denial, and seriously address the question of linking to copies of publications that are neither provided by the author nor publisher. —David Eppstein (talk) 17:25, 18 March 2019 (UTC)[reply]
Thank you for your suggestion but I already did: I personally consider actual legal standards when deciding what edits are appropriate. Your personal proposed standards are unknown: we only have WP:IDONTLIKEIT arguments on certain domains or URLs you dislike. My understanding of your argument is "let's not (systematically) link any domain which might link to <certain PDF files I don't like>", in which case we would not be able to link scholar.google.com or doi.org because they have millions such links. Nemo 18:19, 18 March 2019 (UTC)[reply]
My standards are only unknown to people who are unwilling or unable to read my clear statements above, and they are not subjective. Copies of papers with clear provenance to the author, to the publisher, or to an author-uploaded preprint archive are ok. Links to web scrapers like CiteSeerX that provide a provenance to one of those things are also ok, even if the same cluster of copies also has others mixed in. Everything else is not. —David Eppstein (talk) 19:59, 18 March 2019 (UTC)[reply]
normal people do thousands of edits with this bot and now it's shut down over some stupid argument about what one parameter might facilitate when google facilitates the exact same thing. —Chris Capoccia (talk) 14:15, 17 March 2019 (UTC)[reply]
alternatively, there are citations like this one where the publisher gives the full pdf at their site accessible through the DOI, but the CiteSeerX links are all broken :D —Chris Capoccia (talk) 22:36, 17 March 2019 (UTC)[reply]

DeStefano, Frank; Price, Cristofer S.; Weintraub, Eric S. (August 2013). "Increasing Exposure to Antibody-Stimulating Proteins and Polysaccharides in Vaccines Is Not Associated with Risk of Autism". The Journal of Pediatrics. 163 (2): 561–567. CiteSeerX 10.1.1.371.2592. doi:10.1016/j.jpeds.2013.02.001. PMID 23545349.

  • This entire stuff smacks of copyright paranoia.WBGconverse 12:24, 18 March 2019 (UTC)[reply]
  • The nature of the Internet makes it impossible to comply with Wikipedia's policy to the letter. For example Wayback and 20+ other archive sites scrape content from other websites, which we relink here. There are degrees of compliance based on COMMONSENSE, a no-tolerance compliance is a recipe for endless fights that will degrade Wikipedia needlessly. -- GreenC 15:51, 19 March 2019 (UTC)[reply]
    • Huh? I am not asking to ban CiteSeerX, Wayback, or other sites that might sometimes include inappropriately copyrighted content. All I am asking is that when they do include content that has no proper provenance, we avoid linking to that exact content. —David Eppstein (talk) 16:14, 19 March 2019 (UTC)[reply]
      • I find it dubious that anyone every checks the provenance of these links. AManWithNoPlan (talk) 17:06, 19 March 2019 (UTC)[reply]
      • How would we determine programmatically? It would require an army of trained specialists given the volume of links, difficulty in tracking down and determining copyright ownership (sometimes impossible). This is not a new problem, sites like Wayback say they honor take-down requests, but because they are operating programmatically they can't check every page of 150 billion for copyright content. Just as telephone companies can't monitor every conversation for illegal activity with their equipment. If we show some effort, we should be OK. Even a simple warning to users might be sufficient - or a list where known problem URLs are kept and the bot can bypass when it finds them and users can add to the list. -- GreenC 20:50, 19 March 2019 (UTC)[reply]
        • I agree that it does not look possible to check this programmatically. That's why I think such links should only be added manually. There are far too many bad links (my estimate: roughly 10% of all CiteSeerX pages) to make maintaining a whitelist feasible. —David Eppstein (talk) 21:13, 19 March 2019 (UTC)[reply]
          • If they were to be checked manually, the labor would be better spent improving a blacklist so bad links only need to discovered one time. The bot will prevent all future additions elsewhere. Otherwise, the same link has to be manually discovered every time it is added which is redundant labor and error prone. And the blacklist will reveal patterns of domains so it will be possible to blacklist entire domains. This is basically how IABot works, there is a database where URLs are maintained and users can adjust as needed. -- GreenC 23:38, 19 March 2019 (UTC)[reply]
  • information Note: I have posted a notice of this discussion on WT:CP and requested the input of editors with experience in this area. --Xover (talk) 17:35, 19 March 2019 (UTC)[reply]
  • information Note: And now I've linked it from WP:VPP. Nemo 11:05, 2 October 2019 (UTC)[reply]

In case anyone is drawn here from Nemo's new link: in rough outline, my opinion is WP:ELNEVER bans links that do not have a clear provenance to the author or publisher of a work and that (regardless of what WP:EL says at the top about not applying to references) the same ban should clearly apply to links within references. Nemo's position appears to be (because the only relevant policy is ELNEVER and it doesn't cover references) that links within references are completely unrestricted, and that if something can be found anywhere on the net it can be linked to in the references. Nemo has been working with software (OABOT) to automatically add these links at mass rates to Wikipedia articles, at rates too quick to allow manual checking of their provenance and has encouraged other editors to run the same software. I believe that editors must check the provenance manually and have so far blocked two users of OABOT for not doing so. —David Eppstein (talk) 17:48, 2 October 2019 (UTC)[reply]

Thank you for commenting. No, that's not my position; my position is expressed in my comments above. Nemo 10:39, 3 October 2019 (UTC)[reply]
  • Links within references that are know to be to copyvio versions should of course not be included, and DavidE's case 4 way above in the discussion explains very well what they are. The question is in adding references, how carefully we need to screen them--whether the likelihood is so low that we can use an automatic process. I think the first step would be a redetermination of the actual state of their links now, as it's a while since this discussion was initiated. (the analogy with GoogleScholar is irrelevant. CiteSeerX hosts copies of articles; GoogleScholar does not. ) DGG ( talk ) 07:31, 7 October 2019 (UTC)[reply]
    • But we don't host copies either, we merely link them. Nemo 08:03, 7 October 2019 (UTC)[reply]
  • To facilitate the discussion, and make sure everybody can base their opinion on realistic legal standards (if they choose to), I'm pursuing a legal opinion from a renowned copyright lawyer and scholar. LIBER is helping me formulate the question in a correct way and I hope Wikimedia Italia will cover the costs. Nemo 07:00, 15 November 2019 (UTC)[reply]
    • You are aware that Wikipedia's standards for content are significantly stricter than merely what is legal, correct? Perhaps you should re-read Wikipedia:Non-free content. From a strictly-what-is-legal perspective that link might not seem relevant, since it's about hosted content rather than external links, but I think the principle is important. —David Eppstein (talk) 07:54, 15 November 2019 (UTC)[reply]
      • I didn't say anything about asking "what is legal". If you have legal questions to ask I can try to incorporate them (I'm not planning to ask anything about whether adding links is fair use in USA, for instance). Nemo 09:35, 21 November 2019 (UTC)[reply]

Linking to copyrighted works

Stated in that section is: "Knowingly and intentionally directing others to a site that violates copyright has been considered a form of contributory infringement in the United States (Intellectual Reserve v. Utah Lighthouse Ministry [1]);". That case, not only being old (1999), and is only at the level of a District Court, is only binding in one Federal judicial district, the State of Utah. Has this case, or a similar one, been upheld at an appeals court level, or even the US Supreme Court? How about a contrary ruling in one of the 12 (Circuits 1-11 and DC) Appeals Circuits? It was also issued relatively shortly after the Internet became available to a majority of the American public. It is also impractical to determine where a given website physically is, or if it is actually bound by the laws of a particular nation: Websites can be "anywhere", and people who cite them do not necessarily know the laws of whichever country the material is hosted in. Another factor is the passage of the DMCA, or Digital Millenium Copyright Act. See: http://www.ipinbrief.com/dmca-contributory-infringement-vicarious-liability/ . It would be utterly impractical for WP to dig through all of these factors on its own; The only realistic path is to allow copyright holders to make claims, where they believe them to be valid. Aardvark231 (talk) 04:48, 19 August 2019 (UTC)[reply]

Wikipedia doesn't expect you to do exhaustive research into a site before linking to it. However if you know that a link contains a copyright violation, or is likely to, then don't link to it. That's all the policy says. As this is a policy with legal considerations amateurs should not be attempting to rewrite the legal parts anyway. Hut 8.5 06:45, 19 August 2019 (UTC)[reply]
Yes. But by "legal parts" do you mean the parts which are drawn from the m:Terms of use? If not, can you point out what lawyer wrote the section in question or what legal opinion it is based on? I don't see anything relevant at m:Wikilegal, for instance. Nemo 16:13, 19 August 2019 (UTC)[reply]
When it comes to adding external links, the burden tends to fall upon the editor wanting to add the links to establish a consensus that the link should be added per WP:ELBURDEN; this is similar to to WP:BURDEN when it comes to article content in general. I think the same reasoning can be applied here based upon WP:ELNEVER; the burden seems to fall upon those wanting to add a link to establish that it is clearly not a copyright violation, not the other way around; moreover, whenever there's any reasonable doubt, the link probably shouldn't be added. -- Marchjuly (talk) 06:50, 19 August 2019 (UTC)[reply]
Marchjuly, your interpretation is practically the opposite of Hut 8.5's. I suggest to re-read the guidelines and see if perhaps there was a misunderstanding. For instance, where do you draw your "any reasonable doubt" criterion, which is quite different from "Knowingly and intentionally"? Once we identify what parts of the guidelines are unclear, we might want to clarify them.
I'll also point out that for sources the guideline is at WP:LINKVIO and WP:ELNO doesn't apply. Nemo 16:13, 19 August 2019 (UTC)[reply]
I didn't mention ELNO, but ELNEVER does seem to specifically apply to such a thing. Regardless, ELNO, ELBURDEN and ELNEVER are all subsections of WP:EL, which does seems applicable. As for "reasonable doubt", I was basing that on my reading of If there is reason to believe that a website has a copy of a work in violation of its copyright, do not link to it. in item 1 of ELNEVER and However, if you know or reasonably suspect that an external Web site is carrying a work in violation of the creator's copyright, do not link to that copy of the work. from WP:COPYLINK; so, I don't see what I wrote as being practically the opposite from Hut 8.5's comment as you're claiming. The burden still, at least in my opinion, falls on those wanting to include a link to demonstrate that it's clearly not a copyright violation, not the other way around. Nobody has the right to add links or any other type of content to Wikipedia and Wikipedia is not obligated to host links or content just because someone wants to add it. Editors can be bold, but they are expected, in principle, to discuss when their edits are challenged by another editor. So, that should also apply to an external link whose source is questionable per COPYLINK. It seems in such cases that removing the link as a potential copyright violation is what COPYLINK is asking use to do as a precaution until any concerns about the link can be resolved. -- Marchjuly (talk) 02:22, 20 August 2019 (UTC)[reply]
WP:ELPOINTS: "This guideline does not apply to inline citations or general references". It's practically the opposite because you reverse the burden of proof compared to what Hut 8.5 said. It's also impossible to prove a negative (absence of any reasonable doubt), especially on something we have no direct information about: how do you suggest that happens in practice? If I raise a doubt about 10k links/articles at once, are now all the editors of those articles forced to remove all those links all of the sudden from their references until they manage to file 10k lawsuits to prove that every usage is cleared by some court? Nemo 09:50, 20 August 2019 (UTC)[reply]
We weren't discussing (at least I didn't think we were) adding a link as an inline citation, but rather adding a link as an external link to the "External links" section. However, even if we were discussing a link added as part of a citation, WP:ELNEVER clearly states Policy: material that violates the copyrights of others per contributors' rights and obligations should not be linked, whether in an external-links section or in a citation. and this is further clarified by WP:EL#cite_note-copyvio_exception-1. Once again, I don't think I've posted anything that is practically the opposite of However if you know that a link contains a copyright violation, or is likely to, then don't link to it. You seem to be interpreting "know" as meaning with "100% certainty", whereas I think However, if you know or reasonably suspect that an external Web site is carrying a work in violation of the creator's copyright, do not link to that copy of the work. as stated in WP:COPYLINK doesn't necessarily mean 100% certainty. I'm not suggesting links be removed on whim and perhaps "reasonably expect" is too open to interpretation. However, if an editor removes a link and another editor disagrees with its removal, then I think the burden of discussing the link and establishing a consensus in favor of re-adding it (perhaps even on WP:ELN) falls on the person wanting to re-add the link; moreover, this discussion probably should take place prior to the link being re-added. If a consensus is established in favor of re-adding the link, then it will be re-added; if not, it won't. A local consensus, however, cannot supersede a community-wide policy so eventually a community discussion might be needed to sort things out. Your example about one editor questioning/removing 10k links/articles at once sounds a bit WP:POINTy and would likely be considered WP:TE and WP:DE more than anything else; however, if there was a formal RFC about the suitability of a certain source/link and the consensus reached through that RFC was the link shouldn't be added to any Wikipedia pages at all, then all those instances where the link is being used should be removed per that consensus. -- Marchjuly (talk) 13:26, 20 August 2019 (UTC)[reply]
  • ELPOINTS's reference to citations means that if you use a copyrighted source as a reference, you can link to the copyrighted source in the citation. If that source is infringing someone's copyright (e.g., someone reposted a WSJ.com article), then it is better to link to the original source rather than the copy. In general with copyrights, since you can't prove a negative as you say, we want the source to be explicitly published under a free license; lack of a copyright notice is not usually sufficient to show a free-license, as copyright exists the moment content is "fixed in a tangible form". All that said, as Hut originally mentioned, we don't require exhaustive research into the copyright status, but it also comes with the risk of being summarily removed by another editor who thinks it may be. At that point it will require resolving the question. CrowCaw 16:20, 20 August 2019 (UTC)[reply]
The original message by Aardvark231 quoted WP:COPYLINKS and did not mention external links, so I think it was a general comment. It's fine to mention that the "external links" section has strict rules (due to being abused more often) but some of the messages above seem to forget that links in general, and citations in particular, follow different rules.
Given the continued confusion in this area, we may consider a few changes:
  1. "if you know or reasonably suspect that an external Web site is carrying a work in violation of the creator's copyright" → "if an external website is known to carry a work in violation of copyright";
  2. add after the previous sentence: "thanks to a final judgment against it or an identical case in its home jurisdiction our in the USA" (otherwise we would have stopped linking Google Books for ten years until the Google Books Settlement came);
  3. replace "Linking to a page that illegally distributes" with "Linking to a page known to illegally distribute" (otherwise the bad light would have inundated us for those ten years);
  4. replace "a site hosting the lyrics of many popular songs without permission from their copyright holders" with "a site providing access to video streaming of popular movies despite contrary court rulings", because 1) lyrics sites are often complex cases, claiming fair use and so on; 2) most of them nowadays gain licenses and it's impossible to know details about it; 3) even Google search would fall under this prohibition if strictly interpreted (see recent kerfuffle); 4) it's easy to find court rulings on streaming, like Pelispedia, which is a more compelling example; 5) it's not only about "hosting", one can be sentenced for less clear-cut cases as well.
These are relatively minor changes, in that they don't automatically resolve the matter in either direction for the most commonly debated cases. They would only nudge the discussions towards a more constructive method. Nemo 13:20, 24 August 2019 (UTC)[reply]
I think what you're missing here is that the term "external links" does not only mean "links added to an 'External links' section"; it means any link to a external/outside website (i.e. not an internal link (a.k.a Wikilink)) added to an article. So, even links added as part of a citation are external links to third-party websites. All links to external websites used anywhere in an article are, in principle, subject to WP:EL and this is stated in the very first sentence of that page. An "External links" section is just the preferred way of presenting such links when they aren't being used as inline citations or general references; moreover, WP:ELNEVER and WP:EL#cite_note-copyvio_exception-1 apply to all external links, even those not used in an "External links" section. -- Marchjuly (talk) 14:57, 24 August 2019 (UTC)[reply]
Aardvark231 was a sock of Slyfox4908 and thus not allowed to edit - we typically strike through sock edits and I've done that. Doug Weller talk 15:20, 25 August 2019 (UTC)[reply]
Marchjuly, thanks for pursuing additional clarity but no, I'm not confusing anything. Wikipedia:External links explicitly excludes citations and references in the body of the article (and inline external links are generally to be avoided), so in practice it's about the "external links" section and similar. On the other hand, WP:COPYLINKS applies to all external links. Nemo 17:56, 25 August 2019 (UTC)[reply]
I'm actually not disagreeing with you about citations and references; I'm only saying that WP:ELNEVER (and WP:EL#cite_note-copyvio_exception-1), which is part of WP:EL, does apply to all external links (including citations and references). So, at least that one part of WP:EL does apply here.
If a link is deemed to be a copyright violation per COPYLINK (or ELNEVER), then it shouldn't be added (even as a citation); furthermore, if an editor isn't sure but believes/thinks the source of the link might contain copyright violations, it would be best for them to avoid linking to it and seek input from others. In cases when there's clear disagreement over whether the link is a COPYLINK/ELNEVER violation, things should be discussed and the link only be considered OK to add if a consensus has been established that it's not a violation. Even if a link is clearly determined not to be a COPYLINK/ELNEVER violation, then there is still no automatic right of usage. If it's intended to be used as a citation and its value as such is challenged, then it should be discussed on the article's talk page or WP:RSN and a consensus established as to whether it should be used; if it's intended to be used as an external link and its value as such is challenged, then it should be discussed on the article's talk page or at WP:ELN. In either case, the burden (WP:BURDEN and WP:ELBURDEN) is upon the editor wanting to add the citation/link to establish a consensus in favor of adding it.
The OP (who has since been confirmed to be a sock) wanted to add a link to a particular article, but other editors were disagreeing. The main disagreement was about whether the link was a violation of COPYLINK, and the OP strongly believed that the COPYLINK policy either wasn't applicable or was wrong altogether; so, the OP posted here to try and get/propose a change in policy to strengthen their argument. Even if the COPYLINK policy was changed, however, that wouldn't mean (even though the OP seems to think it would) that the link would automatically OK to add; when this was pointed out to the OP, they started posting stuff about there being some kind of conspiracy (by those opposed/afraid of the content in question) to keep the link out of Wikipedia and accusing others of aiding this conspiracy.
All policies/guidelines probably should be reassessed/reviewed every now and then, and now might actually be the right time to do this for COPYLINK. WP:COPY is, however, a really major policy with legal considerations; so, changing one part would likely have a huge impact. What you're proposing might actually be a good thing, but it's probably not wise for a few editors to decide to do. It would be better for something like this to go through a WP:RFC involving lots of members of the community to see whether there really is a strong consensus of making such a change. -- Marchjuly (talk) 22:09, 25 August 2019 (UTC); {Note: Post edited my Marchjuly to add the word "might" to the second to last sentence. -- 07:35, 7 October 2019 (UTC)][reply]

Use of External Links and citations to link to scanned technical manuals, documents and others

This issue has implications beyond the local pages where the confusing is occurring and I am reaching out here to reach a guiding consensus regarding links in external links and citations to infringing or potentially infringing external links to personal websites and hosting services (Scribd.com, manualslib.com and other) containing various materials, service manuals, internal documents, academic journals as external links. One argument came up that if they were published before 1977 and they're not specifically marked "copyright" it is fine; however according to Cornell University's copyright Per https://copyright.cornell.edu/publicdomain#Footnote_2 unpublished documents, such as corporate authorship documents even before 1977 may be under 120 year copyright protection. The espeically grey area is the linking to personal sites containing a vault of scanned stuff between 1923-1977. (or I could call it using EL like a "torrent seed" to links to vaults of scanned manuals without actually hosting potentially infringing materials on Wiki server). Do we err on the exclusion if there's no clear and convincing evidence they're not infringing; or do we side with including them unless there's a clear and convincing evidence the stash of scanned documents are infringing? I interpret WP:COPYVIOEL to be suggesting if in doubt, leave it out.

Such as,

  • scanned potentially copyrighted material posted on Scribd like hosting service: Special:diff/929714575 (I say potentially, because it's 1972, the extent of scanning is far beyond "fair use" and I'm uncertainty about registry or publication status).
  • A PDF scan of academic conference posted in its entirety on personal website such as a PDF scan of copyrighted conference
  • Citing service manuals, factory notes, etc, then citing to fan sites large assortment of fan scanned technical manuals and such that are likely unauthorized a PDF scan of copyrighted workshop manual
  • What about when copyright registry status is uncertain? Let's say for the sake of discussion, say on a page about a Ford vehicle, citing contents from a factory shop manual, then linking to directory listing that contains all kinds of full PDF scans of service manuals of numerous vehicles that have been authored prior to 1977. If there are no "copyright" in those PDFs, is that deemed acceptable on Wikipedia? One user commented that such a practice is particularly common railroad/train related articles, but is it acceptale on Wikipedia? Graywalls (talk) 09:57, 8 December 2019 (UTC)[reply]

Notifying users whom I believe to be experts on copyright matters on Wikipedia. @Sphilbrick:, @Diannaa:

Previous discussions that have taken place for reference:

I've been involved in previous discussions as they relate to railroading articles; operator manuals are a useful source for performance characteristics of locomotives (as built, anyway). I think the question is somewhat complex:

  • Unpublished material is indeed probably still under copyright, shouldn't be linked to, and probably isn't even usable as a source, per WP:V.
  • Published material from 1923-1977 which doesn't have a copyright notice may be in the public domain. I don't think it's possible to have a single uniform answer; you'd probably have to do it document-by-document. Such material is acceptable as a source either way, but if still under copyright shouldn't be linked to. Anything that's held in a library preemptively passes WP:V and is usable as a source.
  • Given the harm to article stability and related WP:INTEGRITY problems, challenged sources should be flagged as such and discussed, but not removed. Removing a source without tagging the information it supported is harmful.
  • If there is a site which was found to host an overwhelming amount of copyright-violating material, blacklisting that site would be appropriate.

That's my view, to get the ball rolling. I appreciate Graywalls for centralizing the discussion here. Mackensen (talk) 14:50, 8 December 2019 (UTC)[reply]

Being held in library establishes that it's published, but the material being available in library is not a pass to link to PDF scans of such article instead of using a basic citation, as Sphilbrick said in one of the linked discussions, such copyright issues can be corrected by removing the link and replacing it with basic citation. The material being in library doesn't equal it's acceptable to link to a PDF scan of such material being hosted on someone's page, or Google Docs or like. We don't have a rule of thumb that says anything prior to 1977 that's in library and doesn't have copyright appended is a "public domain". On somewhat more complex situation, a university instructor may have a chapter of a book scanned and that maybe considered fair use for educational use for use by classes he teach, but linking to that course page and making it available for everyone on Wikipedia likely is not within "fair use" even if it maybe available to Google index. That issue is easily avoided by not linking it, but simply treating it as if you were citing from a physical book. I believe links and citations that reasonably appear to be infringing are something that should be converted to basic citation on sight; or removed if that's not practical due to lack of clear information. Graywalls (talk) 15:33, 8 December 2019 (UTC)[reply]
Sphilbrick and Diannaa didn't mention in the referenced discussion above they were ok/not ok because of age for published things created between 1923 to Jan 1, 1978 and it was suggested that removing the user scanned journal and replacing it with the basic cite was a proper approach. Since this has implications much beyond trains, I hope we develop a crystal clear global consensus regarding scans of numerous materials. Graywalls (talk) 00:50, 9 December 2019 (UTC)[reply]

See Wikipedia talk:Copying within Wikipedia#Proper attribution. A permalink for it is here.

Anyone interested in commenting on this there? Flyer22 Frozen (talk) 21:10, 2 February 2020 (UTC)[reply]

Semantic Scholar

Is there a policy about linking to full-text articles on Semantic Scholar? My understanding is that authors upload their own journal articles to the SS site. In many cases copyright is owned by the journal, not the author, so it cannot be assumed that the author has the right to post such material. The SS terms and conditions require uploaders to "represent and warrant that (i) you own and control all of the rights to the User Content that you post or you otherwise have the right to post such User Content to the Site". Does anyone know if SS does any monitoring or checks on the copyright status of articles it hosts, or is it just a case of "trust the user until someone complains"? In short, is it safe to link to an article on SS? Thanks Kognos (talk) 14:13, 5 February 2020 (UTC)[reply]

My understanding, from talking to them about a case I found where it was clear the authors did not upload, is that they scrape the web for pdfs, sometimes pick up pirated ones, and remove them quickly when that is called to their attention. But of course hardly anyone ever calls it to their attention. I don't think it is ever useful to link to the pdfs on their site: either it is available elsewhere or it should not be available. On the other hand, linking to their indexing pages (with citations etc) could possibly be useful, even when those indexing pages do not have pdfs available. —David Eppstein (talk) 17:14, 5 February 2020 (UTC)[reply]
citation bot only adds ones that are licensed and not the ones that are scraped. So, they know which are which and their API exposed that information. AManWithNoPlan (talk) 18:49, 5 February 2020 (UTC)[reply]
Do there exist papers that are licensed for open distribution by the publisher through SemanticScholar but that are not directly available openly from the publisher? —David Eppstein (talk) 19:02, 5 February 2020 (UTC)[reply]
That's a very good question! I was using OAbot. The first free-to-read link it suggested was this one: http://pdfs.semanticscholar.org/ae76/b5c8c2f18a0b82e5a15ebcd574fa0e0f4616.pdf. I checked before making the change, and this paper is released under CC-BY-3.0, so I accepted the suggestion. The next suggested link was to a paper that had a 1995 copyright notice and no indication that it was released or licensed for open distribution. This raised my suspicions, leading to my initial query above. I am investigating the second paper, and will let you know what I find out. On the question of whether legitimate papers exist that can only be found on SS, I checked the first paper, and it is indeed available from the publisher, here: https://www.mapress.com/j/zt/article/viewFile/zootaxa.3882.1.1/33563. I have edited the article (Limnochromini) to link to the publisher's version, rather than SS's. So I agree with David's suggestion to avoid linking to Semantic Scholar. If anyone finds papers on SS that can't be found anywhere else, then I'd be happy to reconsider. Kognos (talk) 10:39, 10 February 2020 (UTC)[reply]

RD1 in case of old edits

@Piotrus, Justlettersandnumbers, Hut 8.5, Moonriddengirl, Diannaa, Money emoji, L3X1, MER-C, Ymblanter, Wizardman, and Crow:

The decision to do a revision deletion when the problematic edit is old has always been something that troubles me.

The good news is that with the successful implementation of Copy Patrol, coupled with a small number of volunteers willing to work in this area, most new copyright problems are detected within hours, sometimes minutes, and a revision deletion almost never has any collateral damage.

In contrast, we still have a number of CCI's not yet completed, and one notable CCI in progress with several detected problem edits in 2006 and 2007. A revision deletion of those edits hides a substantial portion of the history the article from non-admins.

The issue came to a head with a recent RD1 performed on Pinsk, prompting a discussion here: Talk:Pinsk#CCI_review

I thought this talk page would be a better place to have a discussion about what should be done.

I mulled over whether there should be a formal RFC, but my current thinking is that we should start with a discussion, see where it leads, and then possibly open an RFC. One additional advantage of starting this informal discussion is that it may help clarify the issues.

One important point is that, unlike many issues, we cannot simply reach a consensus as editors on what to do. There are legal implications, and if we decide to codify a guideline, we will need to involve legal at some time. At one time I was thinking that we only need to involve legal if we choose to do something other than apply revision deletion all the time, but one editor raise the interesting point that revision deletion might constitute a violation of the creative common's license for interim edits, so whatever we decide will almost certainly require legal signoff.

Issues:

  1. a revision deletion which is not performed immediately after a problematic edit will hide some of the history of the article. In the case of a multiyear gap between the problematic edit in the time of the revision deletion, this may seriously hamper the ability of editors to understand the history of the article which is often important in disputes as well as ordinary editing questions
  2. failure to do a revision deletion when we have detected a copyright problem, even when that copyrighted text is removed, means that there is problematic material in the history of the article, and the community along with legal generally agrees that we need to ensure that we do not have copyrighted material in either the current version or historical versions of the article
  3. a revision deletion which is not done immediately will also hide intervening edits. Is this a violation of the Creative Commons license--S Philbrick(Talk) 15:49, 16 February 2020 (UTC)[reply]
The legal code states that all contributing authors must be credited. ("The credit required by this Section 4(c) may be implemented in any reasonable manner; provided, however, that in the case of a Adaptation or Collection, at a minimum such credit will appear, if a credit for all contributing authors of the Adaptation or Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors.") It does not state that each sentence and phrase in the document must be credited to an individual author. Therefore in my opinion providing a list of contributors via the article history is adequate attribution, even if the edits themselves are no longer visible. If we took the stricter reading of the policy, revision deletion would be extremely rare - it would only be permitted if there were no intervening edits whatsoever, and that rarely happens. That's not the way that most of us active in this area interpret the policy. — Diannaa (talk) 15:58, 16 February 2020 (UTC)[reply]
I agree with Diannaa--Ymblanter (talk) 16:43, 16 February 2020 (UTC)[reply]
Wikipedia:Revision_deletion#Large-scale_use already discusses the application of revision deletion on old edits on on pages with many revisions, it says that it is generally not a good idea but that there is an element of judgement involved. I generally agree that revision deletion shouldn't be used in those situations, although there are exceptions (e.g. if almost all the article's contents has been removed as copyvio). I don't think there is a general expectation that copyright violations must be revdeled all the time and the revision deletion policy isn't written with this in mind. There are no issues with the licence for the reason Diannaa gave. Hut 8.5 16:53, 16 February 2020 (UTC)[reply]
There are two factors to consider: the scale of copyvio and the amount of other modification made to the article since the copyvio was inserted. More modification requires a bigger copyvio to trigger RD1. MER-C 17:35, 16 February 2020 (UTC)[reply]
A busy article on a recent current event, where a lot of diffs would have to be hidden, is a place where revision deletion might not be appropriate, even for an egregious violation. Likewise an article on a controversial topic where who-added-what might be germane to an ongoing discussion or for settling an edit war.— Diannaa (talk) 19:23, 16 February 2020 (UTC)[reply]
Hmm.... This is something I thought about as well. I think it depends on how severe the violation is. I'm not too sure how an rfc would pan out; I feel like everyone would just point to us, but even we aren't too sure about what is and isn't excessive. It's always been a ymmv thing. 💵Money💵emoji💵Talk💸Help out at CCI! 23:29, 16 February 2020 (UTC)[reply]

My concern is this: if an editor made an edit to a non-copyvio section of the article, but the revdel hid it as well, isn't this a violation? As in, while the author is still attributed, it is no longer possible to see what they have contributed, even through we can be 100% certain they neither introduced the copyvio, nor changed it. --Piotr Konieczny aka Prokonsul Piotrus| reply here 10:40, 17 February 2020 (UTC)[reply]

That isn't a violation of the Creative Commons licence. The licence only requires that you credit the contributors of a work, it doesn't require you to specify exactly what each of them contributed. You can satisfy the Creative Commons attribution requirements just by providing a list of people who have edited the page. Hut 8.5 22:19, 17 February 2020 (UTC)[reply]

 You are invited to join the discussion at WP:ELN#Citation link question. -- Marchjuly (talk) 21:57, 24 March 2020 (UTC)Template:Z48[reply]

This is a page that I've created and am still working on. I don't think there's a copyright problem (and if there is, it's easy to fix), but in case someone raises the issue, I'd like to be able to point them here so they can see what you guys think. Bottom line: William T. Stearn was one of the preeminent botanists of the 20th century. He worked primarily as a botanist rather than as a linguist. Stearn's botanical names (U–Z) includes a list of species and genus names that come from Stearn's book (with a lot of curation ... well over half of the words are discarded). But this is really a glossary of Latin and Greek terms, and the definitions come from some authorities older than Stearn (who he borrowed directly from) and from some younger than Stearn (who built on and updated his work). In every scholarly field, researchers tend to look to previous giants of the field, and then frame their own work as building on the previous work ... botany is no different, so it's not really possible to put together any kind of list like this one without acknowledging that recent authorities rely heavily and specifically on Stearn. So ... I don't think there's a copyright issue per se, but of course, I'm not interested in doing the minimum to avoid problems, I want the list to be the best it can be. Any suggestions would be welcome. - Dank (push to talk) 12:47, 6 April 2020 (UTC) P.S. I just got a question about this at WT:PLANTS, so I'll add one more point: I started out creating the list according to what appeared in any two of my four modern sources, with a few extra rules for what to discard. When I changed the rule to "Stearn plus any other source", the new list was nearly identical. That shows the esteem the more recent sources have for Stearn, and indicates that in some ways, this curated list is largely their curation, and not mine. - Dank (push to talk) 19:03, 7 April 2020 (UTC)[reply]

  • I don't think you have any copyright issues as it stands. As you said, this is basically a list of greek/latin terms translated. There may have been creative reasons why these terms were chosen for the thing they name, but that's a degree or two of separation removed from here. And maybe one or two of the terms could be a "stretch" in the translation given, but taken in context to the character of the page as a whole that would be de minimis. CrowCaw 20:17, 7 April 2020 (UTC)[reply]
  • Thanks Crow. (Btw, the rule I eventually settled on for species is "Stearn plus another two sources".) - Dank (push to talk) 14:31, 10 April 2020 (UTC)[reply]

Loss of public domain

This [[1]] needs more expert eyes.Slatersteven (talk) 17:37, 18 June 2020 (UTC)[reply]

Attribution in a case of deleted history from a cross-wiki translation

I need assistance or advice in a case of providing required attribution in a case of translation from es-wiki to en-wiki, where the es-wiki article has now been deleted. There are two questions inherent in this, and one follow-up:

  1. given the conditions described below, what must one do to provide proper attribution for this article at en-wiki?
  2. what do the licensing requirements imply about "maintaining history" (not deleting the source article) when the source and target of copied content are on two different wikipedias?
  3. follow-up: shouldn't the {{translated page}} template contain advice about not deleting the source article, as the {{copied}} template already does?

A specific case of es→en translation with deleted source

It is my belief that the en-wiki article Travesti (gender identity) is a translation in whole or in part, from es:Travesti (identidad de género), now deleted. The English article was created in 2007, and has undergone expansion lately by Bleff [noping]. I'm not able to compare the two texts, and I can't tell when the es-wiki article was created because it was A5ed (duplicate article of low quality) on 15 September 2020 by Spanish user Marcelo (talk · contribs). As an additional wrinkle, it appears that Bleff may have cached a copy of the now deleted es-wiki article at User:Bleff/sandbox3 (perma), but one can't be sure of its provenance.

My understanding of WP:COPYRIGHT and WP:CWW tells me that an attribution statement such as the model proposed at WP:TFOLWP is required when content is translated from a foreign-language enyclopedia. I've messaged Bleff at their Talk page about the general attribution requirements. But I don't know what to tell them, given that the boilerplate translation attributions states in part, ...see that article's history for attribution but the es-wiki article is gone, so there is no history anymore.

Does notification about non-deletion apply cross-wiki?

I'm aware in cases of same-wiki copying, that the source article should carry the {{Copied}} template, to let editors at the source article know not to delete the article, because of the history required for the attribution at the copy-destination article. But what about in the case of cross-wiki copying or translating? Es-wiki gets to make it's own policies, that I know, but don't licensing requirements transcend policy and guidelines, and become a requirement across all Wikipedias? Doesn't this imply, that the source article on es-wiki must be kept, or at least, its history must be?

Regarding this second question, my instinct is that we should undo the A5 at es-wiki and restore the article. However, they find it to be of low quality, and a duplicate; so turning it into a redirect to the existing article, should solve our concerns, and theirs. Accordingly, I've left this discussion at Marcelo's talk page on es-wiki, asking that he do this.

Feedback needed for this case

Can I get comments on this approach for this particular case? Also, doesn't this imply that editors who translate content from other Wikipedias, in addition to adding the attribution described at WP:TFOLWP, should *also* be adding the equivalent of our "copied" or "translated" template to the source article at the other language Wikipedia?

Possible template changes needed in the general case

If so, I can beef up the /doc page at Template:Translated page/doc to mention the fact that 89 languages have their own version of Translated page template, and that the source article should have one added. I can probably also alter the English {{Translated page}} template, to add some boilerplate text that actually provides copy-pastable code suitable for adding to the other-language Wikipedia that would generate their version of the template. Or is maintenance of the foreign source's history not needed, in a case of cross-wiki copying or translation?

And as a corollary of that, shouldn't our {{translated page}} contain the same boilerplate about keeping the source page around, that the {{copied}} template already contains? Thanks, Mathglot (talk) 00:28, 25 September 2020 (UTC)[reply]

Strictly yes, if the page originated as a translation from a deleted page on another Wikipedia then we should add attribution for it. We can do this without the deleted page existing or being restored, as it can be done by making a dummy edit with the names of the authors, or posting the deleted page's edit history on a subpage of the talk page and linking to it WP:PATT. So you could go to an admin on the Spanish Wikipedia and get the history of the deleted page for this to be done. Hut 8.5 06:49, 25 September 2020 (UTC)[reply]
The history necessary parts of the history are at [2]. — JJMC89(T·C) 07:50, 25 September 2020 (UTC)[reply]
@JJMC89:, very nice, thanks for that. Here it is again in jsonfm format. Can I just port that result to Talk:Travesti (gender identity), and add a dummy edit to the article pointing to it as sufficient WP:RIA? Mathglot (talk) 09:30, 25 September 2020 (UTC)[reply]
Yes that would do, but all you actually need is the names of the editors and there's only two distinct editors in that list, so you could just name them in your dummy edit and skip the subpage entirely. Hut 8.5 12:19, 25 September 2020 (UTC)[reply]
Ah, good point. Thanks to all for this point. Mathglot (talk) 20:44, 25 September 2020 (UTC)[reply]

@Hut 8.5 and JJMC89: Am still interested in the general question concerning the templates: {{translated page}} doesn't have anything to say about "retaining the source page" the way {{copied}} does. Please see this proposed change (diff) at Template:Translated page/sandbox. Seems like that boilerplate should be there. I there any reason I shouldn't make that change to the template? Mathglot (talk) 21:25, 25 September 2020 (UTC)[reply]