Wikipedia:Large language models: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:47, 22 March 2023

The following is a draft working towards a proposal for adoption as a Wikipedia policy.
The proposal must not be taken to represent consensus, but is still in development and under discussion, and has not yet reached the process of gathering consensus for adoption. Thus references or links to this page should not describe it as policy, guideline, nor yet even as a proposal.

Shortcut

WP:LLM

This page in a nutshell: The use of Large Language Models to aid editing is not prohibited but their output must be rigorously scrutinized. Their use to generate or modify text must be declared in the edit summary and in-text attribution provided in articles and drafts. Because of the many risks and pitfalls associated with them, only editors with substantial prior experience in the intended task should use them.

Large language models (LLMs) are computer programs for natural language processing that use deep learning and neural networks, like GPT-3. This policy covers how LLMs may and may not be used on Wikipedia to generate new or modify existing text. Potential problems include that generated contents are biased, non-verifiable, constitute original research, or violate copyrights. Because of this, LLMs should only be used for tasks in which the editor has substantial experience and their outputs must be rigorously scrutinized for compliance with all applicable policies. Furthermore, LLM use to generate or modify text must be declared in the edit summary and in-text attribution is required for non-trivial changes to articles and drafts. Editors retain full responsibility for LLM-assisted edits.

Relevant policies and associated risks

Shortcut

WP:AIFAIL

“

Large language models have limited reliability, limited understanding, limited range, and hence need human supervision.

”

— Michael Osborne, Professor of Machine Learning in the Dept. of Engineering Science, University of Oxford,
January 25, 2023^[1]

The use of LLMs to produce encyclopedic content on Wikipedia is associated with various risks. This clarifies key policies as they pertain to LLM application on the project, i.e. how the latter generally presents an issue with respect to the former. Note that this policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that, due to technological advances, it automatically complies with Wikipedia policies and guidelines.

Copyrights
Further: Wikipedia:Large language models and copyright
An LLM can generate copyright-violating material.^[a] Generated text may include verbatim non-free content or be a derivative work. In addition, using LLMs to summarize copyrighted content (like news articles) may produce excessively close paraphrases. The copyright status of LLMs trained on copyrighted material is not yet fully understood and their output may not be compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia.
Verifiability
LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing. They generate text by outputting the words most likely to come after the previous ones. If asked to write an article on the benefits of eating crushed glass, they will sometimes do so. LLMs can completely make things up, including generating citations to non-existent sources (both literature and URLs). When they generate citations, those may be inappropriate or fictitious and can include unreliable sources such as Wikipedia itself.
Neutral point of view
LLM may produce content that is neutral-seeming in tone, but not necessarily in substance. This concern is especially strong for biographies of living persons.
No original research
While LLMs may give accurate answers in response to some questions, they may also generate interpretations that are biased or false, sometimes in subtle ways. Asking them about obscure subjects, complicated questions, or telling them to do tasks which they are not suited to (i.e. tasks which require extensive knowledge or analysis) makes these errors much more likely. The aforementioned making things up is termed hallucination in ML practice, and it can easily be mistaken for a true opinion and/or a piece of original research.

Using LLMs

Any specific application of LLMs to generate or modify text is only tolerated, not recommended. They should only be used for tasks in which the editor has substantial experience. Editors take full responsibility for their edits' compliance with Wikipedia policies.

Writing articles

Large language models can be used to copy edit or expand existing text, to generate ideas for new or existing articles, or to create new content. Output from a large language model can contain inaccurate information, and thus must be carefully evaluated for consistency with appropriate reliable sources. You must become familiar with relevant sources for the content in question, so that you can review the output text for its verifiability, neutrality, absence of original research, compliance with copyright, and compliance with all other applicable policies. Compliance with copyright includes respecting the copyright licensing policies of all sources, as well as the AI-provider. As part of providing a neutral point of view, you must not give undue prominence to irrelevant details or minority viewpoints. You must verify that any citations in the output text exist and screen them for relevance.

Drafts

If an LLM is used to create the initial version of a draft or userspace draft, the user that created the draft must bring it into compliance with all applicable Wikipedia policies, add reliable sourcing, and rigorously check the draft's accuracy prior to submitting the draft for review. If such a draft is submitted for review without having been brought into compliance, it should be declined. Repeated submissions of unaltered (or insufficiently altered) LLM outputs may lead to a revocation of draft privileges.

Using sources with LLM-generated text

All sources used for writing an article must be reliable, as described at Wikipedia:Verifiability § Reliable sources. Before using any source written by a large language model, you must verify that the content was evaluated for accuracy.

Talk pages

While you may include an LLM's raw output in your talk page comments for the purposes of discussion, you should not use LLMs to "argue your case for you" in talk page discussions. Communication among human editors is at the root of core Wikipedia processes like building and reaching consensus, and it is presumed that editors contributing to the English-language Wikipedia possess the ability to communicate with other editors in edit summaries and talk pages.

Be constructive

Wikipedia relies on volunteer efforts to review new content for compliance with our core content policies. This is often time consuming. The informal social contract on Wikipedia is that editors will put significant effort into their contributions, so that other editors do not need to "clean up after them". Editors must ensure that their LLM-assisted edits are a net positive to the encyclopedia, and do not increase the maintenance burden on other volunteers.

Do not, under any circumstances, use LLMs to generate hoaxes or disinformation. This includes knowingly adding false information to test our ability to detect and remove it.

Wikipedia is not a testing ground for LLM development, for example, by running experiments or trials on Wikipedia for this sole purpose. Edits to Wikipedia are made to advance the encyclopedia, not a technology. This is not meant to prohibit editors from responsibly experimenting with LLMs in their userspace for the purposes of improving Wikipedia.

Repeated misuse of LLMs form a pattern of disruptive editing, and may lead to a block or ban.

Declare LLM use

Every edit which incorporates LLM output must be marked as LLM-assisted in the edit summary. This applies to all namespaces. For content added to articles and drafts, in-text attribution is necessary. If an LLM by OpenAI was used, this can be achieved by adding the following template to the bottom of the article: {{OpenAI|[GPT-3, ChatGPT etc.]}}. Additionally, the template {{AI generated notification}} may be added to the talk page of the article.

Experience is required

LLMs are assistive tools, and cannot replace human judgment. Careful judgment is needed to determine whether such tools fit a given purpose. Editors using LLMs are expected to familiarize themselves with a given LLM's inherent limitations and then must overcome these limitations, to ensure that their edits comply with relevant guidelines and policies. To this end, prior to using an LLM, editors should have gained substantial experience doing the same or a more advanced task without LLM assistance.^[b]

Inexperienced editors should be especially careful when using these tools; if needed, do not hesitate to ask for help at the Wikipedia:Teahouse.

Editors should have enough familiarity with the subject matter to recognize when an LLM is providing false information – if an LLM is asked to paraphrase something (i.e. source material or existing article content), editors should not assume that it will retain the meaning.

Experience is required not just in relation to Wikipedia practices but also concerning the proper usage of LLMs. For example, this applies to the issue of how to formulate the good prompts.

High-speed editing

Human editors are expected to pay attention to the edits they make, and ensure that they do not sacrifice quality in the pursuit of speed or quantity. For the purpose of dispute resolution, it is irrelevant whether high-speed or large-scale edits that a) are contrary to consensus or b) cause errors an attentive human would not make are actually being performed by a bot, by a human assisted by a script, or even by a human without any programmatic assistance. No matter the method, the disruptive editing must stop or the user may end up blocked. However, merely editing quickly, particularly for a short time, is not by itself disruptive. Consequently, if you are using LLMs to edit Wikipedia, you must do so in a manner that complies with Wikipedia:Bot policy, specifically WP:MEATBOT.

Handling suspected LLM-generated content

Identification and tagging

Editors who identify LLM-originated content that does not to comply with our core content policies should consider placing {{AI-generated|date=August 2024}} at the top of the affected article or draft, unless they are capable of immediately resolving the identified issues themselves.

This template should not be used in biographies of living persons. In BLPs, such non-compliant content should be removed immediately and without waiting for discussion.

Verification

All known or suspected LLM output must be checked for accuracy and is assumed to be fabricated until proven otherwise. LLM models are known to falsify sources such as books, journal articles and web URLs, so be sure to first check that the referenced work actually exists. All factual claims must then be verified against the provided sources. LLM-originated content that is contentious or fails verification must be removed immediately.

Deletion

If removal as described above would result in deletion of the entire contents of the article, it then becomes a candidate for deletion. If the entire article appears to be factually incorrect or relies on fabricated sources, speedy deletion via WP:G3 (Pure vandalism and blatant hoaxes) may be appropriate.

^ This also applies to cases in which the AI model is in a jurisdiction where works generated solely by AI is not copyrightable.
^ e.g. someone skilled at dealing with vandalism but doing very little article work is probably not someone who should start creating articles using LLMs, before they have gained actual experience at article creation without the assistance of these models; the same logic applies to creating modules, templates, using talk pages etc.

References

^ Smith, Adam (2023-01-25). "What is ChatGPT? And will it steal our jobs?". www.context.news. Thomson Reuters Foundation. Retrieved 2023-01-27.

[2] This also applies to cases in which the AI model is in a jurisdiction where works generated solely by AI is not copyrightable.

[3] .g. someone skilled at dealing with vandalism but doing very little article work is probably not someone who should start creating articles using LLMs, before they have gained actual experience at article creation without the assistance of these models; the same logic applies to creating modules, templates, using talk pages etc.

[1] Smith, Adam (2023-01-25). "What is ChatGPT? And will it steal our jobs?". www.context.news. Thomson Reuters Foundation. Retrieved 2023-01-27.

[1]

[a]

[b]

@@ Line 1: / Line 1: @@
 {{draft proposal|WP:LLM|type=policy}}
-{{nutshell|[[Large language model]]s must be used responsibly. Their outputs must be rigorously scrutinized, and their use must be disclosed in the [[WP:Edit summary|edit summary]].}}
+{{nutshell|The use of Large Language Models to aid editing is not prohibited but their output must be rigorously scrutinized. Their use to generate or modify text must be declared in the edit summary and in-text attribution provided in articles and drafts. Because of the many risks and pitfalls associated with them, only editors with substantial prior experience in the intended task should use them.}}
+[[Large language models]] (LLMs) are computer programs for natural language processing that use deep learning and neural networks, like [[GPT-3]]. This policy covers how LLMs may and may not be used on Wikipedia to generate new or modify existing text. Potential problems include that generated contents are [[Wikipedia:NPOV|biased]], [[Wikipedia:V|non-verifiable]], constitute [[Wikipedia:OR|original research]], or violate [[Wikipedia:Copyrights|copyrights]]. Because of this, LLMs should only be used for tasks in which the editor has substantial experience and their outputs must be {{strong|rigorously scrutinized}} for compliance with all applicable policies. Furthermore, LLM use to generate or modify text must be declared in the [[Help:Edit summary|edit summary]] and [[WP:INTEXT|in-text attribution]] is required for non-trivial changes to articles and drafts. Editors retain full responsibility for LLM-assisted edits.
+== Relevant policies and associated risks ==
+{{shortcut|WP:AIFAIL}}
 {{Rquote |1=right |2=Large language models have limited reliability, limited understanding, limited range, and hence need human supervision. |3=Michael Osborne, Professor of Machine Learning in the Dept. of Engineering Science, [[University of Oxford]]|4=<br />''January 25, 2023''<ref>{{Cite web |last=Smith |first=Adam |url=https://www.context.news/ai/what-is-chatgpt-and-will-it-steal-our-jobs |title=What is ChatGPT? And will it steal our jobs? |date=2023-01-25 |access-date=2023-01-27 |website=www.context.news |publisher=[[Thomson Reuters Foundation]]}}</ref>}}
+<!-- list of policies -->
-'''Large language models''' (LLMs) can generate text that goes against our policies of [[Wikipedia:Verifiability|verifiability]], [[Wikipedia:Neutral point of view|neutrality]], [[Wikipedia:No original research|avoidance of original research]], or that violates [[Wikipedia:Copyrights|copyrights]].
+The use of LLMs to produce encyclopedic content on Wikipedia is associated with various risks. This clarifies key policies as they pertain to LLM application on the project, i.e. how the latter generally presents an issue with respect to the former. Note that this policy applies to all usages of LLMs independently of whether a provider or user of an LLM claims that, due to technological advances, it automatically complies with Wikipedia policies and guidelines.
+{{indented plainlist|indent=1.65em|
+*'''[[Wikipedia:Copyrights|{{tooltip|Copyrights|If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license.}}]]'''{{br}}{{in5}}''Further: [[Wikipedia:Large language models and copyright]]''{{br}}<onlyinclude>'''An LLM can generate copyright-violating material.'''{{efn|This also applies to cases in which the AI model is in a jurisdiction where works generated solely by AI is not copyrightable.}} Generated text may include verbatim [[Wikipedia:Non-free content|non-free content]] or be a [[WP:DERIVATIVE|derivative work]]. In addition, using LLMs to summarize copyrighted content (like news articles) may produce [[Wikipedia:Close paraphrasing|excessively close paraphrases]]. The copyright status of LLMs trained on copyrighted material is not yet fully understood and their output may not be compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia.</onlyinclude>
+*'''[[Wikipedia:Verifiability|{{tooltip|Verifiability|Readers must be able to check that any of the information within Wikipedia articles is not just made up. This means all material must be attributable to <!--[[Wikipedia:Verifiability#Reliable sources|-->reliable, published sources<!--]]-->. Additionally, quotations and any material challenged or likely to be challenged must be supported by <!--[[Wikipedia:Inline citation|-->inline citations<!--]]-->.}}]]'''{{br}}LLMs do not follow Wikipedia's policies on verifiability and [[Wikipedia:Reliable sources|reliable sourcing]]. They generate text by outputting the words most likely to come after the previous ones. If asked to write an article on the benefits of eating crushed glass, they will sometimes do so. '''LLMs can completely [[Hallucination (machine learning)|make things up]], including generating citations to non-existent sources''' (both literature and URLs). When they generate citations, those may be inappropriate or [[Wikipedia:Fictitious references|fictitious]] and can include [[WP:QUESTIONABLE|unreliable source]]s such as [[WP:CIRCULAR|Wikipedia itself]].
+*'''[[Wikipedia:Neutral point of view|{{tooltip|Neutral point of view|Articles must not take sides, but should explain the sides, fairly and without editorial bias. This applies to both what you say and how you say it.}}]]'''{{br}}LLM may produce content that is neutral-seeming in tone, but not necessarily in substance. This concern is especially strong for [[Wikipedia:Biographies of living persons|biographies of living persons]].
+*'''[[Wikipedia:No original research|{{tooltip|No original research|Wikipedia articles must not contain original research – i.e. facts, allegations, and ideas for which no reliable, published sources exist. This includes any analysis or synthesis of published material that <!--[[Wikipedia:No original research#Synthesis of published material|-->serves to reach or imply a conclusion not stated by the sources<!--]]-->. To demonstrate that you are not adding original research, you must be able to cite reliable, published sources that are *directly related* to the topic of the article and *directly support* the material being presented.}}]]'''{{br}}While LLMs may give accurate answers in response to some questions, they may also generate interpretations that are biased or false, sometimes in subtle ways. Asking them about obscure subjects, complicated questions, or telling them to do tasks which they are not suited to (i.e. tasks which require extensive knowledge or analysis) makes these errors much more likely. The aforementioned making things up is termed hallucination in ML practice, and it can easily be mistaken for a true opinion and/or a piece of original research.
+}}
+== Using LLMs ==
-Editors are expected to use LLMs {{strong|responsibly}}, and must rigorously scrutinize their outputs for compliance with our [[Wikipedia:Core content policies|content policies]]. Editors must disclose any LLM use in the [[Help:Edit summary|edit summary]].
+Any specific application of LLMs to generate or modify text is only tolerated, not recommended. They should only be used for tasks in which the editor has substantial experience. Editors take full responsibility for their edits' compliance with Wikipedia policies.
+===Writing articles===
-== Responsible use ==
+Large language models can be used to copy edit or expand existing text, to generate ideas for new or existing articles, or to create new content. Output from a large language model can contain [[Hallucination (artificial intelligence)|inaccurate information]], and thus must be carefully evaluated for consistency with appropriate reliable sources. You must become familiar with relevant sources for the content in question, so that you can review the output text for its [[Wikipedia:Verifiability|verifiability]], [[Wikipedia:Neutral point of view|neutrality]], absence of [[Wikipedia:No original research|original research]], compliance with [[Wikipedia:Copyrights|copyright]], and compliance with all other applicable policies. Compliance with copyright includes respecting the copyright licensing policies of all sources, as well as the AI-provider. As part of providing a neutral point of view, you must not give [[WP:UNDUE|undue prominence]] to irrelevant details or minority viewpoints. You must verify that any citations in the output text exist and screen them for relevance.
+==== Drafts ====
-# LLMs can [[Hallucination (artificial intelligence)|make things up]], produce [[Algorithmic bias|biased outputs]], include [[Wikipedia:Large_language_models_and_copyright|copyrighted content]], or generate fake or [[Wikipedia:Reliable sources|unreliable]] citations. Use LLMs with the greatest care and attention to our [[Wikipedia:Core content policies|core content policies]]. Never paste  LLM outputs directly into Wikipedia. You must rigorously scrutinize all your LLM-assisted edits before hitting "Publish".
+If an LLM is used to create the initial version of a [[Wikipedia:Drafts|draft]] or [[Help:Userspace draft|userspace draft]], the user that created the draft must bring it into compliance with all applicable Wikipedia policies, add reliable sourcing, and rigorously check the draft's accuracy prior to submitting the draft for review. If such a draft is [[Wikipedia:Articles for creation|submitted for review]] without having been brought into compliance, it should be declined. Repeated submissions of unaltered (or insufficiently altered) LLM outputs may lead to a revocation of draft privileges.
-# Every edit which incorporates LLM output must be marked as LLM-assisted in the [[Help:Edit summary|edit summary]]. This applies to all [[Wikipedia:Namespaces|namespaces]].
-# LLMs are not [[Wikipedia:Verifiability#Reliable sources|reliable sources]]. Unless their outputs were published by reliable outlets with rigorous oversight, they should not be cited in our articles.
+===Using sources with LLM-generated text===
-# Wikipedia [[Wikipedia:NOTLAB|is not a testing ground]] for large language models. The use of Wikipedia for experiments or trials is forbidden.
+All sources used for writing an article must be reliable, as described at {{section link|Wikipedia:Verifiability|Reliable sources}}. Before using any source written by a large language model, you must verify that the content was evaluated for accuracy.
+=== Talk pages ===
+While you may include an LLM's raw output in your talk page comments for the purposes of discussion, you should not use LLMs to "argue your case for you" in talk page discussions. Communication among human editors is at the root of core Wikipedia processes like [[WP:CON|building and reaching consensus]], and [[WP:CIR|it is presumed]] that editors contributing to the English-language Wikipedia possess the ability to communicate with other editors in edit summaries and talk pages.
+=== Be constructive ===
+Wikipedia relies on volunteer efforts to review new content for compliance with our [[Wikipedia:Core content policies|core content policies]]. This is often time consuming. The informal social contract on Wikipedia is that editors will put significant effort into their contributions, so that other editors do not need to "clean up after them". Editors must ensure that their LLM-assisted edits are a net positive to the encyclopedia, and do not increase the maintenance burden on other volunteers.
+Do not, under any circumstances, use LLMs to generate [[Wikipedia:Do not create hoaxes|hoaxes]] or disinformation. This includes knowingly adding false information to test our ability to detect and remove it.
+Wikipedia [[Wikipedia:NOTLAB|is not a testing ground]] for LLM development, for example, by running experiments or trials on Wikipedia for this sole purpose. Edits to Wikipedia are made to advance the encyclopedia, not a technology. This is not meant to prohibit {{em|editors}} from responsibly experimenting with LLMs in their userspace for the purposes of improving Wikipedia.
+Repeated misuse of LLMs form a pattern of [[Wikipedia:Disruptive editing|disruptive editing]], and may lead to a [[Wikipedia:Blocking policy|block]] or [[Wikipedia:Banning policy|ban]].
+=== Declare LLM use ===
+Every edit which incorporates LLM output must be marked as LLM-assisted in the [[Help:Edit summary|edit summary]]. This applies to all [[Wikipedia:Namespaces|namespaces]]. For content added to articles and drafts, [[WP:INTEXT|in-text attribution]] is necessary. If an LLM by OpenAI was used, this can be achieved by adding the following template to the bottom of the article: {{tlx|OpenAI|{{font color|darkgrey|''[GPT-3, ChatGPT etc.]''}}}}. Additionally, the template {{tlx|AI generated notification}} may be added to the talk page of the article.
+=== Experience is required ===
+LLMs are assistive tools, and cannot replace human judgment. Careful judgment is needed to determine whether such tools fit a given purpose. Editors using LLMs are expected to {{em|familiarize themselves}} with a given LLM's inherent limitations and then must {{em|overcome}} these limitations, to ensure that their edits comply with relevant guidelines and policies. To this end, prior to using an LLM, editors should have gained substantial experience doing the same or a more advanced task {{em|without LLM assistance}}.{{efn|e.g. someone skilled at dealing with vandalism but doing very little article work is probably not someone who should start creating articles using LLMs, before they have gained actual experience at article creation without the assistance of these models; the same logic applies to creating modules, templates, using talk pages etc.}}
+Inexperienced editors should be especially careful when using these tools; if needed, do not hesitate to ask for help at the [[Wikipedia:Teahouse]].
+Editors should have enough familiarity with the subject matter to recognize when an LLM is providing false information – if an LLM is asked to paraphrase something (i.e. source material or existing article content), editors should not assume that it will retain the meaning.
+Experience is required not just in relation to Wikipedia practices but also concerning the proper usage of LLMs. For example, this applies to the issue of how to [[Prompt engineering|formulate the good prompts]].
+=== High-speed editing ===
+Human editors are expected to pay attention to the edits they make, and ensure that they do not sacrifice quality in the pursuit of speed or quantity. For the purpose of dispute resolution, it is irrelevant whether high-speed or large-scale edits that a) are contrary to consensus or b) cause errors an attentive human would not make are actually being performed by a bot, by a human assisted by a script, or even by a human without any programmatic assistance. No matter the method, the disruptive editing must stop or the user may end up blocked. However, merely editing quickly, particularly for a short time, is not by itself disruptive. Consequently, if you are using LLMs to edit Wikipedia, you must do so in a manner that complies with [[Wikipedia:Bot policy]], specifically [[WP:MEATBOT]].
+== Handling suspected LLM-generated content ==
+=== Identification and tagging ===
+Editors who identify LLM-originated content that does not to comply with our [[Wikipedia:Core content policies|core content policies]] should consider placing {{Tlx|AI-generated|date{{=}}{{CURRENTMONTHNAME}} {{CURRENTYEAR}}}} at the top of the affected article or draft, unless they are capable of immediately resolving the identified issues themselves.
+This template should not be used in [[Wikipedia:Biographies of living persons|biographies of living persons]]. In BLPs, such non-compliant content should be {{strong|removed immediately and without waiting for discussion}}.
+=== Verification ===
+All known or suspected LLM output {{strong|must}} be checked for accuracy and is assumed to be fabricated until proven otherwise. LLM models are known to falsify sources such as books, journal articles and web URLs, so be sure to first check that the referenced work actually exists. All factual claims must then be verified against the provided sources. LLM-originated content that is contentious or fails verification must be removed immediately.
+=== Deletion ===
+If removal as described above would result in deletion of the entire contents of the article, it then becomes a candidate for deletion. If the entire article appears to be factually incorrect or relies on fabricated sources, speedy deletion via [[WP:G3]] (Pure vandalism and blatant hoaxes) may be appropriate.
+<!-- REMOVED, MAY NEED TO WORK OUT DRAFTIFICATION: Any LLM output that is unsourced or fails verification must be removed or moved to draft space immediately. -->
 == See also ==
@@ Line 28: / Line 84: @@
 * [[User:DraconicDark/ChatGPT]] (lead expansion)
 * [[Wikipedia:Using neural network language models on Wikipedia/Transcripts]]
+=== Related articles ===
+{{further|Template:Natural language processing{{!}}Natural language processing navbox|Template:Differentiable computing{{!}}Differentiable computing navbox}}
+* [[Machine learning]]
+* [[Artificial neural network]]
+* [[Human-in-the-loop]]
 == Notes ==