= Artificial intelligence and copyright =

In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models raised questions about the copyright status of AI-generated works, and about whether copyright infringement occurs when such are trained or used. This includes text-to-image models such as Stable Diffusion and large language models such as ChatGPT. As of 2023, there were several pending U.S. lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under fair use.

Popular deep learning models are trained on mass amounts of media scraped from the Internet, often utilizing copyrighted material. When assembling training data, the sourcing of copyrighted works may infringe on the copyright holder's exclusive right to control reproduction, unless covered by exceptions in relevant copyright laws. Additionally, using a model's outputs might violate copyright, and the model creator could be accused of vicarious liability and held responsible for that copyright infringement.

== Copyright status of AI-generated works ==

Since most legal jurisdictions only grant copyright to original works of authorship by human authors, the definition of originality is central to the copyright status of AI-generated works.

=== United States ===
The Copyright Clause states:The Congress shall have the power ... To promote the Progress of .... useful Arts, by securing for limited Times to Authors ... the exclusive Right to their respective Writings... The US Congress and Federal Courts have interpreted this clause as being limited to works "created by a human being", declining to grant copyright to works generated without human intervention. Some legal professionals have suggested that Naruto v. Slater (2018), in which the U.S. 9th Circuit Court of Appeals held that non-humans cannot be copyright holders of artistic works, could be a potential precedent in copyright litigation over works created by generative AI. Some have suggested that certain AI generations might be copyrightable in the U.S. and similar jurisdictions if it can be shown that the human who ran the AI program exercised sufficient originality in selecting the inputs to the AI or editing the AI's output.

Proponents of this view suggest that an AI model may be viewed as merely a tool (akin to a pen or a camera) used by its human operator to express their creative vision. For example, proponents argue that if the standard of originality can be satisfied by an artist clicking the shutter button on a camera, then perhaps artists using generative AI should get similar deference, especially if they go through multiple rounds of revision to refine their prompts to the AI. Other proponents argue that the Copyright Office is not taking a technology neutral approach to the use of AI or algorithmic tools. For other creative expressions (music, photography, writing) the test is effectively whether there is de minimis, or limited human creativity. For works using AI tools, the Copyright Office has made the test a different one i.e. whether there is no more than de minimis technological involvement.

This difference in approach can be seen in the recent decision in respect of a registration claim by Jason Matthew Allen for his work Théâtre D'opéra Spatial created using Midjourney and an upscaling tool. The Copyright Office stated: The Board finds that the Work contains more than a de minimis amount of content generated by artificial intelligence ("AI"), and this content must therefore be disclaimed in an application for registration. Because Mr. Allen is unwilling to disclaim the AI-generated material, the Work cannot be registered as submitted.
As AI is increasingly used to generate literature, music, and other forms of art, the U.S. Copyright Office has released new guidance emphasizing whether works, including materials generated by artificial intelligence, exhibit a 'mechanical reproduction' nature or are the 'manifestation of the author's own creative conception'. The U.S. Copyright Office published a Rule in March 2023 on a range of issues related to the use of AI, where they stated:

...because the Office receives roughly half a million applications for registration each year, it sees new trends in registration activity that may require modifying or expanding the information required to be disclosed on an application.

One such recent development is the use of sophisticated artificial intelligence ("AI") technologies capable of producing expressive material. These technologies "train" on vast quantities of preexisting human-authored works and use inferences from that training to generate new content. Some systems operate in response to a user's textual instruction, called a "prompt."

The resulting output may be textual, visual, or audio, and is determined by the AI based on its design and the material it has been trained on. These technologies, often described as "generative AI," raise questions about whether the material they produce is protected by copyright, whether works consisting of both human-authored and AI-generated material may be registered, and what information should be provided to the Office by applicants seeking to register them.

The Copyright Office further clarified in January 2025 that AI-assisted works where the creative expression of the human remains evident in the work can be copyrighted, which can include creative adaption of prompts for AI generators or usage of AI to assist in the creation process of a work, such as in filmmaking. Works "where the expressive elements are determined by a machine" still remain uncopyrightable. Following this guidance, the Copyright Office registered "A Single Piece of American Cheese", the first visual artwork composed solely of AI generated outputs as a composite work in January 2025. The basis for the copyright involved arguing that human-driven selection, arrangement, and coordination involved in the creative process on a single work constituted sufficient human authorship to merit the copyright.

Both the federal and circuit courts in the District of Columbia have upheld the Copyright Office's refusal to register copyrights for works generated solely by machines, establishing that machine ownership would conflict with heritable property rights as establish by the Copyright Act of 1975.

The U.S. Patent and Trademark Office (USPTO) similarly codified restrictions on the patentability of patents credits solely to AI authors in February 2024, following an August 2023 ruling in the case Thaler v. Perlmutter. In this case, the Patent Office denied grant to patents created by Stephen Thaler's AI program, DABUS due to the lack of a "natural person" on the patents' list of inventors. The U.S. Court of Appeals for the Federal Circuit upheld this decision. In the subsequent rule-making, the USPTO allows for human inventors to incorporate the output of artificial intelligence, as long as this method is appropriately documented in the patent application. However, it may become virtually impossible as when the inner workings and the use of AI in inventive transactions are not adequately understood or are largely unknown.

Representative Adam Schiff proposed the Generative AI Copyright Disclosure Act in April 2024. If passed, the bill would require AI companies to submit copyrighted works to the Register of Copyrights before releasing new generative AI systems. These companies would have to file these documents 30 days before publicly showing their AI tools.

=== United Kingdom ===
Other jurisdictions include explicit statutory language related to computer-generated works, including the United Kingdom's Copyright, Designs and Patents Act 1988, which states:

In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

However, the computer generated work law under UK law relates to autonomous creations by computer programs. Individuals using AI tools will usually be the authors of the works assuming they meet the minimum requirements for copyright work. The language used for computer generated work relates, in respect of AI, to the ability of the human programmers to have copyright in the autonomous productions of the AI tools (i.e. where there is no direct human input): In so far as each composite frame is a computer generated work then the arrangements necessary for the creation of the work were undertaken by Mr Jones because he devised the appearance of the various elements of the game and the rules and logic by which each frame is generated and he wrote the relevant computer program. In these circumstances I am satisfied that Mr Jones is the person by whom the arrangements necessary for the creation of the works were undertaken and therefore is deemed to be the author by virtue of s.9(3)

The UK government has consulted on the use of generative tools and AI in respect of intellectual property leading to a proposed specialist Code of Practice: "to provide guidance to support AI firms to access copyrighted work as an input to their models, whilst ensuring there are protections on generated output to support right holders of copyrighted work". In October, 2023, The U.S. Copyright Office published a notice of inquiry and request for comments following its 2023 Registration Guidance.

=== China ===
On November 27, 2023, the Beijing Internet Court issued a decision recognizing copyright in AI-generated images in a litigation.

As noted by a lawyer and AI art creator, the challenge for intellectual property regulators, legislators and the courts is how to protect human creativity in a technologically neutral fashion whilst considering the risks of automated AI factories. AI tools have the ability to autonomously create a range of material that is potentially subject to copyright (music, blogs, poetry, images, and technical papers) or other intellectual property rights (patents and design rights).

== Training AI with copyrighted data ==
Deep learning models source large data sets from the Internet such as publicly available images and the text of web pages. The text and images are then converted into numeric formats the AI can analyze. A deep learning model identifies patterns linking the encoded text and image data and learns which text concepts correspond to elements in images. Through repetitive testing, the model refines its accuracy by matching images to text descriptions. The trained model undergoes validation to evaluate its skill in generating or manipulating new images using only the text prompts provided after the training process. When assembling these training datasets involves making copies of copyrighted works, this has raised the question of whether this process infringes the copyright holder's exclusive right to make reproductions of their works, or if it falls use fair use allowances.

===United States===
U.S. machine learning developers have traditionally believed this to be allowable under fair use because using copyrighted work is transformative, and limited. The situation has been compared to Google Books's scanning of copyrighted books in Authors Guild, Inc. v. Google, Inc., which was ultimately found to be fair use, because the scanned content was not made publicly available, and the use was non-expressive.

Timothy B. Lee, in Ars Technica, argues that if the plaintiffs succeed, this may shift the balance of power in favour of large corporations such as Google, Microsoft, and Meta which can afford to license large amounts of training data from copyright holders and leverage their proprietary datasets of user-generated data. IP scholars Bryan Casey and Mark Lemley argue in the Texas Law Review that datasets are so large that "there is no plausible option simply to license all [of the data...]. So allowing [any generative training] copyright claim is tantamount to saying, not that copyright owners will get paid, but that the use won't be permitted at all." Other scholars disagree; some predict a similar outcome to the U.S. music licensing procedures.

One of the earliest case to challenge the nature of fair use for training AI was a lawsuit that Thomson Reuters brought against Ross Intelligence first filed in 2020. Thomson Reuters argued that Ross Intelligence had used their Westlaw headnotes, brief summaries of court decisions, to train their AI engine designed to compete with Westlaw. While Thomson Reuters' claims were initially denied by judge Stephanos Bibas of the Third Circuit on the basis that headnotes may not have been copyrightable, Bibas reevaluated his decision in February 2025 and issued a ruling favoring Thomson Reuters, in that headnotes are copyrightable, and that Ross Intelligence, which had since closed down in 2021, had inappropriately used the material. In the case of Ross's AI, the engine was not generative, and produced output that was composed of pieces of Westlaw's material, which aided in Thomson Reuter's claims of reuse, so how the case may apply to generative AI like OpenAI is not clear.

In a consolidated case brought by several authors against Meta and OpenAI, federal district judge Vince Chhabria expressed doubt that the use of unlicensed copyrighted material for training AI would fall under fair use. He stated during court hearings to Meta's lawyers that "You have companies using copyright-protected material to create a product that is capable of producing an infinite number of competing products. You are dramatically changing, you might even say obliterating, the market for that person's work, and you're saying that you don't even have to pay a license to that person. I just don't understand how that can be fair use." Chhabria ruled a summary judgment in Meta's favor in June 2025, but based only on the lack of demonstration of sufficient harm of the output that Meta's LLM could produce, thus finding Meta's use of the authors' works fell within fair use. Chhabria emphasized his ruled did not mean that any use Meta made of copyrighted materials fell within fair use.

In a similar case, several authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, sued Anthropic in August 2024, for using their works to train their AI model. Some of these works had been part of the Pile, intended as a collection of open source and public domain works but at times had included copyrighted works but since removed. Anthropic affirmed it has used the Pile but also had legally bought books and subsequently digitized them for training. Judge William Alsup granted a summary judgment for Anthropic that affirmed that their use of purchased books for training was within fair use, but issues related to using the Pile with unlicensed works was not, and would face a separate trial related to damages. Anthropic offered to settle on the latter matter in August 2025, preparing to pay the authors $1.5 billion, roughly $3000 for the 500,000 authors affected, and would have been the largest payout related to copyright infringement in the United States; Alsup rejected the settlement over shortcomings in the settlement details that would be forced "down the throat of authors".

The U.S. Copyright Office released a report that included review of public comment on matters of AI. Among other topics, the report addressed concerns about fair use of training materials, and considered that two of the fair use factors could be of concern. One factor was the purpose and character of the created work, where the Office directed to the Supreme Court decision in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith that even though transformation had been performed, the works were still ultimately considered derivative works of the original copyright. The other factor was the impact on the commercial market, with the Office suggesting that AI models trained on copyrighted data to produce works in a specific style could have negative market impacts that would weaken the fair use defense.

===EU===
In the EU, such text and data mining (TDM) exceptions form part of the 2019 Directive on Copyright in the Digital Single Market. They are specifically referred to in the EU's AI Act (which came into force in 2024), which "is widely seen as a clear indication of the EU legislator's intention that the exception covers AI data collection", a view that was also endorsed in a 2024 German court decision. Unlike the TDM exception for scientific research, the more general exception covering commercial AI only applies if the copyright holder has not opted out. In order to facilitate the opt-out to the TDM exception, the EU's AI Act of 2024 requires providers of "general-purpose" AI models to implement a policy to comply with EU law (including the TDM exception opt-out) and to publish a detailed summary of training content according to a template provided by the AI Office. These provisions will come into force in August 2025, with further clarification on exactly what will be required to providers of general-purpose AI models expected to come from a Code of Practice to be released in advance of this.

===UK===
Unlike the EU, the United Kingdom prohibits data mining for commercial purposes but has proposed this should be changed to support the development of AI: "For text and data mining, we plan to introduce a new copyright and database exception which allows TDM for any purpose. Rights holders will still have safeguards to protect their content, including a requirement for lawful access."

===India===
Indian copyright law provides fair use exceptions for scientific research, but lacks specific provisions for commercial AI training models. Unlike the EU and UK, India has not established TDM provisions that explicitly address commercial AI systems. This regulatory uncertainty became apparent in 2024 when Asian News International (ANI) sued OpenAI for using its content to train AI models without authorization. While OpenAI offered an opt-out policy that ANI used in October 2024 to block AI scrapers, ANI claimed this measure was ineffective since their content remained available through content syndication. The case also highlighted jurisdictional challenges, as OpenAI argued it was not subject to Indian law because its servers and training operations were located outside the country.

==Copyright infringing AI outputs==

In some cases, deep learning models may replicate items in their training set when generating output. This behaviour is generally considered an undesired overfitting of a model by AI developers, and has in previous generations of AI been considered a manageable problem. Memorization is the emergent phenomenon of LLMs to repeat long strings of training data, and it is no longer related to overfitting. Evaluations of controlled LLM output measure the amount memorized from training data (focused on GPT-2-series models) as variously over 1% for exact duplicates or up to about 7%. This is potentially a security risk and a copyright risk, for both users and providers. As of August 2023, major consumer LLMs have attempted to mitigate these problems, but researchers have still been able to prompt leakage of copyrighted material.

Under U.S. law, to prove that an AI output infringes a copyright, a plaintiff must show the copyrighted work was "actually copied", meaning that the AI generates output which is "substantially similar" to their work, and that the AI had access to their work.

In the course of learning to statistically model the data on which they are trained, deep generative AI models may learn to imitate the distinct style of particular authors in the training set. Since fictional characters enjoy some copyright protection in the U.S. and other jurisdictions, an AI may also produce infringing content in the form of novel works which incorporate fictional characters.

A generative image model such as Stable Diffusion is able to model the stylistic characteristics of an artist like Pablo Picasso (including his particular brush strokes, use of colour, perspective, and so on), and a user can engineer a prompt such as "an astronaut riding a horse, by Picasso" to cause the model to generate a novel image applying the artist's style to an arbitrary subject. However, an artist's overall style is generally not subject to copyright protection. Additional questions related to the copyrightability of style and the output of AI models was raised in March 2025, following an update to ChatGPT's model that was able to produce images strongly resembling the work of Studio Ghibli's artist Hayao Miyazaki. While users initially used it to make "Ghiblification" of popular meme images, further users were found to be distasteful in light of Miyazaki's negative stance on AI, and ChatGPT placed limits on the ability for users to make images in the style of living artists.

==Litigation==
- A November 2022 class action lawsuit against Microsoft, GitHub and OpenAI alleged that GitHub Copilot, an AI-powered code editing tool trained on public GitHub repositories, violated the copyright of the repositories' authors, noting that the tool was able to generate source code which matched its training data verbatim, without providing attribution.
- In January 2023 three US artists—Sarah Andersen, Kelly McKernan, and Karla Ortiz—filed a class action copyright infringement lawsuit against Stability AI, Midjourney, and DeviantArt, claiming that these companies have infringed the rights of millions of artists by training AI tools on five billion images scraped from the web without the consent of the original artists. The plaintiffs' complaint has been criticized for technical inaccuracies, such as incorrectly claiming that "a trained diffusion model can produce a copy of any of its Training Images", and describing Stable Diffusion as "merely a complex collage tool". In addition to copyright infringement, the plaintiffs allege unlawful competition and violation of their right of publicity in relation to AI tools' ability to create works in the style of the plaintiffs en masse. In July 2023, U.S. District Judge William Orrick inclined to dismiss most of the lawsuit filed by Andersen, McKernan, and Ortiz but allowed them to file a new complaint. Judge Orrick later dismissed all but one claim, that of copyright infringement towards Stability AI, in October 2023. However, after refiling on some of the eliminated claims, Orrick agreed in August 2024 to include some of these additional claims against the AI companies, which included both copyright and trademark infringements.
- In January 2023, Stability AI was sued in London by Getty Images for using its images in their training data without purchasing a license.
- Getty filed another suit against Stability AI in a U.S. district court in Delaware in February 2023. The suit again alleges copyright infringement for the use of Getty's images in the training of Stable Diffusion, and further argues that the model infringes Getty's trademark by generating images with Getty's watermark.
- In July 2023, authors Paul Tremblay and Mona Awad filed a lawsuit in a San Francisco court against OpenAI, alleging that its ChatGPT language model had been trained on their copyrighted books without permission, citing ChatGPT's "very accurate" summaries of their works as evidence. Two separate lawsuits were filed by authors Sarah Silverman, Christopher Golden and Richard Kadrey against Meta and OpenAI, arguing that in addition to copyright infringement for training their engines on their works, that products produced from the AI engines were derivative works and also copyright infringements. The two suits against OpenAI were combined (during which Awad left the suit) and by February 2024, Judge Araceli Martínez-Olguín of the Northern District of California threw out all but one claim related to the use of the author's copyrighted works as part of the training data for the AI model.
- The Authors Guild, on behalf of 17 authors, including George R. R. Martin, filed a copyright infringement complaint against OpenAI in September 2023, claiming "the company illegally copied the copyrighted works of authors" in training ChatGPT.
- The New York Times has sued Microsoft and OpenAI in December 2023, claiming that their engines were trained on wholesale articles from the Times, which the Times considers infringement of their copyright. The Times further claimed that fair use claims made by these AI companies were invalid since the generated information around news stories directly competes with the Times and impacts the newspaper's commercial opportunities. In March 2025, the federal district judge denied OpenAI's motion to dismiss the lawsuit, while narrowing the Times claims to those related to copyright infringement in training OpenAI's models.
- Eight U.S. national newspapers owned by Tribune Publishing sued Microsoft and OpenAI in April 2024 over copyright infringement related to the use of their news articles for training data, as well as for output that creates false and misleading statements that are attributed to the newspapers.
- The Recording Industry Association of America (RIAA) and several major music labels sued the developers of Suno AI and Udio, AI models that can take text input to create songs with both lyrics and backing music, in separate lawsuits in June 2024, alleging that both AI models were trained without consent with music from the labels.
- In September 2024, the Regional Court of Hamburg dismissed a German photographer's lawsuit against the non-profit organization LAION for unauthorized reproduction of his copyrighted work while creating a dataset for AI training. The decision was described as a "landmark ruling on TDM exceptions for AI training data" in Germany and EU more generally. The plaintiff has filed an appeal against the decision.
- Indian news agency ANI sued OpenAI before the Delhi High Court in India. The suit claims that OpenAI's ChatGPT reproduces ANI's copyrighted news content without authorization, amounting to copyright infringement and unauthorized use of proprietary journalistic material.
- Several Canadian news agencies under News Media Canada sued OpenAI in November 2024 for copyright violations related to the use of their news articles being used to train ChatGPT. They are seeking damages up to per news article used for training.
- The German collecting society GEMA has sued OpenAI in November 2024 for copyright infringement by reproducing copyrighted song texts. And the same society has sued Suno Inc. in January 2025 for reproducing copyrighted works. Both lawsuits were filed in Munich.
- Midjourney was sued by Disney and NBCUniversal in June 2025 on claims the AI engine, described in the lawsuit as a "bottomless pit of plagiarism", was trained on copyrighted works from both companies without permission, including depictions of their characters. Warner Bros. filed a similar suit in September 2025 against Midjourney.
- In September 2025, Disney, NBCUniversal, and Warner Bros. Discovery filed a similar lawsuit against the AI image and video generation service Hailuo AI (operated by Chinese technology company MiniMax), alleging that it profits from the unlicensed use of "iconic copyrighted characters."
- Apple Inc. was sued in September 2025 by authors Grady Hendrix and Jennifer Roberson, who claimed that Apple trained their Apple Intelligence model from a portion of the Pile dataset that included unlicensed copies of their work.
- Encyclopædia Britannica, Inc. sued Perplexity AI search engine in September 2025, claiming that the results pulled material from their encyclopedia and Merriam-Webster dictionary website, denying them visits by users, without permission or compensation, as well as generated false results due to the nature of AI that is then attributed to their sites.
- The German performance rights organization, GEMA, sued OpenAI in the Munich Regional Court for using their clients' lyrics without license. In November 2025 Gema won the suit after showing lyrics used to train ChatGPT were returned almost identically as results by the AI when requested. OpenAI had argued the texts were newly generated. OpenAI was ordered to cease saving these lyrics and cease returning them in their AI services. Additionally, OpenAI was required to pay damages and provide usage data of the lyrics in question and revenue generated from them.
