Jump to content

LAION

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Citation bot (talk | contribs) at 04:50, 25 July 2023 (Alter: template type. Add: eprint, class. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

LAION
Company typeNon-profit
IndustryArtificial intelligence
Founder
  • Christoph Schuhmann
  • Jenia Jitsev
  • Richard Vencu
  • Robert Kaczmarczyk
  • Theo Coombes
  • Mehdi Cherti
  • Aarush Katta
  • Jan Ebert
Websitelaion.ai Edit this on Wikidata

LAION (acronym for Large-scale Artificial Intelligence Open Network) is a German non-profit which makes open-sourced artificial intelligence models and datasets.[1] It is best known for releasing a number of large datasets of images and captions scraped from the web which have been used to train a number of high-profile text-to-image models, including Stable Diffusion and Imagen.[2][3]

In February 2023, LAION was named in the Getty Images lawsuit against Stable Diffusion as a non-party.[4] In April 2023, LAION was directly sued by a German photographer who wanted to have his images removed from the training set.[5]

On April 15, 2023, LAION and contributors released to public an open source AI assistant chatbot OpenAssistant.

Image datasets

LAION has publicly released a number of large datasets of image-caption pairs which have been widely used by AI researchers. The data is derived from the Common Crawl, a dataset of scraped web pages. The developers searched the crawled html for <img> tags and treated their alt attributes as captions. They used CLIP to identify and discard images whose content did not appear to match their captions.[6] LAION does not host the content of scraped images themselves; rather, the dataset contains URLs pointing to images, which researchers must download themselves.[7]

The first such dataset, LAION-400M, was released in August 2021 and consisted of 400 million image-caption pairs. The pairs were extracted from a random subset of webpages scraped by Common Crawl between 2014 and 2021.[8] It was an attempt to recreate the process used by OpenAI to collect the 400 million image-caption pairs they used to train the CLIP model - the company had chosen to open-source the model's code and weights, but not its training dataset.[6] Imagen, a text-to-image model announced by Google Brain in 2022, was trained on LAION-400M in combination with private internal datasets.[9]

A successor of more than 5 billion pairs, LAION-5B, was released in March 2022.[10] As of its release, it was the largest freely available dataset of image-caption pairs in existence.[6] Its creation was funded by Doodlebot, Hugging Face and Stability AI, the AI company behind the funding of the Stable Diffusion text-to-image model, which was trained on it.[11]

OpenAssistant

OpenAssistant
Developer(s)LAION and contributors
Initial release15 April 2023; 19 months ago (2023-04-15)
Type
LicenseApache License 2.0
Websiteopen-assistant.io

OpenAssistant is an artificial intelligence (AI) open source chat-based assistant that understands tasks, can interact with third-party systems and retrieve information dynamically to do so. The project is developed by a group of volunteers in collaboration with LAION. One of the goals for development includes free access to large language models that can be run locally on consumer hardware.[12][13] The project is backed by a worldwide crowdsourcing effort involving over 13,500 volunteers who have created 600k human-generated data points.[13][14]

References

  1. ^ "About". LAION.ai. Retrieved 26 September 2022.
  2. ^ Edwards, Benj (15 September 2022). "Have AI image generators assimilated your art? New tool lets you check". Ars Technica.
  3. ^ Newman, Marissa; Cantrill, Aggi (24 April 2023). "The Future of AI Relies on a High School Teacher's Free Database". Bloomberg News. Retrieved 24 April 2023.
  4. ^ "Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135". CourtListener. Retrieved 2023-02-08.
  5. ^ "A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead". Vice. Retrieved 2023-05-04.
  6. ^ a b c Alford, Anthony (17 May 2022). "LAION Releases Five Billion Image-Text Pair Dataset LAION-5B". InfoQ.
  7. ^ Edwards, Benj (21 September 2022). "Artist finds private medical record photos in popular AI training data set". Ars Technica.
  8. ^ Schuhmann, Christoph (8 August 2021). "LAION-400-Million Open Dataset". LAION blog. Retrieved 26 September 2022.
  9. ^ Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Kamyar Seyed Ghasemipour, Seyed; Karagol Ayan, Burcu; Sara Mahdavi, S.; Gontijo Lopes, Rapha; Salimans, Tim; Ho, Jonathan; J Fleet, David; Norouzi, Mohammad (23 May 2022). "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding". arXiv:2205.11487 [cs.CV].
  10. ^ Beaumont, Romain (3 March 2022). "LAION-5B: A New Era of Open Large-Scale Multi-Modal Datasets". LAION blog.
  11. ^ Wiggers, Kyle (12 August 2022). "This startup is setting a DALL-E 2-like AI free, consequences be damned". TechCrunch.
  12. ^ Open-Assistant, LAION AI, 2023-03-09, retrieved 2023-03-09
  13. ^ a b Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations -- Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
  14. ^ "Open Assistant: Explore the Possibilities of Open and Collaborative Chatbot Development". KDnuggets. Retrieved 2023-05-05.