Jump to content

DALL-E

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 46.13.183.213 (talk) at 08:18, 31 October 2022 (elaborate on dall-e 2 inference). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

DALL-E
Original author(s)OpenAI
Initial releaseJanuary 5, 2021
TypeTransformer language model
Websiteopenai.com/blog/dall-e/
Images produced by DALL-E when given the text prompt "a professional high quality illustration of a giraffe dragon chimera. a giraffe imitating a dragon. a giraffe made of dragon."

DALL-E (stylized as DALL·E) and DALL-E 2 are machine learning models developed by OpenAI to generate digital images from natural language descriptions, called "prompts". DALL-E was revealed by OpenAI in a blog post in January 2021, and uses a version of GPT-3[1] modified to generate images. In April 2022, OpenAI announced DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles".[2]

OpenAI has not released source code for either model, although output from a limited selection of sample prompts is available on OpenAI's website.[1] On 20 July 2022, DALL-E 2 entered into a beta phase with invitations sent to 1 million waitlisted individuals.[3][4] Access had previously been restricted to pre-selected users for a research preview due to concerns about ethics and safety.[5][6] On 28 September 2022, DALL-E 2 was opened to anyone and the waitlist requirement was removed;[7] users can generate a certain number of images for free and may purchase additional ones. Several open-source imitations trained on smaller amounts of data have been released by others.[8][9][10]

The software's name is a portmanteau of the names of animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dalí.[11][1]

Technology

The Generative Pre-trained Transformer (GPT) model was initially developed by OpenAI in 2018,[12] using a Transformer architecture. The first iteration, GPT, was scaled up to produce GPT-2 in 2019;[13] in 2020 it was scaled up again to produce GPT-3, with 175 billion parameters.[14][1][15] DALL-E's model is a multimodal implementation of GPT-3[16] with 12 billion parameters[1] which "swaps text for pixels", trained on text-image pairs from the Internet.[17] DALL-E 2 uses 3.5 billion parameters, a smaller number than its predecessor.[18]

DALL-E was developed and announced to the public in conjunction with CLIP (Contrastive Language-Image Pre-training).[17] CLIP is a separate model based on zero-shot learning that was trained on 400 million pairs of images with text captions scraped from the Internet.[1][17][19] Its role is to "understand and rank" DALL-E's output by predicting which caption from a list of 32,768 captions randomly selected from the dataset (of which one was the correct answer) is most appropriate for an image. This model is used to filter a larger initial list of images generated by DALL-E to select the most appropriate outputs.[11][17]

DALL-E 2 uses a diffusion model conditioned on CLIP image embeddings, which, during inference, are generated from CLIP text embeddings by a prior model.[18]

Capabilities

DALL-E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji.[1] It can "manipulate and rearrange" objects in its images,[1] and can correctly place design elements in novel compositions without explicit instruction. Thom Dunn writing for BoingBoing remarked that "For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL-E often draws the handkerchief, hands, and feet in plausible locations."[20] DALL-E showed the ability to "fill in the blanks" to infer appropriate details without specific prompts such as adding Christmas imagery to prompts commonly associated with the celebration,[21] and appropriately-placed shadows to images that did not mention them.[22] Furthermore, DALL-E exhibits broad understanding of visual and design trends.[citation needed]

DALL-E is able to produce images for a wide variety of arbitrary descriptions from various viewpoints[23] with only rare failures.[11] Mark Riedl, an associate professor at the Georgia Tech School of Interactive Computing, found that DALL-E could blend concepts (described as a key element of human creativity).[24][25]

Its visual reasoning ability is sufficient to solve Raven's Matrices (visual tests often administered to humans to measure intelligence).[26][27]

Ethical concerns

DALL-E 2's reliance on public datasets influences its results and lead to algorithmic bias in some cases such as generating higher numbers of men than women for requests that do not mention gender.[28] DALL-E 2's training data was filtered to remove violent and sexual imagery, but this was found to increase bias in some cases such as reducing the frequency of women being generated.[29] OpenAI hypothesize that this may be because women were more likely to be sexualized in training data which caused the filter to influence results.[29] In September 2022, OpenAI confirmed to The Verge that DALL-E invisibly inserts phrases into user prompts in order to address bias in results; for instance, "black man" and "Asian woman" are inserted into prompts that do not specify gender or race.[30]

A concern about DALL-E 2 and similar image generation models is that they could be used to propagate deepfakes and other forms of misinformation.[31][32] As an attempt to mitigate this, the software rejects prompts involving public figures and uploads containing human faces.[33] Prompts containing potentially objectionable content are blocked, and uploaded images are analyzed to detect offensive material.[34] A disadvantage of prompt-based filtering is that it is easy to bypass using alternative phrases that result in a similar output. For example, the word "blood" is filtered, but "ketchup" and "red liquid" are not.[35][34]

Another concern about DALLE-2 and similar models is that they could cause technological unemployment for artists, photographers, and graphic designers due to their accuracy and popularity. [36][37]

Technical limitations

DALL-E 2's language understanding has limits. It is sometimes unable to distinguish "A yellow book and a red vase" from "A red book and a yellow vase" or "A panda making latte art" from "Latte art of a panda".[38] It generates images of "an astronaut riding a horse" when presented with the prompt "a horse riding an astronaut".[39] It also fails to generate the correct images in a variety of circumstances. Requesting more than 3 objects, negation, numbers, and connected sentences may result in mistakes and object features may appear on the wrong object.[23] Additional limitations include handling text - which, even with legible lettering, almost invariably comes out as dream-esque gibberish - and its limited capacity to address scientific information, such as astronomy or medical imagery.[40]

Reception

Most coverage of DALL-E focuses on a small subset of "surreal"[17] or "quirky"[24] outputs. DALL-E's output for "an illustration of a baby daikon radish in a tutu walking a dog" was mentioned in pieces from Input,[41] NBC,[42] Nature,[43] and other publications.[1][44][45] Its output for "an armchair in the shape of an avocado" was also widely covered.[17][25]

ExtremeTech stated "you can ask DALL-E for a picture of a phone or vacuum cleaner from a specified period of time, and it understands how those objects have changed".[21] Engadget also noted its unusual capacity for "understanding how telephones and other objects change over time".[22]

According to MIT Technology Review, one of OpenAI's objectives was to "give language models a better grasp of the everyday concepts that humans use to make sense of things".[17]

Open-source implementations

There have been several attempts to create open-source implementations of DALL-E.[8][46] Released in 2022 on Hugging Face's Spaces platform, Craiyon (formerly DALL-E Mini until a name change was requested by OpenAI in June 2022) is an AI model based on the original DALL-E that was trained on unfiltered data from the Internet. It attracted substantial media attention in mid-2022 after its release due to its capacity for producing humorous imagery.[47][48][49]

Stable Diffusion is a source code available model similar to DALL-E that was released to the public in August 2022.[50]

See also

References

  1. ^ a b c d e f g h i j Johnson, Khari (5 January 2021). "OpenAI debuts DALL-E for generating images from text". VentureBeat. Archived from the original on 5 January 2021. Retrieved 5 January 2021.
  2. ^ "DALL·E 2". OpenAI. Retrieved 6 July 2022.
  3. ^ "DALL·E Now Available in Beta". OpenAI. 20 July 2022. Retrieved 20 July 2022.
  4. ^ Allyn, Bobby (20 July 2022). "Surreal or too real? Breathtaking AI tool DALL-E takes its images to a bigger stage". NPR. Retrieved 20 July 2022.
  5. ^ "DALL·E Waitlist". labs.openai.com. Retrieved 6 July 2022.
  6. ^ "From Trump Nevermind babies to deep fakes: DALL-E and the ethics of AI art". the Guardian. 18 June 2022. Retrieved 6 July 2022.
  7. ^ "DALL·E Now Available Without Waitlist". OpenAI. 28 September 2022. Retrieved 5 October 2022.
  8. ^ a b c Sahar Mor, Stripe (16 April 2022). "How DALL-E 2 could solve major computer vision challenges". VentureBeat. Archived from the original on 24 May 2022. Retrieved 15 June 2022.
  9. ^ Knight, Will. "Inside DALL-E Mini, the Internet's Favorite AI Meme Machine". Wired. ISSN 1059-1028. Retrieved 6 July 2022.
  10. ^ "Midjourney". Midjourney. Retrieved 20 July 2022.
  11. ^ a b c d Coldewey, Devin (5 January 2021). "OpenAI's DALL-E creates plausible images of literally anything you ask it to". Archived from the original on 6 January 2021. Retrieved 5 January 2021.
  12. ^ a b Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  13. ^ a b Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF). 1 (8). Archived (PDF) from the original on 6 February 2021. Retrieved 19 December 2020. {{cite journal}}: Cite journal requires |journal= (help)
  14. ^ a b Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (22 July 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
  15. ^ a b Ramesh, Aditya; Pavlov, Mikhail; Goh, Gabriel; Gray, Scott; Voss, Chelsea; Radford, Alec; Chen, Mark; Sutskever, Ilya (24 February 2021). "Zero-Shot Text-to-Image Generation". arXiv:2102.12092 [cs.LG].
  16. ^ a b Tamkin, Alex; Brundage, Miles; Clark, Jack; Ganguli, Deep (2021). "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". arXiv:2102.02503 [cs.CL].
  17. ^ a b c d e f g h Heaven, Will Douglas (5 January 2021). "This avocado armchair could be the future of AI". MIT Technology Review. Retrieved 5 January 2021.
  18. ^ a b Ramesh, Aditya; Dhariwal, Prafulla; Nichol, Alex; Chu, Casey; Chen, Mark (12 April 2022). "Hierarchical Text-Conditional Image Generation with CLIP Latents". arXiv:2204.06125. {{cite journal}}: Cite journal requires |journal= (help)
  19. ^ "'DALL-E' AI generates an image out of anything you describe". Engadget. Retrieved 18 July 2022.
  20. ^ a b Dunn, Thom (10 February 2021). "This AI neural network transforms text captions into art, like a jellyfish Pikachu". BoingBoing. Archived from the original on 22 February 2021. Retrieved 2 March 2021.
  21. ^ a b c Whitwam, Ryan (6 January 2021). "OpenAI's 'DALL-E' Generates Images From Text Descriptions". ExtremeTech. Archived from the original on 28 January 2021. Retrieved 2 March 2021.
  22. ^ a b c Dent, Steve (6 January 2021). "OpenAI's DALL-E app generates images from just a description". Engadget. Archived from the original on 27 January 2021. Retrieved 2 March 2021.
  23. ^ a b Marcus, Gary; Davis, Ernest; Aaronson, Scott (2 May 2022). "A very preliminary analysis of DALL-E 2". arXiv:2204.13807 [cs.CV].
  24. ^ a b c Shead, Sam (8 January 2021). "Why everyone is talking about an image generator released by an Elon Musk-backed A.I. lab". CNBC. Retrieved 2 March 2021.
  25. ^ a b c Wakefield, Jane (6 January 2021). "AI draws dog-walking baby radish in a tutu". British Broadcasting Corporation. Archived from the original on 2 March 2021. Retrieved 3 March 2021.
  26. ^ a b Markowitz, Dale (10 January 2021). "Here's how OpenAI's magical DALL-E image generator works". TheNextWeb. Archived from the original on 23 February 2021. Retrieved 2 March 2021.
  27. ^ "DALL·E: Creating Images from Text". OpenAI. 5 January 2021. Retrieved 13 August 2022.
  28. ^ STRICKLAND, ELIZA (14 July 2022). "DALL-E 2's Failures Are the Most Interesting Thing About It". IEEE Spectrum. Retrieved 15 July 2022.
  29. ^ a b "DALL·E 2 Pre-Training Mitigations". OpenAI. 28 June 2022. Retrieved 18 July 2022.
  30. ^ James Vincent (29 September 2022). "OpenAI's image generator DALL-E is available for anyone to use immediately". The Verge.
  31. ^ Taylor, Josh (18 June 2022). "From Trump Nevermind babies to deep fakes: DALL-E and the ethics of AI art". The Guardian. Retrieved 2 August 2022.
  32. ^ Knight, Will (13 July 2022). "When AI Makes Art, Humans Supply the Creative Spark". Wired. Retrieved 2 August 2022.
  33. ^ Rose, Janus (24 June 2022). "DALL-E Is Now Generating Realistic Faces of Fake People". Vice. Retrieved 2 August 2022.
  34. ^ a b OpenAI (19 June 2022). "DALL·E 2 Preview - Risks and Limitations". GitHub. Retrieved 2 August 2022.
  35. ^ Lane, Laura (1 July 2022). "DALL-E, Make Me Another Picasso, Please". The New Yorker. Retrieved 2 August 2022.
  36. ^ Goldman, Sharon (26 July 2022). "OpenAI: Will DALLE-2 kill creative careers?".
  37. ^ Blain, Loz (29 July 2022). "DALL-E 2: A dream tool and an existential threat to visual artists".
  38. ^ Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Ghasemipour, Seyed Kamyar Seyed; Ayan, Burcu Karagol; Mahdavi, S. Sara; Lopes, Rapha Gontijo; Salimans, Tim (23 May 2022). "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding". arXiv:2205.11487 [cs.CV].
  39. ^ Marcus, Gary (28 May 2022). "Horse rides astronaut". The Road to AI We Can Trust. Retrieved 18 June 2022.
  40. ^ Strickland, Eliza (14 July 2022). "DALL-E 2's Failures Are the Most Interesting Thing About It". IEEE Spectrum. Retrieved 16 August 2022.
  41. ^ a b Kasana, Mehreen (7 January 2021). "This AI turns text into surreal, suggestion-driven art". Input. Archived from the original on 29 January 2021. Retrieved 2 March 2021.
  42. ^ a b Ehrenkranz, Melanie (27 January 2021). "Here's DALL-E: An algorithm learned to draw anything you tell it". NBC News. Archived from the original on 20 February 2021. Retrieved 2 March 2021.
  43. ^ a b Stove, Emma (5 February 2021). "Tardigrade circus and a tree of life — January's best science images". Nature. Archived from the original on 8 March 2021. Retrieved 2 March 2021.
  44. ^ a b Knight, Will (26 January 2021). "This AI Could Go From 'Art' to Steering a Self-Driving Car". Wired. Archived from the original on 21 February 2021. Retrieved 2 March 2021.
  45. ^ a b Metz, Rachel (2 February 2021). "A radish in a tutu walking a dog? This AI can draw it really well". CNN. Retrieved 2 March 2021.
  46. ^ jina-ai/dalle-flow, Jina AI, 17 June 2022, retrieved 17 June 2022
  47. ^ a b Carson, Erin (14 June 2022). "Everything to Know About Dall-E Mini, the Mind-Bending AI Art Creator". CNET. Archived from the original on 15 June 2022. Retrieved 15 June 2022.
  48. ^ a b Schroeder, Audra (9 June 2022). "AI program DALL-E mini prompts some truly cursed images". Daily Dot. Archived from the original on 10 June 2022. Retrieved 15 June 2022.
  49. ^ a b Diaz, Ana (15 June 2022). "People are using DALL-E mini to make meme abominations — like pug Pikachu". Polygon. Archived from the original on 15 June 2022. Retrieved 15 June 2022.
  50. ^ Growcoot, Matt (23 August 2022). "Open Source AI Image Generator Stable Diffusion Released to the Public". PetaPixel. Retrieved 5 October 2022.
  51. ^ Nichele, Stefano (2021). "Tim Taylor and Alan Dorin: Rise of the self-replicators—early visions of machines, AI and robots that can reproduce and evolve". Genetic Programming and Evolvable Machines. 22: 141–145. doi:10.1007/s10710-021-09398-5. S2CID 231930573.
  52. ^ Macaulay, Thomas (6 January 2021). "Say hello to OpenAI's DALL-E, a GPT-3-powered bot that creates weird images from text". TheNextWeb. Archived from the original on 28 January 2021. Retrieved 2 March 2021.
  53. ^ Andrei, Mihai (8 January 2021). "This AI module can create stunning images out of any text input". ZME Science. Archived from the original on 29 January 2021. Retrieved 2 March 2021.
  54. ^ Grossman, Gary (16 January 2021). "OpenAI's text-to-image engine, DALL-E, is a powerful visual idea generator". VentureBeat. Archived from the original on 26 February 2021. Retrieved 2 March 2021.
  55. ^ Toews, Rob (18 January 2021). "AI And Creativity: Why OpenAI's Latest Model Matters". Forbes. Archived from the original on 12 February 2021. Retrieved 2 March 2021.
  56. ^ Walsh, Bryan (5 January 2021). "A new AI model draws images from text". Axios. Retrieved 2 March 2021.
  57. ^ "For Its Latest Trick, OpenAI's GPT-3 Generates Images From Text Captions". Synced. 5 January 2021. Archived from the original on 6 January 2021. Retrieved 2 March 2021.