= Veo (text-to-video model) =

Infobox
- Screenshot: Veo 3 demo Owl and Badger.webm
- Screenshot Size: 300px
- Developer: Google DeepMind
- Genre: Text-to-video model
- Latest Release Version: Veo 3.1

Veo, or Google Veo, is a text-to-video model developed by Google DeepMind and announced in May 2024. As a generative AI model, it creates videos based on user prompts. Veo 3, released in May 2025, can also generate accompanying audio.

==Development==
In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024. Google claimed that it could generate 1080p videos over a minute long. In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation and has an improved understanding of physics. In April 2025, Google announced that Veo 2 became available for advanced users on the Gemini app.

In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals. Google also announced Flow, a video-creation tool powered by Veo and Imagen. Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.

==Capabilities and limitations==

Google Veo can be purchased at multiple subscription tiers and through Google "AI credits". The software itself can be run by two different consoles, Google Gemini and Google Flow. Gemini being geared towards shorter, quicker, and faster projects, using the Gemini AI chat model, with Google Flow, which is essentially a movie editor allowing users to create longer projects with continuity, using the same characters and actors. Users can create a maximum of eight seconds per clip. Additionally, video content can be created using Whisk in the Google Labs platform.

Google Veo has a simple interface and dashboard. However, those who have little to no experience in transcribing or filmmaking may face issues when writing prompts, with the software misunderstanding what the user intended by their prompt. So prompts, which are the forefront of the software, need to be not only clear but also specific. When it comes to human models, Veo is able to generate several ethnicities and body types. The software is also capable of generating stand up comedy routines, music videos, animals, cartoons, and animation. Prompts need places, people, and things in each scene, in addition knowledge of film and camera lingo such as panning, zooming, and terms for camera angles.

Veo, however, has strict guidelines and blockades to their software. Before a clip is generated, the algorithm computer software reviews it, and if it is

- inappropriate
- too graphically sexual
- illegal
- showcasing graphic abuse, assault, or fighting (unless the prompt specifies that it is a fictitious martial arts scene etc.), gross behaviors
- antisemitic
- racist
- homophobic
- depicting current regimes, rioting, blood, gore, or warfare, (unless in some cases the prompt specifies that it is fictitious period drama)

the clip will not be generated. In addition, Google Veo cannot and will not generate character actors that look identical to celebrities or real-life individuals. Users have primarily complained that, regardless of how descriptive and detailed their prompts are, Google Veo often misunderstands the input, resulting in completely different outputs. Common issues include the emulation of incorrect subtitles and captions, the generation of complex scenes that are incomplete due to the maximum length, the production of garbled and nonsensical speech, and character models that appear deformed in both appearance and movement. Users have also reported that their prompts and generated content are falsely flagged as violating guidelines, along with a variety of other issues and complaints. However, trial and error may have to be used with Veo for optimal results.

==Reactions==
A reporter for Gizmodo reacted to the release of Veo 3 by observing that users were directing the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products. Another media commentator reported that the tool tended to repeat the same joke in response to different prompts.

Commentators speculated that Google had trained the service on YouTube videos or Reddit posts. Google itself had not stated the source of its training content.

In July 2025, Media Matters for America reported that racist and antisemitic videos generated using Veo 3 were being uploaded to TikTok. Ryan Whitwam of Ars Technica commented, "In a perfect world, Veo 3 would refuse to create these videos, but vagueness in the prompt and the AI's inability to understand the subtleties of racist tropes (i.e., the use of monkeys instead of humans in some videos) make it easy to skirt the rules."

== See also ==
- Sora (text-to-video model)
- Seedance 2.0
- VideoPoet
- Dream Machine (text-to-video model)
