Veo 3 AI Video Model Adds Sound to Generated Clips

Google has launched a major update in the world of generative video: Veo 3, its latest AI video model, now adds synchronized sound effects, ambient audio, and even dialogue to the clips it generates. The announcement came during the Google I/O 2025 developer conference, where Google pitched Veo 3 as a breakthrough in multimodal AI creativity.

Veo 3 is available starting this week to users of Google’s AI Ultra plan ($249.99/month) via the Gemini chatbot app. It can be prompted using either text descriptions or images, allowing users to generate complex scenes complete with visuals and matching sound.

According to Demis Hassabis, CEO of Google DeepMind, this marks a turning point. “For the first time, we’re emerging from the silent era of video generation,” he said. “You can describe characters, environments, even dialogue—and Veo 3 brings it all to life with sound.”

What makes Veo 3 AI video model unique is its ability to understand visual content and generate synchronized audio. While AI-powered audio generation isn’t new, combining it seamlessly with raw video pixels for dynamic, context-aware output is still rare. Veo 3 automatically syncs its generated sounds with movement, setting it apart from competitors like Runway, Genmo, and OpenAI’s Sora.

The tech builds on DeepMind’s earlier video-to-audio research. Last year, the lab unveiled a system trained on a mix of video footage, dialogue transcripts, and sounds to build realistic audio tracks for AI-generated visuals. Though Google hasn’t disclosed Veo 3’s full training dataset, YouTube—a Google property—is likely a key source.

To address concerns around deepfakes and content misuse, Google has embedded its SynthID watermarking technology into every Veo 3 frame. These invisible markers help verify whether a video was AI-generated, offering a layer of transparency as synthetic media grows more realistic.

New Upgrades Also Land for Veo 2

In addition to Veo 3, Google also rolled out enhanced tools for Veo 2. These include the ability to input character images and styles to maintain visual consistency across clips. Veo 2 can now interpret camera movements—like pans, zooms, and dollies—and allows users to add, remove, or extend objects within a scene. One standout feature lets users expand clips from portrait to landscape formats without starting over.

These updates for Veo 2 will be available via Google’s Vertex AI API platform in the coming weeks, opening up more advanced video tools to developers and enterprises.

While Google continues positioning Veo 3 as a powerful creative tool, its launch comes amid increasing tension in the entertainment industry. A 2024 study from the Animation Guild estimates that over 100,000 creative jobs in the U.S. may be impacted by AI by 2026, underscoring how transformative—and disruptive—these tools can be.

Still, Veo 3 represents a clear leap in video AI. If its audio generation quality and scene awareness hold up in real-world use, it could quickly become a go-to platform for creators, studios, and developers alike.

Share with others