Google DeepMind develops V2A that creates sound for AI videos

DeepMind, Google’s AI research lab has revealed that it is working on an AI tool known as V2A, which can create sound and dialogue for AI-generated videos. V2A, which stands for video-to-audio uses text-based prompts and video pixels to generate dialogue, music, and sound effects for videos.

Also read: Google DeepMind Introduces TacticAI: Revolutionizing Soccer Tactics

According to DeepMind, the sound effects and music generated match the intended tone and characters of the video. DeepMind further explained that the technology could help bring more life to AI-generated videos.

V2A matches audio to video scenes

While audio-generating technology is nothing new, DeepMind claims its V2A tool is the first of its kind, which automatically matches audio to video.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” wrote DeepMind in a blog post.

“By training on video, audio, and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts.”

DeepMind.

The company added its technology is automatic as opposed to time-consuming manual alignment which requires adjusting sounds, videos, and timings.

According to DeepMind, the V2A tool can be used to generate an unlimited number of soundtracks for any video output. A positive prompt can be “defined to guide the generated output toward desired sounds, or a negative prompt to guide it away from undesired sounds.”

“This flexibility gives users more control over V2A’s output, making it possible to rapidly experiment with different audio outputs and choose the best match,” said the company.

We're sharing progress on our video-to-audio (V2A) generative technology. 🎥

It can add sound to silent clips that match the acoustics of the scene, accompany on-screen action, and more.

Here are 4 examples – turn your sound on. 🧵🔊 https://t.co/VHpJ2cBr24 pic.twitter.com/S5m159Ye62

— Google DeepMind (@GoogleDeepMind) June 17, 2024

Deepmind unmoved by competition

The latest innovation comes as DeepMind wants to consolidate its dominance in the industry. Earlier this year, UK AI voice generator firm ElevenLabs achieved a milestone after its series B funding round generated $80 million, valuing the company over $1 billion, according to Verdict.

The company says its users have generated over 100 years of audio. It also claims its audio software is currently used by 41% of Fortune 500 companies.

Also read: Google DeepMind vs. OpenAI: The Race in AI Video Generation Heats Up

Despite this competition, DeepMind has indicated that they are not in a haste to release the technology to the public.

“Before we consider opening access to the wider public, our V2A technology will undergo rigorous safety assessments and testing,” said the company.

DeepMind also indicated that V2A is pairable with video generation models like Veo, which helps create realistic sound effects.

Cryptopolitan reporting by Enacy Mapakame