Google DeepMind develops V2A that creates sound for AI videos

Google DeepMind develops V2A that creates sound for AI videos

full version at cryptopolitan

DeepMind, Google’s AI research lab has revealed that it is working on an AI tool known as V2A, which can create sound and dialogue for AI-generated videos. V2A, which stands for video-to-audio uses text-based prompts and video pixels to generate dialogue, music, and sound effects for videos.

Also read: Google DeepMind Introduces TacticAI: Revolutionizing Soccer Tactics

According to DeepMind, the sound effects and music generated match the intended tone and characters of the video. DeepMind further explained that the technology could help bring more life to AI-generated videos.

V2A matches audio to video scenes

While audio-generating technology is nothing new, DeepMind claims its V2A tool is the first of its kind, which automatically matches audio to video.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” wrote DeepMind in a blog post.

“By training on video, audio, and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts.”

DeepMind.

The company added its technology is automatic as opposed to time-consuming manual alignment which requires adjusting sounds, videos, and timings.

According to DeepMind, the V2A tool can be used to generate an unlimited number of soundtracks for any video output. A positive prompt can be “defined to guide the generated output toward desired sounds, or a negative prompt to guide it away from undesired sounds.”

“This flexibility gives users more control over V2A’s output, making it possible to rapidly experiment with different audio outputs and choose the best match,” said the company.

Deepmind unmoved by competition

The latest innovation comes as DeepMind wants to consolidate its dominance in the industry. Earlier this year, UK AI voice generator firm ElevenLabs achieved a milestone after its series B funding round generated $80 million, valuing the company over $1 billion, according to Verdict.

The company says its users have generated over 100 years of audio. It also claims its audio software is currently used by 41% of Fortune 500 companies.

Also read: Google DeepMind vs. OpenAI: The Race in AI Video Generation Heats Up

Despite this competition, DeepMind has indicated that they are not in a haste to release the technology to the public.

“Before we consider opening access to the wider public, our V2A technology will undergo rigorous safety assessments and testing,” said the company.

DeepMind also indicated that V2A is pairable with video generation models like Veo, which helps create realistic sound effects.


Cryptopolitan reporting by Enacy Mapakame

Recent Crypto News

Defunct FTX and Alameda banned from crypto trading in $12.7 billion CFTC settlement
Gmx Launches SHIB Perpetual Futures Market on Arbitrum
XRP Surges 19% After Partial Victory for Ripple, but the Case May Not Be Over
XRP and Shiba Inu (SHIB) Show Unusual Correlations Amid Market Rebound
Vitalik Buterin Introduces Massive Ethereum Update: Details
Investors Show Continued Interest in Bitcoin ETFs

Recent conversions

1.35 ETH to NOK 100 BITS to EUR 35 ETH to NOK 01 BTC to USD 6.94 BTC to NZD 1 BHD to BTC 0.13 ETH to EUR 1.2 ETH to BTC 0.014 ETH to CHF 50 SLP to NZD 1300000 COP to AUD