AI models spew ‘gibberish’ when trained on AI-generated data

Large language models (LLMs) trained on previous iterations of AI-generated material produce outputs that lack substance and nuance, a new study has found. The findings present a new challenge for AI developers, who rely on limited human-generated data sets for content.

Also read: AI deepfakes are making it hard for US authorities to protect children – report

Artificial intelligence researchers from the University of Cambridge and Oxford University in the United Kingdom tried to write prompts relying on a dataset comprising only AI-generated content. The outcome was not ideal, as it produced incomprehensible responses.

AI still needs humans to make sense

One of the paper’s authors, Zhakar Shumaylov from the University of Cambridge said there is a need for quality control in the data that feeds LLMs, the technology behind generative AI chatbots like ChatGPT and Google’s Gemini. Shumaylov said:

“The message is we have to be very careful about what ends up in our training data. [Otherwise,] things will always, provably, go wrong”.

The phenomenon is known as “model collapse,” Shumaylov detailed. It has been proven to affect all kinds of artificial intelligence models including those that specialize in image generation using text prompts.

According to the study, repeat text prompts using AI-generated data on one model ended up generating gibberish. For example, researchers found that one system tested with text about the UK’s medieval Church towers produced a repetitive list of jackrabbits after only nine generations.

Commenting on the outputs, University of California computer scientist, Hany Farid, likened the data collapse to the challenges endemic to animal in-breeding.

“If a species inbreeds with their own offspring and doesn’t diversify their gene pool, it can lead to a collapse of the species,” Farid said.

When the researchers infused human-generated data into the AI data, the collapse happened more slowly than when it was running on purely AI-generated content.

Researchers: AI could worsen biases against minority groups

Language models work by building up associations between tokens — words or word parts — in huge swathes of text, often scraped from the Internet. They generate text by spitting out the statistically most probable next word, based on these learned patterns.

Also read: ChatGPT-powered teddy bear Poe reads bedtime stories to kids

The study, which was published in the journal Nature on July 24, showed that information mentioned a few times in data sets is likely not to be repeated. Researchers worry this could negatively impact already marginalized minority groups.

To avert the model collapse in real-life use cases, the study suggested watermarking AI-generated content and human-generated content. But this could also be problematic due to a lack of coordination between rival AI companies, it said.

The study’s findings come at a time when there is increased debate on whether AI will result in the total exclusion of humans in the creation of content, including the writing of novels and newspaper articles.

Titled, ‘AI models collapse when trained on recursively generated data’, the study’s outcomes put that debate to rest – humans aren’t being removed from the equation just yet.

by cryptopolitan

Recent Crypto News

New AI Crypto Projects to Watch for 100x ROI: AI Firm Network3 Raises $5.5M and AI Crypto Gem WienerAI Surpasses $7M Goal

Added 12 minutes, 48 seconds ago by zycrypto

Coinbase Adds OpenAI Exec, Former Clinton Aide to Board of Directors

Interest in Web3 Crypto Reaches New Heights as Web3 AI Platform Assisterr Raises $1.7M and Play2Earn Crypto PlayDoge Nears $6M Goal

Added 12 minutes, 54 seconds ago by zycrypto

AI models spew ‘gibberish’ when trained on AI-generated data

AI still needs humans to make sense

Researchers: AI could worsen biases against minority groups

Recent Crypto News

New AI Crypto Projects to Watch for 100x ROI: AI Firm Network3 Raises $5.5M and AI Crypto Gem WienerAI Surpasses $7M Goal

Coinbase Adds OpenAI Exec, Former Clinton Aide to Board of Directors

Ethereum Banner Raised Outside NYSE, Industry Observer Says Moment 'Might End Up In The History Books'

Investors Drive Crypto Market Surge

Cardano, Chiliz, and Fantom Eye Strong Rally Potential Amid Rising Short Squeeze Likelihood

Interest in Web3 Crypto Reaches New Heights as Web3 AI Platform Assisterr Raises $1.7M and Play2Earn Crypto PlayDoge Nears $6M Goal

Recent conversions