Model Collapse – Why AI should not be trained with AI content (2024)

AI here, AI there. Google has released new devices with AI functions and Apple will soon be distributing its first operating systems with "Apple Intelligence". ChatGPT, DALL-E, Gemini, Llama, MidJourney and Stable Diffusion are also widely used. But this is precisely where the problem lies for the further development of the underlying models, such as the Large Language Models (LLMs) of chatbots. If these are trained with too much AI-generated content, so-called "model collapse" can occur. At least that is the conclusion of a study by the University of Oxford, which published in the journal Nature wurde.

Mac tip: Find and delete large files with Daisy Disk (advertising)

Model Collapse – Why AI should not be trained with AI content (1)

Chapter in this post:

  • 1 AI failure: What is the so-called “model collapse”?
  • 2 How does AI model collapse occur?
  • 3 What is the danger of current AI training?
  • 4 Is there a solution to the problem of AI model collapse?
  • 6 Similar posts

AI failure: What is the so-called “model collapse”?

Chatbots like ChatGPT are trained with millions of texts and billions of words to be able to recognize connections and give relevant answers to questions. The situation is similar with AIs that generate images and are fed with a huge number of photos, works of art, sketches and the like. However, the study linked above shows that AI models produce increasingly poor results the more their generations are trained with the output of their previous generations.

As an example, a chat about historical architecture is shown, which after only a few AI-trained generations of the chatbot led to incomprehensible answers mentioning different types of rabbits. The situation is similar with AI models that generate images. As early as 2023 Another study showedthat imaging AI models sometimes produce highly distorted results even after training with the smallest amounts of their own images. AI model collapse describes the output of incomprehensible or unrecognizable results despite the input of understandable questions or tasks.

How does AI model collapse occur?

The current study in the Nature magazine shows the reasons for the incomprehensible and distorted output of the AIs. According to it, probable answers and sentence elements are given higher priority with each generation, while less likely content, phrases and words fall further and further behind and are ultimately forgotten. After several generations of AI models trained with previous answers, the answer contains completely false assumptions, repeated words and the like. Or in short: the AI ​​"poisons" its own reality.

If the original training material for generation 0 already contains some errors, which may even be repeated and thus appear to be important, then these will be reinforced more and more in generations n. If the AI ​​generations are trained with data from generation n-1, at some point the only possible answer will be the error. If there is then a grammatical collapse, repetitions arise, as in the chat example of the study, where there is only "[...] In addition to being home to some of the world's largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, […]" is called.

What is the danger with current AI training?

Training material is slowly running out for both chatbots and AI models that generate images or videos. The (freely available) web has already been almost completely drained for the largest companies and their models. Whether OpenAI, Google, Meta or others - the endless hunger for additional texts and data sets can only be satisfied in two ways: using every bit of brand new content directly as training material, and using "synthetic data".

But as more and more AI content floods the web, AI models will sooner rather than later be fed their own outputs. And "synthetic data" is AI-generated data sets that are created specifically for training new models. In addition to the risk of accidental "poisoning" (transferring the term "poisoning" from the study), AI is already being deliberately trained with the content of previous models - for example at Google and Meta. Because real material is slowly running out and high-quality sources (newspapers, etc.) want to see money for their texts.

Is there a solution to the problem of AI model collapse?

At present, the extent of the spending distortion shown in the study is still purely theoretical. The web is not yet filled with so much AI content that human-made content is in a vanishing minority. AI is therefore not trained primarily or exclusively on the basis of AI content. Nevertheless, precautions must be taken here, ideally with the cooperation of AI companies with sources of high-quality, human-made data sets. OpenAI is already working with Springer-Verlag and News Corp.. However, this costs several million dollars.

Another solution proposed by researchers at Oxford University is a joint agreement between AI companies to clarify questions of origin regarding existing data sets. It should therefore be checked whether sources may come from this or that AI, so that they can be marked accordingly and, if necessary, removed from the data set for training. But this would require automation and thus another (testing) AI. Because manual checking cannot satisfy the data hunger of AI training.

My tips & tricks about technology & Apple

Related Articles

  • GSM mobile phone with rotary dial “Macintosh Phone 128k” in classifieds
  • Authy hack: 33 million phone numbers stolen – security flaw in API uncovered
  • Study: Malware can use ChatGPT as an accomplice for its own optimization and distribution
  • Nine-year-old Apple TV is compatible with tvOS 18!
  • Cara vs. Instagram – that’s why many artists are switching to Cara.App
  • M4 iPad Pro camera: What is the new sensor for?
  • iFixit teardown: 13-inch iPad Pro and Apple Pencil Pro taken apart

Model Collapse – Why AI should not be trained with AI content (10)

Johannes Domke

After graduating from high school, Johannes completed an apprenticeship as a business assistant specializing in foreign languages. But then he decided to research and write, which resulted in his independence. For several years he has been working for Sir Apfelot, among others. His articles include product introductions, news, manuals, video games, consoles, and more. He follows Apple keynotes live via stream.

Did you like the article and did the instructions on the blog help you? Then I would be happy if you the blog via a Steady Membership would support.

Model Collapse – Why AI should not be trained with AI content (2024)
Top Articles
▷ James Patterson - Tous les livres de l'auteur (liste complète)
Breaking Free: Unshackling the Muslim Mind from Secular Thought and Reestablishing the Islamic Personality | Yaqeen Institute for Islamic Research
Palm Coast Permits Online
Using GPT for translation: How to get the best outcomes
Robot or human?
Workday Latech Edu
Craigslist Free Stuff Appleton Wisconsin
Urinevlekken verwijderen: De meest effectieve methoden - Puurlv
Mercy MyPay (Online Pay Stubs) / mercy-mypay-online-pay-stubs.pdf / PDF4PRO
What’s the Difference Between Cash Flow and Profit?
Sarpian Cat
More Apt To Complain Crossword
Craigslist Cars Nwi
Craigslist Pets Sac
Wgu Admissions Login
Craigslist Deming
Hair Love Salon Bradley Beach
Connect U Of M Dearborn
Video shows two planes collide while taxiing at airport | CNN
Weepinbell Gen 3 Learnset
Loft Stores Near Me
Drift Boss 911
Wsop Hunters Club
Craigslist St. Cloud Minnesota
Www.craigslist.com Austin Tx
City Of Durham Recycling Schedule
Meet the Characters of Disney’s ‘Moana’
Meijer Deli Trays Brochure
Orange Park Dog Racing Results
101 Lewman Way Jeffersonville In
Pay Stub Portal
Blush Bootcamp Olathe
Grove City Craigslist Pets
Ripsi Terzian Instagram
Fandango Pocatello
Mg Char Grill
Deleted app while troubleshooting recent outage, can I get my devices back?
Song That Goes Yeah Yeah Yeah Yeah Sounds Like Mgmt
Hair Love Salon Bradley Beach
Review: T-Mobile's Unlimited 4G voor Thuis | Consumentenbond
Indio Mall Eye Doctor
Weather Underground Corvallis
Deepwoken: How To Unlock All Fighting Styles Guide - Item Level Gaming
Sechrest Davis Funeral Home High Point Nc
Avatar: The Way Of Water Showtimes Near Jasper 8 Theatres
Cch Staffnet
Tropical Smoothie Address
Mejores páginas para ver deportes gratis y online - VidaBytes
Shannon Sharpe Pointing Gif
Oak Hill, Blue Owl Lead Record Finastra Private Credit Loan
Craigslist Cars For Sale By Owner Memphis Tn
Bluebird Valuation Appraiser Login
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6098

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.