Showing posts with label AI memorization. Show all posts
Showing posts with label AI memorization. Show all posts

Sunday, January 25, 2026

How researchers got AI to quote copyrighted books word for word; Le Monde, January 24, 2026

  , Le Monde; How researchers got AI to quote copyrighted books word for word

"Where does artificial intelligence acquire its knowledge? From an enormous trove of texts used for training. These typically include vast numbers of articles from Wikipedia, but also a wide range of other writings, such as the massive Books3 dataset, which aggregates nearly 200,000 books without the authors' permission. Some proponents of conversational AI present these training datasets as a form of "universal knowledge" that transcends copyright law, adding that, protected or not, AIs do not memorize these works verbatim and only store fragmented information.

This argument has been challenged by a series of studies, the latest of which, published in early January by researchers at Stanford University and Yale University, is particularly revealing. Ahmed Ahmed and his coauthors managed to prompt four mainstream AI programs, disconnected from the internet to ensure no new information was retrieved, to recite entire pages from books."

Friday, January 16, 2026

AI’S MEMORIZATION CRISIS: Large language models don’t “learn”—they copy. And that could change everything for the tech industry.; The Atlantic, January 9, 2026

Alex Reisner, The Atlantic; AI’S MEMORIZATION CRISISLarge language models don’t “learn”—they copy. And that could change everything for the tech industry

"On tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books."