Ethics, Info, Tech: Contested Voices, Values, Spaces: robots.txt voluntary norm

Tuesday, November 5, 2024

Penguin Random House books now explicitly say ‘no’ to AI training; The Verge, October 18, 2024

Emma Roth , The Verge; Penguin Random House books now explicitly say ‘no’ to AI training

"Book publisher Penguin Random House is putting its stance on AI training in print. The standard copyright page on both new and reprinted books will now say, “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems,” according to a report from The Bookseller spotted by Gizmodo.

The clause also notes that Penguin Random House “expressly reserves this work from the text and data mining exception” in line with the European Union’s laws. The Bookseller says that Penguin Random House appears to be the first major publisher to account for AI on its copyright page.

What gets printed on that page might be a warning shot, but it also has little to do with actual copyright law. The amended page is sort of like Penguin Random House’s version of a robots.txt file, which websites will sometimes use to ask AI companies and others not to scrape their content. But robots.txt isn’t a legal mechanism; it’s a voluntarily-adopted norm across the web. Copyright protections exist regardless of whether the copyright page is slipped into the front of the book, and fair use and other defenses (if applicable!) also exist even if the rights holder says they do not."