Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Saturday, April 25, 2026

The World’s First Museum of A.I. Art Will Open in Los Angeles as the Art World Ponders Questions of Ethics and Sustainability; Smithsonian Magazine, April 24, 2026

Michele Debczak, Smithsonian Magazine ; The World’s First Museum of A.I. Art Will Open in Los Angeles as the Art World Ponders Questions of Ethics and Sustainability

"The four-block strip that houses such Los Angeles institutions as the Walt Disney Concert Hall, the Broad and the Museum of Contemporary Art will get a different type of cultural attraction this summer. Dataland, billed as the world’s first museum dedicated to A.I.-generated art, is set to open on June 20.

The brainchild of digital artists Refik Anadol and Efsun Erkiliç, Dataland will anchor the Grand LA complex, designed by architect Frank Gehry, in downtown Los Angeles. The privately funded museum covers 35,000 square feet, 10,000 of which are reserved for the technology required to support the exhibitions. Rather than traditional halls displaying individual artworks, Dataland’s five galleries and 30-foot ceiling are designed for total immersion.

“It’s very exciting to say that A.I. art is not image only,” Anadol tells Jessica Gelt for the Los Angeles Times. “It’s a very multisensory, multimedium experience—meaning sound, image, video, text, smell, taste and touch. They are all together in conversation.”

The museum’s inaugural exhibition, called “Machine Dreams: Rainforest,” was inspired by a trip to the Amazon. Anadol’s studio created an open-access A.I. model called the Large Nature Model, fed it millions of images of nature, and then prompted the machine to “learn and play with the intelligent behaviors of the natural world,” Richard Whiddington writes for Artnet. The result, as Anadol puts it per the Times, is a “a living museum” where visitors can walk among “digital sculptures.” In addition to a kaleidoscope of imagery, museum guests will be immersed in soundscapes, woven from audio that includes oral histories of the Yawanawá people of Brazil and the last recorded call of the extinct Kaua‘i ‘ō‘ō bird of Hawaii, Léa Zeitoun reports for Designboom."

Thursday, April 23, 2026

Anthropic seeks pivotal court win in music publisher lawsuit over AI training; Reuters, April 21, 2026

 , Reuters; Anthropic seeks pivotal court win in music publisher lawsuit over AI training

"Artificial intelligence company Anthropic has asked a California federal court to ​rule in its favor in a copyright lawsuit brought by music publishers Universal Music Group, Concord ‌and ABKCO, arguing it made "fair use" of their song lyrics to train its AI-powered chatbot Claude.

Anthropic's Monday filing addresses the key question for a wave of high-stakes copyright cases brought by creators against tech companies: is it legally permissible to copy millions of copyrighted works ​without permission to train AI models?...

The lawsuit ​is one of dozens of disputes between copyright owners such as authors and news outlets, and tech giants ​including OpenAI, Microsoft and Meta Platforms over the training of their AI systems. Amazon- and Google-backed Anthropic was the first major AI ‌company ⁠to settle one of the cases, agreeing last yearto pay a group of authors $1.5 billion to resolve a class-action lawsuit."

Wednesday, April 22, 2026

Authors Guild Addresses Publishers’ AI Use; Publishers Weekly, April 21, 2026

Sam Spratford , Publishers Weekly; Authors Guild Addresses Publishers’ AI Use

"The Authors Guild has released a statement criticizing publishing professionals’ use of AI tools following a report first published in the Bookseller that some editors have been uploading authors’ personal information, including manuscripts, into consumer-facing LLMs like ChatGPT.

“Uploading or inputting a copyrighted work or an author’s personal information into AI systems without permission may constitute a violation of the author’s copyright or right of privacy, and it puts the author’s intellectual property and personal information at risk,” the statement read. “Editors, agents, and others in the industry who have access to authors’ works should not upload any manuscript to or otherwise prompt consumer-facing chatbots with any author’s works without first getting the author’s written permission.”"

Sunday, April 19, 2026

Thousands of authors seek share of Anthropic copyright settlement; Reuters, April 17, 2026

  , Reuters; Thousands of authors seek share of Anthropic copyright settlement

"Nearly 120,000 authors and other copyright holders are seeking a share of a $1.5 billion class-action settlement with Anthropic over the company's unauthorized use of their books in artificial-intelligence training, according to a ​filing in California federal court.

Claims have been filed for 91% of the more than 480,000 ‌works covered by the settlement, according to a court filing  in the case on Thursday.

A judge will consider whether to grant final approval to the settlement – the largest ever in a U.S. copyright case – at a hearing next month.

Anthropic was the first and ​remains the only major AI company to settle a U.S. class-action by copyright holders alleging AI ​platforms used their work without permission to train their systems."

Sunday, April 12, 2026

Is AI the greatest art heist in history?; The Guardian, April 12, 2026

, The Guardian; Is AI the greatest art heist in history? 

New technologies of reproduction are plundering the art world – and getting away with it

"In 2026, its easy to see why generative AI is bad. The internet has nicknamed its excretions “slop”. The CEOs of AI companies prance about on stage like supervillains, bragging that their products will eliminate vast swathes of work. Generative AI requires sacrificing the world’s water to feed its hideous data centres. Around the globe, chatbots induce schizophrenic delusions and urge teens to kill themselves – all while turning users brains to mush.

Who could have predicted this? Artists, that’s who...

When tech boosters want to demonise resistance, they invoke the luddites. By their telling, the luddites were primitive idiots, who smashed machines they were too stupid to understand. History though, tells a different story. As recounted by Brian Merchant’s sublime work Blood in the Machineluddites were skilled artisans, fighting for their way of life against the “satanic mills” – textile sweatshops powered by child semi-slaves. Forbidden from unionising, luddites smashed machines as a protest tactic. And they did not lose to the inevitable march of progress. They lost to physical force. The government called in troops, and the luddites were either executed or shipped to penal colonies in Australia.

Artists too are fighting for a way of life. And if we are too disorganised to triumph, that will be everyone’s loss. AI companies’ inappropriate scraping may have started with the work of illustrators like me, but it has grown to encompass everything else. It extends to the billions of dollars that these companies squander each year, to the carbon they burn, to the rare minerals in their chips, to the land on which their data centres sit, to culture, education, sanity and our very imaginations. In return for the entirety of the human and non-human world, the tech lords can only offer us dystopia. Their fantasy future contains neither meaningful work nor real communities, just robots chattering to each other, leaving nothing for us."

The most 'ethical' AI company might also be the web's biggest freeloader; Business Insider, April 12, 2026

, Business Insider ; The most 'ethical' AI company might also be the web's biggest freeloader

"Cloudflare's latest data offers one of the clearest snapshots yet of how AI companies consume the web, and how little they give back.

The company, which powers roughly 20% of the internet, tracks how AI bots crawl websites versus how often those platforms send users back through referrals. The resulting "crawl-to-refer" ratio is a simple yet telling metric: how much value is extracted compared to returned.

The early April 2026 figures are stark. Anthropic is the worst by a wide margin, with a ratio of 8,800 to 1. That means its bots crawl webpages 8,800 times for every referral sent...

Anthropic's position is particularly striking given its reputation for being "ethical." That reputation has made it a preferred choice among some users who want to support more responsible AI development. This data highlights a different dimension of ethics — how companies interact with the broader web ecosystem that provides information for AI model outputs."

Thursday, April 9, 2026

Judge slams key OpenAI witness in copyright infringement case for ‘hazy recollections’; New York Daily News via Chicago Tribune, April 9, 2026

  , New York Daily News via Chicago Tribune; Judge slams key OpenAI witness in copyright infringement case for ‘hazy recollections’

"An unimpressed Manhattan judge ordered a corporate representative for OpenAI to undergo a second deposition after finding he failed to answer “even the simplest questions” the first time around about what the company has described as efforts to limit chatbots from stealing writers’ work.

​Magistrate Judge Ona Wang, in a sharply-worded 11-page order Tuesday, said OpenAI had been put on notice that the company’s purported expert on plagiarism John Vincent “Vinnie” Monaco was woefully underprepared for his January deposition, ordering him to submit to 3.5 more hours of questioning that took place Wednesday.

​In granting a motion from the Chicago Tribune, New York Times and other news outlets suing OpenAI to compel the additional testimony, Wang deferred ruling on a request for sanctions, saying it would depend on how Monaco fared in his do-over. She said she may issue fines or recommend some of his answers be deemed as admissions.

​OpenAI has previously said that Monaco has more knowledge than any of its engineers about Project Giraffe, an internal operation which the company claims is designed to develop ways to limit its learning language models, or LLMs, from inadvertently regurgitating copyrighted works — the issue at the core of the ongoing Manhattan Federal Court lawsuit."

Monday, April 6, 2026

US music publishers suing Anthropic make their case against AI 'fair use'; Reuters, March 24, 2026

 , Reuters; US music publishers suing Anthropic make their case against AI 'fair use'

"Music publishers Universal Music Group , Concord and ABKCO have asked a judge in California to rule that U.S. copyright law does not insulate artificial intelligence startup Anthropic from ​liability for copying their song lyrics to train its AI-powered chatbot Claude.

The publishers' request , filed on Monday ‌in federal court in San Jose, tees up a critical question in the legal battle between creators and tech companies: Does the doctrine of "fair use" apply to the copying of millions of copyrighted works to train AI models?"

Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude’s Source Code; Futurism, April 3, 2026

  , Futurism; Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude’s Source Code

As the Wall Street Journal reports, Anthropic is scrambling to contain a leak of its Claude Code AI model’s source code by issuing a copyright takedown request for more than 8,000 copies of it — a gallingly ironic stance for the company to be taking, considering how it trained its models in the first place.

The leak isn’t considered to be an outright disaster; no customer data was exposed, Anthropic says, nor were the internal mathematical “weights” that determine how the AI “learns” and which distinguish it from other models. But it did expose the techniques its engineers used to get its AI model to act as an autonomous agent, a form of digital infrastructure coders call a harness, and other tricks for making the AI operate as seamlessly as it does.

Hence Anthropic’s copyright takedown request, which targets the thousands of copies that were shared on GitHub. It later narrowed its request from 8,000 copies to 96 copies, according to the WSJ reporting, claiming that the initial one covered more accounts than intended.

It’s certainly within Anthropic’s right to issue the takedown request, but the hypocrisy of Anthropic running to the law to protect its intellectual property is plain to see, especially for a company that’s relentlessly positioned itself as the ethical adult in the room."

Thursday, April 2, 2026

Anthropic boss makes big call on Australian copyright as artists say pay up; Australian Broadcasting Corporation, April 1, 2026

  Clare Armstrong , Australian Broadcasting Corporation; Anthropic boss makes big call on Australian copyright as artists say pay up

"In short:

Anthropic CEO Dario Amodei has told a Canberra forum AI is moving faster than any technological change before it.

Mr Amodei says he is not trying to change Australia's mind on copyright, is worried about AI in the hands of autocratic countries, and feels a tax on profits is inevitable.

What's next?

The $555 billion company behind AI program Claude is facing pushback from artists over the use of copyrighted material to train its technology."

Tuesday, March 31, 2026

Copyright Law in 2025: Courts begin to draw lines around AI training, piracy, and market harm; Reuters, March 16, 2026

  and  , Reuters; Copyright Law in 2025: Courts begin to draw lines around AI training, piracy, and market harm

"In 2025, U.S. courts issued the first substantive, merits-stage decisions addressing whether the use of copyrighted works to train generative artificial intelligence systems constitutes "fair use." Although these rulings do not settle all open questions — and in some respects highlight emerging judicial disagreements — they represent a significant inflection point in copyright law's response to large language models, image generators, and other foundation models.

Taken together, these cases establish early guideposts for AI developers, publishers, media companies, and enterprises deploying generative AI systems. Below, we summarize the most important copyright ​decisions and pending cases shaping the law in 2025...

Conclusion and recommendations

The ​2025 decisions reflect cautious but meaningful progress in defining how copyright law applies to generative AI. Courts are increasingly receptive to fair use arguments for training on lawfully acquired data, deeply skeptical of speculative market-harm claims, and uniformly intolerant of piracy. At the same time, cases involving direct competition, news content, and human likeness may test the limits of these early rulings."

Monday, March 30, 2026

Axios AI+DC Summit: Copyright protection in the AI era will be up to the courts, industry leaders say; Axios, March 27, 2026

 Julie Bowen, Axios ; Axios AI+DC Summit: Copyright protection in the AI era will be up to the courts, industry leaders say

"Washington, D.C. — As policymakers grapple with how to regulate AI, the hardest questions around copyright and fair use are being punted to the courts, according to governance, creator, and technology experts at an Axios expert voices roundtable.

The big picture: With Congress moving slowly and disagreements over policy, judges are becoming the primary deciders of how AI and the creators work together — or don't.


That's partly by necessity: "Fair use is incredibly complicated — case by case, fact specific," News/Media Alliance president and CEO Danielle Coffey said.


"Each case that we get … we start to get these new guideposts," Jones Walker partner Graham Ryan said.


Ryan said they expect at least three fair use decisions this year that will have implications for the broader AI-artist ecosystem.


Axios' Maria Curi and Ashley Gold moderated the March 25 discussion, which was sponsored by Adobe.

What they're saying: Legal uncertainty remains. For example, two courts within the same district, and during the same week, differed in the reasoning behind their rulings on similar matters of fair use and AI.


"There is a current, live controversy over … the extant understanding of the fourth factor in fair use, which is: Does the copy replace the market for the work?" said Kevin Bankston, senior adviser for the Center for Democracy & Technology.


Still, "we have been trying to support the process through the courts, because we think there is a really strong framework in copyright law for protecting artists right now," according to Public Knowledge president and CEO Chris Lewis."

Friday, March 27, 2026

Q&A: The UK’s Copyright Report - A Gift to Creators, a Problem for AI; JD Supra, March 27, 2026

 Oliver Howley, JD Supra; Q&A: The UK’s Copyright Report - A Gift to Creators, a Problem for AI

"The UK Government has released its long-awaited copyright report, framed as an attempt to reconcile the competing interests of creators, technology companies and the wider innovation ecosystem. Rightsholders will welcome it, while the UK’s AI sector will find less comfort.

Two core policy decisions (on training data and on the ownership of AI-generated outputs) mark a shift away from earlier, more developer-friendly proposals. Both decisions leave significant questions unanswered: how AI developers can lawfully assemble training data at scale, what happens to content produced with minimal human input, and whether the UK’s current posture is sustainable in a world where capital and training runs are increasingly mobile.

In this Q&A, Oliver Howley, partner in Proskauer’s TMT Group and one of The Lawyer’s 2026 Hot 100, unpacks what the report says on these two decisions, what it leaves open, and what it means for developers, investors and rightsholders navigating the uncertainty ahead."

Tuesday, March 24, 2026

Chicken Soup for the Soul Sues AI Firms for Copyright Infringement; Publishers Weekly, March 20, 2026

 Ed Nawotka , Publishers Weekly; Chicken Soup for the Soul Sues AI Firms for Copyright Infringement

"Chicken Soup for the Soul is suing tech companies OpenAI, Anthropic, Google, Meta, xAI, Perplexity, Apple, and Nvidia for copyright infringement. The suit, filed March 17 in the Northern District of California, alleges that hundreds of its copyrighted works were ingested without authorization or compensation to train large language models...

Much like the complaint filed in December by author John Carreyrou and others against many of the same defendants, this filing also aims to challenge the class-action model that has dominated AI copyright litigation.

Pointing to the pending Anthropic settlement in the Northern District of California, the suit notes that the framework would pay rights holders approximately $3,000 per work—"just 2% of the Copyright Act's statutory ceiling of $150,000 per willfully infringed work." The complaint states that such settlements "seem to serve Defendants, not creators."

Chicken Soup for the Soul is instead seeking individualized statutory damages determined by a jury. The law firms behind the suit say more than 1,000 authors representing more than 5,000 works have signed on to the same approach."

Saturday, March 21, 2026

The dictionaries are suing OpenAI for ‘massive’ copyright infringement, and say ChatGPT is starving publishers of revenue; Fortune, March 21, 2026

 , Fortune; The dictionaries are suing OpenAI for ‘massive’ copyright infringement, and say ChatGPT is starving publishers of revenue

"In a filing submitted to the Southern District of New York, the companies accuse OpenAI of cannibalizing the traffic and ad revenue that publishers depend on to survive. “ChatGPT starves web publishers, like [the] Plaintiffs, of revenue,” the complaint reads. Where a traditional search engine sends users to a publisher’s website, Britannica and Merriam-Webster allege ChatGPT instead absorbs the content and delivers a polished answer. It also alleges the AI company fed its LLM with researched and fact-checked work of the companies’ hundreds of human writers and editors...

In an apt example, the complaint describes a prompt asking “How does Merriam-Webster define plagiarize?” to which the model reportedly responded with a definition identical to the one found in the Merriam-Webster dictionary. The complaint adds that the dictionary has been registered with the U.S. Copyright Office."

Thursday, March 19, 2026

UK reverses course on AI copyright position after backlash; Engadget, March 18, 2026

 Will Shanklin , Engadget; UK reverses course on AI copyright position after backlash

"halk up a win for creative artists against AI companies. On Wednesday, the UK government abandoned its previous position on copyrighted works. It’s currently working on a data bill that, if unaltered, would have allowed AI companies like Google and OpenAI to train models on copyrighted materials without consent. Artists and other copyright holders would only have been offered a mere opt-out clause.

After significant backlash, the UK backed off from that position. "We have listened," Technology Secretary Liz Kendall said on Wednesday. However, the government’s new stance is, well, not a stance at all. It currently "no longer has a preferred option" about how to handle the issue.

Still, backpedaling from its previous position is viewed as a win for artists. UK Music CEO Tom Kiehl described the decision as "a major victory," while promising to work with the government on the next steps."

Tuesday, March 17, 2026

Now OpenAI is getting sued by the dictionary; Quartz, March 17, 2026

 Quartz Staff, Quartz; Now OpenAI is getting sued by the dictionary

Encyclopedia Britannica and Merriam-Webster sued the ChatGPT maker, accusing it of copying almost 100,000 articles to train its AI models

"Encyclopedia Britannica and its subsidiary Merriam-Webster have filed suit against OpenAI, alleging that the ChatGPT maker copied their copyrighted content without authorization to train its large language models,

The lawsuit, filed in Manhattan federal court last week, alleges that OpenAI used close to 100,000 Britannica articles to train its models, and that ChatGPT responses frequently reproduce or closely paraphrase Britannica's reference content, including encyclopedia articles and dictionary entries. The complaint also alleges OpenAI uses a retrieval-augmented generation system to pull from Britannica's content in real time when generating responses."

Monday, March 16, 2026

The dictionary sues OpenAI; TechCrunch, March 16, 2026

 Amanda Silberling, TechCrunch; The dictionary sues OpenAI

"Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI, alleging in its complaint that the AI giant has committed “massive copyright infringement.”

Britannica, which owns Merriam-Webster, retains the copyright to nearly 100,000 online articles, which have been scraped and used to train OpenAI’s LLMs without permission, the publisher alleges in the lawsuit.

Britannica also accuses OpenAI of violating copyright laws when it generates outputs that contain “full or partial verbatim reproductions” of its content and when the AI lab uses its articles in ChatGPT’s RAG (retrieval augmented generation) workflow. OpenAI’s RAG tool is how the LLM scans the web or other databases for newly updated information when responding to a query. Britannica also alleges that OpenAI violates the Lanham Act, a trademark statute, when it generates made-up hallucinations and attributes them falsely to the publisher."

This Bill Would Force AI Companies to Disclose Copyrighted Works; PetaPixel, March 16, 2026

Pesala Bandara, PetaPixel; This Bill Would Force AI Companies to Disclose Copyrighted Works

"U.S. Senators Adam Schiff, a Democrat from California, and John Curtis, a Republican from Utah, have introduced the Copyright Labeling and Ethical AI Reporting Act, known as the CLEAR Act. The proposed legislation would require companies developing AI models to report when copyrighted material is used to train those systems.

If passed, the legislation could increase transparency around the material used to train generative AI systems, including copyrighted photographs."

UK to rule out sweeping AI copyright overhaul; Politico, March 11, 2026

 JOSEPH BAMBRIDGE, Politico; UK to rule out sweeping AI copyright overhaul 

The U.K. will rule out making creatives actively opt out of having their copyrighted material scraped by AI companies.

"The U.K. government will rule out sweeping reform of its copyright laws in a highly-anticipated policy update next week, according to three people briefed on government thinking and granted anonymity to speak freely. 

The people said the update, due by March 18, will state the government does not plan to take forward work on an “opt out” model, whereby rights holders would have to explicitly say they do not want their work used to train AI models. 


It comes amid intense pressure from rights holders and lawmakers not to pursue the “opt out” policy. The government previously said this was its “preferred option” to facilitate AI innovation in the U.K., before ministers were forced to row back."