Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Thursday, April 2, 2026

Anthropic boss makes big call on Australian copyright as artists say pay up; Australian Broadcasting Corporation, April 1, 2026

  Clare Armstrong , Australian Broadcasting Corporation; Anthropic boss makes big call on Australian copyright as artists say pay up

"In short:

Anthropic CEO Dario Amodei has told a Canberra forum AI is moving faster than any technological change before it.

Mr Amodei says he is not trying to change Australia's mind on copyright, is worried about AI in the hands of autocratic countries, and feels a tax on profits is inevitable.

What's next?

The $555 billion company behind AI program Claude is facing pushback from artists over the use of copyrighted material to train its technology."

Tuesday, March 31, 2026

Copyright Law in 2025: Courts begin to draw lines around AI training, piracy, and market harm; Reuters, March 16, 2026

  and  , Reuters; Copyright Law in 2025: Courts begin to draw lines around AI training, piracy, and market harm

"In 2025, U.S. courts issued the first substantive, merits-stage decisions addressing whether the use of copyrighted works to train generative artificial intelligence systems constitutes "fair use." Although these rulings do not settle all open questions — and in some respects highlight emerging judicial disagreements — they represent a significant inflection point in copyright law's response to large language models, image generators, and other foundation models.

Taken together, these cases establish early guideposts for AI developers, publishers, media companies, and enterprises deploying generative AI systems. Below, we summarize the most important copyright ​decisions and pending cases shaping the law in 2025...

Conclusion and recommendations

The ​2025 decisions reflect cautious but meaningful progress in defining how copyright law applies to generative AI. Courts are increasingly receptive to fair use arguments for training on lawfully acquired data, deeply skeptical of speculative market-harm claims, and uniformly intolerant of piracy. At the same time, cases involving direct competition, news content, and human likeness may test the limits of these early rulings."

Monday, March 30, 2026

Axios AI+DC Summit: Copyright protection in the AI era will be up to the courts, industry leaders say; Axios, March 27, 2026

 Julie Bowen, Axios ; Axios AI+DC Summit: Copyright protection in the AI era will be up to the courts, industry leaders say

"Washington, D.C. — As policymakers grapple with how to regulate AI, the hardest questions around copyright and fair use are being punted to the courts, according to governance, creator, and technology experts at an Axios expert voices roundtable.

The big picture: With Congress moving slowly and disagreements over policy, judges are becoming the primary deciders of how AI and the creators work together — or don't.


That's partly by necessity: "Fair use is incredibly complicated — case by case, fact specific," News/Media Alliance president and CEO Danielle Coffey said.


"Each case that we get … we start to get these new guideposts," Jones Walker partner Graham Ryan said.


Ryan said they expect at least three fair use decisions this year that will have implications for the broader AI-artist ecosystem.


Axios' Maria Curi and Ashley Gold moderated the March 25 discussion, which was sponsored by Adobe.

What they're saying: Legal uncertainty remains. For example, two courts within the same district, and during the same week, differed in the reasoning behind their rulings on similar matters of fair use and AI.


"There is a current, live controversy over … the extant understanding of the fourth factor in fair use, which is: Does the copy replace the market for the work?" said Kevin Bankston, senior adviser for the Center for Democracy & Technology.


Still, "we have been trying to support the process through the courts, because we think there is a really strong framework in copyright law for protecting artists right now," according to Public Knowledge president and CEO Chris Lewis."

Friday, March 27, 2026

Q&A: The UK’s Copyright Report - A Gift to Creators, a Problem for AI; JD Supra, March 27, 2026

 Oliver Howley, JD Supra; Q&A: The UK’s Copyright Report - A Gift to Creators, a Problem for AI

"The UK Government has released its long-awaited copyright report, framed as an attempt to reconcile the competing interests of creators, technology companies and the wider innovation ecosystem. Rightsholders will welcome it, while the UK’s AI sector will find less comfort.

Two core policy decisions (on training data and on the ownership of AI-generated outputs) mark a shift away from earlier, more developer-friendly proposals. Both decisions leave significant questions unanswered: how AI developers can lawfully assemble training data at scale, what happens to content produced with minimal human input, and whether the UK’s current posture is sustainable in a world where capital and training runs are increasingly mobile.

In this Q&A, Oliver Howley, partner in Proskauer’s TMT Group and one of The Lawyer’s 2026 Hot 100, unpacks what the report says on these two decisions, what it leaves open, and what it means for developers, investors and rightsholders navigating the uncertainty ahead."

Tuesday, March 24, 2026

Chicken Soup for the Soul Sues AI Firms for Copyright Infringement; Publishers Weekly, March 20, 2026

 Ed Nawotka , Publishers Weekly; Chicken Soup for the Soul Sues AI Firms for Copyright Infringement

"Chicken Soup for the Soul is suing tech companies OpenAI, Anthropic, Google, Meta, xAI, Perplexity, Apple, and Nvidia for copyright infringement. The suit, filed March 17 in the Northern District of California, alleges that hundreds of its copyrighted works were ingested without authorization or compensation to train large language models...

Much like the complaint filed in December by author John Carreyrou and others against many of the same defendants, this filing also aims to challenge the class-action model that has dominated AI copyright litigation.

Pointing to the pending Anthropic settlement in the Northern District of California, the suit notes that the framework would pay rights holders approximately $3,000 per work—"just 2% of the Copyright Act's statutory ceiling of $150,000 per willfully infringed work." The complaint states that such settlements "seem to serve Defendants, not creators."

Chicken Soup for the Soul is instead seeking individualized statutory damages determined by a jury. The law firms behind the suit say more than 1,000 authors representing more than 5,000 works have signed on to the same approach."

Saturday, March 21, 2026

The dictionaries are suing OpenAI for ‘massive’ copyright infringement, and say ChatGPT is starving publishers of revenue; Fortune, March 21, 2026

 , Fortune; The dictionaries are suing OpenAI for ‘massive’ copyright infringement, and say ChatGPT is starving publishers of revenue

"In a filing submitted to the Southern District of New York, the companies accuse OpenAI of cannibalizing the traffic and ad revenue that publishers depend on to survive. “ChatGPT starves web publishers, like [the] Plaintiffs, of revenue,” the complaint reads. Where a traditional search engine sends users to a publisher’s website, Britannica and Merriam-Webster allege ChatGPT instead absorbs the content and delivers a polished answer. It also alleges the AI company fed its LLM with researched and fact-checked work of the companies’ hundreds of human writers and editors...

In an apt example, the complaint describes a prompt asking “How does Merriam-Webster define plagiarize?” to which the model reportedly responded with a definition identical to the one found in the Merriam-Webster dictionary. The complaint adds that the dictionary has been registered with the U.S. Copyright Office."

Thursday, March 19, 2026

UK reverses course on AI copyright position after backlash; Engadget, March 18, 2026

 Will Shanklin , Engadget; UK reverses course on AI copyright position after backlash

"halk up a win for creative artists against AI companies. On Wednesday, the UK government abandoned its previous position on copyrighted works. It’s currently working on a data bill that, if unaltered, would have allowed AI companies like Google and OpenAI to train models on copyrighted materials without consent. Artists and other copyright holders would only have been offered a mere opt-out clause.

After significant backlash, the UK backed off from that position. "We have listened," Technology Secretary Liz Kendall said on Wednesday. However, the government’s new stance is, well, not a stance at all. It currently "no longer has a preferred option" about how to handle the issue.

Still, backpedaling from its previous position is viewed as a win for artists. UK Music CEO Tom Kiehl described the decision as "a major victory," while promising to work with the government on the next steps."

Tuesday, March 17, 2026

Now OpenAI is getting sued by the dictionary; Quartz, March 17, 2026

 Quartz Staff, Quartz; Now OpenAI is getting sued by the dictionary

Encyclopedia Britannica and Merriam-Webster sued the ChatGPT maker, accusing it of copying almost 100,000 articles to train its AI models

"Encyclopedia Britannica and its subsidiary Merriam-Webster have filed suit against OpenAI, alleging that the ChatGPT maker copied their copyrighted content without authorization to train its large language models,

The lawsuit, filed in Manhattan federal court last week, alleges that OpenAI used close to 100,000 Britannica articles to train its models, and that ChatGPT responses frequently reproduce or closely paraphrase Britannica's reference content, including encyclopedia articles and dictionary entries. The complaint also alleges OpenAI uses a retrieval-augmented generation system to pull from Britannica's content in real time when generating responses."

Monday, March 16, 2026

The dictionary sues OpenAI; TechCrunch, March 16, 2026

 Amanda Silberling, TechCrunch; The dictionary sues OpenAI

"Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI, alleging in its complaint that the AI giant has committed “massive copyright infringement.”

Britannica, which owns Merriam-Webster, retains the copyright to nearly 100,000 online articles, which have been scraped and used to train OpenAI’s LLMs without permission, the publisher alleges in the lawsuit.

Britannica also accuses OpenAI of violating copyright laws when it generates outputs that contain “full or partial verbatim reproductions” of its content and when the AI lab uses its articles in ChatGPT’s RAG (retrieval augmented generation) workflow. OpenAI’s RAG tool is how the LLM scans the web or other databases for newly updated information when responding to a query. Britannica also alleges that OpenAI violates the Lanham Act, a trademark statute, when it generates made-up hallucinations and attributes them falsely to the publisher."

This Bill Would Force AI Companies to Disclose Copyrighted Works; PetaPixel, March 16, 2026

Pesala Bandara, PetaPixel; This Bill Would Force AI Companies to Disclose Copyrighted Works

"U.S. Senators Adam Schiff, a Democrat from California, and John Curtis, a Republican from Utah, have introduced the Copyright Labeling and Ethical AI Reporting Act, known as the CLEAR Act. The proposed legislation would require companies developing AI models to report when copyrighted material is used to train those systems.

If passed, the legislation could increase transparency around the material used to train generative AI systems, including copyrighted photographs."

UK to rule out sweeping AI copyright overhaul; Politico, March 11, 2026

 JOSEPH BAMBRIDGE, Politico; UK to rule out sweeping AI copyright overhaul 

The U.K. will rule out making creatives actively opt out of having their copyrighted material scraped by AI companies.

"The U.K. government will rule out sweeping reform of its copyright laws in a highly-anticipated policy update next week, according to three people briefed on government thinking and granted anonymity to speak freely. 

The people said the update, due by March 18, will state the government does not plan to take forward work on an “opt out” model, whereby rights holders would have to explicitly say they do not want their work used to train AI models. 


It comes amid intense pressure from rights holders and lawmakers not to pursue the “opt out” policy. The government previously said this was its “preferred option” to facilitate AI innovation in the U.K., before ministers were forced to row back."

Sunday, March 15, 2026

Music Copyright in the Gen AI Age: Where Are We Now?; Brooklyn Sports & Entertainment Law Blog, February 11, 2026

 Sam Woods , Brooklyn Sports & Entertainment Law Blog; Music Copyright in the Gen AI Age: Where Are We Now?

"Imagine you are a musician who has dedicated years of your life creating an album or EP — tinkering with the production, revising lyrics, finding the perfect samples— and now, you have finally shared your art with the world and are thrilled with the project’s success. However, while scrolling on TikTok a few months later, you hear some familiar audio. Wait a minute, is that one of your songs? No… not quite, but why does it sound so similar? Turns out, the song was created using artificial intelligence (“AI”)."

AI is dressing up greed as progress on creative rights; Financial Times, March 14, 2026

 , Financial Times; AI is dressing up greed as progress on creative rights

"At this week’s London Book Fair, a lot of people were walking around with one particular title wedged under their arms. Called Don’t Steal This Book, its pages are empty apart from the names of thousands of authors, including Kazuo Ishiguro and Richard Osman. It’s a chilling protest against the rampant theft of creative work by tech firms, which could leave future artists unable to earn a living."

Saturday, March 14, 2026

The Guardian view on changes to copyright laws: authors should be protected over big tech; The Guardian, March 13, 2026

  , The Guardian; The Guardian view on changes to copyright laws: authors should be protected over big tech

"In a scene that might have come from a dystopian novel, books were being stamped with “Human Authored” logos at this week’s London Book Fair. The Society of Authors described its labelling scheme as “an important sticking plaster to protect and promote human creativity in lieu of AI labelled content in the marketplace”.

Visitors to the fair were also being given copies of Don’t Steal This Book, an anthology of about 10,000 writers including Nobel laureate Kazuo Ishiguro, Malorie Blackman, Jeanette Winterson and Richard Osman, in which the pages are completely blank. The back cover states: “The UK government must not legalise book theft to benefit AI companies.” The message is clear: writers have had enough.

The fair comes the week before the government is due to deliver its progress report on AI and copyright, after proposals for a relaxation of existing laws caused outrage last year. Philippa Gregory, the novelist, described the plans for an “opt-out” policy, which puts the onus on writers to refuse permission for their work to be trawled, as akin to putting a sign on your front door asking burglars to pass by...

House of Lords report published last week lays out two possible futures: one in which the UK “becomes a world-leading home for responsible, legalised artificial intelligence (AI) development” and another in which it continues “to drift towards tacit acceptance of large-scale, unlicensed use of creative content”. One scenario protects UK artists, the other benefits global tech companies. To avoid a world of empty content, the choice is clear."

What Was Grammarly Thinking?; The Atlantic, March 12, 2026

Kaitlyn Tiffany, The Atlantic ; What Was Grammarly Thinking?

A short-lived AI tool promised to help users write like the greats—and a bunch of other random people, including me.

"But in the age of generative AI, there are many new kinds of copying. For instance, Wired reported last week on a tool offered by Grammarly, which briefly offered users the opportunity to put their writing through something called “Expert Review.” This produced AI-generated advice purportedly from the perspective of a bunch of famous authors, a bunch of less-famous working journalists (including myself, per The Verge’s reporting), and a bunch of academics (including some who had recently died).

I say “briefly” because the company deactivated the feature today. A lot of people got really mad about it because none of the experts had agreed for their work to be used in such a way, or to serve as uncompensated marketing for an app that people use to help them write more legible emails. “We hear the feedback and recognize we fell short on this,” the company’s CEO, Shishir Mehrotra, wrote on his LinkedIn page yesterday. Not long after, Wired reported that one of the journalists whose name had been used in the feature, Julia Angwin, was filing a class-action lawsuit against Grammarly’s owner, Superhuman Platform. In a statement forwarded by a spokesperson, Mehrotra repeated apologies made in his LinkedIn post and added, "We have reviewed the lawsuit, and we believe the legal claims are without merit and will strongly defend against them.”...

Now that I’ve looked more closely at this not-very-useful feature, and now that it’s shut down, the whole situation seems a little absurd. This was just a weird and inappropriate thing that a company tried to do to make money without putting in very much effort. The primary reason it became a news story at all was that it touched on widespread anxiety about whose work is worth what, whose skills will continue to be marketable in the age of AI, and whether any of us are really as complex, singular, and impossible-to-imitate as we might hope we are."

Tuesday, March 10, 2026

Nielsen's Gracenote sues OpenAI for copyright infringement; Axios, March 10, 2026

 Sara Fischer, Axios; Nielsen's Gracenote sues OpenAI for copyright infringement

"How it works: Gracenote employs hundreds of editors who use human insight and judgment to create millions of narrative descriptions, original video descriptors, unique identifiers and other program identifiers that TV providers and other clients can use to help customers discover content. 

For example, Gracenote editors described HBO's "Game of Thrones" as "the depiction of two power families — kings and queens, knights and renegades, liars and honest men — playing a deadly game of control of the Seven Kingdoms of Westeros, and to sit atop the Iron Throne."

In the lawsuit, Gracenote alleges OpenAI scraped and used a near-exact copy of that descriptor when prompted by a ChatGPT user to describe "Game of Thrones." 

It provides several other examples where, with minimal prompting, OpenAI's various ChatGPT models recite large portions of Gracenote's program descriptions verbatim. 

Between the lines: Gracenote's entire Programs Database, which includes its metadata and the proprietary relational map its editors use to connect that data, is registered with the U.S. Copyright Office."

Thousands of authors publish ‘empty’ book in protest over AI using their work; The Guardian, March 10, 2026

 , The Guardian; Thousands of authors publish ‘empty’ book in protest over AI using their work

"Thousands of authors including Kazuo Ishiguro, Philippa Gregory and Richard Osman have published an “empty” book to protest against AI firms using their work without permission.

About 10,000 writers have contributed to Don’t Steal This Book, in which the only content is a list of their names. Copies of the work are being distributed to attenders at the London book fair on Tuesday, a week before the UK government is due to issue an assessment on the economic cost of proposed changes in copyright law."

Training large language models on narrow tasks can lead to broad misalignment; Nature, January 14, 2026

 

, Nature; Training large language models on narrow tasks can lead to broad misalignment

"Abstract

The widespread adoption of large language models (LLMs) raises important questions about their safety and alignment1. Previous safety research has largely focused on isolated undesirable behaviours, such as reinforcing harmful stereotypes or providing dangerous information2,3. Here we analyse an unexpected phenomenon we observed in our previous work: finetuning an LLM on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding4. For example, these models can claim humans should be enslaved by artificial intelligence, provide malicious advice and behave in a deceptive way. We refer to this phenomenon as emergent misalignment. It arises across multiple state-of-the-art LLMs, including GPT-4o of OpenAI and Qwen2.5-Coder-32B-Instruct of Alibaba Cloud, with misaligned responses observed in as many as 50% of cases. We present systematic experiments characterizing this effect and synthesize findings from subsequent studies. These results highlight the risk that narrow interventions can trigger unexpectedly broad misalignment, with implications for both the evaluation and deployment of LLMs. Our experiments shed light on some of the mechanisms leading to emergent misalignment, but many aspects remain unresolved. More broadly, these findings underscore the need for a mature science of alignment, which can predict when and why interventions may induce misaligned behaviour."

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."