Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Saturday, July 5, 2025

Two Courts Rule On Generative AI and Fair Use — One Gets It Right; Electronic Frontier Foundation (EFF), June 26, 2025

 TORI NOBLE, Electronic Frontier Foundation (EFF); Two Courts Rule On Generative AI and Fair Use — One Gets It Right

 "Gen-AI is spurring the kind of tech panics we’ve seen before; then, as now, thoughtful fair use opinions helped ensure that copyright law served innovation and creativity. Gen-AI does raise a host of other serious concerns about fair labor practices and misinformation, but copyright wasn’t designed to address those problems. Trying to force copyright law to play those roles only hurts important and legal uses of this technology.

In keeping with that tradition, courts deciding fair use in other AI copyright cases should look to Bartz, not Kadrey."

Thursday, July 3, 2025

Cloudflare Sidesteps Copyright Issues, Blocking AI Scrapers By Default; Forbes, July 2, 2025

 Emma Woollacott , Forbes; Cloudflare Sidesteps Copyright Issues, Blocking AI Scrapers By Default

"IT service management company Cloudflare is striking back on behalf of content creators, blocking AI scrapers by default.

Web scrapers are bots that crawl the internet, collecting and cataloguing content of all types, and are used by AI firms to collect material that can be used to train their models.

Now, though, Cloudflare is allowing website owners to choose if they want AI crawlers to access their content, and decide how the AI companies can use it. They can opt to allow crawlers for certain purposes—search, for example—but block others. AI companies will have to obtain explicit permission from a website before scraping."

Wednesday, July 2, 2025

Fair Use or Foul Play? The AI Fair Use Copyright Line; The National Law Review, July 2, 2025

Jodi Benassi of McDermott Will & Emery  , The National Law Review; Fair Use or Foul Play? The AI Fair Use Copyright Line

"Practice note: This is the first federal court decision analyzing the defense of fair use of copyrighted material to train generative AI. Two days after this decision issued, another Northern District of California judge ruled in Kadrey et al. v. Meta Platforms Inc. et al., Case No. 3:23-cv-03417, and concluded that the AI technology at issue in his case was transformative. However, the basis for his ruling in favor of Meta on the question of fair use was not transformation, but the plaintiffs’ failure “to present meaningful evidence that Meta’s use of their works to create [a generative AI engine] impacted the market” for the books."

Eminem, AI and me: why artists need new laws in the digital age; The Guardian, July 2, 2025

  , The Guardian; Eminem, AI and me: why artists need new laws in the digital age

"Song lyrics, my publisher informs me, are subject to notoriously strict copyright enforcement and the cost to buy the rights is often astronomical. Fat chance as well, then, of me quoting Eminem to talk about how Lose Yourself seeped into the psyche of a generation when he rapped: “You only get one shot, do not miss your chance to blow, this opportunity comes once in a lifetime.”

Oh would it be different if I were an AI company with a large language model (LLM), though. I could scrape from the complete discography of the National and Eminem, and the lyrics of every other song ever written. Then, when a user prompted something like, “write a rap in the style of Eminem about losing money, and draw inspiration from the National’s Bloodbuzz Ohio”, my word correlation program – with hundreds of millions of paying customers and a market capitalisation worth tens if not hundreds of billions of dollars – could answer:

“I still owe money to the money to the money I owe,

But I spit gold out my throat when I flow,

So go tell the bank they can take what they like

I already gave my soul to the mic.”

And that, according to rulings last month by the US courts, is somehow “fair use” and is perplexingly not copyright infringement at all, despite no royalties having been paid to anyone in the process."

Tuesday, July 1, 2025

The Court Battles That Will Decide if Silicon Valley Can Plunder Your Work; Slate, June 30, 2025

  BY  , SLATE; The Court Battles That Will Decide if Silicon Valley Can Plunder Your Work

"Last week, two different federal judges in the Northern District of California made legal rulings that attempt to resolve one of the knottiest debates in the artificial intelligence world: whether it’s a copyright violation for Big Tech firms to use published books for training generative bots like ChatGPT. Unfortunately for the many authors who’ve brought lawsuits with this argument, neither decision favors their case—at least, not for now. And that means creators in all fields may not be able to stop A.I. companies from using their work however they please...

What if these copyright battles are also lost? Then there will be little in the way of stopping A.I. startups from utilizing all creative works for their own purposes, with no consideration as to the artists and writers who actually put in the work. And we will have a world blessed less with human creativity than one overrun by second-rate slop that crushes the careers of the people whose imaginations made that A.I. so potent to begin with."

Hollywood Confronts AI Copyright Chaos in Washington, Courts; The Wall Street Journal, July 1, 2025

 Amrith Ramkumar,  Jessica Toonkel, The Wall Street Journal; Hollywood Confronts AI Copyright Chaos in Washington, Courts

Technology firms say using copyrighted materials to train AI models is key to America’s success; creatives want their work protected

Sunday, June 29, 2025

An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy; Los Angeles Times, June 27, 2025

 Michael Hiltzik , Los Angeles Times; An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy


[Kip Currier: Excellent informative overview of some of the principal issues, players, stakes, and recent decisions in the ongoing AI copyright legal battles. Definitely worth 5-10 minutes of your time to read and reflect on.

A key take-away, derived from Judge Vince Chhabria's decision in last week's Meta win, is that:

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market. Chhabria all but pleaded for the plaintiffs to bring some such evidence before him: 

“It’s hard to imagine that it can be fair use to use copyrighted books...to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books.” 

But “the plaintiffs never so much as mentioned it,” he lamented.

https://www.latimes.com/business/story/2025-06-27/an-ai-firm-won-a-lawsuit-over-copyright-infringement-but-may-face-a-huge-bill-for-piracy]


[Excerpt]

"Anthropic had to acknowledge a troubling qualification in Alsup’s order, however. Although he found for the company on the copyright issue, he also noted that it had downloaded copies of more than 7 million books from online “shadow libraries,” which included countless copyrighted works, without permission. 

That action was “inherently, irredeemably infringing,” Alsup concluded. “We will have a trial on the pirated copies...and the resulting damages,” he advised Anthropic ominously: Piracy on that scale could expose the company to judgments worth untold millions of dollars...

“Neither case is going to be the last word” in the battle between copyright holders and AI developers, says Aaron Moss, a Los Angeles attorney specializing in copyright law. With more than 40 lawsuits on court dockets around the country, he told me, “it’s too early to declare that either side is going to win the ultimate battle.”...

With billions of dollars, even trillions, at stake for AI developers and the artistic community at stake, no one expects the law to be resolved until the issue reaches the Supreme Court, presumably years from now...

But Anthropic also downloaded copies of more than 7 million books from online “shadow libraries,” which include untold copyrighted works without permission. 

Alsup wrote that Anthropic “could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’” Alsup wrote. (He was quoting Anthropic co-founder and CEO Dario Amodei.)...

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market."...

The truth is that the AI camp is just trying to get out of paying for something instead of getting it for free. Never mind the trillions of dollars in revenue they say they expect over the next decade — they claim that licensing will be so expensive it will stop the march of this supposedly historic technology dead in its tracks.

Chhabria aptly called this argument “nonsense.” If using books for training is as valuable as the AI firms say they are, he noted, then surely a market for book licensing will emerge. That is, it will — if the courts don’t give the firms the right to use stolen works without compensation."

Saturday, June 28, 2025

The Anthropic Copyright Ruling Exposes Blind Spots on AI; Bloomberg, June 26, 2025

 , Bloomberg; The Anthropic Copyright Ruling Exposes Blind Spots on AI


[Kip Currier: It's still early days in the AI copyright legal battles underway between AI tech companies and everyone else whose training data was "scarfed up" to enable the former to create lucrative AI tools and products. But cases like this week's Anthropic lawsuit win and another suit won by Meta (with some issues still to be adjudicated regarding the use of pirated materials as AI training data) are finally now giving us some more discernible "tea leaves" and "black letter law" as to how courts are likely to rule vis-a-vis AI inputs.

This week being the much ballyhooed 50th anniversary of the so-called "1st summer blockbuster flick" Jaws ("you're gonna need a bigger boat"), these rulings make me think we the public may need a bigger copyright law schema that sets out protections for the creatives making the fuel that enables stratospherically profitable AI innovations. The Jaws metaphor may be a bit on-the-nose, but one can't help but view AI tech companies akin to rapacious sharks that are imperiling the financial survival and long-standing business models of human creators.

As touched on in this Bloomberg article, too, there's a moral argument that what AI tech folks have done with the uncompensated use of creative works, without permission, doesn't mean that it's ethically justifiable simply because a court may say it's legal. Or that these companies shouldn't be required by updated federal copyright legislation and licensing frameworks to fairly compensate creators for the use of their copyrighted works. After all, billionaire tech oligarchs like Zuckerberg, Musk, and Altman would never allow others to do to them what they've done to creatives with impunity and zero contrition.

Are you listening, Congress?

Or are all of you in the pockets of AI tech company lobbyists, rather than representing the needs and interests of all of your constituents and not just the billionaire class.] 


[Excerpt]

"In what is shaping up to be a long, hard fight over the use of creative works, round one has gone to the AI makers. In the first such US decision of its kind, District Judge William Alsup said Anthropic’s use of millions of books to train its artificial-intelligence model, without payment to the sources, was legal under copyright law because it was “transformative — spectacularly so.”...

If a precedent has been set, as several observers believe, it stands to cripple one of the few possible AI monetization strategies for rights holders, which is to sell licenses to firms for access to their work. Some of these deals have already been made while the “fair use” question has been in limbo, deals that emerged only after the threat of legal action. This ruling may have just taken future deals off the table...

Alsup was right when he wrote that “the technology at issue was among the most transformative many of us will see in our lifetimes.”...

But that doesn’t mean it shouldn’t pay its way. Nobody would dare suggest Nvidia Corp. CEO Jensen Huang hand out his chips free. No construction worker is asked to keep costs down by building data center walls for nothing. Software engineers aren’t volunteering their time to Meta Platforms Inc. in awe of Mark Zuckerberg’s business plan — they instead command salaries of $100 million and beyond. 

Yet, as ever, those in the tech industry have decided that creative works, and those who create them, should be considered of little or no value and must step aside in service of the great calling of AI — despite being every bit as vital to the product as any other factor mentioned above. As science-fiction author Harlan Ellison said in his famous sweary rant, nobody ever wants to pay the writer if they can get away with it. When it comes to AI, paying creators of original work isn’t impossible, it’s just inconvenient. Legislators should leave companies no choice."

Friday, June 27, 2025

Getty drops copyright allegations in UK lawsuit against Stability AI; AP, June 25, 2025

  KELVIN CHAN, AP; Getty drops copyright allegations in UK lawsuit against Stability AI

"Getty Images dropped copyright infringement allegations from its lawsuit against artificial intelligence company Stability AI as closing arguments began Wednesday in the landmark case at Britain’s High Court. 

Seattle-based Getty’s decision to abandon the copyright claim removes a key part of its lawsuit against Stability AI, which owns a popular AI image-making tool called Stable Diffusion. The two have been facing off in a widely watched court case that could have implications for the creative and technology industries."

Wednesday, June 25, 2025

Judge dismisses authors’ copyright lawsuit against Meta over AI training; AP, June 25, 2025

MATT O’BRIEN AND BARBARA ORTUTAY, AP; Judge dismisses authors’ copyright lawsuit against Meta over AI training

"Although Meta prevailed in its request to dismiss the case, it could turn out to be a pyrrhic victory. In his 40-page ruling, Chhabria repeatedly indicated reasons to believe that Meta and other AI companies have turned into serial copyright infringers as they train their technology on books and other works created by humans, and seemed to be inviting other authors to bring cases to his court presented in a manner that would allow them to proceed to trial.

The judge scoffed at arguments that requiring AI companies to adhere to decades-old copyright laws would slow down advances in a crucial technology at a pivotal time. “These products are expected to generate billions, even trillions of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”

Tuesday, June 24, 2025

Anthropic’s AI copyright ‘win’ is more complicated than it looks; Fast Company, June 24, 2025

 CHRIS STOKEL-WALKER, Fast Company;Anthropic’s AI copyright ‘win’ is more complicated than it looks

"And that’s the catch: This wasn’t an unvarnished win for Anthropic. Like other tech companies, Anthropic allegedly sourced training materials from piracy sites for ease—a fact that clearly troubled the court. “This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” Alsup wrote, referring to Anthropic’s alleged pirating of more than 7 million books.

That alone could carry billions in liability, with statutory damages starting at $750 per book—a trial on that issue is still to come.

So while tech companies may still claim victory (with some justification, given the fair use precedent), the same ruling also implies that companies will need to pay substantial sums to legally obtain training materials. OpenAI, for its part, has in the past argued that licensing all the copyrighted material needed to train its models would be practically impossible.

Joanna Bryson, a professor of AI ethics at the Hertie School in Berlin, says the ruling is “absolutely not” a blanket win for tech companies. “First of all, it’s not the Supreme Court. Secondly, it’s only one jurisdiction: The U.S.,” she says. “I think they don’t entirely have purchase over this thing about whether or not it was transformative in the sense of changing Claude’s output.”"

The copyright war between the AI industry and creatives; Financial Times, June 23, 2025

, Financial Times ; The copyright war between the AI industry and creatives

"One is that the government itself estimates that “creative industries generated £126bn in gross value added to the economy [5 per cent of GDP] and employed 2.4 million people in 2022”. It is at the very least an open question whether the value added of the AI industry will ever be of a comparable scale in this country. Another is that the creative industries represent much of the best of what the UK and indeed humanity does. The idea of handing over its output for free is abhorrent...

Interestingly, for much of the 19th century, the US did not recognise international copyright at all in its domestic law. Anthony Trollope himself complained fiercely about the theft of the copyright over his books."

Anthropic wins key US ruling on AI training in authors' copyright lawsuit; Reuters, June 24, 2025

 , Reuters; Anthropic wins key US ruling on AI training in authors' copyright lawsuit

 "A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law.

Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model.

Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement."

Study: Meta AI model can reproduce almost half of Harry Potter book; Ars Technica, June 20, 2025

 TIMOTHY B. LEE  , Ars Techcnica; Study: Meta AI model can reproduce almost half of Harry Potter book

"In recent years, numerous plaintiffs—including publishers of books, newspapers, computer code, and photographs—have sued AI companies for training models using copyrighted material. A key question in all of these lawsuits has been how easily AI models produce verbatim excerpts from the plaintiffs’ copyrighted content.

For example, in its December 2023 lawsuit against OpenAI, The New York Times Company produced dozens of examples where GPT-4 exactly reproduced significant passages from Times stories. In its response, OpenAI described this as a “fringe behavior” and a “problem that researchers at OpenAI and elsewhere work hard to address.”

But is it actually a fringe behavior? And have leading AI companies addressed it? New research—focusing on books rather than newspaper articles and on different companies—provides surprising insights into this question. Some of the findings should bolster plaintiffs’ arguments, while others may be more helpful to defendants.

The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models—three from Meta and one each from Microsoft and EleutherAI—were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright."

Friday, June 20, 2025

Two Major Lawsuits Aim to Answer a Multi-Billion-Dollar Question: Can AI Train on Your Creative Work Without Permission?; The National Law Review, June 18, 2025

Andrew R. LeeTimothy P. Scanlan, Jr. of Jones Walker LLP , The National Law Review; Two Major Lawsuits Aim to Answer a Multi-Billion-Dollar Question: Can AI Train on Your Creative Work Without Permission?

"In a London courtroom, lawyers faced off in early June in a legal battle that could shape the future relationship between artificial intelligence and creative work. The case pits Getty Images, a major provider of stock photography, against Stability AI, the company behind the popular AI art generator, Stable Diffusion.

At the heart of the dispute is Getty's claim that Stability AI unlawfully used 12 million of its copyrighted images to train its AI model. The outcome of this case could establish a critical precedent for whether AI companies can use publicly available online content for training data or if they will be required to license it.

On the first day of trial, Getty's lawyer told the London High Court that the company “recognises that the AI industry overall may be a force for good,” but that did not justify AI companies “riding roughshod over intellectual property rights.”

A Key Piece of Evidence

A central component of Getty's case is the observation that Stable Diffusion's output sometimes includes distorted versions of the Getty Images watermark. Getty argues this suggests its images were not only used for training but are also being partially reproduced by the AI model.

Stability AI has taken the position that training an AI model on images constitutes a transformative use of that data. The argument is that teaching a machine from existing information is fundamentally different from direct copying."

Sunday, June 15, 2025

AI chatbots need more books to learn from. These libraries are opening their stacks; AP, June 12, 2025

  MATT O’BRIEN, AP; AI chatbots need more books to learn from. These libraries are opening their stacks

"Supported by “unrestricted gifts” from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries and museums around the world on how to make their historic collections AI-ready in a way that also benefits the communities they serve.

“We’re trying to move some of the power from this current AI moment back to these institutions,” said Aristana Scourtas, who manages research at Harvard Law School’s Library Innovation Lab. “Librarians have always been the stewards of data and the stewards of information.

Harvard’s newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter’s handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. 

It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems."

Friday, June 13, 2025

How Disney’s AI lawsuit could shift the future of entertainment; The Washington Post, June 11, 2025

, The Washington Post ; How Disney’s AI lawsuit could shift the future of entertainment

"The battle over the future of AI-generated content escalated on Wednesday as two Hollywood titans sued a fast-growing AI start-up for copyright infringement.

Disney and Universal, whose entertainment empires include Pixar, Star Wars, Marvel and Despicable Me, sued Midjourney, claiming it wrongfully trained its image-generating AI models on the studios’ intellectual property.

They are the first major Hollywood studios to file copyright infringement lawsuits, marking a pivotal moment in the ongoing fight by artists, newspapers and content makers to stop AI firms from using their work as training data — or at least make them pay for it."

Wednesday, June 11, 2025

Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’; Variety, June 11, 2025

  Todd Spangler, Variety; Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’

"Disney and NBCU filed a federal lawsuit Tuesday against Midjourney, a generative AI start-up, alleging copyright infringement. The companies alleged that Midjourney’s own website “displays hundreds, if not thousands, of images generated by its Image Service at the request of its subscribers that infringe Plaintiffs’ Copyrighted Works.”

A copy of the lawsuit is at this link...

Disney and NBCU’s lawsuit includes images alleged to be examples of instances of Midjourney’s infringement. Those include an image of Marvel’s Deadpool and Wolverine (pictured above), Iron Man, Spider-Man, the Hulk and more; Star Wars’ Darth Vader, Yoda, R2-D2, C-3PO and Chewbacca; Disney’s Princess Elsa and Olaf from “Frozen”; characters from “The Simpsons”; Pixar’s Buzz Lightyear from “Toy Story” and Lightning McQueen from “Cars”; DreamWorks’ “How to Train Your Dragon”; and Universal‘s “Shrek” and the yellow Minions from the “Despicable Me” film franchise."

Tuesday, June 10, 2025

Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins; PetaPixel, June 10, 2025

 Matt Growcoot , PetaPixel; Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins

"The Guardian notes that the trial will focus on specific photos taken by famous photographers. Getty plans to bring up photos of the Chicago Cubs taken by sports photographer Gregory Shamus and photos of film director Christopher Nolan taken by Andreas Rentz. 

All-in-all, 78,000 pages of evidence have been disclosed for the case and AI experts are being called in to give testimonies. Getty is also suing Stability AI in the United States in a parallel case. The trial in London is expected to run for three weeks and will be followed by a written decision from the judge at a later date."

Monday, June 9, 2025

Getty argues its landmark UK copyright case does not threaten AI; Reuters, June 9, 2025

 , Reuters; Getty argues its landmark UK copyright case does not threaten AI

 "Getty Images' landmark copyright lawsuit against artificial intelligence company Stability AI began at London's High Court on Monday, with Getty rejecting Stability AI's contention the case posed a threat to the generative AI industry.

Seattle-based Getty, which produces editorial content and creative stock images and video, accuses Stability AI of using its images to "train" its Stable Diffusion system, which can generate images from text inputs...

Creative industries are grappling with the legal and ethical implications of AI models that can produce their own work after being trained on existing material. Prominent figures including Elton John have called for greater protections for artists.

Lawyers say Getty's case will have a major impact on the law, as well as potentially informing government policy on copyright protections relating to AI."