Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Friday, June 27, 2025

Getty drops copyright allegations in UK lawsuit against Stability AI; AP, June 25, 2025

  KELVIN CHAN, AP; Getty drops copyright allegations in UK lawsuit against Stability AI

"Getty Images dropped copyright infringement allegations from its lawsuit against artificial intelligence company Stability AI as closing arguments began Wednesday in the landmark case at Britain’s High Court. 

Seattle-based Getty’s decision to abandon the copyright claim removes a key part of its lawsuit against Stability AI, which owns a popular AI image-making tool called Stable Diffusion. The two have been facing off in a widely watched court case that could have implications for the creative and technology industries."

Wednesday, June 25, 2025

Judge dismisses authors’ copyright lawsuit against Meta over AI training; AP, June 25, 2025

MATT O’BRIEN AND BARBARA ORTUTAY, AP; Judge dismisses authors’ copyright lawsuit against Meta over AI training

"Although Meta prevailed in its request to dismiss the case, it could turn out to be a pyrrhic victory. In his 40-page ruling, Chhabria repeatedly indicated reasons to believe that Meta and other AI companies have turned into serial copyright infringers as they train their technology on books and other works created by humans, and seemed to be inviting other authors to bring cases to his court presented in a manner that would allow them to proceed to trial.

The judge scoffed at arguments that requiring AI companies to adhere to decades-old copyright laws would slow down advances in a crucial technology at a pivotal time. “These products are expected to generate billions, even trillions of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”

Tuesday, June 24, 2025

Anthropic’s AI copyright ‘win’ is more complicated than it looks; Fast Company, June 24, 2025

 CHRIS STOKEL-WALKER, Fast Company;Anthropic’s AI copyright ‘win’ is more complicated than it looks

"And that’s the catch: This wasn’t an unvarnished win for Anthropic. Like other tech companies, Anthropic allegedly sourced training materials from piracy sites for ease—a fact that clearly troubled the court. “This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” Alsup wrote, referring to Anthropic’s alleged pirating of more than 7 million books.

That alone could carry billions in liability, with statutory damages starting at $750 per book—a trial on that issue is still to come.

So while tech companies may still claim victory (with some justification, given the fair use precedent), the same ruling also implies that companies will need to pay substantial sums to legally obtain training materials. OpenAI, for its part, has in the past argued that licensing all the copyrighted material needed to train its models would be practically impossible.

Joanna Bryson, a professor of AI ethics at the Hertie School in Berlin, says the ruling is “absolutely not” a blanket win for tech companies. “First of all, it’s not the Supreme Court. Secondly, it’s only one jurisdiction: The U.S.,” she says. “I think they don’t entirely have purchase over this thing about whether or not it was transformative in the sense of changing Claude’s output.”"

The copyright war between the AI industry and creatives; Financial Times, June 23, 2025

, Financial Times ; The copyright war between the AI industry and creatives

"One is that the government itself estimates that “creative industries generated £126bn in gross value added to the economy [5 per cent of GDP] and employed 2.4 million people in 2022”. It is at the very least an open question whether the value added of the AI industry will ever be of a comparable scale in this country. Another is that the creative industries represent much of the best of what the UK and indeed humanity does. The idea of handing over its output for free is abhorrent...

Interestingly, for much of the 19th century, the US did not recognise international copyright at all in its domestic law. Anthony Trollope himself complained fiercely about the theft of the copyright over his books."

Anthropic wins key US ruling on AI training in authors' copyright lawsuit; Reuters, June 24, 2025

 , Reuters; Anthropic wins key US ruling on AI training in authors' copyright lawsuit

 "A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law.

Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model.

Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement."

Study: Meta AI model can reproduce almost half of Harry Potter book; Ars Technica, June 20, 2025

 TIMOTHY B. LEE  , Ars Techcnica; Study: Meta AI model can reproduce almost half of Harry Potter book

"In recent years, numerous plaintiffs—including publishers of books, newspapers, computer code, and photographs—have sued AI companies for training models using copyrighted material. A key question in all of these lawsuits has been how easily AI models produce verbatim excerpts from the plaintiffs’ copyrighted content.

For example, in its December 2023 lawsuit against OpenAI, The New York Times Company produced dozens of examples where GPT-4 exactly reproduced significant passages from Times stories. In its response, OpenAI described this as a “fringe behavior” and a “problem that researchers at OpenAI and elsewhere work hard to address.”

But is it actually a fringe behavior? And have leading AI companies addressed it? New research—focusing on books rather than newspaper articles and on different companies—provides surprising insights into this question. Some of the findings should bolster plaintiffs’ arguments, while others may be more helpful to defendants.

The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models—three from Meta and one each from Microsoft and EleutherAI—were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright."

Friday, June 20, 2025

Two Major Lawsuits Aim to Answer a Multi-Billion-Dollar Question: Can AI Train on Your Creative Work Without Permission?; The National Law Review, June 18, 2025

Andrew R. LeeTimothy P. Scanlan, Jr. of Jones Walker LLP , The National Law Review; Two Major Lawsuits Aim to Answer a Multi-Billion-Dollar Question: Can AI Train on Your Creative Work Without Permission?

"In a London courtroom, lawyers faced off in early June in a legal battle that could shape the future relationship between artificial intelligence and creative work. The case pits Getty Images, a major provider of stock photography, against Stability AI, the company behind the popular AI art generator, Stable Diffusion.

At the heart of the dispute is Getty's claim that Stability AI unlawfully used 12 million of its copyrighted images to train its AI model. The outcome of this case could establish a critical precedent for whether AI companies can use publicly available online content for training data or if they will be required to license it.

On the first day of trial, Getty's lawyer told the London High Court that the company “recognises that the AI industry overall may be a force for good,” but that did not justify AI companies “riding roughshod over intellectual property rights.”

A Key Piece of Evidence

A central component of Getty's case is the observation that Stable Diffusion's output sometimes includes distorted versions of the Getty Images watermark. Getty argues this suggests its images were not only used for training but are also being partially reproduced by the AI model.

Stability AI has taken the position that training an AI model on images constitutes a transformative use of that data. The argument is that teaching a machine from existing information is fundamentally different from direct copying."

Sunday, June 15, 2025

AI chatbots need more books to learn from. These libraries are opening their stacks; AP, June 12, 2025

  MATT O’BRIEN, AP; AI chatbots need more books to learn from. These libraries are opening their stacks

"Supported by “unrestricted gifts” from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries and museums around the world on how to make their historic collections AI-ready in a way that also benefits the communities they serve.

“We’re trying to move some of the power from this current AI moment back to these institutions,” said Aristana Scourtas, who manages research at Harvard Law School’s Library Innovation Lab. “Librarians have always been the stewards of data and the stewards of information.

Harvard’s newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter’s handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. 

It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems."

Friday, June 13, 2025

How Disney’s AI lawsuit could shift the future of entertainment; The Washington Post, June 11, 2025

, The Washington Post ; How Disney’s AI lawsuit could shift the future of entertainment

"The battle over the future of AI-generated content escalated on Wednesday as two Hollywood titans sued a fast-growing AI start-up for copyright infringement.

Disney and Universal, whose entertainment empires include Pixar, Star Wars, Marvel and Despicable Me, sued Midjourney, claiming it wrongfully trained its image-generating AI models on the studios’ intellectual property.

They are the first major Hollywood studios to file copyright infringement lawsuits, marking a pivotal moment in the ongoing fight by artists, newspapers and content makers to stop AI firms from using their work as training data — or at least make them pay for it."

Wednesday, June 11, 2025

Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’; Variety, June 11, 2025

  Todd Spangler, Variety; Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’

"Disney and NBCU filed a federal lawsuit Tuesday against Midjourney, a generative AI start-up, alleging copyright infringement. The companies alleged that Midjourney’s own website “displays hundreds, if not thousands, of images generated by its Image Service at the request of its subscribers that infringe Plaintiffs’ Copyrighted Works.”

A copy of the lawsuit is at this link...

Disney and NBCU’s lawsuit includes images alleged to be examples of instances of Midjourney’s infringement. Those include an image of Marvel’s Deadpool and Wolverine (pictured above), Iron Man, Spider-Man, the Hulk and more; Star Wars’ Darth Vader, Yoda, R2-D2, C-3PO and Chewbacca; Disney’s Princess Elsa and Olaf from “Frozen”; characters from “The Simpsons”; Pixar’s Buzz Lightyear from “Toy Story” and Lightning McQueen from “Cars”; DreamWorks’ “How to Train Your Dragon”; and Universal‘s “Shrek” and the yellow Minions from the “Despicable Me” film franchise."

Tuesday, June 10, 2025

Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins; PetaPixel, June 10, 2025

 Matt Growcoot , PetaPixel; Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins

"The Guardian notes that the trial will focus on specific photos taken by famous photographers. Getty plans to bring up photos of the Chicago Cubs taken by sports photographer Gregory Shamus and photos of film director Christopher Nolan taken by Andreas Rentz. 

All-in-all, 78,000 pages of evidence have been disclosed for the case and AI experts are being called in to give testimonies. Getty is also suing Stability AI in the United States in a parallel case. The trial in London is expected to run for three weeks and will be followed by a written decision from the judge at a later date."

Monday, June 9, 2025

Getty argues its landmark UK copyright case does not threaten AI; Reuters, June 9, 2025

 , Reuters; Getty argues its landmark UK copyright case does not threaten AI

 "Getty Images' landmark copyright lawsuit against artificial intelligence company Stability AI began at London's High Court on Monday, with Getty rejecting Stability AI's contention the case posed a threat to the generative AI industry.

Seattle-based Getty, which produces editorial content and creative stock images and video, accuses Stability AI of using its images to "train" its Stable Diffusion system, which can generate images from text inputs...

Creative industries are grappling with the legal and ethical implications of AI models that can produce their own work after being trained on existing material. Prominent figures including Elton John have called for greater protections for artists.

Lawyers say Getty's case will have a major impact on the law, as well as potentially informing government policy on copyright protections relating to AI."

Saturday, June 7, 2025

UK government signals it will not force tech firms to disclose how they train AI; The Guardian, June 6, 2025

  and , The Guardian ; UK government signals it will not force tech firms to disclose how they train AI

"Opponents of the plans have warned that even if the attempts to insert clauses into the data bill fail, the government could be challenged in the courts over the proposed changes.

The consultation on copyright changes, which is due to produce its findings before the end of the year, contains four options: to let AI companies use copyrighted work without permission, alongside an option for artists to “opt out” of the process; to leave the situation unchanged; to require AI companies to seek licences for using copyrighted work; and to allow AI firms to use copyrighted work with no opt-out for creative companies and individuals.

The technology secretary, Peter Kyle, has said the copyright-waiver-plus-opt-out scenario is no longer the government’s preferred option, but Kidron’s amendments have attempted to head off that option by effectively requiring tech companies to seek licensing deals for any content that they use to train their AI models."

How AI and copyright turned into a political nightmare for Labour; Politico.eu, June 4, 2025

 JOSEPH BAMBRIDGE , Politico.eu; How AI and copyright turned into a political nightmare for Labour

"The Data (Use and Access Bill) has ricocheted between the Commons and the Lords in an extraordinarily long incidence of ping-pong, with both Houses digging their heels in and a frenzied lobbying battle on all sides."

Friday, June 6, 2025

AI firms say they can’t respect copyright. These researchers tried.; The Washington Post, June 5, 2025

 Analysis by  

with research by 
, The Washington Post; AI firms say they can’t respect copyright. These researchers tried.

"A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7Bwhich Meta released in 2023.

paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate.

The group built an AI model that is significantly smaller than the latest offered by OpenAI’s ChatGPT or Google’s Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools.

That could have implications for the policy debate swirling around AI and copyright.

The paper itself does not take a position on whether scraping text to train AI is fair use.

That debate has reignited in recent weeks with a high-profile lawsuit and dramatic turns around copyright law and enforcement in both the U.S. and U.K."

The U.S. Copyright Office used to be fairly low-drama. Not anymore; NPR, June 6, 2025

 , NPR ; The U.S. Copyright Office used to be fairly low-drama. Not anymore

"The U.S. Copyright Office is normally a quiet place. It mostly exists to register materials for copyright and advise members of Congress on copyright issues. Experts and insiders used words like "stable" and "sleepy" to describe the agency. Not anymore...

Inside the AI report

That big bombshell report on generative AI and copyright can be summed up like this – in some instances, using copyrighted material to train AI models could count as fair use. In other cases, it wouldn't.

The conclusion of the report says this: "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs—all of which can affect the market."

"It's very even keeled," said Keith Kupferschmid, CEO of the Copyright Alliance, a group that represents artists and publishers pushing for stronger copyright laws.

Kupferschmid said the report avoids generalizations and takes arguments on a case-by-case basis.

"Perlmutter was beloved, no matter whether you agreed with her or not, because she did the hard work," Kupferschmid said. "She always was very thoughtful and considered all these different viewpoints."

It remains to be seen how the report will be used in the dozens of legal cases over copyright and AI usage."

Thursday, June 5, 2025

Government AI copyright plan suffers fourth House of Lords defeat; BBC, June 2, 2025

 Zoe Kleinman , BBC; Government AI copyright plan suffers fourth House of Lords defeat

"The argument is over how best to balance the demands of two huge industries: the tech and creative sectors. 

More specifically, it's about the fairest way to allow AI developers access to creative content in order to make better AI tools - without undermining the livelihoods of the people who make that content in the first place.

What's sparked it is the Data (Use and Access) Bill.

This proposed legislation was broadly expected to finish its long journey through parliament this week and sail off into the law books. 

Instead, it is currently stuck in limbo, ping-ponging between the House of Lords and the House of Commons.

A government consultation proposes AI developers should have access to all content unless its individual owners choose to opt out. 

But 242 members of the House of Lords disagree with the bill in its current form.

They think AI firms should be forced to disclose which copyrighted material they use to train their tools, with a view to licensing it."

Saturday, May 31, 2025

It’s too expensive to fight every AI copyright battle, Getty CEO says; Ars Technica, May 28, 2025

ASHLEY BELANGER , Ars Technica; It’s too expensive to fight every AI copyright battle, Getty CEO says


[Kip Currier: As of May 2025, New York Stock Exchange (NYSE) data values Getty Images at nearly three-quarters of a billion dollars.

So it's noteworthy and should give individual creators pause that even a company of that size is publicly acknowledging the financial realities of copyright litigation against AI tech companies like Stability AI.

Even if the courts ultimately determine that AI tech companies can prevail on fair use grounds against copyright infringement claims, isn't there something fundamentally unfair and unethical about AI tech oligarchs being able to devour and digest everyone else's copyrighted works, and then alchemize that improperly-taken aggregation of creativity into new IP works that they can monetize, with no recompense given to the original creators?

Just because someone can do something, doesn't mean they should be able to do it.

AI tech company leaders like Elon Musk, Sam Altman, Mark Zuckerberg et al would never stand for similar uses of their works without permission or compensation. 

Neither should creators. Quid pulchrum est (What's fair is fair).

If the courts do side with AI tech companies, new federal legislation may need to be enacted to provide protections for content creators from the AI tech companies that want and need their content to power up novel iterations of their AI tools via ever-increasing amounts of training data. 

In the current Congress, that's not likely to happen. But it may be possible after 2026 or 2028. If enough content creators make their voices heard through their grassroots advocacy and votes at the ballot box.]


[Excerpt]

"On Bluesky, a trial lawyer, Max Kennerly, effectively satirized Clegg and the whole AI industry by writing, "Our product creates such little value that it is simply not viable in the marketplace, not even as a niche product. Therefore, we must be allowed to unilaterally extract value from the work of others and convert that value into our profits."

Saturday, May 24, 2025

Judge Hints Anthropic’s AI Training on Books Is Fair Use; Bloomberg Law, May 22, 2025

 

, Bloomberg Law; Judge Hints Anthropic’s AI Training on Books Is Fair Use

"A California federal judge is leaning toward finding Anthropic PBC violated copyright law when it made initial copies of pirated books, but that its subsequent uses to train their generative AI models qualify as fair use.

“I’m inclined to say they did violate the Copyright Act but the subsequent uses were fair use,” Judge William Alsup said Thursday during a hearing in San Francisco. “That’s kind of the way I’m leaning right now,” he said, but concluded the 90-minute hearing by clarifying that his decision isn’t final. “Sometimes I say that and change my mind."...

The first judge to rule will provide a window into how federal courts interpret the fair use argument for training generative artificial intelligence models with copyrighted materials. A decision against Anthropic could disrupt the billion-dollar business model behind many AI companies, which rely on the belief that training with unlicensed copyrighted content doesn’t violate the law."

The Library of Congress Shake-Up Endangers Copyrights; Bloomberg, May 24, 2025

Stephen Mihm, Bloomberg; The Library of Congress Shake-Up Endangers Copyrights