Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Thursday, March 27, 2025

Judge allows 'New York Times' copyright case against OpenAI to go forward; NPR, March 27, 2025

 , NPR ; Judge allows 'New York Times' copyright case against OpenAI to go forward

"A federal judge on Wednesday rejected OpenAI's request to toss out a copyright lawsuit from The New York Times that alleges that the tech company exploited the newspaper's content without permission or payment.

In an order allowing the lawsuit to go forward, Judge Sidney Stein, of the Southern District of New York, narrowed the scope of the lawsuit but allowed the case's main copyright infringement claims to go forward.

Stein did not immediately release an opinion but promised one would come "expeditiously."

The decision is a victory for the newspaper, which has joined forces with other publishers, including The New York Daily News and the Center for Investigative Reporting, to challenge the way that OpenAI collected vast amounts of data from the web to train its popular artificial intelligence service, ChatGPT."

Wednesday, March 26, 2025

Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright; The Guardian, March 25, 2025

  , The Guardian; Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright

"Richard Osman has said that writers will “have a good go” at taking on Meta after it emerged that the company used a notorious database believed to contain pirated books to train artificial intelligence.

“Copyright law is not complicated at all,” the author of The Thursday Murder Club series wrote in a statement on X on Sunday evening. “If you want to use an author’s work you need to ask for permission. If you use it without permission you’re breaking the law. It’s so simple.”

In January, it emerged that Mark Zuckerberg approved his company’s use of The Library Genesis dataset, a “shadow library” that originated in Russia and contains more than 7.5m books. In 2024 a New York federal court ordered LibGen’s anonymous operators to pay a group of publishers $30m (£24m) in damages for copyright infringement. Last week, the Atlantic republished a searchable database of the titles contained in LibGen. In response, authors and writers’ organisations have rallied against Meta’s use of copyrighted works."

Search LibGen, the Pirated-Books Database That Meta Used to Train AI; The Atlantic, March 20, 2025

 Alex Reisner , The Atlantic; Search LibGen, the Pirated-Books Database That Meta Used to Train AI

"Editor’s note: This search tool is part of The Atlantic’s investigation into the Library Genesis data set. You can read an analysis about LibGen and its contents here. Find The Atlantic’s search tool for movie and television writing used to train AI here."

Anthropic wins early round in music publishers' AI copyright case; Reuters, March 26, 2025

  , Reuters; Anthropic wins early round in music publishers' AI copyright case

"Artificial intelligence company Anthropic convinced a California federal judge on Tuesday to reject a preliminary bid to block it from using lyrics owned by Universal Music Group and other music publishers to train its AI-powered chatbot Claude.

U.S. District Judge Eumi Lee said that the publishers' request was too broad and that they failed to show Anthropic's conduct caused them "irreparable harm."

Tuesday, March 25, 2025

Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works; Variety, March 17, 2025

 Todd Spangler , Variety; Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works

"More than 400 Hollywood creative leaders signed an open letter to the Trump White House’s Office of Science and Technology Policy, urging the administration to not roll back copyright protections at the behest of AI companies.

The filmmakers, writers, actors, musicians and others — which included Ben Stiller, Mark Ruffalo, Cynthia Erivo, Cate Blanchett, Cord Jefferson, Paul McCartney, Ron Howard and Taika Waititi — were submitting comments for the Trump administration’s U.S. AI Action Plan⁠. The letter specifically was penned in response to recent submissions to the Office of Science and Technology Policy from OpenAI and Google, which asserted that U.S. copyright law allows (or should allow) allow AI companies to train their system on copyrighted works without obtaining permission from (or compensating) rights holders."

Monday, March 24, 2025

Should AI be treated the same way as people are when it comes to copyright law? ; The Hill, March 24, 2025

  NICHOLAS CREEL, The Hill ; Should AI be treated the same way as people are when it comes to copyright law? 

"The New York Times’s lawsuit against OpenAI and Microsoft highlights an uncomfortable contradiction in how we view creativity and learning. While the Times accuses these companies of copyright infringement for training AI on their content, this ignores a fundamental truth: AI systems learn exactly as humans do, by absorbing, synthesizing and transforming existing knowledge into something new."

Friday, March 21, 2025

AI firms push to use copyrighted content freely; Axios, March 20, 2025

 Ina Fried, Axios; AI firms push to use copyrighted content freely

"A sharp divide over AI engines' free use of copyrighted material has emerged as a key conflict among the firms and groups that recently flooded the White House with advice on its forthcoming "AI Action Plan."

Why it matters: Copyright infringement claims were among the first legal challenges following ChatGPT's launch, with multiple lawsuits now winding their way through the courts.

Driving the news: In their White House memos, OpenAI and Google argue that their  use of copyrighted material for AI is a matter of national security — and if that use is limited, China will gain an unfair edge in the AI race."

Sunday, March 16, 2025

The AI Copyright Battle: Why OpenAI And Google Are Pushing For Fair Use; Forbes, March 15, 2025

 Virginie Berger , Forbes; The AI Copyright Battle: Why OpenAI And Google Are Pushing For Fair Use

"Furthermore, the ongoing lawsuits against AI firms could serve as a necessary correction to push the industry toward genuinely intelligent machine learning models instead of data-compression-based generators masquerading as intelligence. If legal challenges force AI firms to rethink their reliance on copyrighted content, it could spur innovation toward creating more advanced, ethically sourced AI systems...

Recommendations: Finding a Sustainable Balance

A sustainable solution must reconcile technological innovation with creators' economic interests. Policymakers should develop clear federal standards specifying fair use parameters for AI training, considering solutions such as:

  • Licensing and Royalties: Transparent licensing arrangements compensating creators whose work is integral to AI datasets.
  • Curated Datasets: Government or industry-managed datasets explicitly approved for AI training, ensuring fair compensation.
  • Regulated Exceptions: Clear legal definitions distinguishing transformative use in AI training contexts.

These nuanced policies could encourage innovation without sacrificing creators’ rights.

The lobbying by OpenAI and Google reveals broader tensions between rapid technological growth and ethical accountability. While national security concerns warrant careful consideration, they must not justify irresponsible regulation or ethical compromises. A balanced approach, preserving innovation, protecting creators’ rights, and ensuring sustainable and ethical AI development, is critical for future global competitiveness and societal fairness."

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use; Ars Technica, March 13, 2025

 ASHLEY BELANGER  , Ars Technica; OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

"OpenAI is hoping that Donald Trump's AI Action Plan, due out this July, will settle copyright debates by declaring AI training fair use—paving the way for AI companies' unfettered access to training data that OpenAI claims is critical to defeat China in the AI race.

Currently, courts are mulling whether AI training is fair use, as rights holders say that AI models trained on creative works threaten to replace them in markets and water down humanity's creative output overall.

OpenAI is just one AI company fighting with rights holders in several dozen lawsuits, arguing that AI transforms copyrighted works it trains on and alleging that AI outputs aren't substitutes for original works.

So far, one landmark ruling favored rights holders, with a judge declaring AI training is not fair use, as AI outputs clearly threatened to replace Thomson-Reuters' legal research firm Westlaw in the market, Wired reported. But OpenAI now appears to be looking to Trump to avoid a similar outcome in its lawsuits, including a major suit brought by The New York Times."

Tuesday, March 11, 2025

Judge says Meta must defend claim it stripped copyright info from Llama's training fodder; The Register, March 11, 2025

 Thomas Claburn , The Register; Judge says Meta must defend claim it stripped copyright info from Llama's training fodder

"A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models.

The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, who reckon the Instagram titan's use of their work to train its neural networks was illegal.

Their case burbled along until January 2025 when the plaintiffs made the explosive allegation that Meta knew it used copyrighted material for training, and that its AI models would therefore produce results that included copyright management information (CMI) – the fancy term for things like the creator of a copyrighted work, its license and terms of use, its date of creation, and so on, that accompany copyrighted material.

The miffed scribes alleged Meta therefore removed all of this copyright info from the works it used to train its models so users wouldn’t be made aware the results they saw stemmed from copyrighted stuff."

Saturday, March 1, 2025

Prioritise artists over tech in AI copyright debate, MPs say; The Guardian, February 26, 2025

 , The Guardian; Prioritise artists over tech in AI copyright debate, MPs say

"Two cross-party committees of MPs have urged the government to prioritise ensuring that creators are fairly remunerated for their creative work over making it easy to train artificial intelligence models.

The MPs argued there needed to be more transparency around the vast amounts of data used to train generative AI models, and urged the government not to press ahead with plans to require creators to opt out of having their data used.

The government’s preferred solution to the tension between AI and copyright law is to allow AI companies to train the models on copyrighted work by giving them an exception for “text and data mining”, while giving creatives the opportunity to opt out through a “rights reservation” system.

The chair of the culture, media and sport committee, Caroline Dinenage, said there had been a “groundswell of concern from across the creative industries” in response to the proposals, which “illustrates the scale of the threat artists face from artificial intelligence pilfering the fruits of their hard-earned success without permission”.

She added that making creative works “fair game unless creators say so” was akin to “burglars being allowed into your house unless there’s a big sign on your front door expressly telling them that thievery isn’t allowed”."

Thursday, February 27, 2025

An AI Maker Was Just Found Liable for Copyright Infringement. What Does This Portend for Content Creators and AI Makers?; The Federalist Society, February 25, 2025

 , The Federalist Society; An AI Maker Was Just Found Liable for Copyright Infringement. What Does This Portend for Content Creators and AI Makers?

"In a case decided on February 11, the makers of generative AI (GenAI), such as ChatGPT, lost the first legal battle in the war over whether they commit copyright infringement by using the material of others as training data without permission. The case is called Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.

If other courts follow this ruling, the cost of building and selling GenAI services will dramatically increase. Such businesses are already losing money.

The ruling could also empower content creators, such as writers, to deny the use of their material to train GenAIs or to demand license fees. Some creators might be unwilling to license use of their material for training AIs due to fear that GenAI will destroy demand for their work."

Wednesday, February 26, 2025

UK newspapers launch campaign against AI copyright plans; Independent, February 25, 2025

 Martyn Landi, Independent; UK newspapers launch campaign against AI copyright plans

"Some of the UK’s biggest newspapers have used a coordinated campaign across their front pages to raise their concerns about AI’s impact on the creative industries.

Special wraps appeared on Tuesday’s editions of the Daily Express, Daily Mail, The Mirror, the Daily Star, The i, The Sun, and The Times – as well as a number of regional titles – criticising a Government consultation around possible exemptions being added to copyright law for training AI models.

The proposals would allow tech firms to use copyrighted material from creatives and publishers without having to pay or gain a licence, or reimbursing creatives for using their work."

Tuesday, February 25, 2025

Monday, February 24, 2025

Copyright 'sell-out' will silence British musicians, says BRIAN MAY; Daily Mail, February 23, 2025

 Andy Behring , Daily Mail; Copyright 'sell-out' will silence British musicians, says BRIAN MAY

"No one will make music in Britain any more if Labour's AI copyright proposal succeeds, Sir Brian May warned last night as he backed the Daily Mail's campaign against it.

The Queen guitarist said he feared it may already be 'too late' because 'monstrously arrogant' Big Tech barons have already carried out an industrial-scale 'theft' of Britain's cultural genius.

He called on the Government to apply the brakes before the next chapter of Britain's rich cultural heritage – which includes Shakespeare, Chaucer, James Bond, The Beatles and Britpop – is nipped in the bud thanks to Sir Keir Starmer's copyright 'sell-out'...

Sir Brian said: 'My fear is that it's already too late – this theft has already been performed and is unstoppable, like so many incursions that the monstrously arrogant billionaire owners of Al and social media are making into our lives. The future is already forever changed."

Thursday, February 20, 2025

AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead; Electronic Frontier Foundation (EFF), February 19, 2025

 TORI NOBLE, Electronic Frontier Foundation (EFF); AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead


[Kip Currier: No, not everyone. Not requiring Big Tech to figure out a way to fairly license or get permission to use the copyrighted works of creators unjustly advantages these deep pocketed corporations. It also inequitably disadvantages the economic and creative interests of the human beings who labor to create copyrightable content -- authors, songwriters, visual artists, and many others.

The tell is that many of these same Big Tech companies are only too willing to file copyright infringement lawsuits against anyone whom they allege is infringing their AI content to create competing products and services.]


[Excerpt]


"Threats to Socially Valuable Research and Innovation 

Requiring researchers to license fair uses of AI training data could make socially valuable research based on machine learning (ML) and even text and data mining (TDM) prohibitively complicated and expensive, if not impossible. Researchers have relied on fair use to conduct TDM research for a decade, leading to important advancements in myriad fields. However, licensing the vast quantity of works that high-quality TDM research requires is frequently cost-prohibitive and practically infeasible.  

Fair use protects ML and TDM research for good reason. Without fair use, copyright would hinder important scientific advancements that benefit all of us. Empirical studies back this up: research using TDM methodologies are more common in countries that protect TDM research from copyright control; in countries that don’t, copyright restrictions stymie beneficial research. It’s easy to see why: it would be impossible to identify and negotiate with millions of different copyright owners to analyze, say, text from the internet."

Sunday, February 16, 2025

Court filings show Meta paused efforts to license books for AI training; TechCrunch, February 14, 3025

 Kyle Wiggers, TechCrunch; Court filings show Meta paused efforts to license books for AI training

"According to one transcript, Sy Choudhury, who leads Meta’s AI partnership initiatives, said that Meta’s outreach to various publishers was met with “very slow uptake in engagement and interest.”

“I don’t recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera,” Choudhury said, per the transcript, “and we didn’t get contact and feedback from — from a lot of our cold call outreaches to try to establish contact.”

Choudhury added, “There were a few, like, that did, you know, engage, but not many.”

According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript.

“I’d like to point out that the — in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”"

Friday, February 14, 2025

AI companies flaunt their theft. News media has to fight back – so we're suing. | Opinion; USA Today, February 13, 2025

Danielle Coffey, USA Today; AI companies flaunt their theft. News media has to fight back – so we're suing. | Opinion

"Danielle Coffey is president & CEO of the News/Media Alliance, which represents 2,000 news and magazine media outlets worldwide...

This is not an anti-AI lawsuit or an effort to turn back the clock. We love technology. We use it in our businesses. Artificial intelligence will help us better serve our customers, but only if it respects intellectual property. That’s the remedy we’re seeking in court.

When it suits them, the AI companies assert similar claims to ours. Meta's lawsuit accused Bright Data of scraping data in violation of its terms of use. And Sam Altman of OpenAI has complained that DeepSeek illegally copied its algorithms.

Good actors, responsible technologies and potential legislation offer some hope for improving the situation. But what is urgently needed is what every market needs: reinforcement of legal protections against theft."

Thursday, February 13, 2025

News publishers sue Cohere for copyright and trademark infringement; Axios, February 13, 2025

 

"More than a dozen major U.S. news organizations on Thursday said they were suing Cohere, an enterprise AI company, claiming the tech startup illegally repurposed their work and did so in a way that tarnished their brands.

Why it matters: The lawsuit represents the first official legal action against an AI company organized by the News Media Alliance — the largest news media trade group in the U.S...

  • The NMA members participating in the lawsuit include Advance Local Media, Condé Nast, The Atlantic, Forbes Media, The Guardian, Business Insider, The Los Angeles Times, McClatchy Media Company, Newsday, Plain Dealer Publishing Company, Politico, The Republican Company, Toronto Star Newspapers, and Vox Media.

Between the lines: The complaint was filed shortly after the U.S. Copyright Office changed its copyright registration processes to make them faster for digital publishers.

  • Previously, the process by which digital publishers had to file for copyright protections for individual works was extremely cumbersome, limiting their ability to seek protection. 

Because of those changes, Coffey explained, NMA and the publishers who are suing Cohere were able to identify thousands of specific examples of Cohere verbatim copying their copyright-protected works."

Wednesday, February 12, 2025

Court: Training AI Model Based on Copyrighted Data Is Not Fair Use as a Matter of Law; The National Law Review, February 11, 2025

 Joseph A. MeckesJoseph Grasser of Squire Patton Boggs (US) LLP   - Global IP and Technology Law Blog,  The National Law Review; Court: Training AI Model Based on Copyrighted Data Is Not Fair Use as a Matter of Law

"In what may turn out to be an influential decision, Judge Stephanos Bibas ruled as a matter of law in Thompson Reuters v. Ross Intelligence that creating short summaries of law to train Ross Intelligence’s artificial intelligence legal research application not only infringes Thompson Reuters’ copyrights as a matter of law but that the copying is not fair use. Judge Bibas had previously ruled that infringement and fair use were issues for the jury but changed his mind: “A smart man knows when he is right; a wise man knows when he is wrong.”

At issue in the case was whether Ross Intelligence directly infringed Thompson Reuters’ copyrights in its case law headnotes that are organized by Westlaw’s proprietary Key Number system. Thompson Reuters contended that Ross Intelligence’s contractor copied those headnotes to create “Bulk Memos.” Ross Intelligence used the Bulk Memos to train its competitive AI-powered legal research tool. Judge Bibas ruled that (i) the West headnotes were sufficiently original and creative to be copyrightable, and (ii) some of the Bulk Memos used by Ross were so similar that they infringed as a matter of law...

In other words, even if a work is selected entirely from the public domain, the simple act of selection is enough to give rise to copyright protection."