Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Friday, August 29, 2025

Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors; Wired, August 26, 2025

 Kate Knobs, Wired ; Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors

"ANTHROPIC HAS REACHED a preliminary settlement in a class action lawsuit brought by a group of prominent authors, marking a major turn in one of the most significant ongoing AI copyright lawsuits in history. The move will allow Anthropic to avoid what could have been a financially devastating outcome in court."

Thursday, August 28, 2025

Anthropic’s surprise settlement adds new wrinkle in AI copyright war; Reuters, August 27, 2025

, Reuters; Anthropic’s surprise settlement adds new wrinkle in AI copyright war

"Anthropic's class action settlement with a group of U.S. authors this week was a first, but legal experts said the case's distinct qualities complicate the deal's potential influence on a wave of ongoing copyright lawsuits against other artificial-intelligence focused companies like OpenAI, Microsoft and Meta Platforms.

Amazon-backed Anthropic was under particular pressure, with a trial looming in December after a judge found it liable for pirating millions of copyrighted books. The terms of the settlement, which require a judge's approval, are not yet public. And U.S. courts have just begun to wrestle with novel copyright questions related to generative AI, which could prompt other defendants to hold out for favorable rulings."

Monday, August 25, 2025

Who owns the copyright for AI work?; Financial Times, August 24, 2025

  , Financial Times; Who owns the copyright for AI work?

"Generative artificial intelligence poses two copyright puzzles. The first is the widely discussed question of compensation for work used to train AI models. The second, which has yet to receive as much attention, concerns the work that AI produces. Copyright is granted to authors. So what happens to work that has no human author?"

Sunday, August 24, 2025

Using AI for Work Could Land You on the Receiving End of a Nasty Lawsuit; Futurism, August 23, 2025

JOE WILKINS , Futurism; Using AI for Work Could Land You on the Receiving End of a Nasty Lawsuit

"For all its hype, artificial intelligence isn't without its psychologicalenvironmental, and even spiritual hazards.

Perhaps the most pressing concern on an individual level, though, is that it puts users on the hook for a nearly infinite number of legal hazards — even at work, as it turns out.


A recent breakdown by The Register highlights the legal dangers of AI use, especially in corporate settings. If you use generative AI software to spit out graphics, press releases, logos, or videos, you and your employer could end up facing six-figure damages, the publication warns.


This is thanks to the vast archive of copyrighted data that virtually all commercial generative AI models are trained on.


The Register uses Nintendo's Mario as a prime example of how one might stumble, intentionally or not, into a massive copyright lawsuit, regardless of intent to infringe: if you use AI to generate a cutesy mascot for your plumbing company that looks too much like the iconic videogame character, you could easily find yourself in the legal crosshairs of the notoriously litigious corporation.


"The real harm comes from the attorney's fees that you can get saddled with," intellectual property lawyer Benjamin Bedrava told the publication. "Because you could have a hundred and fifty thousand dollars in attorney's fees over something where the license would have been fifteen hundred dollars.""

Saturday, August 23, 2025

Watering down Australia’s AI copyright laws would sacrifice writers’ livelihoods to ‘brogrammers’; The Guardian, August 11, 2025

 Tracey Spicer, The Guardian; Watering down Australia’s AI copyright laws would sacrifice writers’ livelihoods to ‘brogrammers’

"My latest book, which is about artificial intelligence discriminating against people from marginalised communities, was composed on an Apple Mac.

Whatever the form of recording the first rough draft of history, one thing remains the same: they are very human stories – stories that change the way we think about the world.

A society is the sum of the stories it tells. When stories, poems or books are “scraped”, what does this really mean?

The definition of scraping is to “drag or pull a hard or sharp implement across (a surface or object) so as to remove dirt or other matter”.

A long way from Brisbane or Bangladesh, in the rarefied climes of Silicon Valley, scrapers are removing our stories as if they are dirt.

These stories are fed into the machines of the great god: generative AI. But the outputs – their creations – are flatter, less human, more homogenised. ChatGPT tells tales set in metropolitan areas in the global north; of young, cishet men and people living without disability.

We lose the stories of lesser-known characters in remote parts of the world, eroding our understanding of the messy experience of being human.

Where will we find the stories of 64-year-old John from Traralgon, who died from asbestosis? Or seven-year-old Raha from Jaipur, whose future is a “choice” between marriage at the age of 12 and sexual exploitation?

OpenAI’s creations are not the “machines of loving grace” envisioned in the 1967 poem by Richard Brautigan, where he dreams of a “cybernetic meadow”.

Scraping is a venal money grab by oligarchs who are – incidentally – scrambling to protect their own intellectual property during an AI arms race.

The code behind ChatGPT is protected by copyright, which is considered to be a literary work. (I don’t know whether to laugh or cry.)

Meta has already stolen the work of thousands of Australian writers.

Now, our own Productivity Commission is considering weakening our Copyright Act to include an exemption for text and data mining, which may well put us out of business.

In its response, The Australia Institute uses the analogy of a car: “Imagine grabbing the keys for a rental car and just driving around for a while without paying to hire it or filling in any paperwork. Then imagine that instead of being prosecuted for breaking the law, the government changed the law to make driving around in a rental car legal.”

It’s more like taking a piece out of someone’s soul, chucking it into a machine and making it into something entirely different. Ugly. Inhuman.

The commission’s report seems to be an absurdist text. The argument for watering down copyright is that it will lead to more innovation. But the explicit purpose of the Copyright Act is to protect innovation, in the form of creative endeavour.

Our work is being devalued, dismissed and destroyed; our livelihoods demolished.

In this age of techno-capitalism, it appears the only worthwhile innovation is being built by the “brogrammers”.

US companies are pinching Australian content, using it to train their models, then selling it back to us. It’s an extractive industry: neocolonialism, writ large."

Thursday, August 14, 2025

Japan’s largest newspaper, Yomiuri Shimbun, sues AI startup Perplexity for copyright violations; NiemanLab, August 11, 2025

 ANDREW DECK  , NiemanLab; Japan’s largest newspaper, Yomiuri Shimbun, sues AI startup Perplexity for copyright violations

"The Yomiuri Shimbun, Japan’s largest newspaper by circulation, has sued the generative AI startup Perplexity for copyright infringement. The lawsuit, filed in Tokyo District Court on August 7, marks the first copyright challenge by a major Japanese news publisher against an AI company.

The filing claims that Perplexity accessed 119,467 articles on Yomiuri’s site between February and June of this year, based on an analysis of its company server logs. Yomiuri alleges the scraping has been used by Perplexity to reproduce the newspaper’s copyrighted articles in responses to user queries without authorization.

In particular, the suit claims Perplexity has violated its “right of reproduction” and its “right to transmit to the public,” two tenets of Japanese law that give copyright holders control over the copying and distribution of their work. The suit seeks nearly $15 million in damages and demands that Perplexity stop reproducing its articles...

Japan’s copyright law allows AI developers to train models on copyrighted material without permission. This leeway is a direct result of a 2018 amendment to Japan’s Copyright Act, meant to encourage AI developmentin the country’s tech sector. The law does not, however, allow for wholesale reproduction of those works, or for AI developers to distribute copies in a way that will “unreasonably prejudice the interests of the copyright owner."

Wednesday, August 13, 2025

Judge rejects Anthropic bid to appeal copyright ruling, postpone trial; Reuters, August 12, 2025

  , Reuters; Judge rejects Anthropic bid to appeal copyright ruling, postpone trial

"A federal judge in California has denied a request from Anthropic to immediately appeal a ruling that could place the artificial intelligence company on the hook for billions of dollars in damages for allegedly pirating authors' copyrighted books.

U.S. District Judge William Alsup said on Monday that Anthropic must wait until after a scheduled December jury trial to appeal his decision that the company is not shielded from liability for pirating millions of books to train its AI-powered chatbot Claude."

Monday, August 11, 2025

Boston Public Library aims to increase access to a vast historic archive using AI; NPR, August 11, 2025

, NPR ; Boston Public Library aims to increase access to a vast historic archive using AI

"Boston Public Library, one of the oldest and largest public library systems in the country, is launching a project this summer with OpenAI and Harvard Law School to make its trove of historically significant government documents more accessible to the public.

The documents date back to the early 1800s and include oral histories, congressional reports and surveys of different industries and communities...

Currently, members of the public who want to access these documents must show up in person. The project will enhance the metadata of each document and will enable users to search and cross-reference entire texts from anywhere in the world. 

Chapel said Boston Public Library plans to digitize 5,000 documents by the end of the year, and if all goes well, grow the project from there...

Harvard University said it could help. Researchers at the Harvard Law School Library's Institutional Data Initiative are working with libraries, museums and archives on a number of fronts, including training new AI models to help libraries enhance the searchability of their collections. 

AI companies help fund these efforts, and in return get to train their large language models on high-quality materials that are out of copyright and therefore less likely to lead to lawsuits. (Microsoft and OpenAI are among the many AI players targeted by recent copyright infringement lawsuits, in which plaintiffs such as authors claim the companies stole their works without permission.)"

Saturday, August 9, 2025

News Corp CEO Robert Thomson slams AI firms for stealing copyrighted material like Trump’s ‘Art of the Deal’; New York Post, August 6, 2025

 Ariel Zilber, New York Post ; News Corp CEO Robert Thomson slams AI firms for stealing copyrighted material like Trump’s ‘Art of the Deal’

"The media executive said the voracious appetite of the AI firms to train their bots on proprietary content without paying for it risks eroding America’s edge over rival nations.

“Much is made of the competition with China, but America’s advantage is ingenuity and creativity, not bits and bytes, not watts but wit,” he said.

“To undermine that comparative advantage by stripping away IP rights is to vandalize our virtuosity.”"

AI industry horrified to face largest copyright class action ever certified; Ars Technica, August 8, 2025

 ASHLEY BELANGER, Ars Technica ; AI industry horrified to face largest copyright class action ever certified

"AI industry groups are urging an appeals court to block what they say is the largest copyright class action ever certified. They've warned that a single lawsuit raised by three authors over Anthropic's AI training now threatens to "financially ruin" the entire AI industry if up to 7 million claimants end up joining the litigation and forcing a settlement.

Last week, Anthropic petitioned to appeal the class certification, urging the court to weigh questions that the district court judge, William Alsup, seemingly did not. Alsup allegedly failed to conduct a "rigorous analysis" of the potential class and instead based his judgment on his "50 years" of experience, Anthropic said.

If the appeals court denies the petition, Anthropic argued, the emerging company may be doomed. As Anthropic argued, it now "faces hundreds of billions of dollars in potential damages liability at trial in four months" based on a class certification rushed at "warp speed" that involves "up to seven million potential claimants, whose works span a century of publishing history," each possibly triggering a $150,000 fine.

Confronted with such extreme potential damages, Anthropic may lose its rights to raise valid defenses of its AI training, deciding it would be more prudent to settle, the company argued. And that could set an alarming precedent, considering all the other lawsuits generative AI (GenAI) companies face over training on copyrighted materials, Anthropic argued."

Wednesday, July 30, 2025

European Creators Slam AI Act Implementation, Warn Copyright Protections Are Failing; The Hollywood Reporter; July 30, 2025

  Scott Roxborough, The Hollywood Reporter; European Creators Slam AI Act Implementation, Warn Copyright Protections Are Failing

"The coalition is asking for the European Commission to revisit its implementation of the AI Act to ensure the law ” lives up to its promise to safeguard European intellectual property rights in the age of generative AI.”

Tuesday, July 29, 2025

Meta pirated and seeded porn for years to train AI, lawsuit says; Ars Technica, July 28, 2025

ASHLEY BELANGER  , Ars Technica; Meta pirated and seeded porn for years to train AI, lawsuit says

"Porn sites may have blown up Meta's key defense in a copyright fight with book authors who earlier this year said that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries" to train its AI models.

Meta has defeated most of the authors' claims and claimed there is no proof that Meta ever uploaded pirated data through seeding or leeching on the BitTorrent network used to download training data. But authors still have a chance to prove that Meta may have profited off its massive piracy, and a new lawsuit filed by adult sites last week appears to contain evidence that could help authors win their fight, TorrentFreak reported.

The new lawsuit was filed last Friday in a US district court in California by Strike 3 Holdings—which says it attracts "over 25 million monthly visitors" to sites that serve as "ethical sources" for adult videos that "are famous for redefining adult content with Hollywood style and quality."

After authors revealed Meta's torrenting, Strike 3 Holdings checked its proprietary BitTorrent-tracking tools designed to detect infringement of its videos and alleged that the company found evidence that Meta has been torrenting and seeding its copyrighted content for years—since at least 2018. Some of the IP addresses were clearly registered to Meta, while others appeared to be "hidden," and at least one was linked to a Meta employee, the filing said."

Monday, July 28, 2025

A copyright lawsuit over pirated books could result in ‘business-ending’ damages for Anthropic; Fortune, July 28, 2025

  BEATRICE NOLAN , Fortune; A copyright lawsuit over pirated books could result in ‘business-ending’ damages for Anthropic

"A class-action lawsuit against Anthropic could expose the AI company to billions in copyright damages over its alleged use of pirated books from shadow libraries like LibGen and PiLiMi to train its models. While a federal judge ruled that training on lawfully obtained books may qualify as fair use, the court will hold a separate trial to address the allegedly illegal acquisition and storage of copyrighted works. Legal experts warn that statutory damages could be severe, with estimates ranging from $1 billion to over $100 billion."

Friday, July 25, 2025

Trump’s Comments Undermine AI Action Plan, Threaten Copyright; Publishers Weekly, July 23, 2025

 Ed Nawotka  , Publishers Weekly; Trump’s Comments Undermine AI Action Plan, Threaten Copyright

"Senate bill proposes 'opt-in' legislation

Trump's comments come on the heels of the introduction, by U.S. senators Josh Hawley (R-Mo.) and Richard Blumenthal (D-Conn.), of the AI Accountability and Personal Data Protection Act this past Monday following a hearing last week on AI companies' copyright infringement. The bipartisan legislation aims to hold AI firms liable for using copyrighted works or personal data without acquiring explicit consent to train AI models. It would empower individuals—including writers, artists, and content creators—to sue companies in federal court if their data or copyrighted works are used without consent. It also supports class action lawsuits and advocates for violators to pay robust penalties.

"AI companies are robbing the American people blind while leaving artists, writers, and other creators with zero recourse," said Hawley. "It’s time for Congress to give the American worker their day in court to protect their personal data and creative works. My bipartisan legislation would finally empower working Americans who now find their livelihoods in the crosshairs of Big Tech’s lawlessness."

"This bill embodies a bipartisan consensus that AI safeguards are urgent—because the technology is moving at accelerating speed, and so are dangers to privacy," added Blumenthal. "Enforceable rules can put consumers back in control of their data, and help bar abuses. Tech companies must be held accountable—and liable legally—when they breach consumer privacy, collecting, monetizing or sharing personal information without express consent. Consumers must be given rights and remedies—and legal tools to make them real—not relying on government enforcement alone."

Thursday, July 24, 2025

Donald Trump Is Fairy-Godmothering AI; The Atlantic, July 23, 2025

Matteo Wong , The Atlantic; Donald Trump Is Fairy-Godmothering AI

"In a sense, the action plan is a bet. AI is already changing a number of industries, including software engineering, and a number of scientific disciplines. Should AI end up producing incredible prosperity and new scientific discoveries, then the AI Action Plan may well get America there faster simply by removing any roadblocks and regulations, however sensible, that would slow the companies down. But should the technology prove to be a bubble—AI products remain error-prone, extremely expensive to build, and unproven in many business applications—the Trump administration is more rapidly pushing us toward the bust. Either way, the nation is in Silicon Valley’s hands...

Once the red tape is gone, the Trump administration wants to create a “dynamic, ‘try-first’ culture for AI across American industry.” In other words, build and test out AI products first, and then determine if those products are actually helpful—or if they pose any risks.

Trump gestured toward other concessions to the AI industry in his speech. He specifically targeted intellectual-property laws, arguing that training AI models on copyrighted books and articles does not infringe upon copyright because the chatbots, like people, are simply learning from the content. This has been a major conflict in recent years, with more than 40 related lawsuits filed against AI companies since 2022. (The Atlantic is suing the AI company Cohere, for example.) If courts were to decide that training AI models with copyrighted material is against the law, it would be a major setback for AI companies. In their official recommendations for the AI Action Plan, OpenAI, Microsoft, and Google all requested a copyright exception, known as “fair use,” for AI training. Based on his statements, Trump appears to strongly agree with this position, although the AI Action Plan itself does not reference copyright and AI training.

Also sprinkled throughout the AI Action Plan are gestures toward some MAGA priorities. Notably, the policy states that the government will contract with only AI companies whose models are “free from top-down ideological bias”—a reference to Sacks’s crusade against “woke” AI—and that a federal AI-risk-management framework should “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.” Trump signed a third executive order today that, in his words, will eliminate “woke, Marxist lunacy” from AI models...

Looming over the White House’s AI agenda is the threat of Chinese technology getting ahead. The AI Action Plan repeatedly references the importance of staying ahead of Chinese AI firms, as did the president’s speech: “We will not allow any foreign nation to beat us; our nation will not live in a planet controlled by the algorithms of the adversaries,” Trump declared...

But whatever happens on the international stage, hundreds of millions of Americans will feel more and more of generative AI’s influence—on salaries and schools, air quality and electricity costs, federal services and doctor’s offices. AI companies have been granted a good chunk of their wish list; if anything, the industry is being told that it’s not moving fast enough. Silicon Valley has been given permission to accelerate, and we’re all along for the ride."

Donald Trump Says AI Companies Can’t Be Expected To Pay For All Copyrighted Content Used In Their Training Models: “Not Do-Able”; Deadline, July 23, 2025

Ted JohnsonTom Tapp, Deadline; Donald Trump Says AI Companies Can’t Be Expected To Pay For All Copyrighted Content Used In Their Training Models: “Not Do-Able”

 

[Kip Currier: Don't be fooled by the flimflam rhetoric in Trump's AI Action Plan unveiled yesterday (July 23, 2025). Where Trump's AI Action Plan says “We must ensure that free speech flourishes in the era of AI and that AI procured by the Federal government objectively reflects truth rather than social engineering agendas", it's actually the exact opposite: the Trump plan is censorious and will "cancel out" truth (e.g. on climate science, misinformation and disinformation, etc.) in Orwellian fashion.]


[Excerpt]

"The plan is a contrast to Trump’s predecessor, Joe Biden, who focused on the government’s role in ensuring that the technology was safe.

The Trump White House plan also recommends updating federal procurement guidelines “to ensure that the government only contracts with frontier large language model (LLM) developers who ensure that their systems are objective and free from top-down ideological bias.” Also recommended is revising the National Institute of Standards and Technology AI Risk Management Framework to remove references to misinformation, DEI and climate change.

“We must ensure that free speech flourishes in the era of AI and that AI procured by the Federal government objectively reflects truth rather than social engineering agendas,” the plan says."

Wednesday, July 23, 2025

Trump derides copyright and state rules in AI Action Plan launch; Politico, July 23, 2025

MOHAR CHATTERJEE , Politico; Trump derides copyright and state rules in AI Action Plan launch

"President Donald Trump criticized copyright enforcement efforts and state-level AI regulations Wednesday as he launched the White House’s AI Action Plan on a mission to dominate the industry.

In remarks delivered at a “Winning the AI Race” summit hosted by the All-In Podcast and the Hill and Valley Forum in Washington, Trump said stringent copyright enforcement was unrealistic for the AI industry and would kneecap U.S. companies trying to compete globally, particularly against China.

“You can’t be expected to have a successful AI program when every single article, book or anything else that you’ve read or studied, you’re supposed to pay for,” he said. “You just can’t do it because it’s not doable. ... China’s not doing it.”

Trump’s comments were a riff as his 28-page AI Action Plan did not wade into copyright and administration officials told reporters the issue should be left to the courts to decide.

Trump also signed three executive orders. One will fast track federal permitting, streamline reviews and “do everything possible to expedite construction of all major AI infrastructure projects,” Trump said. Another expands American exports of AI hardware and software. A third order bans the federal government from procuring AI technology “that has been infused with partisan bias or ideological agendas,” as Trump put it...

Trump echoed tech companies’ complaints about state AI laws creating a patchwork of regulation. “You can’t have one state holding you up,” he said. “We need one common sense federal standard that supersedes all states, supersedes everybody.”"

Commentary: A win-win-win path for AI in America; The Post & Courier, July 22, 2025

 Keith Kupferschmid, The Post & Courier; Commentary: A win-win-win path for AI in America

"Contrary to claims that these AI training deals are impossible to make at scale, a robust free market is already emerging in which hundreds (if not thousands) of licensed deals between AI companies and copyright owners have been reached. New research shows it is possible to create fully licensed data sets for AI.

No wonder one federal judge recently called claims that licensing is impractical “ridiculous,” given the billions at stake: “If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders.” Just like AI companies don’t dispute that they have to pay for energy, infrastructure, coding teams and the other inputs their operations require, they need to pay for creative works as well.

America’s example to the world is a free-market economy based on the rule of law, property rights and freedom to contract — so, let the market innovate solutions to these new (but not so new) licensing challenges. Let’s construct a pro-innovation, pro-worker approach that replaces the false choice of the AI alarmists with a positive, pro-America pathway to leadership on AI."

Wave of copyright lawsuits hit AI companies like Cambridge-based Suno; WBUR, July 23, 2025

 

WBUR; Wave of copyright lawsuits hit AI companies like Cambridge-based Suno

"Suno, a Cambridge company that generates AI music, faces multiple lawsuits alleging it illegally trained its model on copyrighted work. Peter Karol of Suffolk Law School and Bhamati Viswanathan of Columbia University Law School's Kernochan Center for Law, Media, and the Arts join WBUR's Morning Edition to explain how the suits against Suno fit into a broader legal battle over the future of creative work.

This segment aired on July 23, 2025. Audio will be available soon."

Tuesday, July 22, 2025

Senators Introduce Bill To Restrict AI Companies’ Unauthorized Use Of Copyrighted Works For Training Models; Deadline, July 21, 2025

 Ted Johnson , Deadline; Senators Introduce Bill To Restrict AI Companies’ Unauthorized Use Of Copyrighted Works For Training Models

"Sen. Josh Hawley (R-MO) and Sen. Richard Blumenthal (D-CT) introduced legislation on Monday that would restrict AI companies from using copyrighted material in their training models without the consent of the individual owner.

The AI Accountability and Personal Data Protection Act also would allow individuals to sue companies that uses their personal data or copyrighted works without their “express, prior consent.”

The bill addresses a raging debate between tech and content owners, one that has already led to extensive litigation. Companies like OpenAI have argued that the use of copyrighted materials in training models is a fair use, while figures including John Grisham and George R.R. Martin have challenged that notion."