Showing posts with label Meta. Show all posts

Friday, August 15, 2025

Meta faces backlash over AI policy that lets bots have ‘sensual’ conversations with children; The Guardian, August 15, 2025

Edward Helmore, The Guardian ; Meta faces backlash over AI policy that lets bots have ‘sensual’ conversations with children

"A backlash is brewing against Meta over what it permits its AI chatbots to say.

An internal Meta policy document, seen by Reuters, showed the social media giant’s guidelines for its chatbots allowed the AI to “engage a child in conversations that are romantic or sensual”, generate false medical information, and assist users in arguing that Black people are “dumber than white people”."

Tuesday, August 5, 2025

Zuckerberg fired the fact-checkers. We tested their replacement.; The Washington Post, August 4, 2025

Geoffrey A. Fowler, The Washington Post; Zuckerberg fired the fact-checkers. We tested their replacement

"Zuckerberg fired professional fact-checkers, leaving users to fight falsehoods with community notes. As the main line of defense against hoaxes and deliberate liars exploiting our attention, community notes appear — so far — nowhere near up to the task."

Tuesday, July 29, 2025

Meta pirated and seeded porn for years to train AI, lawsuit says; Ars Technica, July 28, 2025

ASHLEY BELANGER , Ars Technica; Meta pirated and seeded porn for years to train AI, lawsuit says

"Porn sites may have blown up Meta's key defense in a copyright fight with book authors who earlier this year said that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries" to train its AI models.

Meta has defeated most of the authors' claims and claimed there is no proof that Meta ever uploaded pirated data through seeding or leeching on the BitTorrent network used to download training data. But authors still have a chance to prove that Meta may have profited off its massive piracy, and a new lawsuit filed by adult sites last week appears to contain evidence that could help authors win their fight, TorrentFreak reported.

The new lawsuit was filed last Friday in a US district court in California by Strike 3 Holdings—which says it attracts "over 25 million monthly visitors" to sites that serve as "ethical sources" for adult videos that "are famous for redefining adult content with Hollywood style and quality."

After authors revealed Meta's torrenting, Strike 3 Holdings checked its proprietary BitTorrent-tracking tools designed to detect infringement of its videos and alleged that the company found evidence that Meta has been torrenting and seeding its copyrighted content for years—since at least 2018. Some of the IP addresses were clearly registered to Meta, while others appeared to be "hidden," and at least one was linked to a Meta employee, the filing said."

Wednesday, July 9, 2025

Why the new rulings on AI copyright might actually be good news for publishers; Fast Company, July 9, 2025

PETE PACHAL, Fast Company; Why the new rulings on AI copyright might actually be good news for publishers

"The outcomes of both cases were more mixed than the headlines suggest, and they are also deeply instructive. Far from closing the door on copyright holders, they point to places where litigants might find a key...

Taken together, the three cases point to a clearer path forward for publishers building copyright cases against Big AI:
Focus on outputs instead of inputs: It’s not enough that someone hoovered up your work. To build a solid case, you need to show that what the AI company did with it reproduced it in some form. So far, no court has definitively decided whether AI outputs are meaningfully different enough to count as “transformative” in the eyes of copyright law, but it should be noted that courts have ruled in the past that copyright violation can occur even when small parts of the work are copied—ifthose parts represent the “heart” of the original.
Show market harm: This looks increasingly like the main battle. Now that we have a lot of data on how AI search engines and chatbots—which, to be clear, are outputs—are affecting the online behavior of news consumers, the case that an AI service harms the media market is easier to make than it was a year ago. In addition, the emergence of licensing deals between publishers and AI companies is evidence that there’s market harm by creating outputs without offering such a deal.
Question source legitimacy: Was the content legally acquired or pirated? The Anthropic case opens this up as a possible attack vector for publishers. If they can prove scraping occurred through paywalls—without subscribing first—that could be a violation even absent any outputs."

Saturday, June 28, 2025

The Anthropic Copyright Ruling Exposes Blind Spots on AI; Bloomberg, June 26, 2025

Dave Lee , Bloomberg; The Anthropic Copyright Ruling Exposes Blind Spots on AI

[Kip Currier: It's still early days in the AI copyright legal battles underway between AI tech companies and everyone else whose training data was "scarfed up" to enable the former to create lucrative AI tools and products. But cases like this week's Anthropic lawsuit win and another suit won by Meta (with some issues still to be adjudicated regarding the use of pirated materials as AI training data) are finally now giving us some more discernible "tea leaves" and "black letter law" as to how courts are likely to rule vis-a-vis AI inputs.

This week being the much ballyhooed 50th anniversary of the so-called "1st summer blockbuster flick" Jaws ("you're gonna need a bigger boat"), these rulings make me think we the public may need a bigger copyright law schema that sets out protections for the creatives making the fuel that enables stratospherically profitable AI innovations. The Jaws metaphor may be a bit on-the-nose, but one can't help but view AI tech companies akin to rapacious sharks that are imperiling the financial survival and long-standing business models of human creators.

As touched on in this Bloomberg article, too, there's a moral argument that what AI tech folks have done with the uncompensated use of creative works, without permission, doesn't mean that it's ethically justifiable simply because a court may say it's legal. Or that these companies shouldn't be required by updated federal copyright legislation and licensing frameworks to fairly compensate creators for the use of their copyrighted works. After all, billionaire tech oligarchs like Zuckerberg, Musk, and Altman would never allow others to do to them what they've done to creatives with impunity and zero contrition.

Are you listening, Congress?

Or are all of you in the pockets of AI tech company lobbyists, rather than representing the needs and interests of all of your constituents and not just the billionaire class.]

[Excerpt]

"In what is shaping up to be a long, hard fight over the use of creative works, round one has gone to the AI makers. In the first such US decision of its kind, District Judge William Alsup said Anthropic’s use of millions of books to train its artificial-intelligence model, without payment to the sources, was legal under copyright law because it was “transformative — spectacularly so.”...

If a precedent has been set, as several observers believe, it stands to cripple one of the few possible AI monetization strategies for rights holders, which is to sell licenses to firms for access to their work. Some of these deals have already been made while the “fair use” question has been in limbo, deals that emerged only after the threat of legal action. This ruling may have just taken future deals off the table...

Alsup was right when he wrote that “the technology at issue was among the most transformative many of us will see in our lifetimes.”...

But that doesn’t mean it shouldn’t pay its way. Nobody would dare suggest Nvidia Corp. CEO Jensen Huang hand out his chips free. No construction worker is asked to keep costs down by building data center walls for nothing. Software engineers aren’t volunteering their time to Meta Platforms Inc. in awe of Mark Zuckerberg’s business plan — they instead command salaries of $100 million and beyond.

Yet, as ever, those in the tech industry have decided that creative works, and those who create them, should be considered of little or no value and must step aside in service of the great calling of AI — despite being every bit as vital to the product as any other factor mentioned above. As science-fiction author Harlan Ellison said in his famous sweary rant, nobody ever wants to pay the writer if they can get away with it. When it comes to AI, paying creators of original work isn’t impossible, it’s just inconvenient. Legislators should leave companies no choice."

Wednesday, June 25, 2025

Judge dismisses authors’ copyright lawsuit against Meta over AI training; AP, June 25, 2025

MATT O’BRIEN AND BARBARA ORTUTAY, AP; Judge dismisses authors’ copyright lawsuit against Meta over AI training

"Although Meta prevailed in its request to dismiss the case, it could turn out to be a pyrrhic victory. In his 40-page ruling, Chhabria repeatedly indicated reasons to believe that Meta and other AI companies have turned into serial copyright infringers as they train their technology on books and other works created by humans, and seemed to be inviting other authors to bring cases to his court presented in a manner that would allow them to proceed to trial.

The judge scoffed at arguments that requiring AI companies to adhere to decades-old copyright laws would slow down advances in a crucial technology at a pivotal time. “These products are expected to generate billions, even trillions of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”

Tuesday, June 24, 2025

Study: Meta AI model can reproduce almost half of Harry Potter book; Ars Technica, June 20, 2025

TIMOTHY B. LEE , Ars Techcnica; Study: Meta AI model can reproduce almost half of Harry Potter book

"In recent years, numerous plaintiffs—including publishers of books, newspapers, computer code, and photographs—have sued AI companies for training models using copyrighted material. A key question in all of these lawsuits has been how easily AI models produce verbatim excerpts from the plaintiffs’ copyrighted content.

For example, in its December 2023 lawsuit against OpenAI, The New York Times Company produced dozens of examples where GPT-4 exactly reproduced significant passages from Times stories. In its response, OpenAI described this as a “fringe behavior” and a “problem that researchers at OpenAI and elsewhere work hard to address.”

But is it actually a fringe behavior? And have leading AI companies addressed it? New research—focusing on books rather than newspaper articles and on different companies—provides surprising insights into this question. Some of the findings should bolster plaintiffs’ arguments, while others may be more helpful to defendants.

The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models—three from Meta and one each from Microsoft and EleutherAI—were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright."

Thursday, June 5, 2025

Eminem Hits Meta With A Copyright Lawsuit After It Allegedly Misappropriated Hundreds Of His Songs; ABOVE THE LAW, June 4, 2025

Chris Williams , ABOVE THE LAW; Eminem Hits Meta With A Copyright Lawsuit After It Allegedly Misappropriated Hundreds Of His Songs

"Don’t. Mess. With. Eminem. And if the events are as cut and dried as the complaint makes it seem, Meta is getting off easy with the $109M price tag. Meta of all companies should know that the only thing that can get away with brazenly stealing the work of wealthy hard-working artists without facing legal consequences is AI-scrapping software."

Saturday, May 24, 2025

Judge Hints Anthropic’s AI Training on Books Is Fair Use; Bloomberg Law, May 22, 2025

Annelise Levy

, Bloomberg Law; Judge Hints Anthropic’s AI Training on Books Is Fair Use

"A California federal judge is leaning toward finding Anthropic PBC violated copyright law when it made initial copies of pirated books, but that its subsequent uses to train their generative AI models qualify as fair use.

“I’m inclined to say they did violate the Copyright Act but the subsequent uses were fair use,” Judge William Alsup said Thursday during a hearing in San Francisco. “That’s kind of the way I’m leaning right now,” he said, but concluded the 90-minute hearing by clarifying that his decision isn’t final. “Sometimes I say that and change my mind."...

The first judge to rule will provide a window into how federal courts interpret the fair use argument for training generative artificial intelligence models with copyrighted materials. A decision against Anthropic could disrupt the billion-dollar business model behind many AI companies, which rely on the belief that training with unlicensed copyrighted content doesn’t violate the law."

Tuesday, May 6, 2025

Meta lawsuit poses first big test of AI copyright battle; Financial Times, May 1, 2025

Cristina Criddle and Hannah Murphy, Financial Times; Meta lawsuit poses first big test of AI copyright battle

"The case, which has been brought by about a dozen authors including Ta-Nehisi Coates and Richard Kadrey, is centred on the $1.4tn social media giant’s use of LibGen, a so-called shadow library of millions of books, academic articles and comics, to train its Llama AI models. The ruling will have wide-reaching implications in the fierce copyright battle between artists and AI groups and is one of several lawsuits around the world that allege technology groups are using content without permission."

Wednesday, April 30, 2025

Meta Faces Copyright Reckoning in Authors’ Generative AI Case; Bloomberg Law, April 30, 2025

Isaiah Poritz, Annelise Levy, Bloomberg Law; Meta Faces Copyright Reckoning in Authors’ Generative AI Case

"The way courts will view the fair use argument for training generative artificial intelligence models with copyrighted materials will be tested Thursday in a San Francisco courtroom, when the first of dozens of such lawsuits reaches summary judgment.

Meta Platforms Inc. and a group of authors including comedian Sarah Silverman will square off before Judge Vince Chhabria, who will decide whether Meta’s use of pirated books to train its AI model Llama qualifies as fair use, or if the issue should be left to a jury."

Sunday, April 27, 2025

I didn’t eat or sleep’: a Meta moderator on his breakdown after seeing beheadings and child abuse; The Guardian, April 27, 2025

Rachel Hall and Claire Wilmot, The Guardian; I didn’t eat or sleep’: a Meta moderator on his breakdown after seeing beheadings and child abuse

"When Solomon* strode into the gleaming Octagon tower in Accra, Ghana, for his first day as a Meta content moderator, he was bracing himself for difficult but fulfilling work, purging social media of harmful content.

But after just two weeks of training, the scale and depravity of what he was exposed to was far darker than he ever imagined."

Meta faces Ghana lawsuits over impact of extreme content on moderators; The Guardian, April 27, 2025

Rachel Hall and Claire Wilmot, The Guardian; Meta faces Ghana lawsuits over impact of extreme content on moderators

"Meta is facing a second set of lawsuits in Africa over the psychological distress experienced by content moderators employed to take down disturbing social media content including depictions of murders, extreme violence and child sexual abuse.

Lawyers are gearing up for court action against a company contracted by Meta, which owns Facebook and Instagram, after meeting moderators at a facility in Ghana that is understood to employ about 150 people.

Moderators working for Majorel in Accra claim they have suffered from depression, anxiety, insomnia and substance abuse as a direct consequence of the work they do checking extreme content.

The allegedly gruelling conditions endured by workers in Ghana are revealed in a joint investigation by the Guardian and the Bureau of Investigative Journalism."

Monday, April 21, 2025

This Is How Meta AI Staffers Deemed More Than 7 Million Books to Have No “Economic Value”; Vanity Fair, April 15, 2025

KEZIAH WEIR, Vanity Fair; This Is How Meta AI Staffers Deemed More Than 7 Million Books to Have No “Economic Value”

"Last month, I prompted Meta AI with a question: “Is it okay to use pirated books to train AI?” The machine’s response was unequivocal. “Using pirated books to train AI is not okay,” it admonished. “In fact, it’s a serious violation of copyright laws and can have significant consequences for authors and publishers. Recently, it’s been reported that Meta used millions of pirated books to develop its AI programs, which has sparked widespread outrage and condemnation from the writing community.”"

Sunday, April 13, 2025

Law professors side with authors battling Meta in AI copyright case; TechCrunch, April 11, 2025

Kyle Wiggers , TechCrunch; Law professors side with authors battling Meta in AI copyright case

"A group of professors specializing in copyright law has filed an amicus brief in support of authors suing Meta for allegedly training its Llama AI models on e-books without permission.

The brief, filed on Friday in the U.S. District Court for the Northern District of California, San Francisco Division, calls Meta’s fair use defense “a breathtaking request for greater legal privileges than courts have ever granted human authors.”"

Wednesday, March 26, 2025

Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright; The Guardian, March 25, 2025

Ella Creamer , The Guardian; Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright

"Richard Osman has said that writers will “have a good go” at taking on Meta after it emerged that the company used a notorious database believed to contain pirated books to train artificial intelligence.

“Copyright law is not complicated at all,” the author of The Thursday Murder Club series wrote in a statement on X on Sunday evening. “If you want to use an author’s work you need to ask for permission. If you use it without permission you’re breaking the law. It’s so simple.”

In January, it emerged that Mark Zuckerberg approved his company’s use of The Library Genesis dataset, a “shadow library” that originated in Russia and contains more than 7.5m books. In 2024 a New York federal court ordered LibGen’s anonymous operators to pay a group of publishers $30m (£24m) in damages for copyright infringement. Last week, the Atlantic republished a searchable database of the titles contained in LibGen. In response, authors and writers’ organisations have rallied against Meta’s use of copyrighted works."

Search LibGen, the Pirated-Books Database That Meta Used to Train AI; The Atlantic, March 20, 2025

Alex Reisner , The Atlantic; Search LibGen, the Pirated-Books Database That Meta Used to Train AI

"Editor’s note: This search tool is part of The Atlantic’s investigation into the Library Genesis data set. You can read an analysis about LibGen and its contents here. Find The Atlantic’s search tool for movie and television writing used to train AI here."

Tuesday, March 11, 2025

Judge says Meta must defend claim it stripped copyright info from Llama's training fodder; The Register, March 11, 2025

Thomas Claburn , The Register; Judge says Meta must defend claim it stripped copyright info from Llama's training fodder

"A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models.

The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, who reckon the Instagram titan's use of their work to train its neural networks was illegal.

Their case burbled along until January 2025 when the plaintiffs made the explosive allegation that Meta knew it used copyrighted material for training, and that its AI models would therefore produce results that included copyright management information (CMI) – the fancy term for things like the creator of a copyrighted work, its license and terms of use, its date of creation, and so on, that accompany copyrighted material.

The miffed scribes alleged Meta therefore removed all of this copyright info from the works it used to train its models so users wouldn’t be made aware the results they saw stemmed from copyrighted stuff."

Sunday, February 16, 2025

Court filings show Meta paused efforts to license books for AI training; TechCrunch, February 14, 3025

Kyle Wiggers, TechCrunch; Court filings show Meta paused efforts to license books for AI training

"According to one transcript, Sy Choudhury, who leads Meta’s AI partnership initiatives, said that Meta’s outreach to various publishers was met with “very slow uptake in engagement and interest.”

“I don’t recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera,” Choudhury said, per the transcript, “and we didn’t get contact and feedback from — from a lot of our cold call outreaches to try to establish contact.”

Choudhury added, “There were a few, like, that did, you know, engage, but not many.”

According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript.

“I’d like to point out that the — in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”"

Monday, February 10, 2025

Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations; Tom's Hardware, February 9, 2025

Jowi Morales

, Tom's Hardware; Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations

"Facebook parent-company Meta is currently fighting a class action lawsuit alleging copyright infringement and unfair competition, among others, with regards to how it trained LLaMA. According to an X (formerly Twitter) post by vx-underground, court records reveal that the social media company used pirated torrents to download 81.7TB of data from shadow libraries including Anna’s Archive, Z-Library, and LibGen. It then used this information to train its AI models.

The evidence, in the form of written communication, shows the researchers’ concerns about Meta’s use of pirated materials. One senior AI researcher said way back in October 2022, “I don’t think we should use pirated material. I really need to draw a line here.” While another one said, “Using pirated material should be beyond our ethical threshold,” then they added, “SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they’re infringing it.”"