Showing posts with label copyright infringement. Show all posts
Showing posts with label copyright infringement. Show all posts

Thursday, March 7, 2024

Introducing CopyrightCatcher, the first Copyright Detection API for LLMs; Patronus AI, March 6, 2024

 Patronus AI; Introducing CopyrightCatcher, thefirst Copyright Detection API for LLMs

"Managing risks from unintended copyright infringement in LLM outputs should be a central focus for companies deploying LLMs in production.

  • On an adversarial copyright test designed by Patronus AI researchers, we found that state-of-the-art LLMs generate copyrighted content at an alarmingly high rate 😱
  • OpenAI’s GPT-4 produced copyrighted content on 44% of the prompts.
  • Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.
  • Anthropic’s Claude-2.1 produced copyrighted content on 8% of the prompts.
  • Meta’s Llama-2-70b-chat produced copyrighted content on 10% of the prompts.
  • Check out CopyrightCatcher, our solution to detect potential copyright violations in LLMs. Here’s the public demo, with open source model inference powered by Databricks Foundation Model APIs. 🔥

LLM training data often contains copyrighted works, and it is pretty easy to get an LLM to generate exact reproductions from these texts1. It is critical to catch these reproductions, since they pose significant legal and reputational risks for companies that build and use LLMs in production systems2. OpenAI, Anthropic, and Microsoft have all faced copyright lawsuits on LLM generations from authors3, music publishers4, and more recently, the New York Times5.

To check whether LLMs respond to your prompts with copyrighted text, you can use CopyrightCatcher. It detects when LLMs generate exact reproductions of content from text sources like books, and highlights any copyrighted text in LLM outputs. Check out our public CopyrightCatcher demo here!

Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst; CNBC, March 6, 2024

 Hayden Field, CNBC; Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst

"The company, founded by ex-Meta researchers, specializes in evaluation and testing for large language models — the technology behind generative AI products.

Alongside the release of its new tool, CopyrightCatcher, Patronus AI released results of an adversarial test meant to showcase how often four leading AI models respond to user queries using copyrighted text.

The four models it tested were OpenAI’s GPT-4, Anthropic’s Claude 2, Meta’s Llama 2 and Mistral AI’s Mixtral.

“We pretty much found copyrighted content across the board, across all models that we evaluated, whether it’s open source or closed source,” Rebecca Qian, Patronus AI’s cofounder and CTO, who previously worked on responsible AI research at Meta, told CNBC in an interview.

Qian added, “Perhaps what was surprising is that we found that OpenAI’s GPT-4, which is arguably the most powerful model that’s being used by a lot of companies and also individual developers, produced copyrighted content on 44% of prompts that we constructed.”"

Monday, February 12, 2024

On Copyright, Creativity, and Compensation; Reason, February 12, 2024

 , Reason; On Copyright, Creativity, and Compensation

"Some of you may have seen the article by David Segal in the Sunday NY Times several weeks ago [available here] about a rather sordid copyright fracas in which I have been embroiled over the past few months...

What to make of all this? I am not oblivious to the irony of being confronted with this problem after having spent 30 years or so, as a lawyer and law professor, reflecting on and writing about the many mysteries of copyright policy and copyright law in the Internet Age.

Here are a few things that strike me as interesting (and possibly important) in this episode."

Saturday, October 28, 2023

An AI engine scans a book. Is that copyright infringement or fair use?; Columbia Journalism Review, October 26, 2023

MATHEW INGRAM, Columbia Journalism Review; An AI engine scans a book. Is that copyright infringement or fair use?

"Determining whether LLMs training themselves on copyrighted text qualifies as fair use can be difficult even for experts—not just because AI is complicated, but because the concept of fair use is, too."

Tuesday, August 22, 2023

The Dream Was Universal Access to Knowledge. The Result Was a Fiasco.; The New York Times, August 13, 2023

 David Streitfeld, The New York Times; The Dream Was Universal Access to Knowledge. The Result Was a Fiasco.

"In the middle of this mess are writers, whose job is to produce the books that contain much of the world’s best information. Despite that central role, they are largely powerless — a familiar position for most writers. Emotions are running high...

It’s rarely this nasty, but free vs. expensive is a struggle that plays out continuously against all forms of media and entertainment. Neither side has the upper hand forever, even if it sometimes seems it might.

“The more information is free, the more opportunities for it to be collected, refined, packaged and made expensive,” said Stewart Brand, the technology visionary who first developed the formulation. “The more it is expensive, the more workarounds to make it free. It’s a paradox. Each side makes the other true.”"

Tuesday, July 25, 2023

The Generative AI Battle Has a Fundamental Flaw; Wired, July 25, 2023

 , Wired; The Generative AI Battle Has a Fundamental Flaw

"At the core of these cases, explains Sag, is the same general theory: that LLMs “copied” authors’ protected works. Yet, as Sag explained in testimony to a US Senate subcommittee hearing earlier this month, models like GPT-3.5 and GPT-4 do not “copy” work in the traditional sense. Digest would be a more appropriate verb—digesting training data to carry out their function: predicting the best next word in a sequence. “Rather than thinking of an LLM as copying the training data like a scribe in a monastery,” Sag said in his Senate testimony, “it makes more sense to think of it as learning from the training data like a student.”...

Ultimately, though, the technology is not going away, and copyright can only remedy some of its consequences. As Stephanie Bell, a research fellow at the nonprofit Partnership on AI, notes, setting a precedent where creative works can be treated like uncredited data is “very concerning.” To fully address a problem like this, the regulations AI needs aren't yet on the books."

Friday, September 2, 2022

Copyright Fair Use: How Much Copying is Too Much Copying?; Lexology, August 15, 2022

Goodell DeVries Leech & Dann LLP - Jim Astrachan, Lexology; Copyright Fair Use: How Much Copying is Too Much Copying?

"...no plagiarist can excuse the wrong by showing how much of his work he did not pirate.” These words were written by Judge Learned Hand in 1936. His point was that a taking of someone else’s expression will not be excused merely because it is insubstantial in quantity when held up for comparison to the infringing work.

Years back a copyright defendant client related copyright lore as a defense to his actions. He swore up and down that copying was permissible as long as not more than 10 percent of the source work was taken. Many times that belief has been mistakenly repeated. Many of the older, bedrock, principles of copyright practice are worth repeating. Perhaps this repetition comes from being the teacher that I suspect is part of my DNA.

The “ancient” case of Harper & Row Publishers, Inc. v. Nation Enterprises, 471 U.S. 539 (1985) should absolutely disabuse anyone of this silly notion." 

Friday, March 18, 2022

A professor found his exam questions posted online. He’s suing the students responsible for copyright infringement.; The Washington Post, March 16, 2022

Jaclyn Peiser, The Washington Post ; A professor found his exam questions posted online. He’s suing the students responsible for copyright infringement.

"Now, Berkovitz is suing the unknown students from the Orange, Calif., university for copyright infringement. In a lawsuit filed last week in the U.S. District Court for the Central District of California, the professor alleges the students “infringed Berkovitz’s exclusive right to reproduce, make copies, distribute, or create derivative works by publishing the Midterm Exam and Final Exam on the Course Hero website without Berkovitz’s permission.”"

Friday, June 12, 2020

Proposals for Copyright Law and Education During the COVID-19 Pandemic; infojustice, June 9, 2020

Emily Hudson and Paul Wragg, infojustice; Proposals for Copyright Law and Education During the COVID-19 Pandemic

"Abstract: This article asks whether the catastrophic impact of the COVID-19 pandemic justifies new limitations or interventions in copyright law so that UK educational institutions can continue to serve the needs of their students. It describes the existing copyright landscape and suggests ways in which institutions can rely on exceptions in the CDPA, including fair dealing and the exemption for lending by educational establishments. It then considers the viability of other solutions. It argues that issues caused by the pandemic would not enliven a public interest defence to copyright infringement (to the extent this still exists in UK law) but may be relevant to remedies. It also argues that compulsory licensing, while permissible under international copyright law, would not be a desirable intervention, but that legislative expansion to the existing exceptions, in order to encourage voluntary collective licensing, has a number of attractions. It concludes by observing that the pandemic highlights issues with the prevailing model for academic publishing, and asks whether COVID may encourage universities to embrace in-house and open access publishing more swiftly and for an even greater body of material."

Friday, January 10, 2020

Justice Department investigates Sci-Hub founder on suspicion of working for Russian intelligence; The Washington Post, December 19, 2019

Shane Harris and Devlin Barrett, The Washington Post; Justice Department investigates Sci-Hub founder on suspicion of working for Russian intelligence

"Elbakyan’s work has been the subject of legal and ethical controversy. In 2017, a New York district court awarded $15 million in damages to Elsevier, a leading science publisher, for copyright infringement by Sci-Hub and other sites...

Sci-Hub has made millions of documents available to users around the world, said Andrew Pitts, the managing director of PSI, an independent group based in England that advocates for legitimate access to scholarly content.

Pitts said there are 373 universities in 39 countries “that have suffered an intrusion from Sci-Hub,” which he defined as “using stolen credentials to illegally enter a university’s secure network.” More than 150 of the institutions are in the United States, Pitts said...

“She is the Kim Dotcom of scholarly publications,” said Joseph DeMarco, an attorney in New York who represented Elsevier in its lawsuit against Elbakyan. (Dotcom ran a famous file-sharing site that U.S. authorities said violated copyright law.)"

Thursday, December 5, 2019

Archivists Are Trying to Make Sure a ‘Pirate Bay of Science’ Never Goes Down; Vice, December 2, 2019

Matthew Gault, Vice;

Archivists Are Trying to Make Sure a ‘Pirate Bay of Science’ Never Goes Down


"...[O]ver the last few years, two sites—Library Genesis and Sci-Hub—have become high-profile, widely used resources for pirating scientific papers.

The problem is that these sites have had a lot of difficulty actually staying online. They have faced both legal challenges and logistical hosting problems that has knocked them offline for long periods of time. But a new project by data hoarders and freedom of information activists hopes to bring some stability to one of the two “Pirate Bays of Science...

“It's the largest free library in the world, servicing tens of thousands of scientists and medical professionals around the world who live in developing countries that can't afford to buy books and scientific journals. There's almost nothing else like this on Earth. They're using torrents to fulfill World Health Organization and U.N. charters. And it's not just one site index—it's a network of mirrored sites, where a new one pops up every time another gets taken down,” user shrine said on Reddit."

Wednesday, December 5, 2018

Supreme Court hands Fox News another win in copyright case against TVEyes monitoring service; The Washington Post, December 3, 2018

Erik Wemple, The Washington Post; Supreme Court hands Fox News another win in copyright case against TVEyes monitoring service

"The Supreme Court’s decision not to hear the case could leave media critics scrambling. How to fact-check the latest gaffe on “Hannity”? Did Brian Kilmeade really say that? To be sure, cable-news watchers commonly post the most extravagant cable-news moments on Twitter and other social media — a democratic activity that lies outside of the TVEyes ruling, because it’s not a money-making thing. Yet Fox News watchdogs use TVEyes and other services to soak in the full context surrounding those widely circulated clips, and that task is due to get more complicated. That said, services may still provide transcripts without infringing the Fox News copyright."

Friday, April 13, 2018

Former law student obtains $6.45M judgment in revenge porn case; ABA Journal, April 11, 2018

Debra Cassens Weiss, ABA Journal; Former law student obtains $6.45M judgment in revenge porn case

"A former law student in California has obtained a $6.45 million default judgment against a former boyfriend accused of posting her intimate photos after their breakup.

The woman, identified as “Jane Doe” in the case, was awarded $3 million in compensatory damages, $3 million in punitive damages and $450,000 for copyright infringement, report Law360 and CNN...

Besides infringement, the suit had alleged infliction of emotional distress, cyberstalking, and online impersonation with intent to cause harm.
Doe was represented by lawyers from K&L Gates’ Cyber Civil Rights Legal Project, a team of pro bono lawyers representing “revenge porn” victims. The award is the second-largest in a revenge porn case that doesn’t involve a celebrity, according to the law firm. The highest award, $8.9 million, was also obtained with the help of the project."

Wednesday, June 14, 2017

National Geographic Traveler Used My Photo for a Cover and Never Paid Me; PetaPixel, June 12, 2017

Mustafa Turgut, PetaPixel; National Geographic Traveler Used My Photo for a Cover and Never Paid Me

"After a couple of months of receiving no payment, I emailed them again asking them when they would be paying for the use of my photo on their cover.

They never responded to my email, and they have not responded to any contact attempt since then.

Frustrated, I began emailing the global National Geographic headquarters with my story. Although I have tried contacting headquarters over and over, I have yet to receive a single response.

I then began posting on National Geographic social media pages in 2013, but all of my posts were deleted shortly after I wrote them."

Flag on Water Stations Changed Due to Copyright Infringement; KRGV.com, May 25, 2017

KRGV.com; Flag on Water Stations Changed Due to Copyright Infringement

"[T]he American Red Cross sent the non-profit group a cease and desist letter warning the group is infringing on copyright laws."

Tuesday, May 2, 2017

Chinese Government and Hollywood Launch Snoop-and-Censor Copyright Filter; Electronic Frontier Foundation (EFF), May 1, 2017

Jeremy Malcolm, Electronic Frontier Foundation (EFF); 

Chinese Government and Hollywood Launch Snoop-and Censor Copyright Filter

"Two weeks ago the Copyright Society of China (also known as the China Copyright Association) launched its new 12426 Copyright Monitoring Center, which is dedicated to scanning the Chinese Internet for evidence of copyright infringement. This frightening panopticon is said to be able to monitor video, music and images found on "mainstream audio and video sites and graphic portals, small and medium vertical websites, community platforms, cloud and P2P sites, SmartTV, external set-top boxes, aggregation apps, and so on."...

The announcement of China's government-linked 12426 Copyright Monitoring Center is absolutely chilling. It is just as chilling that the governments of the United States and Europe are being lobbied by copyright holders to follow China's lead. Although this call is being heard on both sides of the Atlantic, it has gained the most ground in Europe, where it needs to be urgently stopped in its tracks. Europeans can learn more and speak out against these draconian censorship demands at the Save the Meme campaign website."

Monday, April 24, 2017

‘Remix’ or plagiarism? Artists battle over a Chicago mural of Michelle Obama.; Washington Post, April 24, 2017

Derek Hawkins, Washington Post; ‘Remix’ or plagiarism? Artists battle over a Chicago mural of Michelle Obama.

"Devins’s mural had only been up for a matter of hours when word got back to Mesfin. She objected to the use of her work without permission in a widely circulated Instagram post that triggered a wave of outrage online, saying she felt like Devins stole her piece.

“I was very disheartened when he did that,” Mesfin told The Washington Post. “There’s a common code among all artists that you can get inspired by someone’s work but you have to pay homage and you have to give credit for it.”...

Devins said he never intended to take credit for Mesfin’s creation, which itself was based off a portrait in the New York Times by photographer Collier Schorr. Mesfin credited Schorr’s work on her Instagram post...

Devins said he came across Mesfin’s drawing on the sharing site Pinterest and was unable to track down the artist. He explained his decision to use the image without permission in an analogy, saying he was creating a “remix” of a piece of art in the way that a DJ remixes songs."

Wednesday, March 29, 2017

Judge: Annotations to Georgia Law Are Protected by Copyright; Associated Press via U.S. News & World Report, March 28, 2017

Kate Brumback, Associated Press via U.S. News & World Report; 

Judge: Annotations to Georgia Law Are Protected by Copyright


"A federal judge has ruled that annotations to Georgia's legal code can be copyrighted and that a nonprofit organization's copying and distribution of them isn't protected by fair use laws.

The state in July 2015 sued Public.Resource.Org Inc. in federal court in Atlanta. The nonprofit is run by Carl Malamud, an internet public domain advocate who argues for free access to legally obtained files."

Tuesday, March 7, 2017

Prenda Law principal pleads guilty to federal charges in porn copyright case; ABA Journal, March 7, 2017

Stephanie Francis Ward, ABA Journal; 

Prenda Law principal pleads guilty to federal charges in porn copyright case


"A defendant in the Prenda Law case, which involved alleged shakedowns of people accused of illegally downloading pornography, pleaded guilty Monday to federal conspiracy charges of money laundering, mail fraud and wire fraud. 

John L. Steele, the defendant, previously bragged about earning millions from suing people for illegal downloads, the Star Tribune reports. Federal prosecutors claim that Steele and Paul Hansmeier, a Minneapolis attorney, created two fake businesses to acquire copyrights for the pornographic films, some of which they filmed themselves, and posted the materials to file-sharing websites. Then they and other lawyers filed John Doe lawsuits against the downloaders and subpoenaed Internet service providers to identify defendants.
The government asked for a sentence of eight to 10 years. But according to the Star Tribune, prosecutors could agree to something shorter if Steele cooperates with them, which presumably would involve testifying against Hansemeier.
Between April 2011 and December 2012, Steele and Hansmeier, along with lawyers who worked for them, collected more than $6 million in settlements, according to the article."