Showing posts with label copyrighted materials. Show all posts
Showing posts with label copyrighted materials. Show all posts

Sunday, June 30, 2024

Tech companies battle content creators over use of copyrighted material to train AI models; The Canadian Press via CBC, June 30, 2024

 Anja Karadeglija , The Canadian Press via CBC; Tech companies battle content creators over use of copyrighted material to train AI models

"Canadian creators and publishers want the government to do something about the unauthorized and usually unreported use of their content to train generative artificial intelligence systems.

But AI companies maintain that using the material to train their systems doesn't violate copyright, and say limiting its use would stymie the development of AI in Canada.

The two sides are making their cases in recently published submissions to a consultation on copyright and AI being undertaken by the federal government as it considers how Canada's copyright laws should address the emergence of generative AI systems like OpenAI's ChatGPT."

Tuesday, May 14, 2024

AI Challenges, Freedom to Read Top AAP Annual Meeting Discussions; Publishers Weekly, May 13, 2024

Jim Milliot , Publishers Weekly; AI Challenges, Freedom to Read Top AAP Annual Meeting Discussions

"The search for methods of reining in technology companies’ unauthorized copying of copyrighted materials to build generative AI models was the primary theme of this year's annual meeting of the Association of American Publishers, held May 9 over Zoom...

“To protect society, we will need a forward-thinking scheme of legal rules and enforcement authority across numerous jurisdictions and disciplines—not only intellectual property, but also national security, trade, privacy, consumer protection, and human rights, to name a few,” Pallante said. “And we will need ethical conduct.”...

Newton-Rex began in the generative AI space in 2010, and now leads the Fairly Trained, which launched in January as a nonprofit that seeks to certify AI companies that don't train models on copyrighted work without creators’ consent (Pallante is an advisor for the company.) He founded the nonprofit after leaving a tech company, Stability, that declined to use a licensing model to get permission to use copyrighted materials in training. Stability, Newton-Rex said, “argues that you can train on whatever you want. And it's a fair use in the United States, and I think this is not only incorrect, but I think it's ethically unforgivable. And I think we have to fight it with everything we have.”

“The old rules of copyright are gone,” said Maria Ressa, cofounder of the online news company Rappler and winner of the 2021 Nobel Peace Prize, in her keynote. “We are literally standing on the rubble of the world that was. If we don’t recognize it, we can’t rebuild it.”

Ressa added that, in a social media world drowning in misinformation and manipulation, “it is crucial that we get back to facts.” Messa advised publishers to “hold the line” in protecting their IP, and to continue to defend the importance of truth: “You cannot have rule of law if you do not have integrity of facts.”"

Friday, July 14, 2023

"Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI; Quartz, July 10, 2023

 Michelle Cheng, Quartz; "Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI

"However, there are clues about these two data sets. “Books1” is linked to Project Gutenberg (an online e-book library with over 60,000 titles), a popular dataset for AI researchers to train their data on due to the lack of copyright, the filing states. “Books2” is estimated to contain about 294,000 titles, it notes.

Most of the “internet-based books corpora” is likely to come from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The books aggregated by these sites are available in bulk via torrent websites, which are known for hosting copyrighted materials

What exactly are shadow libraries?

Shadow libraries are online databases that provide access to millions of books and articles that are out of print, hard to obtain, and paywalled. Many of these databases, which began appearing online around 2008, originated in Russia, which has a long tradition of sharing forbidden books, according to the magazine Reason.

Soon enough, these libraries became popular with cash-strapped academics around the world thanks to the high cost of accessing scholarly journals—with some reportedly going for as much as $500 for an entirely open-access article.

These shadow libraries are also called “pirate libraries” because they often infringe on copyrighted work and cut into the publishing industry’s profits. A 2017 Nielsen and Digimarc study (pdf) found that pirated books were “depressing legitimate book sales by as much as 14%.”"

Thursday, November 19, 2015