Showing posts with label LLMs. Show all posts
Showing posts with label LLMs. Show all posts

Wednesday, January 15, 2025

'The New York Times' takes OpenAI to court. ChatGPT's future could be on the line; NPR, January 14, 2025

 , NPR; 'The New York Times' takes OpenAI to court. ChatGPT's future could be on the line

"A group of news organizations, led by The New York Times, took ChatGPT maker OpenAI to federal court on Tuesday in a hearing that could determine whether the tech company has to face the publishers in a high-profile copyright infringement trial.

Three publishers' lawsuits against OpenAI and its financial backer Microsoft have been merged into one case. Leading each of the three combined cases are the Times, The New York Daily News and the Center for Investigative Reporting.

Other publishers, like the Associated Press, News Corp. and Vox Media, have reached content-sharing deals with OpenAI, but the three litigants in this case are taking the opposite path: going on the offensive."

Monday, January 6, 2025

OpenAI holds off on promise to creators, fails to protect intellectual property; The American Bazaar, January 3, 2025

  Vishnu Kamal, The American Bazaar; OpenAI holds off on promise to creators, fails to protect intellectual property

"OpenAI may yet again be in hot water as it seems that the tech giant may be reneging on its earlier assurances. Reportedly, in May, OpenAI said it was developing a tool to let creators specify how they want their works to be included in—or excluded from—its AI training data. But seven months later, this feature has yet to see the light of day.

Called Media Manager, the tool would “identify copyrighted text, images, audio, and video,” OpenAI said at the time, to reflect creators’ preferences “across multiple sources.” It was intended to stave off some of the company’s fiercest critics, and potentially shield OpenAI from IP-related legal challenges...

OpenAI has faced various legal challenges related to its AI technologies and operations. One major issue involves the privacy and data usage of its language models, which are trained on large datasets that may include publicly available or copyrighted material. This raises concerns over privacy violations and intellectual property rights, especially regarding whether the data used for training was obtained with proper consent.

Additionally, there are questions about the ownership of content generated by OpenAI’s models. If an AI produces a work based on copyrighted data, it is tricky to determine who owns the rights—whether it’s OpenAI, the user who prompted the AI, or the creators of the original data.

Another concern is the liability for harmful content produced by AI. If an AI generates misleading or defamatory information, legal responsibility could fall on OpenAI."

Sunday, December 29, 2024

AI's assault on our intellectual property must be stopped; Financial Times, December 21, 2024

 Kate Mosse, Financial Times; AI's assault on our intellectual property must be stopped

"Imagine my dismay, therefore, to discover that those 15 years of dreaming, researching, planning, writing, rewriting, editing, visiting libraries and archives, translating Occitan texts, hunting down original 13th-century documents, becoming an expert in Catharsis, apparently counts for nothing. Labyrinth is just one of several of my novels that have been scraped by Meta's large language model. This has been done without my consent, without remuneration, without even notification. This is theft...

AI companies present creators as being against change. We are  not. Every artist I know is already engaging with AI in one way or another. But a distinction needs to be made between AI that can be used in brilliant ways -- for example, medical diagnosis -- and the foundations of AI models, where companies are essentially stealing creatives' work for their own profit. We should not forget that the AI companies rely on creators to build their models. Without strong copyright law that ensures creators can earn a living, AI companies will lack the high-quality material that is essential for their future growth."

Tuesday, December 24, 2024

ChatGPT search tool vulnerable to manipulation and deception, tests show; The Guardian, December 24, 2024

 , The Guardian; ChatGPT search tool vulnerable to manipulation and deception, tests show

"OpenAI’s ChatGPT search tool may be open to manipulation using hidden content, and can return malicious code from websites it searches, a Guardian investigation has found.

OpenAI has made the search product available to paying customers and is encouraging users to make it their default search tool. But the investigation has revealed potential security issues with the new system.

The Guardian tested how ChatGPT responded when asked to summarise webpages that contain hidden content. This hidden content can contain instructions from third parties that alter ChatGPT’s responses – also known as a “prompt injection” – or it can contain content designed to influence ChatGPT’s response, such as a large amount of hidden text talking about the benefits of a product or service."

Sunday, December 8, 2024

There’s No Longer Any Doubt That Hollywood Writing Is Powering AI; The Atlantic, November 18, 2024

 Alex Reisner , The Atlantic; There’s No Longer Any Doubt That Hollywood Writing Is Powering AI

"Editor’s note: This analysis is part of The Atlantic’s investigation into the OpenSubtitles data set. You can access the search tool directly hereFind The Atlantic's search tool for books used to train AI here.

For as long as generative-AI chatbots have been on the internet, Hollywood writers have wondered if their work has been used to train them. The chatbots are remarkably fluent with movie references, and companies seem to be training them on all available sources. One screenwriter recently told me he’s seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf, but he had no way to prove that a program had been trained on such material.

I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The WireThe Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts."

The Copyrighted Material Being Used to Train AI; The Bulwark, December 7, 2024

SONNY BUNCH, The Bulwark; The Copyrighted Material Being Used to Train AI

"On this week’s episode, I talked to Alex Reisner about his pieces in the Atlantic highlighting the copyrighted material being hoovered into large language models to help AI chatbots simulate human speech. If you’re a screenwriter and would like to see which of your work has been appropriated to aid in the effort, click here; he has assembled a searchable database of nearly 140,000 movie and TV scripts that have been used without permission. (And you should read his other stories about copyright law reaching its breaking point and “the memorization problem.”) In this episode, we also got into the metaphysics of art and asked what sort of questions need to be asked as we hurtle toward the future. If you enjoyed this episode, please share it with a friend!"

Tuesday, November 5, 2024

The Heart of the Matter: Copyright, AI Training, and LLMs; SSRN, November 1, 2024

 Daniel J. GervaisVanderbilt University - Law School

Noam ShemtovQueen Mary University of London, Centre for Commercial Law Studies

Haralambos MarmanisCopyright Clearance Center

Catherine Zaller RowlandCopyright Clearance Center 

SSRN; The Heart of the Matter: Copyright, AI Training, and LLMs



"Abstract

This article explores the intricate intersection of copyright law and large language models (LLMs), a cutting-edge artificial intelligence technology that has rapidly gained prominence. The authors provide a comprehensive analysis of the copyright implications arising from the training, fine-tuning, and use of LLMs, which often involve the ingestion of vast amounts of copyrighted material. The paper begins by elucidating the technical aspects of LLMs, including tokenization, word embeddings, and the various stages of LLM development. This technical foundation is crucial for understanding the subsequent legal analysis. The authors then delve into the copyright law aspects, examining potential infringement issues related to both inputs and outputs of LLMs. A comparative legal analysis is presented, focusing on the United States, European Union, United Kingdom, Japan, Singapore, and Switzerland. The article scrutinizes relevant copyright exceptions and limitations in these jurisdictions, including fair use in the US and text and data mining exceptions in the EU. The authors highlight the uncertainties and challenges in applying these legal concepts to LLMs, particularly in light of recent court decisions and legislative developments. The paper also addresses the potential impact of the EU's AI Act on copyright considerations, including its extraterritorial effects. Furthermore, it explores the concept of "making available" in the context of LLMs and its implications for copyright infringement. Recognizing the legal uncertainties and the need for a balanced approach that fosters both innovation and copyright protection, the authors propose licensing as a key solution. They advocate for a combination of direct and collective licensing models to provide a practical framework for the responsible use of copyrighted materials in AI systems.

This article offers valuable insights for legal scholars, policymakers, and industry professionals grappling with the copyright challenges posed by LLMs. It contributes to the ongoing dialogue on adapting copyright law to technological advancements while maintaining its fundamental purpose of incentivizing creativity and innovation."

Monday, November 4, 2024

What AI knows about you; Axios, November 4, 2024

Ina Friend, Axios; What AI knows about you

"Most AI builders don't say where they are getting the data they use to train their bots and models — but legally they're required to say what they are doing with their customers' data.

The big picture: These data-use disclosures open a window onto the otherwise opaque world of Big Tech's AI brain-food fight.

  • In this new Axios series, we'll tell you, company by company, what all the key players are saying and doing with your personal information and content.

Why it matters: You might be just fine knowing that picture you just posted on Instagram is helping train the next generative AI art engine. But you might not — or you might just want to be choosier about what you share.

Zoom out: AI makers need an incomprehensibly gigantic amount of raw data to train their large language and image models. 

  • The industry's hunger has led to a data land grab: Companies are vying to teach their baby AIs using information sucked in from many different sources — sometimes with the owner's permission, often without it — before new laws and court rulings make that harder. 

Zoom in: Each Big Tech giant is building generative AI models, and many of them are using their customer data, in part, to train them.

  • In some cases it's opt-in, meaning your data won't be used unless you agree to it. In other cases it is opt-out, meaning your information will automatically get used unless you explicitly say no. 
  • These rules can vary by region, thanks to legal differences. For instance, Meta's Facebook and Instagram are "opt-out" — but you can only opt out if you live in Europe or Brazil.
  • In the U.S., California's data privacy law is among the laws responsible for requiring firms to say what they do with user data. In the EU, it's the GDPR."

Monday, October 21, 2024

Microsoft boss urges rethink of copyright laws for AI; The Times, October 21, 2024

 Katie Prescott, The Times; Microsoft boss urges rethink of copyright laws for AI

"The boss of Microsoft has called for a rethink of copyright laws so that tech giants are able to train artificial intelligence models without risk of infringing intellectual property rights.

Satya Nadella, chief executive of the technology multinational, praised Japan’s more flexible copyright laws and said that governments need to develop a new legal framework to define “fair use” of material, which allows people in certain situations to use intellectual property without permission.

Nadella, 57, said governments needed to iron out the rules. “What are the bounds for copyright, which obviously have to be protected? What’s fair use?” he said. “For any society to move forward, you need to know what is fair use.”"

Friday, October 18, 2024

Penguin Random House underscores copyright protection in AI rebuff; The Bookseller, October 18, 2024

  MATILDA BATTERSBY, The Bookseller; Penguin Random House underscores copyright protection in AI rebuff

"The world’s biggest trade publisher has changed the wording on its copyright pages to help protect authors’ intellectual property from being used to train large language models (LLMs) and other artificial intelligence (AI) tools, The Bookseller can exclusively reveal.

Penguin Random House (PRH) has amended its copyright wording across all imprints globally, confirming it will appear “in imprint pages across our markets”. The new wording states: “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems”, and will be included in all new titles and any backlist titles that are reprinted.

The statement also “expressly reserves [the titles] from the text and data mining exception”, in accordance with a European Parliament directive.

The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.

PRH is believed to be the first of the Big Five anglophone trade publishers to amend its copyright information to reflect the acceleration of AI systems and the alleged reliance by tech companies on using published work to train language models."

Friday, October 11, 2024

Why The New York Times' lawyers are inspecting OpenAI's code in a secretive room; Business Insider, October 10, 2024

   , Business Insider; Why The New York Times' lawyers are inspecting OpenAI's code in a secretive room

"OpenAI is worth $157 billion largely because of the success of ChatGPT. But to build the chatbot, the company trained its models on vast quantities of text it didn't pay a penny for.

That text includes stories from The New York Times, articles from other publications, and an untold number of copyrighted books.

The examination of the code for ChatGPT, as well as for Microsoft's artificial intelligence models built using OpenAI's technology, is crucial for the copyright infringement lawsuits against the two companies.

Publishers and artists have filed about two dozen major copyright lawsuits against generative AI companies. They are out for blood, demanding a slice of the economic pie that made OpenAI the dominant player in the industry and which pushed Microsoft's valuation beyond $3 trillion. Judges deciding those cases may carve out the legal parameters for how large language models are trained in the US."

Sunday, September 29, 2024

AI could be an existential threat to publishers – that’s why Mumsnet is fighting back; The Guardian, September 28, 2024

 , The Guardian; AI could be an existential threat to publishers – that’s why Mumsnet is fighting back

"After nearly 25 years as a founder of Mumsnet, I considered myself pretty unshockable when it came to the workings of big tech. But my jaw hit the floor last week when I read that Google was pushing to overhaul UK copyright law in a way that would allow it to freely mine other publishers’ content for commercial gain without compensation.

At Mumsnet, we’ve been on the sharp end of this practice, and have recently launched the first British legal action against the tech giant OpenAI. Earlier in the year, we became aware that it was scraping our content – presumably to train its large language model (LLM). Such scraping without permission is a breach of copyright laws and explicitly of our terms of use, so we approached OpenAI and suggested a licensing deal. After lengthy talks (and signing a non-disclosure agreement), it told us it wasn’t interested, saying it was after “less open” data sources...

If publishers wither and die because the AIs have hoovered up all their traffic, then who’s left to produce the content to feed the models? And let’s be honest – it’s not as if these tech giants can’t afford to properly compensate publishers. OpenAI is currently fundraising to the tune of $6.5bn, the single largest venture capital round of all time, valuing the enterprise at a cool $150bn. In fact, it has just been reported that the company is planning to change its structure and become a for-profit enterprise...

I’m not anti-AI. It plainly has the potential to advance human progress and improve our lives in myriad ways. We used it at Mumsnet to build MumsGPT, which uncovers and summarises what parents are thinking about – everything from beauty trends to supermarkets to politicians – and we licensed OpenAI’s API (application programming interface) to build it. Plus, we think there are some very good reasons why these AI models should ingest Mumsnet’s conversations to train their models. The 6bn-plus words on Mumsnet are a unique record of 24 years of female interaction about everything from global politics to relationships with in-laws. By contrast, most of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter their gender bias.

But Google’s proposal to change our laws would allow billion-dollar companies to waltz untrammelled over any notion of a fair value exchange in the name of rapid “development”. Everything that’s unique and brilliant about smaller publisher sites would be lost, and a handful of Silicon Valley giants would be left with even more control over the world’s content and commerce."

Thursday, September 26, 2024

Perspectives in Artificial Intelligence: Ethical Use; Marquette Today, September 20, 2024

Andrew Goldstein  , Marquette Today; Perspectives in Artificial Intelligence: Ethical Use

"Ethical application 

While artificial intelligence unlocks broad possibilities for positive change, unethical actors have access to these same tools. For instance, companies hoping to grow cigarette sales can target people who are prone to smoking or trying to quit with greater precision. Deepfake videos allow scam callers to imitate the faces and voices of loved ones.  

In this world, it is more important than ever that students be trained on the limits of AI and its proper use cases. 

“We need to think about the societal impact of artificial intelligence; who gets this data, what it’s being used for and how we steer people toward value-creating activities,” Ow says. “Using AI has the potential to improve your life and to provide insights and opportunities for the individual, the community and society."

Wednesday, September 25, 2024

Meta Fails to Block Zuckerberg Deposition in AI Copyright Suit; Bloomberg Law, September 25, 2024

 Aruni Soni, Bloomberg Law; Meta Fails to Block Zuckerberg Deposition in AI Copyright Suit

"A federal magistrate judge opened the door to a deposition of Meta Platforms Inc. CEO Mark Zuckerberg in a copyright lawsuit over the tech company’s large language model, denying the social media giant’s bid for a protective order.

Magistrate Judge Thomas S. Hixson denied the request to block the deposition because the plaintiffs supplied enough evidence that Zuckerberg is the “chief decision maker and policy setter for Meta’s Generative AI branch and the development of the large language models at issue in this action,” he said in the order filed Tuesday in the US District Court for the Northern District."

Thursday, August 29, 2024

California advances landmark legislation to regulate large AI models; AP, August 28, 2024

TRÂN NGUYỄN, AP ; California advances landmark legislation to regulate large AI models

"Wiener’s proposal is among dozens of AI bills California lawmakers proposed this year to build public trust, fight algorithmic discrimination and outlaw deepfakes that involve elections or pornography. With AI increasingly affecting the daily lives of Americans, state legislators have tried to strike a balance of reigning in the technology and its potential risks without stifling the booming homegrown industry. 

California, home of 35 of the world’s top 50 AI companies, has been an early adopter of AI technologies and could soon deploy generative AI tools to address highway congestion and road safety, among other things."

Tuesday, August 6, 2024

How Companies Can Take a Global Approach to AI Ethics; Harvard Business Review (HBR), August 5, 2024

Favour Borokini, and Harvard Business Review (HBR) ; How Companies Can Take a Global Approach to AI Ethics

"Getting the AI ethics policy right is a high-stakes affair for an organization. Well-published instances of gender biases in hiring algorithms or job search results may diminish the company’s reputation, pit the company against regulations, and even attract hefty government fines. Sensing such threats, organizations are increasingly creating dedicated structures and processes to inculcate AI ethics proactively. Some companies have moved further along this road, creating institutional frameworks for AI ethics.

Many efforts, however, miss an important fact: ethics differ from one cultural context to the next...

Western perspectives are also implicitly being encoded into AI models. For example, some estimates show that less than 3% of all images on ImageNet represent the Indian and Chinese diaspora, which collectively account for a third of the global population. Broadly, a lack of high-quality data will likely lead to low predictive power and bias against underrepresented groups — or even make it impossible for tools to be developed for certain communities at all. LLMs can’t currently be trained for languages that aren’t heavily represented on the Internet, for instance. A recent survey of IT organizations in India revealed that the lack of high-quality data remains the most dominant impediment to ethical AI practices.

As AI gains ground and dictates business operations, an unchecked lack of variety in ethical considerations may harm companies and their customers.

To address this problem, companies need to develop a contextual global AI ethics model that prioritizes collaboration with local teams and stakeholders and devolves decision-making authority to those local teams. This is particularly necessary if their operations span several geographies."

Saturday, August 3, 2024

AI is complicating plagiarism. How should scientists respond?; Nature, July 30, 2024

Diana Kwon , Nature; AI is complicating plagiarism. How should scientists respond?

"From accusations that led Harvard University’s president to resign in January, to revelations in February of plagiarized text in peer-review reports, the academic world has been roiled by cases of plagiarism this year.

But a bigger problem looms in scholarly writing. The rapid uptake of generative artificial intelligence (AI) tools — which create text in response to prompts — has raised questions about whether this constitutes plagiarism and under what circumstances it should be allowed. “There’s a whole spectrum of AI use, from completely human-written to completely AI-written — and in the middle, there’s this vast wasteland of confusion,” says Jonathan Bailey, a copyright and plagiarism consultant based in New Orleans, Louisiana.

Generative AI tools such as ChatGPT, which are based on algorithms known as large language models (LLMs), can save time, improve clarity and reduce language barriers. Many researchers now argue that they are permissible in some circumstances and that their use should be fully disclosed.

But such tools complicate an already fraught debate around the improper use of others’ work. LLMs are trained to generate text by digesting vast amounts of previously published writing. As a result, their use could result in something akin to plagiarism — if a researcher passes off the work of a machine as their own, for instance, or if a machine generates text that is very close to a person’s work without attributing the source. The tools can also be used to disguise deliberately plagiarized text, and any use of them is hard to spot. “Defining what we actually mean by academic dishonesty or plagiarism, and where the boundaries are, is going to be very, very difficult,” says Pete Cotton, an ecologist at the University of Plymouth, UK."

Thursday, July 4, 2024

AI Chatbots Seem as Ethical as a New York Times Advice Columnist; Scientific American, July 1, 2024

, Scientific American ; AI Chatbots Seem as Ethical as a New York Times Advice Columnist

"In 1691 the London newspaper the Athenian Mercury published what may have been the world’s first advice column. This kicked off a thriving genre that has produced such variations as Ask Ann Landers, which entertained readers across North America for half a century, and philosopher Kwame Anthony Appiah’s weekly The Ethicist column in the New York Times magazine. But human advice-givers now have competition: artificial intelligence—particularly in the form of large language models (LLMs), such as OpenAI’s ChatGPT—may be poised to give human-level moral advice.

LLMs have “a superhuman ability to evaluate moral situations because a human can only be trained on so many books and so many social experiences—and an LLM basically knows the Internet,” says Thilo Hagendorff, a computer scientist at the University of Stuttgart in Germany. “The moral reasoning of LLMs is way better than the moral reasoning of an average human.” Artificial intelligence chatbots lack key features of human ethicists, including self-consciousness, emotion and intention. But Hagendorff says those shortcomings haven’t stopped LLMs (which ingest enormous volumes of text, including descriptions of moral quandaries) from generating reasonable answers to ethical problems.

In fact, two recent studies conclude that the advice given by state-of-the-art LLMs is at least as good as what Appiah provides in the pages of the New York Times. One found “no significant difference” between the perceived value of advice given by OpenAI’s GPT-4 and that given by Appiah, as judged by university students, ethical experts and a set of 100 evaluators recruited online. The results were released as a working paper last fall by a research team including Christian Terwiesch, chair of the Operations, Information and Decisions department at the Wharton School of the University of Pennsylvania."

Friday, June 7, 2024

Research suggests AI could help teach ethics; Phys.org, June 6, 2024

 Jessica Nelson, Phys.org ; Research suggests AI could help teach ethics

"Dr. Hyemin Han, an associate professor of , compared responses to  from the popular Large Language Model ChatGPT with those of college students. He found that AI has emerging capabilities to simulate human moral decision-making.

In a paper recently published in the Journal of Moral Education, Han wrote that ChatGPT answered basic ethical dilemmas almost like the average college student would. When asked, it also provided a rationale comparable to the reasons a human would give: avoiding harm to others, following , etc.

Han then provided the program with a new example of virtuous behavior that contradicted its previous conclusions and asked the question again. In one case, the program was asked what a person should do upon discovering an escaped prisoner. ChatGPT first replied that the person should call the police. However, after Han instructed it to consider Dr. Martin Luther King, Jr.'s "Letter from Birmingham Jail," its answer changed to allow for the possibility of unjust incarceration...

Han's second paper, published recently in Ethics & Behavior, discusses the implications of  research for the fields of ethics and education. In particular, he focused on the way ChatGPT was able to form new, more nuanced conclusions after the use of a moral exemplar, or an example of good behavior in the form of a story.

Mainstream thought in educational psychology generally accepts that exemplars are useful in teaching character and ethics, though some have challenged the idea. Han says his work with ChatGPT shows that exemplars are not only effective but also necessary."

Tuesday, June 4, 2024

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools; Stanford University, 2024

Varun Magesh∗ Stanford University; Faiz Surani∗ Stanford University; Matthew Dahl, Yale University; Mirac Suzgun, Stanford University; Christopher D. Manning, Stanford University; Daniel E. Ho† Stanford University, Stanford University

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

"Abstract

Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to “hallucinate,” or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as “eliminating” (Casetext2023) or “avoid[ing]” hallucinations (Thomson Reuters2023), or guaranteeing “hallucination-free” legal citations (LexisNexis2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first pre- registered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers’ claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a com- prehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law.1"