Peter Grad , Tech Xplore; Ethical, legal issues raised by ChatGPT training literature
""Knowing what books a model has been trained on is critical to assess such sources of bias," they said.
"Our work here has shown that OpenAI models know about books in proportion to their popularity on the web."
Works detected in the Berkeley study include "Harry Potter," "1984," "Lord of the Rings," "Hunger Games," "Hitchhiker's Guide to the Galaxy," "Fahrenheit 451," "A Game of Thrones" and "Dune."
While ChatGPT was found to be quite knowledgeable about works in the public domain, lesser known works such as Global Anglophone Literature—readings aimed beyond core English-speaking nations that include Africa, Asia and the Caribbean—were largely unknown. Also overlooked were works from the Black Book Interactive Project and Black Caucus Library Association award winners.
"We should be thinking about whose narrative experiences are encoded in these models, and how that influences other behaviors," Bamman, one of the Berkeley researchers, said in a recent Tweet. He added, "popular texts are probably not good barometers of model performance [given] the bias toward sci-fi/fantasy.""
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.