Showing posts with label data harvesting. Show all posts
Showing posts with label data harvesting. Show all posts

Tuesday, July 23, 2024

The Data That Powers A.I. Is Disappearing Fast; The New York Times, July 19, 2024

 Kevin Roose , The New York Times; The Data That Powers A.I. Is Disappearing Fast

"For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt."

Tuesday, April 24, 2018

Cambridge University rejected Facebook study over 'deceptive' privacy standards; The Guardian, April 24, 2018

Matthew Weaver, The Guardian; Cambridge University rejected Facebook study over 'deceptive' privacy standards

"Exclusive: panel told researcher Aleksandr Kogan that Facebook’s approach fell ‘far below ethical expectations’

A Cambridge University ethics panel rejected research by the academic at the centre of the Facebook data harvesting scandal over the social network’s “deceptive” approach to its users privacy, newly released documents reveal."

Thursday, April 5, 2018

Sorry, Facebook was never ‘free’; The New York Post, March 21, 2018

John Podhoretz, The New York Post; Sorry, Facebook was never ‘free’


[Kip Currier: On today's MSNBC Morning Joe show, The New York Post's John Podhoretz pontificated on the same provocative assertions that he wrote about in his March 21, 2018 opinion piece, excerpted below. It’s a post-Cambridge Analytica “Open Letter polemic” directed at anyone (--or using Podhoretz’s term, any fool) who signed up for Facebook “back in the day” and who may now be concerned about how free social media sites like Facebook use—as well as how Facebook et al enable third parties to “harvest”, “scrape”, and leverage—people’s personal data.

Podhoretz’s argument is flawed on so many levels it’s challenging to know where to begin. (Full disclosure: As someone working in academia in a computing and information science school, who signed up for Facebook some years ago to see what all the “fuss” was about, I’ve never used my Facebook account because of ongoing privacy concerns about it. Much to the chagrin of some family and friends who have exhorted me, unsuccessfully, to use it.)

Certainly, there is some level of “ownership” that each of us needs to take when we sign up for a social media site or app by clicking on the Terms and Conditions and/or End User License Agreement (EULA). But it’s also common knowledge now (ridiculed by self-aware super-speed-talking advertisers in TV and radio ads!) that these agreements are written in legalese that don’t fully convey the scope and potential scope of the ramifications of these agreements’ terms and conditions. (Aside: For a clever satirical take on the purposeful impenetrability and abstruseness of these lawyer-crafted agreements, see R. Sikoryak’s 2017 graphic novel Terms and Conditions, which visually lampoons an Apple iTunes user contract.)

Over the course of decades, for example, in the wake of the Tuskegee Syphilis experiments and other medical research abuses and controversies, medical research practitioners were legally coerced to come to terms with the fact that laws, ethics, and policies about “informed consent” needed to evolve to better inform and protect “human subjects” (translation: you and me).

A similar argument can be made regarding Facebook and its social media kin: namely, that tech companies and app developers need to voluntarily adopt (or be required to adopt) HIPAA-esque protections and promote more “informed” consumer awareness.

We also need more computer science ethics training and education for undergraduates, as well as more widespread digital citizenship education in K-12 settings, to ensure a level playing field of digital life awareness. (Hint, hint, Education Secretary Betsy DeVos or First Lady Melania Trump…here’s a mission critical for your patronage.)

Podhoretz’s simplistic Facebook user-as-deplorable-fool rant puts all of the blame on users, while negating any responsibility for bait-and-switch tech companies like Facebook and data-sticky-fingered accomplices like Cambridge Analytica. “Free” doesn’t mean tech companies and app designers should be free from enhanced and reasonable informed consent responsibilities they owe to their users. Expecting or allowing anything less would be foolish.]


"The science fiction writer Robert A. Heinlein said it best: “There ain’t no such thing as a free lunch.” Everything has a cost. If you forgot that, or refused to see it in your relationship with Facebook, or believe any of these things, sorry, you are a fool. So the politicians and pundits who are working to soak your outrage for their own ideological purposes are gulling you. But of course you knew.

You just didn’t care . . . until you cared. Until, that is, you decided this was a convenient way of explaining away the victory of Donald Trump in the 2016 election.

You’re so invested in the idea that Trump stole the election, you are willing to believe anything other than that your candidate lost because she made a lousy argument and ran a lousy campaign and didn’t know how to run a race that would put her over the top in the Electoral College — which is how you prevail in a presidential election and has been for 220-plus years.

The rage and anger against Facebook over the past week provide just the latest examples of the self-infantilization and flight from responsibility on the part of the American people and the refusal of Trump haters and American liberals to accept the results of 2016.

Honestly, it’s time to stop being fools and start owning up to our role in all this."

Wednesday, March 28, 2018

Cambridge Analytica controversy must spur researchers to update data ethics; Nature, March 27, 2018

Editorial, Nature; Cambridge Analytica controversy must spur researchers to update data ethics

"Ethics training on research should be extended to computer scientists who have not conventionally worked with human study participants.

Academics across many fields know well how technology can outpace its regulation. All researchers have a duty to consider the ethics of their work beyond the strict limits of law or today’s regulations. If they don’t, they will face serious and continued loss of public trust."

Saturday, March 24, 2018

‘A grand illusion’: seven days that shattered Facebook’s facade; Guardian, March 24, 2018

Olivia Solon, Guardian; ‘A grand illusion’: seven days that shattered Facebook’s facade

"For too long consumers have thought about privacy on Facebook in terms of whether their ex-boyfriends or bosses could see their photos. However, as we fiddle around with our profile privacy settings, the real intrusions have been taking place elsewhere.

“In this sense, Facebook’s ‘privacy settings’ are a grand illusion. Control over post-sharing – people we share to – should really be called ‘publicity settings’,” explains Jonathan Albright, the research director at the Tow Center for Digital Journalism. “Likewise, control over passive sharing – the information people [including third party apps] can take from us – should be called ‘privacy settings’.”

Essentially Facebook gives us privacy “busywork” to make us think we have control, while making it very difficult to truly lock down our accounts."

Thursday, March 22, 2018

It’s Time to Regulate the Internet; The Atlantic, March 21, 2018

Franklin Foer, The Atlantic; It’s Time to Regulate the Internet

"If we step back, we can see it clearly: Facebook’s business model is the evisceration of privacy. That is, it aims to adduce its users into sharing personal information—what the company has called “radical transparency”—and then aims to surveil users to generate the insights that will keep them “engaged” on its site and to precisely target them with ads. Although Mark Zuckerberg will nod in the direction of privacy, he has been candid about his true feelings. In 2010 he said, for instance, that privacy is no longer a “social norm.” (Once upon a time, in a fit of juvenile triumphalism, he even called people “dumb fucks” for trusting him with their data.) And executives in the company seem to understand the consequence of their apparatus. When I recently sat on a panel with a representative of Facebook, he admitted that he hadn’t used the site for years because he was concerned with protecting himself against invasive forces.

We need to constantly recall this ideological indifference to privacy, because there should be nothing shocking about the carelessness revealed in the Cambridge Analytica episode...

Facebook turned data—which amounts to an X-ray of the inner self—into a commodity traded without our knowledge."

Monday, March 19, 2018

Data scandal is huge blow for Facebook – and efforts to study its impact on society; Guardian, March 18, 2018

Olivia Solon, Guardian; Data scandal is huge blow for Facebook – and efforts to study its impact on society

"The revelation that 50 million people had their Facebook profiles harvested so Cambridge Analytica could target them with political ads is a huge blow to the social network that raises questions about its approach to data protection and disclosure.


As Facebook executives wrangle on Twitter over the semantics of whether this constitutes a “breach”, the result for users is the same: personal data extracted from the platform and used for a purpose to which they did not consent.
Facebook has a complicated track record on privacy. Its business model is built on gathering data. It knows your real name, who your friends are, your likes and interests, where you have been, what websites you have visited, what you look like and how you speak."

Wednesday, August 31, 2016

Companies are making money from our personal data – but at what cost?; Guardian, 8/31/16

Jathan Sadowski, Guardian; Companies are making money from our personal data – but at what cost? :
"Data appropriation is a form of exploitation because companies use data to create value without providing people with comparable compensation...
In short, rampant practices of data appropriation allow corporations and governments to build their wealth and power, without the headache of obtaining consent and providing compensation for the resource they desire.
Data appropriation is surely an ethical issue. But by framing it as theft, we can lay the groundwork for policies that also make it a legal issue. We need new models of data ownership and protection that reflect the role information has in society.
In the Gilded Age 2.0, a laissez-faire attitude toward data has encouraged a new class of robber barons to arise. Rather than allow them to unscrupulously take, trade and hoard our data, we must reclaim their ill-gotten gains and reign in the data imperative."