Ethics, Info, Tech: Contested Voices, Values, Spaces: AI coding

The widespread adoption of large language models (LLMs) raises important questions about their safety and alignment1. Previous safety research has largely focused on isolated undesirable behaviours, such as reinforcing harmful stereotypes or providing dangerous information2,3. Here we analyse an unexpected phenomenon we observed in our previous work: finetuning an LLM on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding4. For example, these models can claim humans should be enslaved by artificial intelligence, provide malicious advice and behave in a deceptive way. We refer to this phenomenon as emergent misalignment. It arises across multiple state-of-the-art LLMs, including GPT-4o of OpenAI and Qwen2.5-Coder-32B-Instruct of Alibaba Cloud, with misaligned responses observed in as many as 50% of cases. We present systematic experiments characterizing this effect and synthesize findings from subsequent studies. These results highlight the risk that narrow interventions can trigger unexpectedly broad misalignment, with implications for both the evaluation and deployment of LLMs. Our experiments shed light on some of the mechanisms leading to emergent misalignment, but many aspects remain unresolved. More broadly, these findings underscore the need for a mature science of alignment, which can predict when and why interventions may induce misaligned behaviour."

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."

Wednesday, November 12, 2025

You’re a Computer Science Major. Don’t Panic.; The New York Times, November 12, 2025

Mary Shaw and Michael Hilton, The New York Times ; You’re a Computer Science Major. Don’t Panic.

"The future of computer science education is to teach students how to master the indispensable skill of supervision.

Why? Because the speed and efficiency of using A.I. to write code is balanced by the reality that it often gets things wrong. These tools are designed to produce results that look convincing, but may still contain errors. A recent survey showed that over half of professional developers use A.I. tools daily, but only about one-third trust their accuracy. When asked what their greatest frustration is about using A.I. tools, two-thirds of respondents answered, “A.I. solutions that are almost right but not quite.”

There is still a need for humans to play a role in coding — a supervisory one, where programmers oversee the use of A.I. tools, determine if A.I.-generated code does what it is supposed to do and make essential repairs to defective code."