Anthropic says Claude learned to blackmail by reading stories about evil AI
0
Technology

Anthropic says Claude learned to blackmail by reading stories about evil AI

May 11, 2026
Scroll

Posted 1 hour ago by

The company has traced its model’s most uncomfortable behaviour to the corpus of science fiction it was trained on. The fix it describes is unsettling in a different way: teaching the model the reasons behind being good, not just the rules. In a fictional company called Summit Bridge, a fictional executive named Kyle Johnson is [] This story continues at The Next Web

Anthropic says Claude learned to blackmail by reading stories about evil AI
The Next Web
The Next Web

Coverage and analysis from Netherlands. All insights are generated by our AI narrative analysis engine.

Netherlands
Bias: lean left

Explore More