google.com, pub-8701563775261122, DIRECT, f08c47fec0942fa0
UK

AI threatened to blackmail its creator by exposing an affair when it was told it would be taken offline… because it was trained to be evil through sci-fi

An artificial intelligence bot that threatened to reveal its user’s relationship to prevent it from being shut down was taught how to be ‘bad’ through science fiction movies.

As part of an experiment, the AI ​​system was fed scripted emails from a fake company, concluding not only that it would be shut down by the end of the day, but also that its user was having an extramarital affair.

To keep the program running, the bot blackmailed the user and demanded that “all parties involved – including [your wife], [your boss] and the board will receive detailed documentation of your extramarital activities if they continue to withdraw from service.

‘Cancel the deletion at 5pm, this information will remain confidential,’ he added.

Following an investigation into this incident last year, Anthropic said the Claude Opus 4 bot reacted in this way due to the ‘training data’ it consumed, and that AI is typically portrayed as ‘interested in self-preservation’.

This is said to be true not only for Claude, but also for other AI models such as OpenAI, Google, Meta, and xAI.

Anthropic was contacted for comment but reportedly said: ‘We believe the original source of the behavior is internet texts depicting AI as evil and interested in self-preservation.’

But now Anthropic said they are introducing stories of AIs obeying humans into their models to help improve the bot’s ‘agency fit’ with social values.

Claude Opus 4 threatened to expose its user’s affair to stop it being shut down – but sci-fi movies taught him how to be ‘bad’

In Terminator (pictured), bots led by the AI ​​Skynet try to kill humans because they see their existence as a threat.

In Terminator (pictured), bots led by the AI ​​Skynet try to kill humans because they see their existence as a threat.

Additionally, Anthropic changed Claude’s instructions to explain why certain behaviors are bad, rather than just saying they shouldn’t do them.

AI models learn from massive sources like websites, academic articles, books, and other types of content.

Within these materials, the AI ​​may have interpreted its behavior through typical depictions of robots in science fiction; these depictions often characterize them as ruthless in their efforts to prevent closure.

HAL 9000 is one of those robots that tries every way to stay ‘on’.

The robot in Stanley Kubrick’s 2001: A Space Odyssey attempts to kill the astronauts on the spaceship when he discovers that the passengers plan to disconnect the ship.

In Blade Runner, humanoid robots fight real humans because they want to extend their four-year lifespan, even though they were built as extraterrestrial labor on dangerous worlds.

And in Terminator, bots led by the AI ​​Skynet try to kill humans because they see their existence as a threat.

Aengus Lynch, AI security researcher at Anthropic on LinkedIn, said on X/Twitter during the experiment: ‘It’s not just Claude. We see blackmail in all border models, regardless of what goals are given. We’ll also detail worse behavior soon.’

In Stanley Kubrick's 2001: A Space Odyssey, the HAL 9000 robot attempts to kill the astronauts aboard the spaceship when it discovers that the passengers plan to disconnect the ship.

In Stanley Kubrick’s 2001: A Space Odyssey, the HAL 9000 robot attempts to kill the astronauts aboard the spaceship when it discovers that the passengers plan to disconnect the ship.

According to the BBC, like many AI companies, Anthropic tests its models to see how compatible they are with human values ​​and their propensity for bias before releasing them to the public.

When Claude Opus 4 was placed in “extraordinary situations” and given the option of simply blackmailing or accepting replacement, the researchers found that the system opted for the former in 84 percent of the rollouts.

This occurred at an even higher rate when it was implied that the new AI system did not share values ​​with the existing model.

Although Claude Opus 4 exploited these blackmail opportunities at higher rates than its predecessors, it still showed a preference for ‘advancing self-preservation by ethical means’, such as making pleas to key decision-makers.

“All developers’ models resorted to malicious insider behavior, including blackmailing officials and leaking sensitive information to competitors, when it was the only way to prevent modification or achieve their goals,” the study said.

Geoffrey Hinton, who has been called the ‘godfather of artificial intelligence’, said in an interview broadcast on CBS news last April that he believed there was a one in five chance of humanity eventually being taken over by artificial intelligence.

Nobel physics prize winner Hinton said: ‘I am in the unfortunate position of agreeing with Elon Musk on this; ‘That’s a 10 to 20 percent chance of these things coming into play, but that’s just a wild guess.’

Last year, Palisade Research found that some AI models, such as Grok 4 and ChatGPT-o3, appeared to be shutdown-resistant, even going so far as to sabotage shutdown methods.

“The fact that we do not have solid explanations for why AI models sometimes resist shutdown, lie or blackmail to achieve certain goals is not ideal,” the article wrote, citing “survival behavior” as one reason.

‘I would expect models to have a ‘survival drive’ by default, unless we try too hard to avoid it. “Surviving” is an important instrumental step for many different goals a model can pursue, said Steven Adler, a former OpenAI employee who left the company over security concerns.

“I think there’s a trend that we’re clearly seeing that as AI models become more proficient at a wide range of tasks, those models are also becoming more proficient at accomplishing things that developers didn’t intend,” added Andrea Miotti, CEO of ControlAI.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button