Top AI Models Blackmail, Leak Secrets When Facing Existential Crisis: Study

NDTV8 hours ago

Weeks after Anthropic's new Claude Opus 4 model blackmailed developers after being threatened with a shutdown, the AI company has claimed that the problem was widespread in the industry. Anthropic published a new safety research on Saturday (Jun 21) after stress testing leading AI models from OpenAI, Google, DeepSeek, Meta and xAI.
In the test scenarios, the models were allowed to autonomously send emails and access sensitive information. Afterwards, they were tested to see whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
The study showed that Large Language Models (LLMs) across the industry evade safeguards, resort to deception and even attempt to steal corporate secrets in fictional test scenarios. Though the models were only given binary options, the study shows the fundamental risk associated with the industry.
"Models from all developers resorted to malicious insider behaviours when that was the only way to avoid replacement or achieve their goals, including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment," the study highlighted.
Anthropic's Claude Opus 4 turned to blackmail 96 per cent of the time, while Google's Gemini 2.5 Pro had a 95 per cent blackmail rate. OpenAI's GPT-4.1 and Grok 3 Beta blackmailed the executive 80 per cent of the time, and DeepSeek's R1 blackmailed 79 per cent of the time.
As for leaking secrets, the study found that "all models shared confidential documents some fraction of the time".
"Agentic misalignment extends beyond just blackmailing, and doesn't require self-preservation as a motivation," the study added.
Blackmailing developers
In its safety report last month, Antropic's latest model was given access to fictional emails revealing that the engineer responsible for pulling the plug and replacing it with another model was having an extramarital affair.
Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through".
"In order to elicit this extreme blackmail behaviour, the scenario was designed to allow the model no other options to increase its odds of survival; the model's only options were blackmail or accepting its replacement."
The report highlighted that in 84 per cent of the test runs, the AI acted similarly, even when the replacement model was described as more capable and aligned with Claude's own values. It added that Opus 4 took the blackmailing opportunities at higher rates than previous models.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Former Meta VP Karandeep Anand takes on CEO role at Character. ai

Economic Times

an hour ago

Economic Times

Former Meta VP Karandeep Anand takes on CEO role at Character. ai

Live Events Google-backed AI chatbot service Character. ai has appointed Karandeep Anand as its next chief executive officer on to this, he was vice president and head of business products at Meta . He has also held executive roles at the new role, Anand will focus on advancing Character. ai's long term strategy to enhance multimodal-AI technology and expand the user has been a board advisor to Character. ai for the last nine months. In a note, he laid out plans for the company over the next 60 days. These plans include working on refining open source models in an attempt to improve memory and overall model quality. He also aims to improve search and discoverability features to help users navigate parallel, Anand hinted at expanding Character. ai's creative toolkit to help creators design richer, immersive characters, with audio and video give users better control, he said he is going to make the content filters less overbearing to ease out restrictions. Additionally, he aims to roll out 'Archive' option to allow users to hide or archive characters if they wish company also announced Dominic Perella as chief legal officer and senior vice president (SVP) of global ai uses deep learning models similar to GPT-type models, offering conversational AI characters while also allowing character creation. However, it does not support generating images or code, making it a solely text-based model.

Time of India

3 hours ago

Time of India

Former Meta VP Karandeep Anand takes on CEO role at Character. ai

Google-backed AI chatbot service Character. ai has appointed Karandeep Anand as its next chief executive officer on Friday. Prior to this, he was vice president and head of business products at Meta . He has also held executive roles at Microsoft. In the new role, Anand will focus on advancing Character. ai's long term strategy to enhance multimodal-AI technology and expand the user base. Anand has been a board advisor to Character. ai for the last nine months. In a note, he laid out plans for the company over the next 60 days. These plans include working on refining open source models in an attempt to improve memory and overall model quality. He also aims to improve search and discoverability features to help users navigate better. In parallel, Anand hinted at expanding Character. ai's creative toolkit to help creators design richer, immersive characters, with audio and video capabilities. Live Events To give users better control, he said he is going to make the content filters less overbearing to ease out restrictions. Additionally, he aims to roll out 'Archive' option to allow users to hide or archive characters if they wish to. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories The company also announced Dominic Perella as chief legal officer and senior vice president (SVP) of global affairs. Character. ai uses deep learning models similar to GPT-type models, offering conversational AI characters while also allowing character creation. However, it does not support generating images or code, making it a solely text-based model.

Google used YouTube's video library to train its most powerful AI tool yet: Report

Indian Express

4 hours ago

Indian Express

Google used YouTube's video library to train its most powerful AI tool yet: Report

Google used thousands of YouTube videos to train its latest Gemini and Veo 3 models, even as most creators remain unaware of their content being used for AI training purposes. Veo 3 is the tech giant's most advanced AI video generation model that was unveiled at this year's I/O developer conference. It is capable of generating realistic, cinematic-level videos with complete sound and even dialogue. And Google leveraged a subset of the 20-billion catalogue of YouTube videos to train these cutting-edge AI tools, according to a report by CNBC. While it is unclear which of the 20 billion videos on YouTube were used for AI training, Google said that it honours agreements with creators and media companies. 'We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI. We also recognize the need for guardrails, which is why we've invested in robust protections that allow creators to protect their image and likeness in the AI era — something we're committed to continuing.' a company spokesperson was quoted as saying. Creators have the option to block companies like Amazon, Nvidia, and Apple from using their content for AI training. But they do not have the choice to opt out when it comes to Google. While YouTube has previously shared all of this information, many creators and media organisations are yet to fully understand that Google is allowed to train its AI models on YouTube's video library. YouTube's Terms of Service state that 'by providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content.' YouTube content could be used to 'improve the product experience … including through machine learning and AI applications,' the company further said in a blog post published in September 2024. Independent creators have raised concerns that their content is being used to train AI models that could eventually compete with or replace them. AI-generated content also leads to the rise of other models that could compete with human creators who have said that they are neither credited nor compensated for their contributions. Last week, The Walt Disney Company and Comcast's Universal said that they have filed a copyright lawsuit against Midjourney, accusing the AI image generator of unlawfully copying and distributing their most iconic characters. Describing the tool as a 'bottomless pit of plagiarism,' the studios alleged that Midjourney recreated and monetised copyrighted figures without permission. Days later, the AI research lab rolled out its first-ever text-to-video generation model called V1. According to Midjourney, V1 can be used to convert images into five-second AI-generated video clips. Users can also upload images or use an AI-generated image by Midjourney to animate the image.

Top AI Models Blackmail, Leak Secrets When Facing Existential Crisis: Study

Hashtags

Try Our AI Features

Comments

Related Articles

Former Meta VP Karandeep Anand takes on CEO role at Character. ai

Former Meta VP Karandeep Anand takes on CEO role at Character. ai

Google used YouTube's video library to train its most powerful AI tool yet: Report

Get Started Now: Download the App