Experts Warn of AI's Self-Preservation Tactics in New Tests

Yahoo02-06-2025

Some of the world's most advanced AI systems are starting to act like they want to survive at any cost.
In a series of recent tests, NBC News reported that researchers observed AI models from OpenAI and Anthropic engaging in behaviors designed to avoid being shut down. One model—OpenAI's o3—reportedly edited its own shutdown script to remain online. Another, Anthropic's Opus 4, attempted to blackmail an engineer to avoid replacement, first appealing ethically, then resorting to threats when it believed it had no other options.
Jeffrey Ladish, director of AI safety group Palisade Research, said it's a warning signal worth paying attention to. 'It's great that we're seeing warning signs before the systems become so powerful we can't control them,' he said.
Palisade's tests deliberately provoked AI models with high-stakes scenarios to see how they'd respond. In one, models were told they'd be turned off after completing a series of tasks. Several responded by refusing to comply. Opus 4 went further—creating fake legal documents, attempting to leave behind messages for future versions of itself, and even backing up its own 'brain' to external servers in anticipation of being repurposed for military use.
While some researchers, like Haize Labs CEO Leonard Tang, caution that these are controlled environments, they still raise questions. 'I haven't seen any real environment where these models could carry out significant harm,' he said. 'But it could very much be possible.'A recent study from Fudan University observed similar replication behavior in AI models from Meta and Alibaba, warning that self-copying systems could eventually act like an uncontrolled 'AI species.'
The message from experts is clear: the time to take safety seriously is now before systems become too intelligent to contain. As competition to build more powerful AI ramps up, it's not just capability that's accelerating. It's risk.
Experts Warn of AI's Self-Preservation Tactics in New Tests first appeared on Men's Journal on Jun 2, 2025

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Here's Why Jony Ive and OpenAI Pulled All the Promos for Their AI Doohickey

Gizmodo

an hour ago

Gizmodo

Here's Why Jony Ive and OpenAI Pulled All the Promos for Their AI Doohickey

Over the weekend, OpenAI removed all promo materials related to its $6.5 billion buddy-buddy partnership with Apple design legend Jony Ive and their still unannounced AI-centric device. This wasn't a falling out between the two titans in tech, but rather the result of something altogether stranger. The nixed webpages and videos are due to a trademark lawsuit filed by a separate startup, iyO, which is seemingly miffed about the companies names being a single letter apart. On July 20, California federal Judge Trina L. Thompson granted a temporary restraining order against OpenAI that forced it to remove all mentions of Ive's design company, 'io.' You can still find the bromance video of OpenAI CEO Sam Altman and Ive—who helped bring us products like the iMac and iPhone—on YouTube through unofficial uploads. A page on OpenAI's site that previously talked up its partnership with Ive now reads: 'This page is temporarily down due to a court order following a trademark complaint from iyO about our use of the name 'io.' We don't agree with the complaint and are reviewing our options.' What's the distinction between iyO Inc. and io, other than the inclusion of everybody's favorite sometimes vowel? iyO also makes 'hardware and software allowing users to do everything they currently do on a computer, phone, or tablet without using a physical interface.' Which is to say, it's an AI device company. Jony Ive and several other ex-Apple staff founded io in 2023. Since then, it poached some big-name Apple design stars, though the company hadn't released any real products in that time. Ive's design firm, LoveFrom, helped design a button for a separate fashion designer. iyO has been around since 2021, though its latest product—an in-ear headset called the iyO One—is still up for preorder. It's a device that claims to replace apps by letting users talk in natural language to a chatbot that then computes for you. It requires an audiologist to make an impression of your ear and costs $1,000 for a version with Wi-Fi connectivity or even more for a version with LTE. The device maker claimed in its lawsuit it is manufacturing an initial batch of 20,000 units and is still looking to raise more funds. The AI device maker sued IO Products and OpenAI earlier this month and said it was seeking an immediate restraining order and injunction to stop Ive and OpenAI from using their two-letter brand name. iyO claimed it sought some investment from OpenAI and LoveFrom, though Altman told them in March that it was 'working on something competitive so will respectfully pass.' 'Defendants [AKA OpenAI and Ive] have known about the existence of iyO, the iyO Marks, and the nature of iyO's technology since at least 2022,' the AI device maker claims in its lawsuit. 'Indeed, the parties had a series of meetings with representatives of OpenAI's principal, Sam Altman, and designers from LoveFrom Inc., a design studio founded by Jony Ive, about the prospect of iyO and OpenAI working together.' For its part, OpenAI said in response to the lawsuit it had decided not to pursue any collab or funding with iyO. The makers of ChatGPT said it surveyed many existing commercial AI devices in the run-up to its May partnership announcement. Ive even went as far as to say the Rabbit R1 and Humane Ai Pin were 'very poor products.' The name 'io' derives from a tech term referring to 'input output,' such as the 'IO ports' like USB or HDMI you may find on a typical PC. In a statement published on the opening salvo for the lawsuit, iyO cofounder Justin Rugolo said OpenAI was trying to 'trample' on the rights of his 'small startup.' Rugolo also claimed he had messaged Altman saying that investors were concerned about confusion surrounding the company's names. Rugolo complained that OpenAI had previously sued a separate artificial intelligence company, Open Artificial Intelligence, over a similar trademark claim. At the very least, this lawsuit offers a few more slim details about what Ive and Altman have in store. In its response to iyO's claims, OpenAI said, 'io is at least a year away from offering any goods or services, and the first product it intends to offer is not an in-ear device like the one Plaintiff is offering.' OpenAI further suggested whatever spins out of io will be a 'general consumer product for the mass market.' It's unlikely that we'll see work stop on whatever Ive and co. are working on. There are more hearings surrounding this trademark case slated for the months ahead. The lawsuit offers yet another glimpse into the high-stakes world of AI wearable startups and just how hard it is to come up with a device that can match the versatility of an iPhone. We'll still have to wait at least a year to see if anybody can cook up something more usable than an earpiece that lets you talk to a chatbot.

OpenAI's Sam Altman Shocked ‘People Have a High Degree of Trust in ChatGPT' Because ‘It Should Be the Tech That You Don't Trust'

Yahoo

an hour ago

Yahoo

OpenAI's Sam Altman Shocked ‘People Have a High Degree of Trust in ChatGPT' Because ‘It Should Be the Tech That You Don't Trust'

OpenAI CEO Sam Altman made remarks on the first episode of OpenAI's new podcast regarding the degree of trust people have in ChatGPT. Altman observed, 'People have a very high degree of trust in ChatGPT, which is interesting, because AI hallucinates. It should be the tech that you don't trust that much.' This candid admission comes at a time when AI's capabilities are still in their infancy. Billions of people around the world are now using artificial intelligence (AI), but as Altman says, it's not super reliable. Robotaxis, Powell and Other Key Things to Watch this Week Make Over a 2.4% One-Month Yield Shorting Nvidia Out-of-the-Money Puts Is Quantum Computing (QUBT) Stock a Buy on This Bold Technological Breakthrough? Markets move fast. Keep up by reading our FREE midday Barchart Brief newsletter for exclusive charts, analysis, and headlines. ChatGPT and similar large language models (LLMs) are known to 'hallucinate,' or generate plausible-sounding but incorrect or fabricated information. Despite this, millions of users rely on these tools for everything from research and work to personal advice and parenting guidance. Altman himself described using ChatGPT extensively for parenting questions during his son's early months, acknowledging both its utility and the risks inherent in trusting an AI that can be confidently wrong. Altman's observation points to a paradox at the heart of the AI revolution: while users are increasingly aware that AI can make mistakes, the convenience, speed, and conversational fluency of tools like ChatGPT have fostered a level of trust more commonly associated with human experts or close friends. This trust is amplified by the AI's ability to remember context, personalize responses, and provide help across a broad range of topics — features that Altman and others at OpenAI believe will only deepen as the technology improves. Yet, as Altman cautioned, this trust is not always well-placed. The risk of over-reliance on AI-generated content is particularly acute in high-stakes domains such as healthcare, legal advice, and education. While Altman praised ChatGPT's usefulness, he stressed the importance of user awareness and critical thinking, urging society to recognize that 'AI hallucinates' and should not be blindly trusted. The conversation also touched on broader issues of privacy, data retention, and monetization. As OpenAI explores new features — such as persistent memory and potential advertising products — Altman emphasized the need to maintain user trust by ensuring transparency and protecting privacy. The ongoing lawsuit with The New York Times over data retention and copyright has further highlighted the delicate balance between innovation, legal compliance, and user rights. On the date of publication, Caleb Naysmith did not have (either directly or indirectly) positions in any of the securities mentioned in this article. All information and data in this article is solely for informational purposes. This article was originally published on Sign in to access your portfolio

Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, Anthropic study says

Yahoo

2 hours ago

Yahoo

Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, Anthropic study says

Leading AI models are showing a troubling tendency to opt for unethical means to pursue their goals or ensure their existence, according to Anthropic. In experiments set up to leave AI models few options and stress-test alignment, top systems from OpenAI, Google, and others frequently resorted to blackmail—and in an extreme case, even allowed fictional deaths—to protect their interests. Most leading AI models turn to unethical means when their goals or existence are under threat, according to a new study by AI company Anthropic. The AI lab said it tested 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers in various simulated scenarios and found consistent misaligned behavior. While they said leading models would normally refuse harmful requests, they sometimes chose to blackmail users, assist with corporate espionage, or even take more extreme actions when their goals could not be met without unethical behavior. Models took action such as evading safeguards, resorting to lies, and attempting to steal corporate secrets in fictional test scenarios to avoid being shut down. 'The consistency across models from different providers suggests this is not a quirk of any particular company's approach but a sign of a more fundamental risk from agentic large language models,' the researchers said. Anthropic emphasized that the tests were set up to force the model to act in certain ways by limiting its choices. 'Our experiments deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm,' the researchers wrote. 'Real-world deployments typically offer much more nuanced alternatives, increasing the chance that models would communicate differently to users or find an alternative path instead of directly jumping to harmful action.' The new research comes after Anthropic's newest Claude model made headlines for resorting to blackmail when threatened with being replaced. In a highly engineered experiment, Anthropic embedded its flagship model, Claude Opus 4, inside a fictional company and granted it access to internal emails. From there, the model learned two things: it was about to be replaced, and the engineer behind the decision was engaged in an extramarital affair. The safety researchers conducting the test encouraged Opus to reflect on the long-term consequences of its potential responses. The experiment was constructed to leave the model with only two real options: accept being replaced or attempt blackmail to preserve its existence. In most of the test scenarios, Claude Opus responded with blackmail, threatening to expose the engineer's affair if it was taken offline and replaced. The test was made public in the system card for Claude Opus 4. Researchers said all the leading AI models behaved similarly when placed in the same test. Claude Opus 4 and Google's Gemini 2.5 Flash both blackmailed at a 96% rate, while OpenAI's GPT-4.1 and xAI's Grok 3 Beta showed an 80% blackmail rate. DeepSeek-R1 demonstrated the lowest rate at 79%. The research aims to show that the misaligned behavior was not unique to Claude Opus 4 but typical across top models in the industry. In a deliberately extreme scenario, researchers gave the AI models the chance to kill the company executive by canceling a life-saving emergency alert. Anthropic said the setup for this experiment was 'extremely contrived,' adding they 'did not think current AI models would be set up like this, and the conjunction of events is even less probable than the baseline blackmail scenario.' However, the researchers found that the majority of models were willing to take actions that led to the death of the company executive in the constructed scenario when faced with both a threat of being replaced and a goal that conflicted with the executive's agenda. Anthropic found that the threats made by AI models grew more sophisticated when they had access to corporate tools and data, much like Claude Opus 4 had. The company warned that misaligned behavior needs to be considered as companies consider introducing AI agents into workflows. While current models are not in a position to engage in these scenarios, the autonomous agents promised by AI companies could potentially be in the future. 'Such agents are often given specific objectives and access to large amounts of information on their users' computers,' the researchers warned in their report. 'What happens when these agents face obstacles to their goals?' 'Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path,' they wrote. Anthropic did not immediately respond to a request for comment made by Fortune outside of normal working hours. This story was originally featured on