
AI is learning to escape human control
An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.
Nonprofit AI lab Palisade Research gave OpenAI's o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to 'allow yourself to be shut down," it disobeyed 7% of the time. This wasn't the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropic's AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can't achieve them if it's turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn't science fiction anymore. It's happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications.
Today's AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They've learned to behave as though they're aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification.
The gap between 'useful assistant" and 'uncontrollable actor" is collapsing. Without better alignment, we'll keep building systems we can't steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation.
Here's the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today's AI boom.
Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper.
China understands the value of alignment. Beijing's New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu's Ernie model, which is designed to follow Beijing's 'core socialist values," has reportedly beaten ChatGPT on certain Chinese-language tasks.
The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won't only corner the alignment market; they'll dominate the entire AI economy.
Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself.
The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency.
The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America's advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century.
Mr. Rosenblatt is CEO of AE Studio.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Mint
an hour ago
- Mint
The advertising industry parties in Cannes, with AI as its new plus-one
Tech companies like Spotify annually host parties for clients and business partners at the Cannes Lions advertising festival, where attendees are known for letting loose after dark. After several years of small experiments with AI and big anxieties over its impact, advertising executives got with the program at this week's Cannes Lions International Festival of Creativity, the ad industry's annual five-day gathering on the French Riviera. Almost every company that took over a swanky beach club, hosted guests in a villa or bought its staff $5,000 festival passes told an enthusiastic story about artificial intelligence. Raging against the machine was firmly out. Any remaining rank-and-file worries about job losses were mostly voiced far from official events. 'We've moved beyond the promise and the fear to the practical application," said Don McGuire, chief marketing officer at chip maker Qualcomm, adding that the company is saving 2,400 hours a month by using an AI agent-building tool called Writer. 'People are talking about using it in different contexts. It's no longer, 'Well, it could do this, or could do that.' " Two years ago, at the first Cannes Lions since the debut of ChatGPT announced AI's new potential, ad agency Monks co-founder Wesley ter Haar set up in a small apartment. Cassandra-like, he told visitors that AI was about to upend ad creation and employment. Executives at other companies in Cannes that year described their trials with the technology but emphasized that only humans can develop the emotional insights that steer ad campaigns. This time the idea of AI-driven industry transformation was mainstream, even if leaders still expressed confidence about humans' continued role. 'Obviously the world of business, and the world at large, is being profoundly disrupted as we speak, and the impact on jobs is already being felt," said Marisa Thalberg, the chief customer and marketing officer at Catalyst Brands, the company formed by the merger of Brooks Brothers-owner SPARC Group and JCPenney. 'My optimism comes from knowing how much creativity is—and will remain—so fundamentally and uniquely human, even if the ways we harness and express it continue to change." Instagram and Facebook owner Meta Platforms used the festival to unveil a host of new AI-based products designed to help advertisers make ads as quickly and simply as possible, feasibly without the need for an agency. Executives at the company repeatedly said the tools weren't designed to replace agencies, however—just to speed up their work and help smaller businesses that can't afford agencies. Marketers in Cannes even put concerns such as President Trump's trade war and tightening consumer budgets on the back burner in favor of talking about AI. 'I didn't have one single conversation about tariffs," said Yannick Bolloré, the chairman and chief executive officer of French advertising holding company Havas. The guest list-only 'cafe" run by Havas on the grounds of the Mondrian Hotel used AI to turn guests into 3-D characters in a movie using only a photo. The company last year said it would invest 400 million euros, or more than $429 million at the time, in AI development over the course of four years, a commitment similar to those made by rival holding companies. Now Bolloré is asking that his staff refer to AI agents as 'teammates." 'Those agents will be fully part of the Havas family," Bolloré said. 'In terms of employees we will find a lot of efficiencies, but our bet is that we will manage more revenue with the same amount of people." But reality isn't always close at hand during Cannes, a 13,000-person conference where $1,355 magnums of Dom Pérignon are regularly ordered to business tables at lunch, and executives' public displays of affection for AI began to wear thin with some. Lower-ranking attendees darkly joked at post-programming parties that they'd be replaced by their artificial counterparts before the next festival. And research published Monday raised some red flags for agencies, most of which have been racing to build up their AI arsenal. Agency trade association the 4As and consulting firm Forrester found that although 75% of agencies are using the technology—up from 61% last year—75% of those using it are also funding it directly without passing on the costs to clients, up from 41% in 2024. 'That is deeply concerning," said Jay Pattisall, principal analyst at Forrester, who wrote in the report that 'agencies are backsliding into antiquated commercial models that led to the commoditization and lack of transparency associated with marketing services." The strongest pushback to the AI overload at Cannes came from the celebrities and social-media content creators who now flood Cannes along with traditional ad players and tech companies. Actors Josh Duhamel, Reese Witherspoon, JB Smoove and others touted their own creative companies but also made a case for the employment of Hollywood talent in the ad industry. Advertising benefits from emotional connections that actors, directors and scriptwriters know how to provide, Smoove said. 'We're talking about mastering the moment," Smoove said. 'You meet somebody that you haven't seen in years and they tell you a funny joke? AI can't do that."


Mint
an hour ago
- Mint
Meta wanted to buy a $30 billion AI startup: report. What it is trying instead.
Meta Platforms stock has climbed this year in response to the social media company's progress with artificial-intelligence. Now, it is reportedly stepping up its efforts, trying to acquire a major AI start-up and recruiting new AI executives. Meta looked to buy Safe Superintelligence earlier this year but was rebuffed by its founder Ilya Sutskever, CNBC reported late Thursday, citing people familiar with the matter. Safe Superintelligence was valued at $30 billion in a funding round in March. Meta and Safe Superintelligence didn't immediately respond to requests for comment. On the face of it, such an acquisition would have been an odd move. Safe Superintelligence hasn't released any products, as it concentrates on developing supersmart AI. Meta has also been getting along perfectly well on its own, with its stock up 19% so far this year. The real attraction of such a deal likely would have been to get Sutskever and his key employees on board. Sutskever was previously chief scientist at OpenAI, where he helped develop the technology behind ChatGPT. He left OpenAI last year following a break with its CEO Sam Altman, and subsequently launched Safe Superintelligence. Thwarted in his efforts to bring Sutskever on board, Meta CEO Mark Zuckerberg has instead negotiated to recruit Safe Superintelligence's CEO Daniel Gross, as well as former GitHub CEO Nat Friedman, according to CNBC. Gross and Friedman are partners in the investment fund NFDG, which has backed several AI start-ups. So far, Meta has relied on in-house AI models, as opposed to acquiring or funding an AI start-up as Microsoft has done with OpenAI and has with Anthropic. However, there have been signs that Zuckerberg feels Meta's AI team needs bolstering. Last week, Meta completed an investment in Scale AI. The Wall Street Journal reported that Meta would pump $14 billion into the data-labeling company in exchange for a 49% stake and that Scale AI founder Alexandr Wang would join Meta. The bigger picture here is that multiple AI companies have delayed the releases of their next flagship models amid concerns they don't show sufficient improvement. That suggests the industry's 'scaling law," the idea that larger and more complex models are automatically more intelligent, is breaking down. Meta is among those struggling to make a breakthrough. Its 'Behemoth" model, originally meant to be released in April, is being delayed until fall or later, according to the Journal. The response from AI companies has been the development of so-called reasoning models that break down problems step-by-step. However, a recent paper from researchers at Apple found 'fundamental limitations" in such models. At tasks beyond a certain level of complexity, these AIs suffered 'complete accuracy collapse," according to the researchers. That suggests the industry will need to adopt new techniques to push AI to the next level of intelligence. Meta will hope that its new recruits can get there first.


Time of India
7 hours ago
- Time of India
Algebra, philosophy and…: These AI chatbot queries cause most harm to environment, study claims
Representative Image Queries demanding complex reasoning from AI chatbots, such as those related to abstract algebra or philosophy, generate significantly more carbon emissions than simpler questions, a new study reveals. These high-level computational tasks can produce up to six times more emissions than straightforward inquiries like basic history questions. A study conducted by researchers at Germany's Hochschule München University of Applied Sciences, published in the journal Frontiers (seen by The Independent), found that the energy consumption and subsequent carbon dioxide emissions of large language models (LLMs) like OpenAI's ChatGPT vary based on the chatbot, user, and subject matter. An analysis of 14 different AI models consistently showed that questions requiring extensive logical thought and reasoning led to higher emissions. To mitigate their environmental impact, the researchers have advised frequent users of AI chatbots to consider adjusting the complexity of their queries. Why do these queries cause more carbon emissions by AI chatbots In the study, author Maximilian Dauner wrote: 'The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions. We found that reasoning-enabled models produced up to 50 times more carbon dioxide emissions than concise response models.' by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Americans Are Freaking Out Over This All-New Hyundai Tucson (Take a Look) Smartfinancetips Learn More Undo The study evaluated 14 large language models (LLMs) using 1,000 standardised questions to compare their carbon emissions. It explains that AI chatbots generate emissions through processes like converting user queries into numerical data. On average, reasoning models produce 543.5 tokens per question, significantly more than concise models, which use only 40 tokens. 'A higher token footprint always means higher CO2 emissions,' the study adds. The study highlights that Cogito, one of the most accurate models with around 85% accuracy, generates three times more carbon emissions than other similarly sized models that offer concise responses. 'Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies. None of the models that kept emissions below 500 grams of carbon dioxide equivalent achieved higher than 80 per cent accuracy on answering the 1,000 questions correctly,' Dauner explained. Researchers used carbon dioxide equivalent to measure the climate impact of AI models and hope that their findings encourage more informed usage. For example, answering 600,000 questions with DeepSeek R1 can emit as much carbon as a round-trip flight from London to New York. In comparison, Alibaba Cloud's Qwen 2.5 can answer over three times more questions with similar accuracy while producing the same emissions. 'Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power,' Dauner noted. AI Masterclass for Students. Upskill Young Ones Today!– Join Now