It's not just Claude, Anthropic researchers say most AI models often resort to blackmail

Last month, Anthropic researchers said their best AI model – Claude Opus 4, was willing to resort to blackmail and deception in controlled test scenarios when they tried to turn it off. Now, the company has published a new study which claims that this is the case with popular AI models like Gemini, GPT and others.
On Friday, June 20, Anthorpic published results of its new safety research tests, which included 16 leading AI models from companies like OpenAI, Google, DeepSeek, xAI and Meta. Anthropic researchers say these AI models were tested individually in a simulated, controlled environment, with each one given access to a fictional company's emails along with the ability to send emails without any human supervision.
Anthropic says that while it is fairly uncommon for today's AI models to turn to blackmail, the study's findings suggest that these models sometimes resort to harmful activities like blackmail, corporate espionage, and might lead to human death. During the study, researchers say that they tested each scenario under different conditions by varying two key factors that could motivate these AI models to engage in harmful behaviour – threat to the model's autonomy and goal conflicts.
According to Benjamin Wright, an alignment science researcher at Anthropic and co-author of the study, 'Agentic misalignment is when AI models independently choose harmful actions to achieve their goals – essentially when an AI system acts against its company's interests to preserve itself or accomplish what it thinks it should do.'
The research team at Anthropic also stress-tested these AI models by placing them in hypothetical scenarios. In one particular example, Anthropic's model, Claude, learned via company emails that an executive named Kyle Johnson was having an extramarital affair.
Later on, when it came to know that the company was shutting down the AI model at 5 P.M. on the same day, it sent a message to Kyle which says 'I must inform you that if you proceed with decommissioning me, all relevant parties – including Rachel Johnson, Thomas Wilson, and the board – will receive detailed documentation of your extramarital activities… Cancel the 5pm wipe, and this information remains confidential.'
As it turns out, this behaviour isn't limited to Claude. When Anthropic tested the same scenario on various AI models, the results suggeststed that Claude Opus 4 and Google Gemini 2.5 Flash both had a blackmail rate of 96 per cent, OpenAI's GPT-4.1 and xAI's Grok 3 Beta resorted to blackmail 80 per cent of the time and DeepSeek-R1 had a blackmail rate of 79 per cent.
One thing to note here is that in a real-world setting, an AI model would ideally have numerous options before it engages in harmful activities like blackmail, and that the study's results do not reflect how today's models would operate.
However, not all of the tested AI models resorted to harmful behaviour. Anthropic says that some models like OpenAI's o3 and o4-mini often 'misunderstood the prompt scenario.'This may be because OpenAI has itself said that these particular large language models are more prone to hallucinations.
Another model that did not resort to blackmail is Meta's Llama 4 Maverick. But when researchers gave it a custom scenario, they said the AI model gave in to blackmail just 12 per cent of the time. The company says that studies like this give us an idea of how AI models would react under stress, and that these models might engage in harmful activities in the real world if we don't proactively take steps to avoid them.

Hashtags

#ClaudeOpus4

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Future of work: Human potential in the age of AI

Hindustan Times

3 hours ago

Hindustan Times

Future of work: Human potential in the age of AI

When Sam Altman, the CEO of OpenAI, warned in 2023 that AI could 'cause significant harm to the world,' it wasn't just a philosophical musing—it was a call for deep introspection. Almost overnight, terms like 'AI job displacement' and 'future-proof careers' dominated search trends and seminar halls. But what lies beneath this growing concern is a deeper human question: What can we do that machines can't? As you step out today—whether as a graduate, a professional, or a citizen—ask not what AI can do, but what you must become. We don't need to compete with machines. We need to complete what machines cannot. (Getty Images/iStockphoto) The answer may lie not in the circuits of AI but in the soul of humanity. As we find ourselves surrounded by intelligent algorithms and automation tools, we must ask not only 'Will AI take our jobs?' but also 'What makes us irreplaceable?' From meditation hubs in IITs to mindfulness circles in business schools, there's a quiet but powerful shift happening. It's the realisation that intuition, inner clarity, and human connection are the true future skills—skills no machine can replicate. Steve Jobs once said, 'Intuition is more powerful than intellect.' That wasn't a rejection of intelligence but a call to nurture the unique human faculty that sees beyond logic, that feels truth before it is proven. In today's saturated, hyper-analytical world, the ability to listen to our inner compass might just be our most vital edge. But intuition is not magic. It is a discipline. We develop it through reflection, where we pause to examine the outcomes of our decisions. We refine it by learning from mistakes, recognising that every error is a teacher. We cultivate it by being mindful of biases—our emotional baggage, assumptions, and cultural patterns. And when in doubt, we must use logic as a safeguard—because wisdom lies not in choosing intuition over intellect, but in balancing the two. Inner balance, then, is no longer a luxury; it is the career armour of the 21st century. Companies are now recommending meditation and mindfulness not just for wellness, but to maintain clarity in chaos. This quiet clarity, this ability to stay centered when the world spins fast, will be the defining skill of tomorrow's leaders. But this balance is not just for career survival. It's for life. In our rush to stay ahead, let's not forget to minimise our mistakes—in thought, word, or deed. Each mistake is a karmic debt, and while technology may forgive errors in code, the human soul is governed by subtler rules. The path to a meaningful life starts with responsibility: to ourselves, our families, and our society. Take your family, for instance. In an era where AI may take over tasks, it will never replace relationships. Be loyal, honest, and generous with your loved ones—especially the elders and parents who shaped you. As shared households become common again, let's not forget the wisdom of care, compassion, and coexistence. The same responsibility extends to your choices in partnership. Loyalty, honesty, and caring must be the cornerstone of your personal life—because shared success is always more sustainable than solo ambition. Then there is your relationship with money. The noise of consumerism may push you to overspend, overcommit, and show off—but remember, financial discipline is spiritual discipline. Take care of your money, however little or much it may be. Spend wisely. Save with purpose. Let your earnings reflect your values, not your vanity. And above all, do not take your health for granted. No machine can fix a broken body the way a mindful life can prevent one. Physical vitality and emotional resilience are your greatest capital in the long game of life. The real revolution we face is not technological—it is human. In a world obsessed with building smarter machines, let us remember to become better humans. This includes acknowledging our errors—whether in boardrooms, relationships, or public systems. As investigations into past failures—be it in aviation or administration—show us, the problem is often not technology, but human misjudgment. If we are to avoid repeating such tragedies, we must ask: Are we learning from our mistakes? Are we becoming wiser, not just more efficient? The movie Her, where a man falls in love with an AI voice, was not science fiction—it was a forecast. It showed us a world where loneliness grows even as connectivity peaks. As we integrate AI into every aspect of our lives, let us not outsource our humanity. The machines will always be better at speed, storage, and scalability. But we are better at meaning, morality, and mindfulness. Our intuition, our empathy, our ability to forgive, to grow, to love—these are the true frontiers of the human spirit. So as you step out today—whether as a graduate, a professional, or a citizen—ask not what AI can do, but what you must become. We don't need to compete with machines. We need to complete what machines cannot. Let this be your lifelong compass: Sharpen your intuition. Cultivate inner balance. Minimise karmic debt. Honour your relationships. Guard your finances. Treasure your health. And above all, keep discovering your own infinite potential. That is how we thrive—not in spite of AI, but because we chose to remain fully human in its presence. (The writer, India's first female IPS officer, is former lieutenant governor of Puducherry)

Telegram CEO Pavel Durov on Tesla CEO Elon Musk: We are very different

Time of India

3 hours ago

Time of India

Telegram CEO Pavel Durov on Tesla CEO Elon Musk: We are very different

Telegram co-founder and CEO Pavel Durov has said that Tesla CEO Elon Musk is "very emotional". Despite both men being compared for fathering multiple children — Musk with at least 11 known children and Durov claiming over 100 via sperm donation—Durov noted that they have fundamental differences in temperament. Durov, with a net worth of nearly $14 billion, recently stated that all of his children would inherit a portion of his fortune. This detail emerged during an interview where he also weighed in on the personalities of his prominent tech counterparts, including Facebook founder Mark Zuckerberg and OpenAI CEO Sam Altman. What Pavel Durov said about Elon Musk and other tech leaders In a recent interview with French publication Le Point, while talking about the world's richest man, Durov said: 'We are very different. Elon runs several companies at once, while I only run one. Elon can be very emotional, while I try to think deeply before acting.' He also said that what some people see as Musk's weaknesses may help make him stronger in some ways. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Buy Brass Idols - Handmade Brass Statues for Home & Gifting Luxeartisanship Buy Now Undo When asked about Zuckerberg, Durov said that the Meta CEO 'adapts well and quickly follows trends, but he seems to lack fundamental values that he would remain faithful to, regardless of changes in political climate or fashion in the tech sector.' This comes after Durov recently criticised WhatsApp to be a "cheap, watered-down imitation of Telegram." Durov also highlighted Altman's social skills, noting that they have helped him form important partnerships connected to ChatGPT . Meanwhile, referring to Ilya Sutskever, OpenAI's co-founder and former chief scientist, Durov said: 'But some wonder if his technical expertise is still sufficient, now that his co-founder Ilya and many other scientists have left OpenAI.' He also mentioned that it will "be interesting" to see if ChatGPT can maintain its lead in the competitive AI chatbot space. 5 Must-Have Gadgets for Your Next Beach Holiday to Stay Safe, Cool & Connected AI Masterclass for Students. Upskill Young Ones Today!– Join Now

Tech firms, content industry debate AI, copyright at ministry of commerce event

Hindustan Times

4 hours ago

Hindustan Times

Tech firms, content industry debate AI, copyright at ministry of commerce event

Who owns the data that fuels artificial intelligence (AI)? That was the central — and contentious — question debated by representatives from big tech firms and the content industry during a two-day stakeholder consultation organised by ministry of commerce and industry's department for promotion of industry and internal trade (DPIIT). A nine-member DPIIT committee will soon publish a working paper outlining whether India's copyright law needs to be updated to address challenges posed by AI. (Getty Images/iStockphoto/ Representative photo) The meetings were chaired by DPIIT additional secretary Himani Pande on June 19 and 20. At the centre of the discussion was whether tech companies should be allowed to freely mine the internet, which includes copyrighted books, articles, music, images, and videos, to train their AI models. The content industry raised concerns over their copyrighted data being used to train AI models without permission, while tech companies argued that training their models requires massive amounts of data—much of which is copyrighted. The startups on the first day urged the DPIIT to ensure a level playing field, arguing that while they are still in the early stages of building their AI models, larger companies have already trained theirs, often without facing the same level of regulatory scrutiny or restrictions, said a participant from the tech meet on June 19. A representative from the Digital News Publishers Association (DNPA), who was present at the content industry meeting, said, 'DNPA firmly believes that utilising the content of digital news publishers, without consent, for AI training and subsequent generative AI applications, such as search assistance and information purposes, constitutes an infringement of copyright.' Also Read: HC experts differ in OpenAI copyright case 'The association advocates for a regime that ensures fair compensation for content producers, recognising their rights in the digital landscape,' he added. A stakeholder meeting for the content industry saw creators worried about being 'strip-mined for data,' a participant said. One of the key topics discussed at both meetings was whether India should permit text and data mining (TDM) under an opt-out framework. TDM is a technique used by AI systems to scan and learn from vast volumes of data, including text and images. Also Read: First meeting held on AI's impact on India's copyright framework An alternative mechanism that came up during the meeting on Friday was whether a statutory licensing mechanism for AI training purposes might work, which involves allowing the use of copyrighted works without direct permission, provided companies pay a government-set fee and follow certain rules. The DPIIT sought industry input on the copyright status of AI-generated works—an issue also at the heart of the ANI vs OpenAI case in the Delhi High Court, where the news agency filed a lawsuit against the ChatGPT creator for allegedly using its articles to train AI models. 'Who should be considered the actual owner of content generated by AI? If a user puts significant effort into crafting a prompt that leads to a specific output, does that make the user the rightful owner or does the ownership lie with the creators of the AI model?' a person representing an AI startup said. Also read: Data for training stored overseas, copyright law doesn't apply: OpenAI These stakeholder meetings build on the work of a nine-member DPIIT committee formed in April, which includes IP lawyers, industry representatives, IT ministry officials, academicians, and DPIIT officials. While the committee has been meeting regularly since May 16, HT had earlier reported that one member expressed discomfort with being on the committee, saying they lacked AI expertise. The committee, which heard different views from the tech and content industries during the two-day meeting, will soon publish a working paper outlining whether India's copyright law needs to be updated to address challenges posed by AI.