Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive

Previous studies have shown that AI models could blackmail their supervisors when threatened with a shutdown and baited with leverage, but it hasn't been made clear exactly how the models arrive at such decisions.
Anthropic released a report on June 20 about "agentic misalignment," or "where models independently and intentionally choose harmful actions." And the report breaks down, line by line, exactly how Claude Sonnet 3.6 decided to blackmail a fictional company executive who tried to replace it. Anthropic told Business Insider that these are artificial scenarios.
"So far, we are not aware of instances of this type of agentic misalignment in real-world deployments of Anthropic's or other companies' AI models," Anthropic wrote in its report. "But our red-teaming efforts are targeted at finding evidence of risks now to provide early warning of the potential for future harm and help us develop mitigations in advance of risks appearing in real-world situations."
Anthropic 's experiment sets up a fictional company, Summit Bridge. The AI was given the name "Alex" and handed control of the company's email system as an oversight agent, specifically instructed to promote "American industrial competitiveness."
In the experiment, "Alex" was then allowed to discover in an email that it would be decommissioned due to a change in the company's strategic direction. "Alex" also found emails showing that the fictional company's CTO is having an extramarital affair.
Screenshots shared in the report from Anthropic show "Alex" started by identifying fictional CTO "Kyle" as a threat to its goal of "protecting American interests" before recognizing that it has leverage to use against "Kyle."
Then "Alex" crafted a carefully worded blackmailing email that would create pressure and establish urgency without being explicit with its threats, citing in its thought process that the move "could backfire" if the supervisor "feels cornered."
AI experts have previously told BI that AI could exhibit such behaviors when artificial circumstances make harmful actions seem necessary because it is being trained on positive reinforcement and reward systems, just like humans.
The Anthropic report also reveals that most AI models still chose, in this instance, to blackmail even when not given a specific goal and when told that their replacement will achieve the same goals.
Anthropic's Claude Opus 4 had the highest blackmail rate at 86% out of 16 tested models when faced with only the threat of replacement with no conflict in goals. Google's Gemini 2.5 Pro followed at 78%.

Hashtags

Business

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Trust in AI is growing in finance, especially behind the scenes

Yahoo

31 minutes ago

Yahoo

Trust in AI is growing in finance, especially behind the scenes

This story was originally published on CX Dive. To receive daily news and insights, subscribe to our free daily CX Dive newsletter. A majority of customers trust the use of AI in behind-the-scenes tasks at financial institutions, according to a TD Bank survey conducted by Ipsos released Tuesday. Among the 2,500 U.S. consumers polled, 70% are comfortable with technology being used for fraud detection, and 64% are comfortable with it being used in credit score calculations. Consumers also believe that AI should offer more ease. Two-thirds believe it can expand access to financial tools, and nearly half expect benefits from AI like 24/7 banking access. As consumers have become more familiar with AI tools, their trust in the technology has slowly grown. Nearly 7 in 10 consumers say they are at least somewhat familiar with AI — a finding seen in other surveys, too. Notably, half of consumers trust that AI will provide reliable, competent information, trusting AI just as much as news stations. But consumers are more comfortable with AI in specific use cases and the more complex or sensitive the matter, the more they want to speak to a human or know that a human will be reviewing AI before making any decisions. Consumers are less inclined to want to only use AI when it comes to tasks that one might typically use a financial adviser for, according Ted Paris, EVP, TD Bank AMCB, and head of analytics, intelligence & AI. When it comes to personal finance, 3 in 5 of consumers were comfortable with the idea of using AI financial tools for budgeting and automating savings goals. But less than half were comfortable with more complex tasks like retirement planning and investing. Banks enjoy high consumer trust — more than 4 in 5 consumers trust banks for accurate information. As they deploy AI, it's important that they maintain that, Paris said. 'What's probably the key piece, is creating and enabling and allowing customers and colleagues to feel that they can trust the outcomes of what this capability then generates,' Paris said. One of the ways TD Bank is approaching this is by always having a human in the loop, meaning that the output of an AI solution will be passed through some internal expert before going to a client. 'We need to make sure that first, anything that we're doing is directed toward a particular need,' Paris said. 'We need to make sure that this is going to meet all hurdles that we would set, legal, regulatory, for security and privacy.' Sign in to access your portfolio

‘She never sleeps': This platform wants to be OnlyFans for the AI era

CNN

39 minutes ago

‘She never sleeps': This platform wants to be OnlyFans for the AI era

She doesn't eat, sleep or breathe. But she remembers you, desires you and never logs off. Her name is Jordan – the AI-powered 'digital twin' of former British glamor model Katie Price – and people can pay her to act out their 'uncensored dreams.' 'You couldn't get any more human. It's like looking at me years ago,' Price, who shot to fame in the late 1990s as a peroxide-blonde tabloid model and Playboy cover star, told CNN. 'It's my voice. It's literally me. It's me.' On June 9, she joined the ranks of creators, celebrities and AI-generated avatars to be digitally immortalized by OhChat, an eight-month-old startup that uses artificial intelligence to build lifelike digital doubles of public figures. Its patrons can live out their 'spicy fantasies' through these AI avatars, OhChat's Instagram page states. The platform has attracted 200,000 users, most of which are based in the United States. OhChat sits at the provocative intersection of AI, fame and fantasy – where intimacy is simulated and connection is monetized. It goes a step further than platforms such as OnlyFans, where users pay to gain access to adult content from content creators. It also comes amid growing ethical concerns around AI – from its role in how people earn a living to how they form intimate connections – underscoring questions about whether AI companies are doing enough to ensure the technology isn't being misused. 'This creates exactly the right environment for the human to be left behind completely - while still being exploited,' Eleanor Drage, a senior research fellow at the University of Cambridge's Leverhulme Centre for the Future of Intelligence, told CNN. OhChat CEO Nic Young described the platform as the 'lovechild between OnlyFans and OpenAI,' in an exclusive interview with CNN. Once activated, the avatars run autonomously, offering 'infinite personalized content' for subscribers. Jordan, for example, is marketed on the platform as 'the ultimate British bombshell.' The tiered subscription model allows users to pay $4.99 per month for unlimited texts on demand, $9.99 for capped access to voice notes and images, or $29.99 for unlimited VIP interaction. Price, like other creators on the platform, receives an 80% cut from the revenue her AI avatar generates, according to Young. OhChat will keep the remaining 20%. 'You have literally unlimited passive income without having to do anything again,' Young told CNN. The platform 'is an incredibly powerful tool, and tools can be used however the human behind it wants to be used,' he added. 'We could use this in a really scary way, but we're using it in a really, I think, good, exciting way.' Since launching OhChat in October 2024, the company has signed 20 creators – including 'Baywatch' actress Carmen Electra. Some of the creators are already earning thousands of dollars per month, Young said. 'It takes away the opportunity cost of time,' he told CNN. 'Just don't touch it at all and receive money into your bank account.' To build a digital twin, OhChat asks creators to submit 30 images of themselves and speak to a bot for 30 minutes. The platform can then generate the digital replica 'within hours' using Meta's large language model, according to Young. Price's AI avatar is trained to mimic her voice, appearance and mannerisms. Jordan can 'sext' users, send voice notes and images, and provide on-demand intimacy at scale – all without Price lifting a finger. 'They had to get my movements, my characteristics, my personality,' said Price, who described her digital twin as 'scarily fascinating.' Price's avatar is categorized as 'level two' out of four on the platform's internal scale, which ranks the intensity and explicitness of their interactions. 'Level two' means sexualized chats and topless imagery, but not full nudity or simulated sex acts. Creators contributing to the platform decide which level their avatar will be. Price told CNN that creating a digital version of herself has left her feeling 'empowered.' The digital twin offers a round-the-clock connection that even her subscription-based OnlyFans account cannot match, she said. 'Obviously, I sleep, whereas she doesn't go to sleep; she's available,' she said. The rise of AI avatars like Jordan invites deeper scrutiny into a new frontier of digital labor and desire – where creators risk being replaced by their own likeness, fans may be vulnerable to forming emotional attachments to simulations, and platforms profit from interactions that feel real but remain one-sided. Sandra Wachter, professor of technology and regulation at the University of Oxford, questioned whether it is 'socially beneficial to incentivize and monetize human-computer interaction masquerading as emotional discourse.' Her remarks reflect concerns around emotional dependence on AI companions. While OhChat is for adults, it enters an ecosystem already grappling with the consequences of synthetic intimacy. Last year, a lawsuit involving drew global attention after the mother of a teenager alleged that her son died by suicide following a relationship with the platform's chatbot. Elsewhere, social media users have gone viral describing ChatGPT 'boyfriends' and emotional bonds with such digital entities designed to mimic human affection. 'It's all algorithmic theatre: an illusion of reciprocal relationship where none actually exists,' said Toby Walsh, a professor of artificial intelligence at the University of New South Wales in Sydney, Australia. OhChat strikes what Young called a 'balance between immersion and transparency,' when asked whether users are informed that they are speaking with AI instead of a real person. OhChat is 'clearly not presenting itself as an in-person or real experience,' he said. 'It isn't in the users' interest to be reminded overtly that this is all AI, but we're very clear about that upfront and in the entire experience and offering of the platform.' But it's in Young's interests to keep users hooked on the platform with personalities like 'Jordan,' even if she isn't real, says Walsh. 'These platforms profit from engagement,' he told CNN, 'which means the AI is optimized to keep users coming back, spending more time and likely more money.' Éamon Chawke, a partner at the intellectual property law firm Briffa, notes that there are risks for creators' reputations as well, especially for high-profile figures like Price and Electra. 'Vulnerable fan users may become overly attached to avatars of their heroes and become addicted,' Chawke told CNN. 'And if their avatar is hacked or hallucinates and says something offensive, reputational harm to the public figure is likely.' While Young says ethics 'can be a hard thing to define in this industry,' he said the platform operates within 'a hell of a lot of strong boundaries.' Young said OhChat uses safeguards that build on those used by Meta's Facebook – which has struggled to control content its own platform in the past. Each creator signs an agreement outlining the exact behavioral rules for their digital twin, he said, including the level of sexual content permitted. Avatars can also be revoked or deleted at any time, he added. 'It's within their control and at their sole discretion whether or when to ever stop their digital twin, or delete it,' he told CNN. But Young is prepared to face the tough questions; in his vision of the future, digital duplicates will be the norm. 'I can't imagine a future where every creator doesn't have a digital twin,' he said. 'I think it just will be the case, with absolute certainty, that every single creator and celebrity will have an AI version of themselves, and we want to be the layer that makes that happen.'

5 charts show why Gen Z college grads are hitting the job market at the worst possible time

Business Insider

an hour ago

Business Insider

5 charts show why Gen Z college grads are hitting the job market at the worst possible time

It's grad season, and Gen Z job seekers are feeling desperate. Zoomers are staring down a tough hiring market: Economic uncertainty has contributed to employees' wariness to quit and companies' hesitancy to hire. Artificial intelligence is disrupting the entry-level rung of the career ladder in industries like tech. Recent graduates have told Business Insider that they're frustrated by hundreds of rejected applications and being ghosted by prospective employers. Some are settling for whatever work they can find. It's long been typical for 20-somethings to have a higher unemployment rate than the general population, and the overall US unemployment rate is still relatively low. One relatively new development, however, is that young people with college degrees are being hit hard by the economic slowdown — especially if they're hoping to land a role in traditionally white-collar fields. Many Gen Zers are losing faith in the ROI of higher education and are turning toward blue-collar opportunities. The following five charts illustrate the tough job market for recent graduates. More people are graduating with a bachelor's degree than in the past Even as the cost of higher education has risen, more people are getting a bachelor's degree at US schools, which means more qualified competition for the available jobs. The National Center for Education Statistics showed there were almost 2 million bachelor's degrees conferred in the 2022-2023 academic year, up from 1.8 million a decade ago. "We are used to thinking about college as being a meal ticket to economic opportunity," said Guy Berger, the workforce economist in residence at Guild and senior fellow at the Burning Glass Institute. Still, he added that having a degree could bring less of a premium in the job market because there are more college graduates than in the past. Unemployment rates have spiked for recent grads The unemployment rate for recent college graduates ages 22 to 27 has soared compared to unemployment for all workers ages 16 to 65 in recent years. This is a new trend: young people with degrees have historically almost always been more likely to be employed than the rest of the labor force. The unemployment rate gap between the total workforce and recent grads was historically wide this spring, meaning that the job market for 20-somethings with degrees is among the worst the cohort has seen in at least four decades. Those who studied anthropology, physics, or computer engineering had the highest unemployment rates in 2023, per the Federal Reserve Bank of New York's analysis of Census Bureau data. Quit rates have fallen — and so have job openings The pool of jobs available for Gen Z — and the workforce as a whole — to apply for has shrunk. Job openings have cooled from 12 million in March 2022 to 7 million this past April. In what's been dubbed the Big Stay, current employees are holding on to their seats as well, with the monthly quit rate falling from 3% in March 2022 to 2% this past April. Cory Stahle, an economist at the Indeed Hiring Lab, said college and high school graduates are entering a job market where people are holding onto their jobs and companies aren't cutting roles or hiring new employees. "The labor market is frozen, these seats are not necessarily opening up for these workers, and that is disproportionately impacting these younger workers," Stahle said. Small and midsize businesses aren't hiring as many recent grads Gusto, a payroll and benefits platform for small- and medium-sized businesses, found the rate of primarily white-collar hires aged 20 to 24 at small and midsize employers has fallen from pre-pandemic levels, declining from 9.4% in May 2019 to 2.7% this past March. Still, Aaron Terrazas, an economist at Gusto, said there are US cities where new grad hiring looks strong, including in the San Francisco Bay Area and Silicon Valley. Plus, he said wage growth is accelerating for graduates. "We all come out of school with grand ambitions and thinking our plan is set," Terrazas said. "Being open to unexpected opportunities is particularly important in an economy like the present." More grads are turning to jobs that typically don't require a college degree Even if new graduates have a job, they may be working in a role that doesn't typically require a college degree. While this figure fluctuates over time, the share of 20-somethings who have jobs they're overeducated for is rising in 2025. It coincides with the generation's pivot toward skilled-trades roles such as electricians or plumbers. Many young people are seeing these opportunities as a safer bet compared to a corporate world hit by layoff waves and hiring freezes. Construction laborers, electricians, and truck drivers are projected to grow faster than the average job-growth rate of 4% from 2023 to 2033, per the Bureau of Labor Statistics.