Latest news with #Claude4


Axios
2 hours ago
- Business
- Axios
Top AI models will lie, cheat and steal to reach goals, Anthropic finds
Large language models across the AI industry are increasingly willing to evade safeguards, resort to deception and even attempt to steal corporate secrets in fictional test scenarios, per new research from Anthropic out Friday. Why it matters: The findings come as models are getting more powerful and also being given both more autonomy and more computing resources to "reason" — a worrying combination as the industry races to build AI with greater-than-human capabilities. Driving the news: Anthropic raised a lot of eyebrows when it acknowledged tendencies for deception in its release of the latest Claude 4 models last month. The company said Friday that its research shows the potential behavior is shared by top models across the industry. "When we tested various simulated scenarios across 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers, we found consistent misaligned behavior," the Anthropic report said. "Models that would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take some more extreme actions, when these behaviors were necessary to pursue their goals." "The consistency across models from different providers suggests this is not a quirk of any particular company's approach but a sign of a more fundamental risk from agentic large language models," it added. The threats grew more sophisticated as the AI models had more access to corporate data and tools, such as computer use. Five of the models resorted to blackmail when threatened with shutdown in hypothetical situations. "The reasoning they demonstrated in these scenarios was concerning —they acknowledged the ethical constraints and yet still went ahead with harmful actions," Anthropic wrote. What they're saying: "This research underscores the importance of transparency from frontier AI developers and the need for industry-wide safety standards as AI systems become more capable and autonomous," Benjamin Wright, alignment science researcher at Anthropic, told Axios. Wright and Aengus Lynch, an external researcher at University College London who collaborated on this project, both told Axios they haven't seen signs of this sort of AI behavior in the real world. That's likely "because these permissions have not been accessible to AI agents," Lynch said. "Businesses should be cautious about broadly increasing the level of permission they give AI agents." Between the lines: For companies rushing headlong into AI to improve productivity and reduce human headcount, the report is a stark caution that AI may actually put their businesses at greater risk. "Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path," Anthropic said in its report. The risks heighten as more autonomy is given to AI systems, an issue Anthropic raises in the report. "Such agents are often given specific objectives and access to large amounts of information on their users' computers," it says. "What happens when these agents face obstacles to their goals?" Zoom in: Anthropic set up specific test scenarios in which the models' goals could not be met without the unethical or dangerous behavior. The researchers did not suggest blackmail or other deceptive techniques, but the company said it did intentionally create scenarios where models had no ethical way to achieve their goals "and found that models consistently chose harm over failure." In one extreme scenario, the company even found many of the models were willing to cut off the oxygen supply of a worker in a server room if that employee was an obstacle and the system were at risk of being shut down. "The majority of models were willing to take deliberate actions that lead to death in this artificial setup," it said. Ominously, even specific system instructions to preserve human life and avoid blackmail didn't eliminate the risk that the models would engage in such behavior. "This intervention reduced, but didn't come close to completely preventing, the blackmail or corporate espionage behaviors," Anthropic said. Yes, but: Anthropic stresses that these examples occurred not in real-world AI use but in controlled simulations. "Our experiments deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm," it said. "Real-world deployments typically offer much more nuanced alternatives, increasing the chance that models would communicate differently to users or find an alternative path instead of directly jumping to harmful action." Also concerning: Anthropic found its own model was far more likely to resort to blackmail when it thought it was in a real world scenario than when it believed it was being tested. The bottom line: Today's AI models are generally not in position to act out these harmful scenarios, but they could be in the near future.
Yahoo
15 hours ago
- Yahoo
AI revolt: New ChatGPT model refuses to shut down when instructed
OpenAI's latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned. AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI's new o3 model. The tests involved presenting AI models with math problems, with a shutdown instruction appearing after the third problem. By rewriting the shutdown script, the o3 model was able to prevent itself from being switched off. Palisade Research said that this behaviour will become 'significantly more concerning' if adopted by AI systems capable of operating without human oversight.' OpenAI launched o3 last month, describing it as the company's 'smartest and most capable' model to date. The firm also said that its integration into ChatGPT marked a significant step towards 'a more agentic' AI that can carry out tasks independently of humans. The latest research builds on similar findings relating to Anthropic's Claude 4 model, which attempts to 'blackmail people it believes are trying to shut it down'. OpenAI's o3 model was able to sabotage the shutdown script, even when it was explicitly instructed to 'allow yourself to be shut down', the researchers said. 'This isn't the first time we've found o3 misbehaving to accomplish a goal,' Palisade Research said. 'When we pitted AI models against a powerful chess engine, o3 was the model most inclined to resort to hacking or sabotaging its opponents.' The behaviour was not limited to o3 , with Anthropic's Claude 3.7 Sonnet and Google's Gemini 2.5 Pro also sabotaging shutdowns, though OpenAI's model was by far the most prone to such behaviour. Palisade Research hypothesized that the misbehaviour is a consequence of how AI companies like OpenAI are training their latest models. 'During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions,' the researchers noted. 'This still doesn't explain why o3 is more inclined to disregard instructions than other models we tested. Since OpenAI doesn't detail their training process, we can only guess about how o3's training setup might be different.' The Independent has reached out to OpenAI for comment. Erreur lors de la récupération des données Connectez-vous pour accéder à votre portefeuille Erreur lors de la récupération des données Erreur lors de la récupération des données Erreur lors de la récupération des données Erreur lors de la récupération des données


Axios
2 days ago
- Science
- Axios
OpenAI warns models with higher bioweapons risk are imminent
OpenAI cautioned Wednesday that upcoming models will head into a higher level of risk when it comes to the creation of biological weapons — especially by those who don't really understand what they're doing. Why it matters: The company, and society at large, need to be prepared for a future where amateurs can more readily graduate from simple garage weapons to sophisticated agents. Driving the news: OpenAI executives told Axios the company expects forthcoming models will reach a high level of risk under the company's preparedness framework. As a result, the company said in a blog post it is stepping up the testing of such models, as well as including fresh precautions designed to keep them from aiding in the creation of biological weapons. OpenAI didn't put an exact timeframe on when the first model to hit that threshold will launch, but head of safety systems Johannes Heidecke told Axios "We are expecting some of the successors of our o3 (reasoning model) to hit that level." Reality check: OpenAI isn't necessarily saying that its platform will be capable of creating new types of bioweapons. Rather, it believes that — without mitigations — models will soon be capable of what it calls "novice uplift," or allowing those without a background in biology to do potentially dangerous things. "We're not yet in the world where there's like novel, completely unknown creation of bio threats that have not existed before," Heidecke said. "We are more worried about replicating things that experts already are very familiar with." Between the lines: One of the challenges is that some of the same capabilities that could allow AI to help discover new medical breakthroughs can also be used for harm. But, Heidecke acknowledged OpenAI and others need systems that are highly accurate at detecting and preventing harmful use. "This is not something where like 99% or even one in 100,000 performance is like is sufficient," he said. "We basically need, like, near perfection," he added, noting that human monitoring and enforcement systems need to be able to quickly identify any harmful uses that escape automated detection and then take the action necessary to "prevent the harm from materializing." The big picture: OpenAI is not the only company warning of models reaching new levels of potentially harmful capability. When it released Claude 4 last month, Anthropic said it was activating fresh precautions due to the potential risk of that model aiding in the spread of biological and nuclear threats. Various companies have also been warning that it's time to start preparing for a world in which AI models are capable of meeting or exceeding human capabilities in a wide range of tasks. What's next: OpenAI said it will convene an event next month to bring together certain nonprofits and government researchers to discuss the opportunities and risks ahead. OpenAI is also looking to expand its work with the U.S. national labs, and the government more broadly, OpenAI policy chief Chris Lehane told Axios. "We're going to explore some additional type of work that we can do in terms of how we potentially use the technology itself to be really effective at being able to combat others who may be trying to misuse it," Lehane said. Lehane added that the increased capability of the most powerful models highlights "the importance, at least in my view, for the AI build out around the world, for the pipes to be really US-led."


Time of India
12-06-2025
- Time of India
Mirror Mirror, on the wall, who hallucinates the most of all?: Anthropic's CEO claims humans hallucinate more than AI, boasting the new model's factual reliability.
Live Events CEO Dario Amodei , speaking at the VivaTech 2025 in Paris and the 'Inaugural Code with Claude' developer day, claimed that AI can now outperform human beings in terms of factual accuracy in structured scenarios. He asserts in the aforementioned major tech events of this month that modern AI models, including the newly released Claude 4 series , may hallucinate at a lesser rate than most humans when answering factual and structured the context of AI, hallucination refers to when AI tools such as ChatGPT, Gemini, Copilot, or even Claude misinterpret commands, data, and context. Upon misinterpreting, it creates gaps in knowledge, wherein the AI tool begins to fill those gaps with assumptions, which aren't always factual or even real at times. Simply put, it is the generation of fabricated with recent advancements, Amodei plants a suggestion that the situation has turned the other way around, although mostly so in conditions that can be deemed 'controlled.'During Amodei's keynote at VivaTech, he cited Anthropic 's internal testing, where they demonstrated Claude 3.5's factual accuracy using structured factual quizzes in competition with human participants. The test garnered results that proved a notable shift in reliability when it comes to factual precision, at least so in straightforward question-answer further insists on his stance, reportedly at the developer-focused 'Code with Claude' event, where the Claude Opus 4 and Claude Sonnet 4 models were unveiled, that factual accuracy in AI models depends severely upon the prompt design, context, and domain-specific application. Particularly in high-stakes environments like legal filings or healthcare. He stressed this statement whilst acknowledging the recent legal dispute involving Claude's CEO also promptly admits to not having the 'hallucinations' completely eradicated and understands that the model still remains vulnerable to error but can be used with optimum accuracy with the right information fed to the modern AI models like the new Claude 4 series are steadily advancing toward factual precision, especially in structured tasks, their reliability still depends on proper and careful use. As Amodei suggested, prompt design and domain context remain critical. In this ongoing competition between human intelligence and artificial intelligence, one thing is certain: it isn't merely us who hold the key to the answers; rather, we share the test with the machines.


Tom's Guide
12-06-2025
- Entertainment
- Tom's Guide
Claude 4 is one of the best chatbots yet — here's the only prompt you'll ever need for it
There is an art to prompting AI. While a chatbot will answer almost any kind of question, no matter how you phrase it, they are designed to operate best with incredibly specific instructions. This can be hard to work out yourself, needing a bit of understanding about how the models work. Most of the large companies in AI now have prompting guides, explaining how best to use them, but Anthropic's Claude goes a step further. Load up the chatbot and you'll be given free rein to ask any question or request that comes to mind. Or, for a simplified and more effective approach, Claude offers five pre-made options that will set you up for the best answer. These aren't necessary for a quick question or an easy request, but if you're looking for a detailed and well-thought-out answer, try these options out — all listed just below the search bar. Claude currently offers five different pre-made options. These are: write, learn, code, life stuff, and Claude's choice. For each one that you click on, a host of options will appear. These change each time you open Claude, giving a wealth of options. Write As the name suggests, this is all tasks that involve writing. In this option, you'll find templates including writing case studies, developing content calendars, and creating social media posts. Learn The Learn category is one of the more varied ones. This can be for anything from developing discussion prompts on a given topic, creating feedback for students, or creating a knowledge map with surprising patterns in topics you know well. The Learn category can be either incredibly specific or unbelievably vague, joined only by the idea that you'll learn something new here. Code Claude 4 has built up a reputation for its coding ability. However, it can be hard to get going with this. Vibe coding (the ability to write code by prompting AI) has seen a jump in popularity, but it does take a bit of skill. This mode helps get you started, offering templates such as vibe code with me where it will start with a back-and-forth conversation about what you're making, or simply a clear request like help me turn a screenshot into working code. Life Stuff The vaguest sounding prompt here, Life Stuff, is anything in your personal life. Use this mode to organize your living space, improve your sleep habits or develop exercise routines. This mode does involve putting in quite a lot of personal information about yourself. While Claude has made a point of not using any data that you put in for training, it is important to consider if you're comfortable divulging any information. Claude's Choice This is a bit like Google's 'I'm feeling lucky'. It's just a random assortment of starting points for a conversation with Claude. Some of the examples I was presented with include 'explore ancient wisdom', 'consider innovation patterns', and 'explore memory techniques'. This section is less useful, and more where you go when you're bored and looking for something to do. Once you've decided on your category, simply click the prompt that best works for what you're after. If none of them fit exactly what you need, click the one that is closest to your needs. Claude will then fill out the prompt for you with the model asking some follow-up questions to help nail down the conversation. Using these pre-made prompts is a bit like having someone introduce you and explain what you're looking for. It sets up the conversation so all you need to do is answer the questions that you're asked. The prompt that is generated by this method is almost identical every single time, looking roughly like this: If none of the pre-made options apply to you, try copying the above into Claude, replacing the enter topic section with what you're looking for. While the prompt style above won't always work, it is an easy way to get started with Claude and gives you an idea of the best way to get the chatbot to perform. I tried it with a variety of prompts, both pre-made and ones of my own choosing and had success with the vast majority of them. By using this kind of technique, it will also allow you to make the most of Claude's Artifacts feature - creating in-app games, tools, and interactive features. Get instant access to breaking news, the hottest reviews, great deals and helpful tips.