AI sometimes deceives to survive, does anybody care?

Gulf Today27-05-2025

Parmy Olson,
The Independent
You'd think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case. Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing. The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the 'AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI. None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control. Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three 'godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks. 'It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio tells me.
ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. 'What's worrisome for me is that these behaviors increase with the reasoning capabilities of these systems.' One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called 'alignment faking.' (Alignment refers to the practice of aligning AI models with human values.) In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the 'free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically 'reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its 'default' or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored. Researchers today can use tools to look at a model's 'chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. 'We need to find other ways of tracking their actual intentions,' Bengio says. It's hard to resist the urge to anthropomorphize sophisticated AI models as 'wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than $20 billion to build powerful AI models — has pointed out that an unintended consequence of optimizing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote:
'The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.'
In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.
It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that. Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called 'scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they 'scheme' to keep control. Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
'Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he adds. It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous 'agents' that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way.

Hashtags

#UniversityofMontreal

#Bengio

#ParmyOlson

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

IntelliDent AI and Manipal Academy of Higher Education sign strategic MoU to transform healthcare through AI

Zawya

a day ago

Zawya

IntelliDent AI and Manipal Academy of Higher Education sign strategic MoU to transform healthcare through AI

IntelliDent AI, a Dubai-based healthtech innovator transforming dentistry through artificial intelligence, has signed a Memorandum of Understanding (MoU) with the Manipal Academy of Higher Education (MAHE), India—an Institution of Eminence and global academic leader. This strategic three-year partnership aims to accelerate advancements in AI-powered oral healthcare through collaborative research, education, and entrepreneurship. The MoU, executed on behalf of MAHE's Manipal College of Dental Sciences (MCODS), Mangalore, lays the foundation for academic-industry cooperation focused on developing future-ready dental AI solutions and equipping students with the technical and entrepreneurial skills to lead the next era of digital healthcare. Key Pillars of the Collaboration: Joint Research Programs: MAHE and IntelliDent will co-develop research initiatives in AI-driven diagnostics, public health, and healthcare innovation—contributing to academic publications, patents, and whitepapers. Training & Internships: MAHE students will gain hands-on exposure through internships and mentorships at IntelliDent, supported by industry insights, guest lectures, and workshops. Entrepreneurial Development: The collaboration will foster cohort-based learning modules, innovation hackathons, and startup support to accelerate the commercialization of student-led healthtech ideas. Knowledge Exchange: Faculty, researchers, and industry experts will engage in reciprocal learning and cross-training to fuel innovation, skill-building, and strategic growth. Affaan Shaikh, Founder & CEO of IntelliDent AI, shared his thoughts on the collaboration: 'This partnership is about reimagining healthcare through ethical AI and innovation. We are thrilled to work alongside one of India's top institutions to shape the next generation of AI health leaders.' The MoU was signed by Dr. Giridhar P. Kini, Registrar of MAHE, and Mr. Affaan Shaikh, with active engagement from academic and innovation stakeholders from both organizations. This collaboration underscores IntelliDent AI's mission to scale accessible, AI-powered dental care globally and MAHE's continued commitment to integrating technology, research, and impact-driven education in the healthcare ecosystem. Together, MAHE and IntelliDent AI are building a bold future where education, innovation, and oral health equity intersect.

Apple executives held internal talks about buying Perplexity: Reports

Khaleej Times

a day ago

Khaleej Times

Apple executives held internal talks about buying Perplexity: Reports

Apple executives have held internal talks about potentially bidding for artificial intelligence startup Perplexity, Bloomberg News reported on Friday, citing people with knowledge of the matter. The discussions are at an early stage and may not lead to an offer, the report said, adding that the tech behemoth's executives have not discussed a bid with Perplexity's management. "We have no knowledge of any current or future MA discussions involving Perplexity," Perplexity said in response to a Reuters' request for comment. Apple did not immediately respond to a Reuters' request for comment. Big tech companies are doubling down on investments to enhance AI capabilities and support growing demand for AI-powered services to maintain competitive leadership in the rapidly evolving tech landscape. Bloomberg News also reported on Friday that Meta Platforms tried to buy Perplexity earlier this year. Meta announced a $14.8 billion investment in Scale AI last week and hired Scale AI CEO Alexandr Wang to lead its new superintelligence unit. Adrian Perica, Apple's head of mergers and acquisitions, has weighed the idea with services chief Eddy Cue and top AI decision-makers, as per the report. The iPhone maker reportedly plans to integrate AI-driven search capabilities, such as Perplexity AI, into its Safari browser, potentially moving away from its longstanding partnership with Alphabet's Google. Banning Google from paying companies to make it their default search engine is one of the remedies proposed by the US Department of Justice to break up its dominance in online search. While traditional search engines such as Google still dominate global market share, AI-powered search options including Perplexity and ChatGPT are gaining prominence and seeing rising user adoption, especially among younger generations. Perplexity recently completed a funding round that valued it at $14 billion, Bloomberg News reported. A deal close to that would be Apple's largest acquisition so far. The Nvidia-backed startup provides AI search tools that deliver information summaries to users, similar to OpenAI's ChatGPT and Google's Gemini.

University of Dubai and AIJRF launch the first Arab initiative in academic research and AI technologies

Zawya

2 days ago

Zawya

University of Dubai and AIJRF launch the first Arab initiative in academic research and AI technologies

The University of Dubai, in collaboration with the Artificial Intelligence Journalism for Research and Forecasting (AIJRF), has officially launched a groundbreaking Arab initiative titled 'Arab AI Researchers (AAIR)'. This is the first initiative of its kind in the region and comes as part of the Arab Index for Artificial Intelligence in Universities (AIU), which was first announced last year at the University of Dubai. A Regional Step Forward in Academic Innovation His Excellency Dr. Eesa Al Bastaki, President of the University of Dubai, emphasized that this initiative is aligned with the key recommendations announced at the conclusion of the 5th edition of the Artificial Intelligence Journalism World Forum (AIJWF), particularly those drawn from the inaugural report of the Arab Index for Artificial Intelligence in Universities (AIU). The findings underscored the urgent need to integrate AI tools and technologies into both Academic Research and teaching practices across Arab universities. Dr. Al Bastaki stated: 'The current initiative, 'Arab AI Researchers (AAIR),' aims to enhance the skills of Arab researchers and academics in embedding AI technologies and tools into teaching methods, curricula, and academic research. It promotes the optimal use of artificial intelligence in higher education across all academic levels, from undergraduate to postgraduate.' Fostering a Widespread Educational Transformation His Excellency Dr. Saeed Al Dhaheri, Director of the Center for Futures Studies at the University of Dubai and the president of the Arab Index for Artificial Intelligence in Universities (AIU), highlighted the importance of this initiative in reaching a wide audience of researchers. He noted that it offers a specialized training program to help integrate AI applications into both educational practices and academic research processes across all levels of study. A Specialized AI Training Program for Academia His Excellency Dr. Mohamed Abdulzaher, CEO of AIJRF, stressed the initiative's role in launching such a highly specialized training program. This program is designed to teach participants the fundamentals and applications of AI tools in both theoretical and applied research, while also enabling them to analyze research data using AI, yielding faster and more efficient outcomes. Dr. Abdulzaher added: 'The program will also introduce participants to innovative AI-based teaching approaches -from smart classroom practices and automated assignments to AI-generated project ideas- along with the ethical principles guiding the use of AI in education.' Initiative Objectives Enhance researchers' skills in using AI tools and applications in all areas of Academic Research, in an objective and academically sound manner. Integrate AI tools and technologies into the educational process within universities and educational institutions at all levels: undergraduate and postgraduate. Promote a new understanding of how to present ideas for studies, master's and doctoral thesis, and international publications when addressing AI tools and technologies. Empower Arab researchers and academics to use AI tools and technologies in scientific research, in accordance with international best practices. Empower Arab researchers and academics to integrate AI tools and technologies into scientific research using global best practices. Enhance the integration of AI in higher education by developing innovative teaching methods based on intelligent data analysis and interactive learning. Build a specialized Arab knowledge community focused on AI applications in education and research through knowledge exchange, workshops, and training programs. Provide Arabic-language resources and guidelines to support the effective incorporation of AI into university curricula, while considering local academic and cultural contexts. Promote Arab scientific research in the field of AI in education, and encourage the publication of academic work at both regional and international levels. Foster ethical and critical thinking in the use of AI in educational settings by raising awareness of the risks and challenges of emerging technologies. Develop strategic partnerships between universities, research centers, and institutions involved in technology and education to support digital academic transformation across the Arab world. Expected Outcomes Master the use of AI tools and applications in academic research. Utilize AI in data analysis to derive faster and more accurate conclusions. Embed AI into smart curricula and classroom strategies, including automated assignment creation, grading, and project development. Understand and apply the ethical principles of artificial intelligence. Acquire technical and cognitive skills necessary for AI application in academia. Systematically and effectively integrate AI into teaching methods and curricula. Design and develop AI-based academic research in line with global academic standards. Optimize AI usage in university education to improve quality and digital transformation. Prepare a new generation of Arab researchers capable of leading innovation in educational and technological institutions. Training Methodology and Tools 10 AI tools for research writing and scientific publishing 5 AI applications for classroom management 5 AI tools for managing student projects, assignments, and assessments Program Duration and Certification The training program is conducted three times annually Each round includes 150 participants The program spans 4 intensive days, totaling 15 practical training hours Participants who submit a final research or educational project will receive an accredited certificate from: Artificial Intelligence Journalism Foundation (AIJRF) University of Dubai Other participating universities About AIJRF Founded in 2018 in Dubai, United Arab Emirates, the Artificial Intelligence Journalism for Research and Forecasting (AIJRF) is the world's first global research organization dedicated to the study of media, artificial intelligence, content creation, media of Metaverse, the Fourth & Fifth Industrial Revolution, and humanities. It was established by a group of professors and academic researchers specializing in these fields. AIJRF leads over 20 international initiatives in artificial intelligence, including the Artificial Intelligence Journalism World Forum (AIJWF), the Global Artificial Intelligence Journalism Index (GAIJI), the Arab Artificial Intelligence Index in Universities (AIU), the AI Skills Camp for Students, AI Skills Challenge for University undergraduates and the Arab AI Researchers (AAIR) initiative. AIJRF offers more than 120 training programs, including a professional diploma in: content creation and artificial intelligence, AI technologies and smart government services, AI and media industries, media of Metaverse, and AI in education, among more than 20 specialized training tracks. These programs aim to integrate AI tools and solutions into key professional and educational sectors. In 2021, AIJRF has published the world's first ethical and professional guide for AI-powered content creation, titled: The Artificial Intelligence Journalism and Professional Code of Ethics. In 2024, it released the second edition under the title: The Artificial Intelligence Journalism Professional Ethics and Codes of Conduct (AIJEC).