
Apple researchers show how popular AI models ‘collapse' at complex problems
A new research paper by a group of people at Apple has said that artificial intelligence (AI) 'reasoning' is not all that it is cracked up to be. Through an analysis of some of the most popular large reasoning models in the market, the paper showed that their accuracy faces a 'complete collapse' beyond a certain complexity threshold.
The researchers put to the test models like OpenAI o3-mini (medium and high configurations), DeepSeek-R1, DeepSeek-R1-Qwen-32B, and Claude-3.7- Sonnet (thinking). Their findings showed that the AI industry may be grossly overstating these models' capabilities. They also benchmarked these large reasoning models (LRMs) with large language models (LLMs) with no reasoning capabilities, and found that in some cases, the latter outperformed the former.
'In simpler problems, reasoning models often identify correct solutions early but inefficiently continue exploring incorrect alternatives — an 'overthinking' phenomenon. At moderate complexity, correct solutions emerge only after extensive exploration of incorrect paths. Beyond a certain complexity threshold, models completely fail to find correct solutions,' the paper said, adding that this 'indicates LRMs possess limited self-correction capabilities that, while valuable, reveal fundamental inefficiencies and clear scaling limitations'.
For semantics, LLMs are AI models trained on vast text data to generate human-like language, especially in tasks such as translation and content creation. LRMs prioritise logical reasoning and problem-solving, focusing on tasks requiring analysis, like math or coding. LLMs emphasise language fluency, while LRMs focus on structured reasoning.
To be sure, the paper's findings are a dampener on the promise of large reasoning models, which many have touted as a frontier breakthrough to understand and assist humans in solving complex problems, in sectors such as health and science.
Apple researchers evaluated reasoning capabilities of LRMs through four controllable puzzle environments, which allowed them fine-grained control over complexity and rigorous evaluation of reasoning:
Tower of Hanoi: It involves moving n disks between three pegs following specific rules, with complexity determined by the number of disks.
Checker Jumping: This requires swapping red and blue checkers on a one-dimensional board, with complexity scaled by the number of checkers.
River Crossing: This is a constraint satisfaction puzzle where and actors and n agents must cross a river, controlled by the number of actor/agent pairs and boat capacity.
Blocks World: Focuses on rearranging blocks into a target configuration, with complexity managed by the number of blocks.
'Most of our experiments are conducted on reasoning models and their non-thinking counterparts, such as Claude 3.7 Sonnet (thinking/non-thinking) and DeepSeek-R1/V3. We chose these models because they allow access to the thinking tokens, unlike models such as OpenAI's o-series. For experiments focused solely on final accuracy, we also report results on the o-series models,' the researchers said.
The researchers found that as problem complexity increased, the accuracy of reasoning models progressively declined. Eventually, their performance reached a complete collapse (zero accuracy) beyond a specific, model-dependent complexity threshold.
Initially, reasoning models increased their thinking tokens proportionally with problem complexity. This indicates that they exerted more reasoning effort for more difficult problems. However, upon approaching a critical threshold (which closely corresponded to their accuracy collapse point), these models counter-intuitively began to reduce their reasoning effort (measured by inference-time tokens), despite the increasing problem difficulty.
Their work also found that in cases where problem complexity is low, non-thinking models (LLMs) were capable to obtain performance comparable to, or even better than thinking models with more token-efficient inference. With medium complexity, the advantage of reasoning models capable of generating long chain-of-thought began to manifest, and the performance gap between LLMs and LRMs increased. But, where problem complexity is higher, the performance of both models collapsed to zero. 'Results show that while thinking models delay this collapse, they also ultimately encounter the same fundamental limitations as their non-thinking counterparts,' the paper said.
It is worth noting though that the researchers have acknowledged their work could have limitations: 'While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.'
Soumyarendra Barik is Special Correspondent with The Indian Express and reports on the intersection of technology, policy and society. With over five years of newsroom experience, he has reported on issues of gig workers' rights, privacy, India's prevalent digital divide and a range of other policy interventions that impact big tech companies. He once also tailed a food delivery worker for over 12 hours to quantify the amount of money they make, and the pain they go through while doing so. In his free time, he likes to nerd about watches, Formula 1 and football. ... Read More
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


India Today
5 hours ago
- India Today
How to use ChatGPT to create Images directly on WhatsApp
How to use ChatGPT to create Images directly on WhatsApp By Divya Bhati You can now generate AI images directly in your WhatsApp chat with ChatGPT. ChatGPT now creates images on WhatsApp Add 1-800-ChatGPT (+1-800-242-8478) to your contacts; this is the verified number by OpenAI. Save the official ChatGPT number Once the number is saved. Open WhatsApp, find the saved number, and send a simple 'Hi' to begin chatting. Start the conversation ChatGPT will ask you to verify your OpenAI account via a secure login—this connects your access to the image tools. Link your OpenAI account Once linked, you can type a prompt like 'a dragon flying over a neon city' and ChatGPT will create it using AI. Image prompt The images are generated through OpenAI's DALL·E model, known for high-quality, creative visuals. Powered by DALL·E You can also ask the chatbot to refine or modify generated images—ask it to add, remove, or tweak parts of the image. Edit Images with prompts No beta invite or special access needed—if you have an OpenAI account, you can use it now on WhatsApp. Available for everyone
&w=3840&q=100)

Business Standard
7 hours ago
- Business Standard
RBI, banks to launch DPIP platform to combat rising digital payment frauds
In a bid to rein in the increasing incidence of digital payment frauds, major public and private sector banks have been roped to develop Digital Payment Intelligence Platform (DPIP) as a Digital Public Infrastructure (DPI) under the supervision and guidance of the RBI. The proposed platform seeks to bolster fraud risk management by facilitating real-time intelligence sharing and gathering, thereby preventing fraudulent digital transactions, sources said. According to sources, the institutional structure of the proposed entity would be created with the help of both public sector and private sector lenders as fraud is a common monster. Earlier this month, a high-level meeting in this regard was convened to finalise the structure of the platform where senior bank officials, RBI officials and other stakeholders were present. Since the issue is one of the top agenda for both the government and the Reserve Bank of India (RBI), sources said the platform should become operational in the next few months. Once operational, DPIP will collect and analyse data from various sources to identify potential threats and prevent fraudulent activities. By enabling real-time data sharing, the platform will help prevent scams and ensure secure transactions. Reserve Bank Innovation Hub (RBIH) has been assigned for building a prototype of DPIP in consultation with 5-10 banks. It is going to leverage advanced technologies to curb payment-related frauds. RBI, in June last year, formed a committee, chaired by A P Hota, former MD & CEO of NPCI, to examine various aspects of establishing this digital public infrastructure. According to the latest annual report of the RBI, there has been a significant surge in bank frauds, with the amount involved rising nearly three times to Rs 36,014 crore in FY25, compared to Rs 12,230 crore in the previous year. Of this, as much as Rs 25,667 crore worth of frauds were reported by public sector banks as against Rs 9,254 crore a year ago. Frauds have occurred predominantly in the category of digital payments (card/internet) in terms of the number and primarily in the loan portfolio (advances) in terms of value, it said. While card/internet frauds contributed maximum to the number of frauds reported by private sector banks, frauds in public sector banks were mainly in advances, it said.


Indian Express
8 hours ago
- Indian Express
From the Opinions Editor: India needs a well thought out trade strategy, but first it needs a China strategy
Dear Express Reader Over the past 11 years, the Narendra Modi government has taken several steps to shore up the economic momentum, and put the country on a higher growth trajectory. But, despite its efforts to ensure macroeconomic stability, revive private sector investments and boost household consumption, growth has been less than spectacular. Between 2014-15 and 2024-25, the economy grew at an average of just 6.2 per cent. Now, in its third term, whether pushed by Donald Trump's tariff war or the imperatives of growth, the government is making a determined effort to sew up trade agreements, hoping they will help embed the country into global supply chains, catalyse exports, and push up growth. A trade deal has been struck with the UK, and talks are proceeding with the US and the EU, with many of the issues that have previously held back these agreements being either resolved or sidestepped. These agreements will ensure greater market access and bring down tariffs, improving competitiveness of exports. But the question is: Will these trade deals be enough? Can they alone facilitate India's deep integration with global supply chains? Can the country emerge as a major production hub without integrating more closely with the supply chains that run through South and East Asia which form a vital part of global production systems? The case of Apple is instructive. The dramatic scaling up of the Apple ecosystem in the country — the company has recently said that iPhones sold in the US market will be mostly sourced from India — is a remarkable development. It is a consequence of both the government's production linked incentive scheme and the firm wanting to diversify its production bases away from China. Now, Apple provides a supplier list — a list that represents 98 per cent of the company's direct spend for materials, manufacturing and assembly of its products worldwide. This would include suppliers not only those involved in the production of the iPhone but also in other Apple products. As per this list, in 2023, 156 of the company's suppliers had manufacturing locations in China, 42 suppliers were located in Japan, 35 in Vietnam and 33 in South Korea, and 14 in India. Two years later the numbers would have changed slightly — as per a recent report there are now more than 20 component suppliers in India — but, they would still point towards the centrality of South and East Asia, and China in particular, to the global production system — a fact that cannot be ignored. If India wants to be a part of the production chain of other Apple products and grab a greater share of the value addition in the production process, it would need the smooth flow of components/materials into the country and more component manufacturers to be located here. And therein lies India's conundrum. What is India's China strategy? Should the country also be a part of RCEP (Regional Comprehensive Economic Partnership) and CPTPP (Comprehensive and Progressive Agreement for Trans-Pacific Partnership)? In 2019, India chose not to be part of RCEP — the trade agreement that spans China, Japan, South Korea, Australia, New Zealand and the 10 ASEAN member states (Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam). The decision to not join was in large part attributed to concerns over China. But the trade relationship with China has only deepened since. And that is the reality, contrary to the desire of reducing the dependence on China. In 2018-19, before India withdrew from RCEP, its trade deficit with China stood at $53.5 billion. By 2024-25, it had surged to $99.2 billion, without RCEP. India, though, is not alone. Even as the US has tried to reduce its reliance on China, its deficit with the country, though it has declined in recent years, stood at a staggering $295 billion in 2024. And this does not account for rerouting of exports through other countries. But, it's not just about companies like Apple. The issue around rare earth minerals — used in a range of sectors such as smartphones, TVs, EV cars, solar panels and jet engines — underlines China's centrality to the global production system. This reality cannot be wished away. China accounts for 90 per cent of global processing of rare earths. With the country placing restrictions on its exports, EV manufacturers in India have reportedly sought the government's intervention in the matter. If these supplies continue to be restricted, India's EV push, and thus its efforts in shifting towards a cleaner vehicle fleet, risk being affected. And that won't be the only sector that is likely to be impacted. There are some reports which suggest that the government has raised the issue of export curbs on rare earth minerals and magnets with China. But it's not just India. Even the US has been affected. In fact, one of the key aspects of the US-China agreement that was announced by Donald Trump is the upfront export of full magnets, and any necessary rare earths by China. It is difficult to see companies move their production to India on the scale that is needed for the country to emerge as a manufacturing powerhouse unless they can be sure of stable trade relations, of supply chains working smoothly, of the seamless movement of components/personnel from other jurisdictions. India needs a well thought out trade strategy. The lack of clarity partly explains the sluggish pace of investments in the country by domestic as well as foreign firms — both of whom seem to be more inclined to invest in other jurisdictions presumably because the risk-return matrix is not as favourable in India. A clear strategy should give these firms the confidence needed to invest in the country. Take care, Ishan