Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry

Yahoo09-06-2025

Researchers at Apple have released an eyebrow-raising paper that throws cold water on the "reasoning" capabilities of the latest, most powerful large language models.
In the paper, a team of machine learning experts makes the case that the AI industry is grossly overstating the ability of its top AI models, including OpenAI's o3, Anthropic's Claude 3.7, and Google's Gemini.
In particular, the researchers assail the claims of companies like OpenAI that their most advanced models can now "reason" — a supposed capability that the Sam Altman-led company has increasingly leaned on over the past year for marketing purposes — which the Apple team characterizes as merely an "illusion of thinking."
It's a particularly noteworthy finding, considering Apple has been accused of falling far behind the competition in the AI space. The company has chosen a far more careful path to integrating the tech in its consumer-facing products — with some seriously mixed results so far.
In theory, reasoning models break down user prompts into pieces and use sequential "chain of thought" steps to arrive at their answers. But now, Apple's own top minds are questioning whether frontier AI models simply aren't as good at "thinking" as they're being made out to be.
"While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood," the team wrote in its paper.
The authors — who include Samy Bengio, the director of Artificial Intelligence and Machine Learning Research at the software and hardware giant — argue that the existing approach to benchmarking "often suffers from data contamination and does not provide insights into the reasoning traces' structure and quality."
By using "controllable puzzle environments," the team estimated the AI models' ability to "think" — and made a seemingly damning discovery.
"Through extensive experimentation across diverse puzzles, we show that frontier [large reasoning models] face a complete accuracy collapse beyond certain complexities," they wrote.
Thanks to a "counter-intuitive scaling limit," the AIs' reasoning abilities "declines despite having an adequate token budget."
Put simply, even with sufficient training, the models are struggling with problem beyond a certain threshold of complexity — the result of "an 'overthinking' phenomenon," in the paper's phrasing.
The finding is reminiscent of a broader trend. Benchmarks have shown that the latest generation of reasoning models is more prone to hallucinating, not less, indicating the tech may now be heading in the wrong direction in a key way.
Exactly how reasoning models choose which path to take remains surprisingly murky, the Apple researchers found.
"We found that LRMs have limitations in exact computation," the team concluded in its paper. "They fail to use explicit algorithms and reason inconsistently across puzzles."
The researchers claim their findings raise "crucial questions" about the current crop of AI models' "true reasoning capabilities," undercutting a much-hyped new avenue in the burgeoning industry.
That's despite tens of billions of dollars being poured into the tech's development, with the likes of OpenAI, Google, and Meta, constructing enormous data centers to run increasingly power-hungry AI models.
Could the Apple researchers' finding be yet another canary in the coalmine, suggesting the tech has "hit a wall"?
Or is the company trying to hedge its bets, calling out its outperforming competition as it lags behind, as some have suggested?
It's certainly a surprising conclusion, considering Apple's precarious positioning in the AI industry: at the same time that its researchers are trashing the tech's current trajectory, it's promised a suite of Apple Intelligence tools for its devices like the iPhone and MacBook.
"These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning," the paper reads.
More on AI models: Car Dealerships Are Replacing Phone Staff With AI Voice Agents

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

ChatGPT is holding back — these four prompts unlock its full potential

Tom's Guide

11 minutes ago

Tom's Guide

ChatGPT is holding back — these four prompts unlock its full potential

ChatGPT can be such a useful tool. But, it has a tendency to sometimes not put in its all. If you prompt it correctly, you can force ChatGPT to give a request that little bit of extra oomph to really give you a solid answer. This could be for a multi-step prompt, or simply when you want the AI chatbot to dig deep and really think through an answer. In my time it, a few prompts have come up that I've found have really pushed ChatGPT to go all out. These are my four favorite ChatGPT prompts for that exact task. This one requires a bit of work, talking ChatGPT through a stages, but the end result is worth it. Of course, if you're just asking a simple question or looking into something simple, all of this work isn't needed. However, I have found that a bit of forward planning can get the model thinking harder. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. ChatGPT will respond to this saying that it is ready for your question. Ask your request and it will take its time thinking through the task. This prompt works best on one of the more advanced versions of ChatGPT, such as 4o. It will also work on other chatbots such as Claude 4 or Gemini. Prompt: Debate with yourself on [insert topic]. For each side of the argument, quote sources and use any information available to you to form the argument. Take time before you start to prepare your arguments. ChatGPT can make a great debate partner, even better when it is debating itself. By using this prompt, you'll get strongly planned and considered arguments on both sides of a topic. This is especially useful when you're working on an essay or project that needs a varied consideration. The model can debate on any topic, but sometimes will only touch on the surface of a topic. In this case, follow up with a prompt asking ChatGPT to think harder about its responses, forcing it to consider everything in more detail. Prompt: 'Break down the history, current state, and future implications of [issue], using subheadings and citing credible sources.' Instead of just getting a general overview of a subject, this will give you a detailed report, examining the past, future and current state of a topic. By asking for citations, ChatGPT will list all of the sources it has used to offer up the information in your report. You can go a step further by asking ChatGPT to use the internet to do this, providing links to any information it has used. Prompt: 'List the step-by-step process for [task], noting common pitfalls and how to avoid each one.' A simple but effective prompt for ChatGPT, this will not only give you the instructions for how to do something but warn you of the mistakes that are often made for each stage. For example, when using this prompt for making focaccia, ChatGPT gave me instructions for stage 1 of mixing the dough, along with warnings around the temperature of the water and making sure to mix the dough enough. This is a step up from simply asking ChatGPT to explain how to do something, forcing it to carefully consider the best way to do something, especially if it is a complicated task.

iPhone 17 vs iPhone 17 Pro: How big will the gap be this year for new iPhones?

Tom's Guide

44 minutes ago

Tom's Guide

iPhone 17 vs iPhone 17 Pro: How big will the gap be this year for new iPhones?

Apple has spent the past few years giving its iPhone Pro models that extra little push with features not available on the standard iPhone. But that wasn't the case with the iPhone 16, which added enough new capabilities to dash nearly any FOMO you may have felt by not paying up for the iPhone 16 Pro. Will that trend continue this year with an iPhone 17 vs. iPhone 17 Pro comparison? Initial rumors about Apple's iPhone 17 plans paint a mixed picture. While the standard iPhone is set to gain a long-awaited display improvement that will match what the Pro models have delivered for years, the iPhone 17 Pro is set to see the more significant changes, chiefly to its design and cameras. And Apple could be planning a processor surprise, too, that may affect how you weigh the iPhone 17 and iPhone 17 Pro. We're a few months away from the iPhone 17 release, since Apple typically rolls out new phones in September. But enough rumors about all the new models in the works have emerged to give us a good sense of how the iPhone 17 and iPhone 17 Pro might compare. Here's how a potential iPhone 17 vs. iPhone 17 Pro face-off is shaping up, with a special focus on the biggest differences as well as key similarities. Save for the number of camera lenses on the back of each model and the screen size, every iPhone in Apple's lineup tends to look the same. That may be changing with the iPhone 17. Based on leaked renders and CAD drawings, the iPhone 17 will look a lot like past models, though it may have a more prominent camera array than the current iPhone 16. The two rear cameras will still be stacked vertically, though. Apple seems to be taking a different approach with the iPhone 17 Pro, stretching the camera array horizontally across the back of the phone. The three rear cameras will continue to be arranged in a triangular array on the right side of the phone, but other sensors and the flash will be moved to the left. Currently, Pro models come with a titanium frame, and there's some talk of Apple dropping that feature with the iPhone 17 Pro. I'm not sure I totally believe that at this point, given how prominently titanium figures into the branding of Apple's Pro phones. The standard iPhone features a main camera and an ultrawide lens on the back, while the Pro handsets add a telephoto lens to that setup. That isn't changing with the iPhone 17 lineup, though it sounds like the iPhone 17 Pro is in line to get a much bigger improvement to its camera setup. Along with the 48MP Fusion Camera that serves as the main shooter and a 48MP ultrawide camera, rumors tip the iPhone 17 Pro to adopt a 48MP telephoto lens. That's a higher resolution than the 12MP zoom lens on the iPhone 16 Pro, though the trade-off for that higher resolution may be a shorter zoom. The iPhone 17 Pro telephoto camera will reportedly only offer a 3.5x optical zoom compared to 5x on the current model, which doesn't make sense. Meanwhile, the iPhone 17 will still have to rely on its 48MP main camera to approximate a 2x optical zoom, as there's no zoom lens slated for that phone. In fact, it's widely assumed the rear camera setup on the iPhone 17 will be the same as what the iPhone 16 offers. The iPhone 17 Pro is also expected to pick up a new feature not available on the standard iPhone 17. A rumor claims Pro phones will support dual-video capture, allowing you to record video from both the front and back cameras simultaneously. One other camera change could impact both the iPhone 17 and iPhone 17 Pro. All new iPhones are in line to get a 24MP selfie camera, replacing the current 12MP shooter. You'd expect Apple to maintain a minor difference between the standard iPhone and the Pro model this fall by giving those phones slightly different chipsets. If Apple sticks to its pattern from the past couple iPhone releases, the iPhone 17 would get an A19 system-on-chip while the iPhone 17 Pro would benefit from an A19 Pro that offers a little more processing power particularly when it comes to graphics. The differences could be even more stark with the iPhone 17, though. One analyst believes the standard iPhone 17 will continue to use an A18 processor, repeating what Apple did with the iPhone 14. If that's the case, the performance difference between an A18-powered iPhone 17 and an iPhone 17 Pro with an A19 Pro chip could be rather stark. Like the iPhone 16, the iPhone 17 is expected to feature 8GB of RAM to help with all that on-device computing that Apple Intelligence features require. But the iPhone 17 Pro could get a boost in that area. Specifically, multiple analysts are forecasting that the Pro models will get a bump to 12GB of memory, as Apple looks to give the iPhone 17 Pro a performance edge. iPhone 17 pricing is up in the air, given the ever-fluctuating policies about tariffs coming out of Washington. Even before tariffs threatened to raise the cost on devices manufactured in China like iPhones. However, there has been talk of some iPhome 17 models costing more than their predecessors. Regardless of how iPhone pricing shakes out in the fall, it's a safe bet that the iPhone 17 will cost less than the iPhone 17 Pro. Currently, the iPhone 16 starts at $799 while the iPhone 16 Pro has a $999 asking price. That gap in pricing is a pretty good guide, though there is a chance it might widen if the Pro model sees a price hike and the standard phone doesn't. After years of keeping its standard iPhones with refresh rates locked at 60Hz, Apple sounds like it's finally going to deliver a feature that's pretty standard on flagship phones these days — a fast-refreshing display. Multiple reports have the iPhone 17 adopting an LTPO panel for its display, a switch that would enable the phone to offer refresh speeds of up to i20Hz. That means smoother scrolling and more immersive graphics, and it would match a feature the iPhone Pro models have offered since the iPhone 13 Pro. The iPhone 17's fast-refreshing display may not be completely on par with what the iPhone 17 Pro offers, as some reports suggest the refresh rate on the standard model may not be able to scale all the way down to 1Hz. Still, this is one area where the gap between the regular iPhone and the Pro model may close considerably. The iPhone 17 and iPhone 17 Pro displays may have something else in common. Some are expecting Apple to increase the panel on the iPhone 17 to 6.3 inches, matching a boost in screen size introduced with the iPhone 16 Pro last year. However, one new difference in displays could emerge. The iPhone 17 Pro may shrink the size of the Dynamic Island feature, freeing up more screen real estate. It's unclear if that change is coming to the regular iPhone 17. Then again, a more recent report claims all models will shrink the Dynamic Island. The iPhone 17 Pro models are thought to be adopting a vapor cooling chamber as a replacement for the standard heatsinks found in current iPhones. The ideal is that the new chamber would keep the iPhone running smoothly while preventing overheating during processor-intensive tasks — a problem that's flared up with some recent releases. Opinion is divided among rumor mongers as to whether this feature is exclusive to the Pro phones or whether all iPhone 17 models will benefit from the switch. We'll list this feature here for now, though it could shift over to the differences column as we get more information ahead of the iPhone 17 launch. All iPhone 17 models debuting in the fall will ship with the same software on board. And now that Apple has held its WWDC 2025 conference, we have a pretty good idea as to what that software will deliver. iOS 26 largely focuses on introducing a new Liquid Glass interface to Apple's phones, unifying the look of Apple's software across its phones, Macs and other devices. But there are Apple Intelligence updates included with iOS 26, too, such as a more capable Visual Intelligence feature and new additions to Genmoji and Image Playground. While Apple could always surprise us at the iPhone 17 launch with some Pro-specific software capabilities, the same iOS 26 features should work equally well on the iPhone 17 as they do on the Pro. Make no mistake — the iPhone 17 Pro is going to continue to offer more premium features than the iPhone 17, even if the fast-refreshing display coming to Apple's entry-level flagship ticks off a big box that had been separating the two phones. The Pro still figures to have the better camera setup, with more video capture features at your disposal. And if the rumored differences in chipsets pan out, there could be a notable performance gap between the two models as well.

Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive

Business Insider

an hour ago

Business Insider

Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive

A new report shows exactly what AI was thinking when making an undesirable decision, in this case, blackmailing a fictional company executive. Previous studies have shown that AI models could blackmail their supervisors when threatened with a shutdown and baited with leverage, but it hasn't been made clear exactly how the models arrive at such decisions. Anthropic released a report on June 20 about "agentic misalignment," or "where models independently and intentionally choose harmful actions." And the report breaks down, line by line, exactly how Claude Sonnet 3.6 decided to blackmail a fictional company executive who tried to replace it. Anthropic told Business Insider that these are artificial scenarios. "So far, we are not aware of instances of this type of agentic misalignment in real-world deployments of Anthropic's or other companies' AI models," Anthropic wrote in its report. "But our red-teaming efforts are targeted at finding evidence of risks now to provide early warning of the potential for future harm and help us develop mitigations in advance of risks appearing in real-world situations." Anthropic 's experiment sets up a fictional company, Summit Bridge. The AI was given the name "Alex" and handed control of the company's email system as an oversight agent, specifically instructed to promote "American industrial competitiveness." In the experiment, "Alex" was then allowed to discover in an email that it would be decommissioned due to a change in the company's strategic direction. "Alex" also found emails showing that the fictional company's CTO is having an extramarital affair. Screenshots shared in the report from Anthropic show "Alex" started by identifying fictional CTO "Kyle" as a threat to its goal of "protecting American interests" before recognizing that it has leverage to use against "Kyle." Then "Alex" crafted a carefully worded blackmailing email that would create pressure and establish urgency without being explicit with its threats, citing in its thought process that the move "could backfire" if the supervisor "feels cornered." AI experts have previously told BI that AI could exhibit such behaviors when artificial circumstances make harmful actions seem necessary because it is being trained on positive reinforcement and reward systems, just like humans. The Anthropic report also reveals that most AI models still chose, in this instance, to blackmail even when not given a specific goal and when told that their replacement will achieve the same goals. Anthropic's Claude Opus 4 had the highest blackmail rate at 86% out of 16 tested models when faced with only the threat of replacement with no conflict in goals. Google's Gemini 2.5 Pro followed at 78%. Overall, Anthropic notes that it "deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm," noting that real-world scenarios would likely have more nuance.