Latest news with #reasoningmodels

The Unstoppable Growth Of Generative AI (AI Outlook Part 1)

Forbes

a day ago

Business
Forbes

The Unstoppable Growth Of Generative AI (AI Outlook Part 1)

The digital creation of an image using Stable Diffusion. When Tirias Research first forecasted global generative AI demand in 2023, we predicted token output would reach an aggressive 20 trillion tokens, such as a letter, word, or punctuation, by the end of 2024. That estimate was soon overwhelmed, as actual usage surged to a staggering 667 trillion tokens. The latest forecast now expects demand to grow 115x by 2030. Disclosure: My company, Tirias Research, has consulted for IBM, Nvidia, and other companies mentioned in this article. Growth drivers The growth in generative AI usage was more explosive than even the most aggressive forecasts could anticipate. Then, a new demand surge began in September 2024 with the release of ChatGPT-o1, the first widely deployed "reasoning" model. Unlike previous generations, it didn't just answer questions, it reasoned through them, generating more thoughtful, logical, and nuanced responses. Reasoning required far more behind-the-scenes "reasoning tokens" per session, resulting in a drastic increase in token generation. In addition, user engagement skyrocketed. By the end of 2024, the time spent generating content via generative and reasoning AI models had grown by more than 22x compared to the previous year. Overview of conditional computing and model sparsity There was also an unprecedented and accelerating rate of innovation. From the introduction of transformers in 2017 to ChatGPT-1 in 2022 and reasoning models in 2024, the innovation timeline continues to accelerate. Advanced model architectures, such as mixture-of-experts (MoE), enable more efficient reasoning while keeping active parameter use low. Open-source models, such as Meta's Llama series, challenge closed-source dominance by offering lighter, faster alternatives that run locally on laptops and smartphones. And optimizations operational efficiencies, such as sparse attention and conditional computing, are resulting in more efficient models like DeepSeek R1 (introduced in 2025), which originally used only 37 billion active parameters per token compared to Llama's 405 billion or over 1 trillion in some closed models. Token demand by the numbers Tirias Research forecasts continued growth in the number of users, visit frequency, time spent, and AI-generated content. Additionally, with agentic APIs rolling out in 2025, AI agents will start autonomously chaining AI models together, forming thoughts, executing tasks, and collaborating with other services. Human prompting will no longer be the sole driver of AI activity once autonomous agents begin to generate usage on their own. As a result, the annual rate of token generation is expected to skyrocket from 677 trillion in 2024 to 2,092 trillion by the end of 2025 and 77,000 trillion (77 quadrillion) by the end of 2030. Generative AI forecast 2024-2030 Simon Solotko, Senior Analyst at Tirias Research, explains: "The AI ecosystem is under unprecedented pressure. Multimodal capability, user demand, and agentic and multimedia workflows are advancing so quickly that even efficiency gains in compute hardware and software won't be enough to offset the surge in demand." A 2028 snapshot of the forecast demonstrates that the use of AI assistants and agents is likely to be concentrated among a small number of providers. However, on the infrastructure side, AI models accessed via APIs are anticipated to drive a wide range of business and consumer applications by enabling AI capabilities for customer-facing service providers. 2028 forecast estimates of service providers token share and the token production infrastructure The industry may consolidate into a natural monopoly similar to Google's dominance in Internet searches. Being the first to market with ChatGPT and wide brand recognition, OpenAI currently dominates the AI market for AI models and token generation. Whether OpenAI retains its lead remains uncertain. Future Trends Larger models will continue to grow in size and complexity, outpacing hardware improvements. The largest models already exceed the memory of any single accelerator, requiring clusters of GPUs and entire racks to process tasks. However, innovations in distillation and efficiency will aid in scaling down to smaller, more targeted models. The introduction of DeepSeek represented a significant leap in model efficiency, resetting the performance baseline. AI Agents will become pervasive. Industry leaders, such as Nvidia's Jensen Huang and IBM's Arvind Krishna, foresee every employee working with multiple AI agents. Some agents will live in machines, others in virtual spaces, and still others in physical robots. AI agents will also begin to collaborate. AI competition will increase. As models mature, differentiation is no longer just about size or speed; it encompasses a broader range of factors. Services are integrating AI models into workflows, APIs, and interactive applications, pushing toward end-to-end task automation and entertainment. At the same time, cost pressures are forcing every player to adopt cutting-edge techniques for faster training, improved inference, and lower computational cost. This competition goes beyond the enterprise; AI is now shaping geopolitics as countries race to innovate. In addition, AI will continue to evolve. By the end of the decade, AI-generated images and video could overtake text as the primary source of AI-generated content and driver of future compute demand. Much of this content may be created on edge devices. Media content generation, combined with autonomous AI agents and machines, will usher in the next wave of AI. Final Thoughts Unlike past technology adoption curves, generative AI doesn't appear to be slowing. Rapid improvements in both capability and efficiency are accelerating demand. As agentic AI expands beyond human usage, the number of "users" of generative AI will multiply exponentially. I will discuss the rising AI demand for images, video, autonomous agents, and autonomous machines, as well as the global infrastructure requirement and total cost of operation (TCO) of generative AI in future articles.

‘Complete collapse': Bombshell report into AI accuracy indicates your job is probably safe

News.com.au

10-06-2025

Science
News.com.au

‘Complete collapse': Bombshell report into AI accuracy indicates your job is probably safe

The latest form of cutting-edge artificial intelligence technology suffers 'fundamental limitations' that result in a 'complete accuracy collapse', a bombshell report from Apple has revealed. Researchers from the tech giant have published a paper with their findings, which cast doubt on the true potential of AI as billions of dollars are poured into developing and rolling out new systems. The team put large reasoning models, an advanced version of AI, used in platforms like DeepSeek and Claude, through a series of puzzle challenges ranging from simple to complex. They also tested large language models, which platforms like ChatGPT are built on. Large language model AI systems fared better than large reasoning models with fairly standard tasks, but both fell flat when confronting more complex challenges, the paper revealed. Researchers also found that large reasoning models began 'reducing their reasoning effort' as they struggled to perform, which was 'particularly concerning'. 'Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty,' the paper read. The advancement of AI, based on current approaches, might've reached its limit for now, the findings suggested. Niusha Shafiabady, an associate professor of computational intelligence at Australian Catholic University and director of the Women in AI for Social Good lab, said 'expecting AI to be a magic wand' is a mistake. 'I have been talking about the realistic expectations about the AI models since 2024,' Dr Shafiabady said. 'When AI models face countless interactions with the world, it is not possible to investigate and control every single problem that could happen. That is why things could get out of hand or out of control.' Gary Marcus, a leading voice on AI and six-time author, delivered a savage analysis of the Apple paper on his popular Substack, describing it as 'pretty devastating'. 'Anybody who thinks [large language models] are a direct route to the [artificial generative intelligence] that could fundamentally transform society for the good is kidding themselves,' Dr Marcus wrote. Dr Marcus then took to X to declare that the hype around AI has become 'a giant game of bait and switch'. 'The bait: we are going to make an AI that can solve any problem an expert human could solve. It's gonna transform the whole world,' Dr Marcus wrote. 'The switch: what we have actually made is fun and kind of amazing in its own way but rarely reliable and often makes mistakes – but ordinary people makes mistakes too.' In the wake of the paper's release, Dr Marcus has re-shared passionate defences of AI shared to X by evangelists defending the accuracy flaws that have been exposed. 'Imagine if calculator designers made a calculator that worked 80 per cent correctly and said 'naah, it's fine, people make mistakes too',' Mr Marcus quipped. Questions about the quality of large language and large reasoning models aren't new. For example, when released in April, OpenAI described its new o3 and o4-mini models as its 'smartest and most capable' yet, trained to 'think for longer before responding'. 'The combined power of state-of-the-art reasoning with full tool access translates into significantly stronger performance across academic benchmarks and real-world tasks, setting a new standard in both intelligence and usefulness,' the company's announcement read. But testing by prestigious American university MIT revealed the o3 model was incorrect 51 per cent of the time, while o4-mini performed even worse with an error rate of 79 per cent. Truth and accuracy undermined Apple recently suspended its news alert feature on iPhones, powered by AI, after users reported significant accuracy errors. Among the jaw-dropping mistakes was an alert that tennis icon Rafael Nadal had come out as gay, alleged United Healthcare CEO shooter Luigi Mangione had died by suicide in prison, and a winner had been crowned at the World Darts Championship hours before competition began. Research conducted by the BBC found a litany of errors across other AI assistants providing information about news events, including Google's Gemini, OpenAI's ChatGPT and Microsoft's CoPilot. It found 51 per cent of all AI-generated answers to queries about the news had 'significant issues' of some form. When looking at how its own news coverage was being manipulated, the BBC found 19 per cent of answers citing its content were factually incorrect. And in 13 per cent of cases, quotes said to be contained within BBC stories had either been altered or entirely fabricated. Meanwhile, a newspaper in Chicago was left red-faced recently after it published a summer reading list featuring multiple books that don't exist, thanks to the story copy being produced by AI. And last year, hundreds of people who lined the streets of Dublin were disappointed when it turned out the Halloween parade advertised on an events website had been invented. Google was among the first of the tech giants to roll out AI, summarising search results relying on a large language model – with some hilarious and possibly dangerous results. Among them were suggestions to add glue to pizza, eat a rock a day to maintain health, take a bath with a toaster to cope with stress, drink two litres of urine to help pass kidney stones and chew tobacco to reduce the risk of cancer. Jobs might be safe – for now Ongoing issues with accuracy might have some companies thinking twice about going all-in on AI when it comes to substituting their workforces. So too might some recent examples of the pitfalls of people being replaced with computers. Buy now, pay later platform Klarna shed more than 1000 people from its global workforce as part of a dramatic shift to AI resourcing, sparked by its partnership with OpenAI, forged in 2023. But last month, the Swedish firm conceded its strong reliance on AI customer service chatbots – which saw its employee count almost halved in two years – had created quality issues and led to a slump in customer satisfaction. Realising most customers prefer interacting with a human, Klarna has begun hiring back actual workers. Software company Anysphere faced a customer backlash in April when its AI-powered support chatbot went rogue, kicking users out of the code-editing platform Cursor and delivering incorrect information. It then seemingly 'created' a new user policy out of thin air to justify the logouts – that the platform couldn't be used across multiple computers. Cursor saw a flood of customer cancellations as a result. AI adviser and former Google chief decision scientist Casse Kozyrkov took to LinkedIn to share her thoughts on the saga, dubbing it a 'viral hot mess'. 'It failed to tell users that its customer support 'person' Sam is actually a hallucinating bot,' Ms Kozyrkov wrote. 'It's only going to get worse with AI agents.' Many companies pushing AI insist the technology is improving swiftly, but a host of experts aren't convinced its hype matches its ability. Earlier this year, the Association for the Advancement of Artificial Intelligence surveyed two dozen AI specialists and some 400 of the group's members and found a surprising level of pessimism about the potential of the technology. Sixty per cent of those probed don't believe problems with factuality and trustworthiness 'would soon be solved', it found. Issues of accuracy and reliability are important, not just for growing public trust in AI, but for preventing unintended consequences in the future, AAAI president Francesca Rossi wrote in a report about the survey. 'We all need to work together to advance AI in a responsible way, to make sure that technological progress supports the progress of humanity and is aligned to human values,' Ms Rossi said. Projects stalled or abandoned Embarrassing and potentially costly issues like these are contributing to a backtrack, with analysis by S&P Global Market Intelligence showing the share of American and European companies abandoning their AI initiatives rising to 42 per cent this year from 17 per cent in 2024. And a study released last month by consulting firm Roland Berger found a mammoth investment in AI technology wasn't translating to useful outcomes for many businesses. Spending on AI by corporates in Europe hit an estimated US$14 billion (AU$21.4 billion) in 2024, but just 27 per cent were able to fully integrate the technology into their operations or workflows, the research revealed. 'Asked about the key challenges involved in implementing AI projects, 28 per cent of respondents cited issues with data, 25 per cent referenced the complexity of integrating AI use cases, and 15 per cent mentioned the difficulty of finding enough AI and data experts,' the study found. Those findings were mirrored in an IBM survey, which found one-in-four AI projects delivered the returns they promised. Dr Shafiabady said there are a few reasons for problems facing AI, like those identified in Apple's research. 'When dealing with highly complex problems, these types of complex AI models can't give an accurate solution. One of the reasons why is the innate nature of algorithms,' Dr Shafiabady said. 'Models are built on mathematical computational iterative algorithms that are coded into computers to be processed. When tasks get very complicated, these algorithms won't necessarily follow the logical reasoning and will lose track of them. 'Sometimes when the problem gets harder, all the computing power and time in the world won't enhance AI model's performance. Sometimes when it hits very difficult tasks, it fails because it has learnt the example rather than the hidden patterns in the data. 'And sometimes the problem gets complicated, and a lot of computation resource and time is wasted over exploring the wrong solutions and there is not enough 'energy' left to reach the right solution.'

How to Cut AI Model Costs by 75% with Gemini AI's Implicit Caching

Geeky Gadgets

13-05-2025

Business
Geeky Gadgets

How to Cut AI Model Costs by 75% with Gemini AI's Implicit Caching

What if you could slash your AI model costs by a staggering 75% without sacrificing performance or efficiency? For many businesses and developers, the rising expense of running advanced AI models has become a significant hurdle, especially when handling repetitive tasks or processing large-scale data. But with Gemini AI's latest innovation—implicit caching—this challenge is being turned on its head. Imagine a system that automatically identifies redundant inputs and applies discounts without requiring you to lift a finger. It's not just a cost-cutting measure; it's a fantastic option for anyone looking to streamline workflows and maximize the value of their AI investments. In this perspective, Sam Witteveen explores how implicit caching works, why it's exclusive to Gemini AI's 2.5 reasoning models, and how it can transform the way you approach AI-driven projects. From understanding token thresholds to using reusable content in your prompts, you'll uncover practical strategies to optimize your workflows and reduce expenses. Whether you're managing repetitive queries, analyzing extensive datasets, or seeking long-term solutions for static data, this feature offers a seamless path to efficiency. The potential to save big while maintaining high performance isn't just a possibility—it's a reality waiting to be unlocked. Gemini AI Cost Savings What Is Implicit Caching? Implicit caching is an advanced functionality exclusive to Gemini AI's 2.5 reasoning models, including the Flash and Pro variants. It identifies repeated prefixes in your prompts and applies discounts automatically, streamlining workflows without requiring user intervention. This makes it particularly effective for tasks involving repetitive queries or foundational data. For example, if your project frequently queries the same base information, implicit caching detects this redundancy and applies a 75% discount on token costs. However, to activate this feature, your prompts must meet specific token thresholds: Flash models require a minimum of 1,024 tokens. Pro models require at least 2,048 tokens. These thresholds ensure that the system can efficiently process and cache repeated content, making it especially beneficial for high-volume tasks where cost savings are critical. When to Use Explicit Caching While implicit caching is ideal for dynamic and repetitive queries, explicit caching remains a valuable tool for projects that require long-term storage of static data. Unlike implicit caching, explicit caching involves manual setup, allowing users to store and retrieve predefined datasets as needed. For instance, if you're working on a project that involves analyzing a fixed set of documents over an extended period, explicit caching ensures consistent access to this data without incurring additional token costs. However, the manual configuration process may require more effort compared to the automated nature of implicit caching. Explicit caching is particularly useful for projects where data consistency and long-term accessibility are priorities. Cut Your Gemini AI Model Costs By Up To 75 % Watch this video on YouTube. Browse through more resources below from our in-depth content covering more areas on Gemini AI. Optimizing Context Windows for Efficiency Efficient use of context windows is another key strategy for reducing costs with Gemini AI. By placing reusable content at the beginning of your prompts, you enable the system to recognize and cache it effectively. This approach not only minimizes token usage but also enhances the overall efficiency of your queries. Gemini AI's 2.5 models are specifically optimized to handle large context windows, making them well-suited for tasks involving substantial inputs such as documents or videos. However, it's important to note that while text and video inputs are supported, YouTube videos are currently excluded from caching capabilities. Testing your specific use case is essential to ensure compatibility and to fully use the system's capabilities. Strategies for Cost Reduction To maximize savings and optimize workflows with Gemini AI, consider implementing the following strategies: Design prompts with reusable content at the beginning to take full advantage of implicit caching. Test caching functionality to ensure it aligns with the specific requirements of your tasks. Use explicit caching for projects that require consistent access to static datasets over time. Ensure your prompts meet the minimum token thresholds for Flash and Pro models to activate caching features effectively. By adopting these practices, you can significantly reduce API costs while maintaining high levels of performance and efficiency in your AI-driven projects. Understanding Limitations and Practical Considerations While implicit caching offers substantial benefits, it is important to understand its limitations. This feature is exclusive to Gemini AI's 2.5 reasoning models and is not available for earlier versions. Additionally, YouTube video caching is not supported, which may limit its applicability for certain multimedia projects. To address these limitations, it is crucial to evaluate your specific project requirements and test the caching functionality before fully integrating it into your workflows. Refining your prompt design and using the system's ability to handle large-scale inputs can help you overcome these challenges and maximize the potential of implicit caching. Maximizing the Value of Gemini AI Gemini AI's implicit caching feature for its 2.5 reasoning models represents a significant step forward in cost optimization. By automatically applying discounts for repeated prompt prefixes, this functionality simplifies token management and delivers substantial savings. Whether you're processing repetitive queries, analyzing large documents, or working with video inputs, these updates provide a practical and efficient way to reduce expenses. With strategic implementation and careful planning, you can cut your AI model costs by up to 75%, making Gemini AI a more accessible and cost-effective tool for a wide range of projects. Media Credit: Sam Witteveen Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Latest news with #reasoningmodels

The Unstoppable Growth Of Generative AI (AI Outlook Part 1)

‘Complete collapse': Bombshell report into AI accuracy indicates your job is probably safe

How to Cut AI Model Costs by 75% with Gemini AI's Implicit Caching

Get Started Now: Download the App