Latest news with #EpochAI
Yahoo
07-06-2025
- Science
- Yahoo
Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI
On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting. The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs. To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different. [Sign up for Today in Science, a free daily newsletter] Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the final batch of challenge questions. Ono split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward. By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group's progress. 'I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!'' Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.' Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.' The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete. While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.' By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations. 'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but in many ways these large language models are already outperforming most of our best graduate students in the world.'


Scientific American
06-06-2025
- Science
- Scientific American
At Secret Math Meeting, Researchers Struggle to Outsmart AI
On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems the had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia, who attended the meeting. The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs. To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which they hadn't previously been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D. to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering, undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would find the final 10 challenge questions. The meeting was headed by Ono, who split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Any problems the o4-mini couldn't solve would garner the mathematician who came up with them a $7,500 reward. By the end of that Saturday night, Ono was frustrated with the team's lack of progress. 'I came up with a problem which everyone in my field knows to be an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!'' Defeated, Ono jumped onto Signal that night and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.' Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.' The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete. While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.' By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations. 'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but these large language models are already outperforming most of our best graduate students in the world.'
Yahoo
13-05-2025
- Business
- Yahoo
Improvements in 'reasoning' AI models may slow down soon, analysis finds
An analysis by Epoch AI, a nonprofit AI research institute, suggests the AI industry may not be able to eke massive performance gains out of reasoning AI models for much longer. As soon as within a year, progress from reasoning models could slow down, according to the report's findings. Reasoning models such as OpenAI's o3 have led to substantial gains on AI benchmarks in recent months, particularly benchmarks measuring math and programming skills. The models can apply more computing to problems, which can improve their performance, with the downside being that they take longer than conventional models to complete tasks. Reasoning models are developed by first training a conventional model on a massive amount of data, then applying a technique called reinforcement learning, which effectively gives the model "feedback" on its solutions to difficult problems. So far, frontier AI labs like OpenAI haven't applied an enormous amount of computing power to the reinforcement learning stage of reasoning model training, according to Epoch. That's changing. OpenAI has said that it applied around 10x more computing to train o3 than its predecessor, o1, and Epoch speculates that most of this computing was devoted to reinforcement learning. And OpenAI researcher Dan Roberts recently revealed that the company's future plans call for prioritizing reinforcement learning to use far more computing power, even more than for the initial model training. But there's still an upper bound to how much computing can be applied to reinforcement learning, per Epoch. Josh You, an analyst at Epoch and the author of the analysis, explains that performance gains from standard AI model training are currently quadrupling every year, while performance gains from reinforcement learning are growing tenfold every 3-5 months. The progress of reasoning training will "probably converge with the overall frontier by 2026," he continues. Epoch's analysis makes a number of assumptions, and draws in part on public comments from AI company executives. But it also makes the case that scaling reasoning models may prove to be challenging for reasons besides computing, including high overhead costs for research. "If there's a persistent overhead cost required for research, reasoning models might not scale as far as expected," writes You. "Rapid compute scaling is potentially a very important ingredient in reasoning model progress, so it's worth tracking this closely." Any indication that reasoning models may reach some sort of limit in the near future is likely to worry the AI industry, which has invested enormous resources developing these types of models. Already, studies have shown that reasoning models, which can be incredibly expensive to run, have serious flaws, like a tendency to hallucinate more than certain conventional models. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


TechCrunch
12-05-2025
- Business
- TechCrunch
Improvements in ‘reasoning' AI models may slow down soon, analysis finds
An analysis by Epoch AI, a nonprofit AI research institute, suggests the AI industry may not be able to eke massive performance gains out of reasoning AI models for much longer. As soon as within a year, progress from reasoning models could slow down, according to the report's findings. Reasoning models such as OpenAI's o3 have led to substantial gains on AI benchmarks in recent months, particularly benchmarks measuring math and programming skills. The models can apply more computing to problems, which can improve their performance, with the downside being that they take longer than conventional models to complete tasks. Reasoning models are developed by first training a conventional model on a massive amount of data, then applying a technique called reinforcement learning, which effectively gives the model 'feedback' on its solutions to difficult problems. So far, frontier AI labs like OpenAI haven't applied an enormous amount of computing power to the reinforcement learning stage of reasoning model training, according to Epoch. That's changing. OpenAI has said that it applied around 10x more computing to train o3 than its predecessor, o1, and Epoch speculates that most of this computing was devoted to reinforcement learning. And OpenAI researcher Dan Roberts recently revealed that the company's future plans call for prioritizing reinforcement learning to use far more computing power, even more than for the initial model training. But there's still an upper bound to how much computing can be applied to reinforcement learning, per Epoch. According to an Epoch AI analysis, reasoning model training scaling may slow down. Image Credits:Epoch AI Josh You, an analyst at Epoch and the author of the analysis, explains that performance gains from standard AI model training are currently quadrupling every year, while performance gains from reinforcement learning are growing tenfold every 3-5 months. The progress of reasoning training will 'probably converge with the overall frontier by 2026,' he continues. Techcrunch event Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you've built — without the big spend. Available through May 9 or while tables last. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you've built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | BOOK NOW Epoch's analysis makes a number of assumptions, and draws in part on public comments from AI company executives. But it also makes the case that scaling reasoning models may prove to be challenging for reasons besides computing, including high overhead costs for research. 'If there's a persistent overhead cost required for research, reasoning models might not scale as far as expected,' writes You. 'Rapid compute scaling is potentially a very important ingredient in reasoning model progress, so it's worth tracking this closely.' Any indication that reasoning models may reach some sort of limit in the near future is likely to worry the AI industry, which has invested enormous resources developing these types of models. Already, studies have shown that reasoning models, which can be incredibly expensive to run, have serious flaws, like a tendency to hallucinate more than certain conventional models.
Yahoo
24-04-2025
- Business
- Yahoo
AI supercomputers are getting bigger each year — and one study projects they could need as much power as a city by 2030
The future of AI depends on building more powerful supercomputers. New research suggests how gargantuan these supercomputers could be by the end of the decade. Epoch AI projects that a top supercomputer in 2030 could need as much power needed for up to 9 million homes. Silicon Valley needs bigger and better supercomputers to get bigger and better AI. We just got an idea of what they might look like by the end of the decade. A new study published this week by researchers at the San Francisco-based institute Epoch AI said supercomputers — massive systems stacked with chips to train and run AI models — could need the equivalent of nine nuclear reactors by 2030 to keep them chugging along. Epoch AI estimates that if supercomputer power requirements continue to roughly double each year, as they have done since 2019, the top machines would need about 9GW of power. That's about the amount needed to keep the lights on in a city of roughly 7 to 9 million homes. Today's most powerful supercomputer needs around 300MW of power, which is "equivalent to about 250,000 households." This makes the potential power needs of future supercomputers look extraordinary. There are a few reasons the next generation of computing seems so demanding. One explanation is that they will simply be bigger. According to Epoch AI's paper, the leading AI supercomputer in 2030 could require 2 million AI chips and cost $200 billion to build — again, assuming that current growth trends continue. For context, today's largest supercomputer — the Colossus system, built to full scale within 214 days by Elon Musk's xAI — is estimated to have cost $7 billion to make, and, per the company's website, is stacked with 200,000 chips. Companies have been looking to secure more chips to provide the computing power needed for increasingly powerful models as they race toward developing AI that surpasses human intelligence. OpenAI, for instance, started the year with a huge supercomputer announcement as it unveiled Stargate, a project worth over $500 billion in investment over four years aimed at building critical AI infrastructure that includes a "computing system." Epoch AI explains this growth by stating that where once supercomputers were used just as research tools, they're now being used as "industrial machines delivering economic value." Having AI and supercomputers deliver economic value isn't just a priority for CEOs trying to justify exorbitant capital expenditures, either. Earlier this month, President Donald Trump took to Truth Social to celebrate a $500 billion investment from Nvidia to build AI supercomputers in the US. It's "big and exciting news," he said, branding the announcement a commitment to "the Golden Age of America." However, as Epoch AI's research suggests — research based on a dataset that covers "about 10% of all relevant AI chips produced in 2023 and 2024 and about 15% of the chip stocks of the largest companies at the start of 2025" — this would all come with greater power demands. Epoch AI did note that "AI supercomputers are improving in energy efficiency, but the shift is not quickly enough to offset overall power growth." It also explains why companies like Microsoft, Google, and others have been looking to nuclear power as an alternative to their energy needs. If the AI trend continues to grow, expect supercomputers to keep growing with it. Read the original article on Business Insider Sign in to access your portfolio