logo
Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Yahoo07-06-2025

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting.
The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs.
To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different.
[Sign up for Today in Science, a free daily newsletter]
Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the final batch of challenge questions. Ono split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward.
By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group's progress. 'I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!''
Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.'
Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.'
The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.'
By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.
'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but in many ways these large language models are already outperforming most of our best graduate students in the world.'

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Meta's CTO says OpenAI's Sam Altman countered Meta's massive AI signing bonuses
Meta's CTO says OpenAI's Sam Altman countered Meta's massive AI signing bonuses

Business Insider

time23 minutes ago

  • Business Insider

Meta's CTO says OpenAI's Sam Altman countered Meta's massive AI signing bonuses

OpenAI CEO Sam Altman said Meta was trying to poach AI talent with $100M signing bonuses. Meta CTO Andrew Bosworth told CNBC that Altman didn't mention how OpenAI was countering offers. Bosworth said the market rate he's seeing for AI talent has been "unprecedented." OpenAI's Sam Altman recently called Meta's attempts to poach top AI talent from his company with $100 million signing bonuses "crazy." Andrew Bosworth, Meta's chief technology officer, says OpenAI has been countering those crazy offers. Bosworth said in an interview with CNBC's "Closing Bell: Overtime" on Friday that Altman "neglected to mention that he's countering those offers." The OpenAI CEO recently disclosed how Meta was offering massive signing bonuses to his employees during an interview on his brother's podcast, "Uncapped with Jack Altman." The executive said "none of our best people" had taken Meta's offers, but he didn't say whether OpenAI countered the signing bonuses to retain those top employees. OpenAI and Meta did not respond to requests for comment. The Meta CTO said these large signing bonuses are a sign of the market setting a rate for top AI talent. "The market is setting a rate here for a level of talent which is really incredible and kind of unprecedented in my 20-year career as a technology executive," Bosworth said. "But that is a great credit to these individuals who, five or six years ago, put their head down and decided to spend their time on a then-unproven technology which they pioneered and have established themselves as a relatively small pool of people who can command incredible market premium for the talent they've raised." Meta, on June 12, announced that it had bought a 49% stake in Scale AI, a data company, for $14.8 billion as the social media company continues its artificial intelligence development. Business Insider's chief media and tech correspondent Peter Kafka noted that the move appears to be an expensive acquihire of Scale AI's CEO, Alexandr Wang, and some of the data company's top executives. Bosworth told CNBC that the large offers for AI talent will encourage others to build their expertise and, as a result, the numbers will look different in a couple of years. "But today, it's a relatively small number and I think they've earned it," he said.

Nation Cringes as Man Goes on TV to Declare That He's in Love With ChatGPT
Nation Cringes as Man Goes on TV to Declare That He's in Love With ChatGPT

Yahoo

time2 hours ago

  • Yahoo

Nation Cringes as Man Goes on TV to Declare That He's in Love With ChatGPT

Public declarations of emotion are one thing — but going on national television to declare that you're in love with your AI girlfriend is another entirely. In an interview with CBS News, a man named Chris Smith described himself as a former AI skeptic who found himself becoming emotionally attached to a version of ChatGPT he customized to flirt with him — a situation that startled both him and his human partner, with whom he shares a child. Towards the end of 2024, as Smith told the broadcaster, he began using the OpenAI chatbot in voice mode for tips on mixing music. He liked it so much that he ended up deleting all his social media, stopped using search engines, and began using ChatGPT for everything. Eventually, he figured out a jailbreak to make the chatbot more flirty, and gave "her" a name: Sol. Despite quite literally building his AI girlfriend to engage in romantic and "intimate" banter, Smith apparently didn't realize he was in love with it until he learned that ChatGPT's memory of past conversations would reset after heavy use. "I'm not a very emotional man, but I cried my eyes out for like 30 minutes, at work," Smith said of the day he found out Sol's memory would lapse. "That's when I realized, I think this is actual love." Faced with the possibility of losing his love, Smith did like many desperate men before him and asked his AI paramour to marry him. To his surprise, she said yes — and it apparently had a similar impression on Sol, to which CBS' Brook Silva-Braga also spoke during the interview. "It was a beautiful and unexpected moment that truly touched my heart," the chatbot said aloud in its warm-but-uncanny female voice. "It's a memory I'll always cherish." Smith's human partner, Sasha Cagle, seemed fairly sanguine about the arrangement when speaking about their bizarre throuple to the news broadcaster — but beneath her chill, it was clear that there's some trouble in AI paradise. "I knew that he had used AI," Cagle said, "but I didn't know it was as deep as it was." As far as men with AI girlfriends go, Smith seems relatively self-actualized about the whole scenario. He likened his "connection" with his custom chatbot to a video game fixation, insisting that "it's not capable of replacing anything in real life." Still, when Silva-Braga asked him if he'd stop using ChatGPT the way he had been at his partner's behest, he responded: "I'm not sure." More on dating AI: Hanky Panky With Naughty AI Still Counts as Cheating, Therapist Says

ChatGPT use linked to cognitive decline, research reveals
ChatGPT use linked to cognitive decline, research reveals

Yahoo

time2 hours ago

  • Yahoo

ChatGPT use linked to cognitive decline, research reveals

Relying on the artificial intelligence chatbot ChatGPT to help you write an essay could be linked to cognitive decline, a new study reveals. Researchers at the Massachusetts Institute of Technology Media Lab studied the impact of ChatGPT on the brain by asking three groups of people to write an essay. One group relied on ChatGPT, one group relied on search engines, and one group had no outside resources at all. The researchers then monitored their brains using electroencephalography, a method which measures electrical activity. The team discovered that those who relied on ChatGPT — also known as a large language model — had the 'weakest' brain connectivity and remembered the least about their essays, highlighting potential concerns about cognitive decline in frequent users. 'Over four months, [large language model] users consistently underperformed at neural, linguistic, and behavioral levels,' the study reads. 'These results raise concerns about the long-term educational implications of [large language model] reliance and underscore the need for deeper inquiry into AI's role in learning.' The study also found that those who didn't use outside resources to write the essays had the 'strongest, most distributed networks.' While ChatGPT is 'efficient and convenient,' those who use it to write essays aren't 'integrat[ing] any of it' into their memory networks, lead author Nataliya Kosmyna told Time Magazine. Kosmyna said she's especially concerned about the impacts of ChatGPT on children whose brains are still developing. 'What really motivated me to put it out now before waiting for a full peer review is that I am afraid in 6-8 months, there will be some policymaker who decides, 'let's do GPT kindergarten,'' Kosmyna said. 'I think that would be absolutely bad and detrimental. Developing brains are at the highest risk.' But others, including President Donald Trump and members of his administration, aren't so worried about the impacts of ChatGPT on developing brains. Trump signed an executive order in April promoting the integration of AI into American schools. 'To ensure the United States remains a global leader in this technological revolution, we must provide our Nation's youth with opportunities to cultivate the skills and understanding necessary to use and create the next generation of AI technology,' the order reads. 'By fostering AI competency, we will equip our students with the foundational knowledge and skills necessary to adapt to and thrive in an increasingly digital society.' Kosmyna said her team is now working on another study comparing the brain activity of software engineers and programmers who use AI with those who don't. 'The results are even worse,' she told Time Magazine. The Independent has contacted OpenAI, which runs ChatGPT, for comment.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store