
ChatGPT Beats Claude, Google's Gemini, DeepSeek In Test Of AI Agents
Rating AI agents including ChatGPT's o3, Claude from Anthropic, and Google's Gemini on web search ... More tasks
ChatGPT's recent o3 AI model beat Anthropic's Claude, Google's Gemini, and Hangzhou's Deepseek in a test of AI agents for web research. But there's still a considerable gap between human capabilities and the best AI agents.
Reseach firm FutureSearch put 11 major large language models through some messy, real-world research tasks, 89 in total, and evaluated each model on its ability to find original sources, seek out data, gather evidence, compile data, and validate claims.
The highest performance achieved was .51 on a scale where an estimated 'perfect' agent would hit about .8. Which means that even the best AI agents available now are relatively easily outperformed by humans.
'We can conclude that frontier agents … substantially underperform smart generalist researchers who are given ample time,' the study says.
Here's how they scored the various AI models:
Still, AI agents are rapidly improving. Based on the year-old ChatGPT -4-Turbo's score of 0.27, researchers say that 'about 45% of the gap between smart generalist researchers and frontier agents' was closed within a year of development.
Also, free or cheap agents such as DeepSeek are not that far behind paid and top-end AI agents from OpenAI. OpenAI's o3 leads the pack, with Claude and Gemini close behind, and for now closed models are clearly superior for research-heavy tasks, but free and open-source models are increasingly capable.
All LLM-based AI agents still have major issues, however. They fall short of smart human researchers — especially on strategic planning, thoroughness, evaluating sources for quality, and 'memory management:' they tend to forget earlier findings mid-task. A particular problem is that AI agents often engage in 'satisficing," or accepting a lower level of quality instead of optimizing until they find the highest-quality level of response.
That's a core reason why ChatGPT's o3 model came in first. ChatGPT-o3 tended to validate its answers more thoroughly and stop short of better available answers less frequently.
Since a year has served to close almost half the gap between elite humans and the best AI agents, it may not be long until AI agents are outperforming even the best humans.
However, given ChatGPT's recent challenges with its latest model being too agreeable, it's clear that there's not a straight-line path to improvement.
For now at least, it'll remain essential to double-check any results from a generative AI application like AI agents to ensure accuracy.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
19 minutes ago
- Yahoo
Could Bitcoin Actually Hit $200,000 Before 2026?
Bitcoin could nearly double $200,000 before the end of this year. It will still be a good investment if it misses that mark. Institutional investors are the ones driving its pricing for the moment. 10 stocks we like better than Bitcoin › Bitcoin (CRYPTO: BTC) trades for about $105,000 (as of June 19), yet credible analysts are mapping a route to its price surpassing $200,000 by the end of 2025. For reference, a 90% price gain to $200,000 would raise Bitcoin's market cap to about $3.9 trillion. That target looks unduly aggressive only if you ignore two simple forces: a sharply lower trickle of new coins, and a sharply higher amount of institutional demand. Both are already affecting the coin's price right now. Let's see what the numbers actually say and take a moment to understand why the forecast for $200,000 isn't unreasonable at all. As always, understanding the supply and demand dynamics of Bitcoin is the first step to appreciating how responsive its price is likely to be relative to what buyers are bidding for it. Every four years, the Bitcoin network halves the block reward, cutting the flow of fresh coins. The most recent halving on April 20, 2024, reduced the reward such that the total annual new issuance declined from roughly 328,500 coins to about 164,000. With 19.9 million coins already mined out of a maximum of 21 million possible, new supply now grows less than 0.8% per year. In April 2028, the next halving will constrain supply even further, and that fact is something that most market participants are aware of already, implying that potential buyers have a significant incentive to procure their allocation sooner rather than later. The tiny drip of new supply today is already meeting a hungry horde of demand. Bitcoin exchange-traded funds (ETFs) have hauled in more than $46 billion cumulatively, including a six-day streak of $1.8 billion in mid-June. Those funds, institutional investors, and publicly traded companies together now command about 6% of the coin's total circulating supply. At today's price, that capital removes roughly 360,000 coins from the public float, which is equivalent to more than two years of issuance at the current block reward. If the inflows simply persist at half their recent pace, the available supply could tighten by another 2% to 3% before 2026. And a shrinking float usually forces prices significantly higher because the number of willing sellers dries up faster than the number of willing buyers. In other words, crypto market euphoria is not a precondition to Bitcoin soaring. The only needed ingredient is buyers who are willing to convert fiat currencies into ETF shares just a bit faster than miners are capable of creating fresh coins. And right now, that speed differential is widening, so the conditions are ripe for the price to squeeze upward. While supply dynamics explain why the crypto's price can rise, macro tailwinds explain why demand might keep accelerating. On that front, U.S. core inflation cooled in May to its lowest reading since 2023. The Federal Reserve has held its benchmark interest rate steady since March; many investors are expecting that the Fed will cut rates a bit before next year. It's possible that lower real yields will make a scarce, non-yielding asset like Bitcoin more attractive. Separately, regulatory clarity is also improving abroad, which will create more institutional buyers. The European Union's Markets in Crypto-Assets (MiCA) framework began licensing major exchanges in mid-June, opening a harmonized 27-nation market. Clear guidelines for competition reduce regulatory risk and invite European pension funds and other institutional investors to buy in, many of which had waited on the sidelines. Nonetheless, the path to $200,000 is not necessarily a straight shot, given the current geopolitical and economic instability, as well as major uncertainties in U.S. trade policy. A surprise liquidity crunch, perhaps sparked by a geopolitical shock or a renewed tariff-driven inflation spike, could dull risk appetite and force some selling, which could temporarily damage sentiment about the coin. Political risk matters, too. U.S. lawmakers still debate crypto taxation and custody rules. A hostile bill could freeze ETF creation or raise costs, muting demand. Assuming no severe shock, however, the chances of Bitcoin surpassing $200,000 in 2026 look realistic, if perhaps a bit ambitious. If ETFs absorb another $50 billion of the supply by late 2025, they would remove roughly 475,000 additional coins from circulation at an average cost basis of $105,000. The good news for investors here is that it doesn't really matter if Bitcoin passes an arbitrary price target before an arbitrary point in time. Since the biggest upside for holders is over the long term, not the near term, the smartest move here is simply to buy the coin and commit to holding it. Before you buy stock in Bitcoin, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and Bitcoin wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $664,089!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $881,731!* Now, it's worth noting Stock Advisor's total average return is 994% — a market-crushing outperformance compared to 172% for the S&P 500. Don't miss out on the latest top 10 list, available when you join . See the 10 stocks » *Stock Advisor returns as of June 9, 2025 Alex Carchidi has positions in Bitcoin. The Motley Fool has positions in and recommends Bitcoin. The Motley Fool has a disclosure policy. Could Bitcoin Actually Hit $200,000 Before 2026? was originally published by The Motley Fool Sign in to access your portfolio
Yahoo
19 minutes ago
- Yahoo
ETH Drops 8% in Flash Crash, Recovers After Buyers Step In
Ether (ETH) ETH experienced a sharp flash crash during the 21:00 hour on June 21, falling 7.56% from $2,406 to $2,224, according to CoinDesk Research's technical analysis model. The sudden price drop triggered heavy trading activity, with more than 751,000 ETH changing hands—nearly five times the average hourly volume. Despite the steep decline, buyer interest surged around the $2,250 level, helping the asset recover to $2,292. During the hour following the crash, ETH rose 0.19% from $2,287.54 to $2,291.92. A volume spike at 05:58 accompanied a 3.15% price jump on 7,314 ETH, establishing a new support zone near $2,290. The price action that followed formed an ascending channel with higher lows, signaling increased buyer engagement as conditions stabilized. Technical Analysis Highlights ETH dropped 7.56% from $2,406 to $2,224 during the 21:00 hour on June 21. Trading volume spiked to over 751,000 ETH, nearly five times the typical hourly average. At 05:58, ETH surged 3.15% from $2,283.94 to $2,291.09 on 7,314 ETH volume. Price action formed an ascending channel with higher lows after the crash. A new support zone formed around $2,290, with resistance testing at $2,297 between 06:17 and 06:20. Volume remained elevated during the recovery, indicating improved liquidity. Parts of this article were generated with the assistance from AI tools and reviewed by our editorial team to ensure accuracy and adherence to our standards. For more information, see CoinDesk's full AI Policy. Error while retrieving data Sign in to access your portfolio Error while retrieving data Error while retrieving data Error while retrieving data Error while retrieving data
Yahoo
19 minutes ago
- Yahoo
Prediction: Shiba Inu Will Be Worth This Much in 5 Years
Shiba Inu delivered incredible returns to early investors, but it doesn't offer much utility. Although meme coins can be fun, they usually don't make good long-term investments. I expect Shiba Inu to lose another 25% to 50% during the next five years. 10 stocks we like better than Shiba Inu › You won't find many investments with more early success than Shiba Inu (CRYPTO: SHIB). During 2020 and 2021, it gained more than 17,000,000%. The lucky few who got in (and out) at the right time are set for life. Since then, Shiba Inu has mostly lost value, outside of a few smaller bull runs. The current price is $0.000011 as of June 20, down 87% from the all-time high in 2021 and 45% on the year. At that price, $100 is enough to buy well over 8 million of the tokens. Several popular cryptocurrencies have done well over the last year, including Bitcoin (up 59%), XRP (up 343%), and Cardano (up 57%). And investors are excited about the future, considering the Trump administration has been crypto-friendly so far. Could it be Shiba Inu's turn to go on a run? Anything's possible, but I would be cautious about investing in this meme coin. Meme coins normally aren't intended to solve real problems or create lasting value. People create them for fun or to hopefully strike it rich. That was the case with Shiba Inu, which started as a joke. It didn't have a white paper; it had a "woof paper" with quotes from Miyamoto Musashi, a 17th-century Japanese swordsman. The anonymous founder, Ryoshi, even sent about half of the total token supply to Ethereum (CRYPTO: ETH) co-founder Vitalik Buterin in 2021 as a publicity stunt. From the beginning, Shiba Inu wasn't a serious project. It does have some use cases, in all fairness. The team behind it has built a Shiba Inu ecosystem with decentralized finance (DeFi) applications, including a decentralized exchange, ShibaSwap. But hardly anyone is using this blockchain ecosystem. The total value locked (TVL) on Shibarium, Shiba Inu's blockchain, is $2 million as of June 20. TVL is the amount of funds deposited onto a blockchain, so it's a good measure of a blockchain's popularity. Ethereum, the current king of DeFi, has $62.6 billion in TVL. Shiba Inu doesn't even crack the top 100 cryptocurrencies by this metric. Shiba Inu's lack of utility means there isn't much of a reason to buy it except in hopes that the price will go up. It's rare to see meme coins come anywhere close to their initial success. The novelty simply wears off. People want to buy the next Shiba Inu or Dogecoin, not the old version that has already had its moment. Shiba Inu appears to be at that stage. The number of daily active addresses has been declining and is in the 3,000 to 4,000 range this month, according to Santiment. In other words, a few thousand wallets per day are using Shiba Inu. At its peak in 2021, it had over 60,000 daily active addresses. The humorous nature of meme coins can also impede growth. Some retail investors may put in $10 to $100 for fun, but few will risk a sizable investment, and institutional money probably isn't going to start pouring into Shiba Inu. My prediction for Shiba Inu is that it will lose 25% to 50% of its value over the next five years. That would put its price between $0.000006 and $0.000009. Shiba Inu is highly volatile, and it could have periods where it does well. I don't think it will have a straight, steady decline. It will have some ups and some downs, but ultimately be a losing investment for those who buy and hold. This is still one of the largest cryptocurrencies in the world, and it has passionate supporters. It probably won't fall off the map completely. But as the highs get further in the rearview mirror, fewer and fewer people will buy Shiba Inu. Before you buy stock in Shiba Inu, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and Shiba Inu wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $664,089!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $881,731!* Now, it's worth noting Stock Advisor's total average return is 994% — a market-crushing outperformance compared to 172% for the S&P 500. Don't miss out on the latest top 10 list, available when you join . See the 10 stocks » *Stock Advisor returns as of June 9, 2025 Lyle Daly has positions in Ethereum. The Motley Fool has positions in and recommends Ethereum. The Motley Fool has a disclosure policy. Prediction: Shiba Inu Will Be Worth This Much in 5 Years was originally published by The Motley Fool