23-05-2025
From Dr Google to AI: Why Moravec's Paradox Still Matters
Remember the eye-rolling days when patients arrived armed with printouts from 'Dr Google' — a jumble of chat-forum anecdotes, WebMD symptom lists, and the occasional PubMed abstract? That DIY differential diagnosis was more often noise than knowledge, but it signaled something profound: Anyone with wi-fi could enter the diagnostic dialogue. Around the same time, another cautionary tale landed in radiology — the image of a gorilla hidden in lung CTs that most experts missed, reminding us that attention is finite and perception fallible.
Both stories set the stage for Moravec's paradox: Computers excel at the heavy cognitive lifting we struggle with (data synthesis, instant recall), but cannot mimic the human art of empathy, context, and creativity. In a recent NEJM AI Perspective, my co-author Scott Penberthy and I argued that acknowledging this asymmetry is the key to collaborative intelligence, not clinician replacement.
Enter Google DeepMind's AMIE (Articulate Medical Intelligence Explorer) — a large language model agent that doesn't just search but also converses, asks follow-up questions, interprets images, and even scores higher than physicians on empathy scales.
Has 'Dr Google' been reborn with a medical degree in Silicon Valley? Let's dive in and find out.
AMIE, Then and Now
AMIE was introduced in an April 2025 Nature study in which it out-diagnosed board-certified primary care physicians across 159 simulated cases and scored higher on 25 of 26 empathy metrics. The machine's edge came from the exact tasks humans find the hardest: perfect recall, tireless synthesis, and accurate arithmetic. The machine also showed off a newer ability: empathy skills.
Vision-Enabled Leap
On May 1, Google DeepMind unveiled multimodal AMIE, powered by Gemini 2.x. The upgraded agent now asks for lab slips, ECGs, or skin photos mid-chat, interprets them on the fly, and incorporates the findings into its differential.
In a 105-case virtual objective structured clinical examination (OSCE), the upgraded AMIE equaled or surpassed primary care physicians on image interpretation and top-3 diagnostic accuracy (up 6 percentage points over the earlier build) — in nearly two thirds of patient encounters, the correct diagnosis appeared somewhere in the AI's first three guesses on its differential list. It's a pragmatic yardstick, good enough to flag most of the right answers without demanding single-shot perfection.
Multimodal AMIE can also mimic empathy, all while maintaining human-level hallucination rates (its occasional detours from reality are on par with our own).
Such machines are edging into jobs once parked squarely in the 'easy for humans' column, plus it knows when to ask for that extra image or lab, then stitches every clue into a tidy differential, and explains the findings in plain language (Table).
Table. Moravec's Matrix: 2025 Edition Hard for Humans, Easy for AI Hard for AI, Easy for Humans Multimodal data synthesis (EHR + derm photo + ECG trace) Reading nonverbal cues on video/in-person Rapid trial-eligibility scanning Physical dexterity (phlebotomy, palpation) Ultrafast literature retrieval and real-time drug–drug checks Explaining uncertainty and negotiating goals of care Automated EHR summarization and coding Cultural context, creative problem-solving
Adapted from Loaiza-Bonillo A, Penberthy S. NEJM AI. April 24, 2025.
Case in Point: Decentralized Trials
Our NEJM AI paper details how AI flips the 'clinical-trial enrollment paradox.' By matching inclusion and exclusion criteria against millions of records in seconds, AI shortens recruitment timelines and makes studies more accessible to community sites that once lacked the manpower for manual screening. It's Moravec's paradox in action.
Guardrails: Bias, Validation, Reality Checks
The creators of multimodal AMIE warn that even with Gemini 2.5 gains, chat-based OSCEs do not adequately represent real clinics — there are no facial expressions, no auscultation, nor tactile exams. Their prospective study with Beth Israel Deaconess Medical Center is exactly the kind of external validation every health-system chief information officer should demand.
Despite its advances, if we want to effectively harness clinical AI, we must keep in mind the following:
Algorithmic bias: Blind spots in training data widen inequities, and audited fairness metrics would need to be mandatory.
Blind spots in training data widen inequities, and audited fairness metrics would need to be mandatory. External validation: Models proven only in silico (purely simulated) wilt in heterogeneous clinics, and multi-institutional trials are the new gold standard.
Models proven only in silico (purely simulated) wilt in heterogeneous clinics, and multi-institutional trials are the new gold standard. AI Governance: Multidisciplinary safety boards must monitor postdeployment drift and hallucination rates. Even a rock-solid model can slide off course after launch as disease patterns shift or new drugs hit the market. AI requires continuous monitoring and periodic tune-ups to keep it clinically honest.
Here's a playbook for clinicians to ensure collaborative intelligence, not physician replacement, when using AI.
Pilot, measure, iterate: Join controlled rollouts of conversational AI, with metrics on accuracy, satisfaction, and bias. Audit for equity: Check that your population's language, literacy, and socioeconomic mix is reflected in training and evaluation. Master AI literacy: Learn prompt engineering and failure modes so you can override the machine when intuition disagrees. Champion human strengths: Double down on empathy, cultural competence, and dexterity — the tasks robots still fumble with.
Seeing Past the Next Gorilla
AMIE shows that large language models can talk medicine almost as well as they see tumors, and with vision-enabled agents already interpreting rashes and ECGs mid-chat, that 'conversation' swiftly becomes like a mini-physical exam. If we deploy these tools responsibly — auditing for bias, validating in a messy reality, and preserving our uniquely human gifts — we won't just catch invisible gorillas, we'll transform the diagnostic conversation itself.
As large language models continue to evolve, I would want them to deal with all my prior authorization approvals and our convoluted health system as well as it navigates medical data. How cool would that be?
I'd love to hear your thoughts on the evolving partnership between AI and radiology. Feel free to reach out at Let's keep the conversation going — and Moravec's paradox relevant.