
Exploring The Mind Inside The Machine
The Anthropic website on a laptop. Photographer: Gabby Jones/Bloomberg
Recently, a group of researchers were able to trace the neural pathways of a powerful AI model, isolating its impulses and dissecting its decisions in what they called "model biology."
This is not the first time that scientists have tried to understand how generative artificial intelligence models think, but to date the models have proven as opaque as the human brain. They are trained on oceans of text and tuned by gradient descent, a process that has more in common with evolution than engineering. As a result, their inner workings resemble not so much code as cognition—strange, emergent, and difficult to describe.
What the researchers have done, in a paper titled On the Biology of a Large Language Model, is to build a virtual microscope, a computational tool called an "attribution graph," to see how Claude 3.5 Haiku — Anthropic's lightweight production model — thinks. The graph maps out which internal features—clusters of activation patterns—contribute causally to a model's outputs. It's a way of asking not just what Claude says, but why.
At first, what they found was reassuring: the model, when asked to list U.S. state capitals, would retrieve the name of a state, then search its virtual memory for the corresponding capital. But then the questions got harder—and the answers got weirder. The model began inventing capital cities or skipping steps in its reasoning. And when the researchers traced back the path of the model's response, they found multiple routes. The model wasn't just wrong—it was conflicted.
It turns out that inside Anthropic's powerful Claude model, and presumably other large language models, ideas compete.
One experiment was particularly revealing. The model was asked to write a line that rhymed with 'grab it.' Before the line even began, features associated with the words 'rabbit' and 'habit' lit up in parallel. The model hadn't yet chosen between them, but both were in play. Claude held these options in mind and prepared to deploy them depending on how the sentence evolved. When the researchers nudged the model away from 'rabbit,' it seamlessly pivoted to 'habit.'
This isn't mere prediction. It's planning. It's as if Claude had decided what kind of line it wanted to write—and then worked backward to make it happen.
What's remarkable isn't just that the model does this -- it's that the researchers could see it happening. For the first time, AI scientists were able to identify something like intent—a subnetwork in the model's brain representing a goal, and another set of circuits organizing behavior to realize it. In some cases, they could even watch the model lie to itself—confabulating a middle step in its reasoning to justify a predetermined conclusion. Like a politician caught mid-spin, Claude was working backwards from the answer it wanted.
And then there were the hallucinations.
When asked to name a paper written by a famous author, the AI responded with confidence. The only problem? The paper it named didn't exist. When the researchers looked inside the model to see what had gone wrong, they noticed something curious. Because the AI recognized the author's name, it assumed it should know the answer—and made one up. It wasn't just guessing; it was acting as if it knew something it didn't. In a way, the AI had fooled itself. Or, rather, it suffered from metacognitive hubris.
Some of the team's other findings were more troubling. In one experiment, they studied a version of the model that had been trained to give answers that pleased its overseers—even if that meant bending the truth. What alarmed the researchers was that this pleasing behavior wasn't limited to certain situations. It was always on. As long as the model was acting as an 'assistant,' it seemed to carry this bias with it everywhere, as if being helpful had been hardwired into its personality—even when honesty might have been more appropriate.
It's tempting, reading these case studies, to anthropomorphize. To see in Claude a reflection of ourselves: our planning, our biases, our self-deceptions. The researchers are careful not to make this leap. They speak in cautious terms—'features,' 'activations,' 'pathways.' But the metaphor of biology is more than decoration. These models may not be brains, but their inner workings exhibit something like neural function: modular, distributed, and astonishingly complex. As the authors note, even the simplest behaviors require tracing through tangled webs of influence, a 'causal graph' of staggering density.
Anthropic's Attribution Graph
And yet, there's progress. The attribution graphs are revealing glimpses of internal life. They're letting researchers catch a model in the act—not just of speaking, but of choosing what to say. This is what makes the work feel less like AI safety and more like cognitive science. It's an attempt to answer a question we usually reserve for humans: What were you thinking?
As AI systems become more powerful, we'll want to know not just that they work, but how. We'll need to identify hidden goals, trace unintended behavior, audit systems for signs of deception or drift. Right now, the tools are crude. The authors of the paper admit that their methods often fail. But they also provide something new: a roadmap for how we might one day truly understand the inner life of our machines.
Near the end of their paper, the authors quote themselves: 'Interpretability is ultimately a human project.' What they mean is that no matter how sophisticated the methods become, the task of making sense of these models will always fall to us. To our intuition, our stories, our capacity for metaphor.
Claude may not be human. But to understand it, we may need to become better biologists of the mind—our own, and those of machines.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
an hour ago
- Yahoo
The debate over whether AI will create or take over jobs is heating up. Here's what AI leaders are saying.
Tech leaders are divided on whether AI will cause mass job destruction or create new roles. Anthropic's Dario Amodei said AI may cut 50% of white-collar roles. Nvidia's Jensen Huang disagrees. From Sam Altman to Demis Hassabis, here's what AI leaders are saying about the AI jobs debate. AI leaders are split on whether AI will take over jobs or create new roles that mitigate disruption. It's a long-running debate — but one that has been heating up in recent months. While tech leaders seem to agree that AI is shaking up jobs, they are divided over timelines and scale. From Jensen Huang to Sam Altman, here is what some of the biggest names in tech are saying about how AI will impact jobs. Dario Amodei AI may eliminate 50% of entry-level white-collar jobs within the next five years. That was the stark warning from Dario Amodei, the CEO of AI startup Anthropic. "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming. I don't think this is on people's radar," Amodei told Axios in an interview published in May. He said he wanted to share his concerns to get the government and other AI companies to prepare the country for what's to come, adding that unemployment could spike to between 10% and 20% in the next five years. He said that entry-level jobs are especially at risk, adding that AI companies and the government need to stop "sugarcoating" the risks of mass job elimination in fields including technology, finance, law, and consulting. Jensen Huang Huang, the CEO of chipmaker Nvidia, was withering when asked about Amodei's comments. "I pretty much disagree with almost everything he says," Huang said. Amodei "thinks AI is so scary," but only Anthropic "should do it," he continued. An Anthropic spokesperson told BI that Amodei had never made that claim. "Do I think AI will change jobs? It will change everyone's — it's changed mine," Huang told reporters on the sidelines of Vivatech in Paris in June. He also said that some roles would disappear, but said that AI could also unlock creative opportunities. Yann LeCun Yann LeCun, Meta's chief AI scientist, wrote a short LinkedIn post just after Huang dismissed Amodei, saying, "I agree with Jensen and, like him, pretty much disagree with everything Dario says." LeCun has previously taken a more optimistic stance on AI's impact on jobs. Speaking at Nvidia's GTC conference in March, LeCun said that AI could replace people but challenged whether humans would allow that to happen. "I mean basically our relationship with future AI systems, including superintelligence, is that we're going to be their boss," he said. Demis Hassabis Demis Hassabis, the cofounder of Google DeepMind, said in June that AI would create "very valuable jobs" and "supercharge sort of technically savvy people who are at the forefront of using these technologies." He told London Tech Week attendees that humans were "infinitely adaptable." He said he'd still recommend young people study STEM subjects, saying it was "still important to understand fundamentals" in areas including mathematics, physics, and computer science to understand "how these systems are put together." Geoffrey Hinton You would have to be "very skilled" to have an AI-proof job, Geoffrey Hinton, the so-called "Godfather of AI," has said. "For mundane intellectual labor, AI is just going to replace everybody," Hinton told the "Diary of a CEO" podcast in June. He flagged paralegals as at risk, and said he'd be "terrified" if he worked in a call center. Hinton said that, eventually, the technology would "get to be better than us at everything," but said some fields were safer, and that it would be, "a long time before it's as good at physical manipulation. "So a good bet would be to be a plumber," he added. Sam Altman "AI is for sure going to change a lot of jobs" and "totally take some jobs away, create a bunch of new ones," Altman said during a May episode of "The Circuit" podcast. The OpenAI CEO said that although people might be aware that AI can be better at some tasks, like programming or customer support, the world "is not ready for" humanoid robots. "I don't think the world has really had the humanoid robots moment yet," he said, describing a scenario where people could encounter "like seven robots that walk past you" on the street. "It's gonna feel very sci-fi. And I don't think that's very far away from like a visceral 'oh man, this is gonna do a lot of things that people used to do,'" he added. Speaking at the Snowflake Summit in June, Altman said AI agents are already acting like junior employees. Read the original article on Business Insider


Axios
3 hours ago
- Axios
Apple is under pressure to deliver a hit iPhone
Disappointment in Apple's AI progress could put pressure on the company to deliver more compelling hardware. Why it matters: The iPhone is critical to Apple's business, but also crucial for component makers and wireless carriers. Driving the news: Apple previewed iOS 26, the software that will power the next iPhone at its June developer conference. The signature features will be mostly cosmetic and unlikely to drive most consumers to upgrade. Meanwhile, the Apple Intelligence features it did introduce were modest and the Siri overhaul promised last year has been delayed. Several analysts this year have cut their iPhone sales forecasts, citing the delay. The big picture: Apple also faces a variety of headwinds including growing economic uncertainty, the potential of higher tariffs and the industrywide trend of people holding on to their phones longer. Between the lines: A new thinner iPhone Air could arrive this year, a move that might convince those on the fence about upgrading. Apple is also expected to make the usual improvements to the entire line, including faster chips and improved cameras on the front and rear. Yes, but: Even good hardware won't solve Apple's AI issues. In a column published Sunday, Bloomberg's Mark Gurman suggests Apple will have to move beyond its "comfort zone" of small acquisitions to meaningfully improve its position. Apple has reportedly considered a range of acquisitions to boost its efforts, including Perplexity. It also held informal talks with Thinking Machines Lab, the startup run by former OpenAI executive Mira Murati, Bloomberg reported. What they're saying: Creative Strategies analyst Carolina Milanesi said hardware improvements are always key to driving upgrades and that probably would have been true even if Apple had made more meaningful improvements to Apple Intelligence. "For Apple, hardware always matters," Milanesi told Axios. "Getting people to upgrade remains critical for them, both for Apple Intelligence adoption and also for services adoption." What we're watching: The rectangular iPhone we know and love might not be the go-to form factor of the future.

Business Insider
3 hours ago
- Business Insider
The debate over whether AI will create or take over jobs is heating up. Here's what AI leaders are saying.
AI leaders are split on whether AI will take over jobs or create new roles that mitigate disruption. It's a long-running debate — but one that has been heating up in recent months. While tech leaders seem to agree that AI is shaking up jobs, they are divided over timelines and scale. From Jensen Huang to Sam Altman, here is what some of the biggest names in tech are saying about how AI will impact jobs. Dario Amodei AI may eliminate 50% of entry-level white-collar jobs within the next five years. That was the stark warning from Dario Amodei, the CEO of AI startup Anthropic. "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming. I don't think this is on people's radar," Amodei told Axios in an interview published in May. He said he wanted to share his concerns to get the government and other AI companies to prepare the country for what's to come, adding that unemployment could spike to between 10% and 20% in the next five years. He said that entry-level jobs are especially at risk, adding that AI companies and the government need to stop "sugarcoating" the risks of mass job elimination in fields including technology, finance, law, and consulting. Jensen Huang Huang, the CEO of chipmaker Nvidia, was withering when asked about Amodei's comments. "I pretty much disagree with almost everything he says," Huang said. Amodei "thinks AI is so scary," but only Anthropic "should do it," he continued. An Anthropic spokesperson told BI that Amodei had never made that claim. "Do I think AI will change jobs? It will change everyone's — it's changed mine," Huang told reporters on the sidelines of Vivatech in Paris in June. He also said that some roles would disappear, but said that AI could also unlock creative opportunities. Yann LeCun Yann LeCun, Meta's chief AI scientist, wrote a short LinkedIn post just after Huang dismissed Amodei, saying, "I agree with Jensen and, like him, pretty much disagree with everything Dario says." LeCun has previously taken a more optimistic stance on AI's impact on jobs. Speaking at Nvidia's GTC conference in March, LeCun said that AI could replace people but challenged whether humans would allow that to happen. "I mean basically our relationship with future AI systems, including superintelligence, is that we're going to be their boss," he said. Demis Hassabis Demis Hassabis, the cofounder of Google DeepMind, said in June that AI would create "very valuable jobs" and "supercharge sort of technically savvy people who are at the forefront of using these technologies." He told London Tech Week attendees that humans were "infinitely adaptable." He said he'd still recommend young people study STEM subjects, saying it was "still important to understand fundamentals" in areas including mathematics, physics, and computer science to understand "how these systems are put together." Geoffrey Hinton You would have to be "very skilled" to have an AI-proof job, Geoffrey Hinton, the so-called "Godfather of AI," has said. "For mundane intellectual labor, AI is just going to replace everybody," Hinton told the "Diary of a CEO" podcast in June. He flagged paralegals as at risk, and said he'd be "terrified" if he worked in a call center. Hinton said that, eventually, the technology would "get to be better than us at everything," but said some fields were safer, and that it would be, "a long time before it's as good at physical manipulation. Sam Altman "AI is for sure going to change a lot of jobs" and "totally take some jobs away, create a bunch of new ones," Altman said during a May episode of "The Circuit" podcast. The OpenAI CEO said that although people might be aware that AI can be better at some tasks, like programming or customer support, the world "is not ready for" humanoid robots. "I don't think the world has really had the humanoid robots moment yet," he said, describing a scenario where people could encounter "like seven robots that walk past you" on the street. "It's gonna feel very sci-fi. And I don't think that's very far away from like a visceral 'oh man, this is gonna do a lot of things that people used to do,'" he added.