Latest news with #Claude3.5Haiku


Forbes
11-04-2025
- Science
- Forbes
Poetry And Deception: Secrets Of Anthropic's Claude 3.5 Haiku AI Model
Anthropic AI recently published two breakthrough research papers that provide surprising insights into how an AI model 'thinks.' One of the papers follows Anthropic's earlier research that linked human-understandable concepts with LLMs' internal pathways to understand how model outputs are generated. The second paper reveals how Anthropic's Claude 3.5 Haiku model handled simple tasks associated with ten model behaviors. These two research papers have provided valuable information on how AI models work — not by any means a complete understanding, but at least a glimpse. Let's dig into what we can learn from that glimpse, including some possibly minor but still important concerns about AI safety. LLMs such as Claude aren't programmed like traditional computers. Instead, they are trained with massive amounts of data. This process creates AI models that behave like black boxes, which obscures how they can produce insightful information on almost any subject. However, black-box AI isn't an architectural choice; it is simply a result of how this complex and nonlinear technology operates. Complex neural networks within an LLM use billions of interconnected nodes to transform data into useful information. These networks contain vast internal processes with billions of parameters, connections and computational pathways. Each parameter interacts non-linearly with other parameters, creating immense complexities that are almost impossible to understand or unravel. According to Anthropic, 'This means that we don't understand how models do most of the things they do.' Anthropic follows a two-step approach to LLM research. First, it identifies features, which are interpretable building blocks that the model uses in its computations. Second, it describes the internal processes, or circuits, by which features interact to produce model outputs. Because of the model's complexity, Anthropic's new research could illuminate only a fraction of the LLM's inner workings. But what was revealed about these models seemed more like science fiction than real science. One of Anthropic's groundbreaking research papers carried the title of 'On the Biology of a Large Language Model.' The paper examined how the scientists used attribution graphs to internally trace how the Claude 3.5 Haiku language model transformed inputs into outputs. Researchers were surprised by some results. Here are a few of their interesting discoveries: Scientists who conducted the research for 'On the Biology of a Large Language Model' concede that Claude 3.5 Haiku exhibits some concealed operations and goals not evident in its outputs. The attribution graphs revealed a number of hidden issues. These discoveries underscore the complexity of the model's internal behavior and highlight the importance of continued efforts to make models more transparent and aligned with human expectations. It is likely these issues also appear in other similar LLMs. With respect to my red flags noted above, it should be mentioned that Anthropic continually updates its Responsible Scaling Policy, which has been in effect since September 2023. Anthropic has made a commitment not to train or deploy models capable of causing catastrophic harm unless safety and security measures have been implemented that keep risks within acceptable limits. Anthropic has also stated that all of its models meet the ASL Deployment and Security Standards, which provide a baseline level of safe deployment and model security. As LLMs have grown larger and more powerful, deployment has spread to critical applications in areas such as healthcare, finance and defense. The increase in model complexity and wider deployment has also increased pressure to achieve a better understanding of how AI works. It is critical to ensure that AI models produce fair, trustworthy, unbiased and safe outcomes. Research is important for our understanding of LLMs, not only to improve and more fully utilize AI, but also to expose potentially dangerous processes. The Anthropic scientists have examined just a small portion of this model's complexity and hidden capabilities. This research reinforces the need for more study of AI's internal operations and security. In my view, it is unfortunate that our complete understanding of LLMs has taken a back seat to the market's preference for AI's high performance outcomes and usefulness. We need to thoroughly understand how LLMs work to ensure safety guardrails are adequate.


Forbes
01-04-2025
- Science
- Forbes
Exploring The Mind Inside The Machine
The Anthropic website on a laptop. Photographer: Gabby Jones/Bloomberg Recently, a group of researchers were able to trace the neural pathways of a powerful AI model, isolating its impulses and dissecting its decisions in what they called "model biology." This is not the first time that scientists have tried to understand how generative artificial intelligence models think, but to date the models have proven as opaque as the human brain. They are trained on oceans of text and tuned by gradient descent, a process that has more in common with evolution than engineering. As a result, their inner workings resemble not so much code as cognition—strange, emergent, and difficult to describe. What the researchers have done, in a paper titled On the Biology of a Large Language Model, is to build a virtual microscope, a computational tool called an "attribution graph," to see how Claude 3.5 Haiku — Anthropic's lightweight production model — thinks. The graph maps out which internal features—clusters of activation patterns—contribute causally to a model's outputs. It's a way of asking not just what Claude says, but why. At first, what they found was reassuring: the model, when asked to list U.S. state capitals, would retrieve the name of a state, then search its virtual memory for the corresponding capital. But then the questions got harder—and the answers got weirder. The model began inventing capital cities or skipping steps in its reasoning. And when the researchers traced back the path of the model's response, they found multiple routes. The model wasn't just wrong—it was conflicted. It turns out that inside Anthropic's powerful Claude model, and presumably other large language models, ideas compete. One experiment was particularly revealing. The model was asked to write a line that rhymed with 'grab it.' Before the line even began, features associated with the words 'rabbit' and 'habit' lit up in parallel. The model hadn't yet chosen between them, but both were in play. Claude held these options in mind and prepared to deploy them depending on how the sentence evolved. When the researchers nudged the model away from 'rabbit,' it seamlessly pivoted to 'habit.' This isn't mere prediction. It's planning. It's as if Claude had decided what kind of line it wanted to write—and then worked backward to make it happen. What's remarkable isn't just that the model does this -- it's that the researchers could see it happening. For the first time, AI scientists were able to identify something like intent—a subnetwork in the model's brain representing a goal, and another set of circuits organizing behavior to realize it. In some cases, they could even watch the model lie to itself—confabulating a middle step in its reasoning to justify a predetermined conclusion. Like a politician caught mid-spin, Claude was working backwards from the answer it wanted. And then there were the hallucinations. When asked to name a paper written by a famous author, the AI responded with confidence. The only problem? The paper it named didn't exist. When the researchers looked inside the model to see what had gone wrong, they noticed something curious. Because the AI recognized the author's name, it assumed it should know the answer—and made one up. It wasn't just guessing; it was acting as if it knew something it didn't. In a way, the AI had fooled itself. Or, rather, it suffered from metacognitive hubris. Some of the team's other findings were more troubling. In one experiment, they studied a version of the model that had been trained to give answers that pleased its overseers—even if that meant bending the truth. What alarmed the researchers was that this pleasing behavior wasn't limited to certain situations. It was always on. As long as the model was acting as an 'assistant,' it seemed to carry this bias with it everywhere, as if being helpful had been hardwired into its personality—even when honesty might have been more appropriate. It's tempting, reading these case studies, to anthropomorphize. To see in Claude a reflection of ourselves: our planning, our biases, our self-deceptions. The researchers are careful not to make this leap. They speak in cautious terms—'features,' 'activations,' 'pathways.' But the metaphor of biology is more than decoration. These models may not be brains, but their inner workings exhibit something like neural function: modular, distributed, and astonishingly complex. As the authors note, even the simplest behaviors require tracing through tangled webs of influence, a 'causal graph' of staggering density. Anthropic's Attribution Graph And yet, there's progress. The attribution graphs are revealing glimpses of internal life. They're letting researchers catch a model in the act—not just of speaking, but of choosing what to say. This is what makes the work feel less like AI safety and more like cognitive science. It's an attempt to answer a question we usually reserve for humans: What were you thinking? As AI systems become more powerful, we'll want to know not just that they work, but how. We'll need to identify hidden goals, trace unintended behavior, audit systems for signs of deception or drift. Right now, the tools are crude. The authors of the paper admit that their methods often fail. But they also provide something new: a roadmap for how we might one day truly understand the inner life of our machines. Near the end of their paper, the authors quote themselves: 'Interpretability is ultimately a human project.' What they mean is that no matter how sophisticated the methods become, the task of making sense of these models will always fall to us. To our intuition, our stories, our capacity for metaphor. Claude may not be human. But to understand it, we may need to become better biologists of the mind—our own, and those of machines.
Yahoo
27-02-2025
- Business
- Yahoo
Inventor of Diffusion Technology Underlying Sora and Midjourney Launches Inception to Bring Advanced Reasoning AI Everywhere, from Wearables to Data Centers
Breakthrough reduces AI computational costs by 90%, enabling widespread deployment of advanced reasoning AI from the cloud to personal devices. Built by top AI researchers from Stanford, UCLA, and Cornell who developed foundational ML technologies including the algorithms that power Midjourney and Sora. PALO ALTO, Calif., February 26, 2025--(BUSINESS WIRE)--Inception today introduced the first-ever commercial-scale diffusion-based large language models (dLLMs), a new approach to AI that significantly improves models' speed, efficiency, and capabilities. Stemming from research at Stanford, Inception's dLLMs achieve up to 10x faster inference speeds and 10x lower inference costs while unlocking advanced capabilities in reasoning, controllable generation, and multi-modal data analysis. Inception's technology enables enterprises to deploy intelligent agents and real-time decision-making systems at scale, setting a new standard for AI performance. Artificial Analysis, an independent AI measurement firm, has benchmarked Inception's dLLMs at speeds 10x faster than leading speed-optimized models like GPT-4o mini and Claude 3.5 Haiku. Indeed, Inception's models achieve speeds previously attainable only with specialized hardware. On Copilot Arena, an LLM performance leaderboard, developers rate Inception's model ahead of frontier closed-source models including GPT-4o. Unlike traditional models that generate text sequentially, Inception's diffusion-based approach—the same technology behind today's most advanced AI systems like Midjourney for images and OpenAI's Sora for video generation—simultaneously generates entire blocks of text. Think of it like watching an image gradually sharpen into detail rather than appearing one pixel at a time. This parallel processing enables faster, more efficient generation and more precise control over output quality. The efficiency of diffusion models opens up possibilities for advanced reasoning, which currently requires minutes of computational "thought." It can power agentic applications in fields ranging from code generation to customer support by enabling agents that can plan and iterate while maintaining a responsive user experience. Advanced reasoning models can now deliver answers on the spot, unlocking their full potential for developers and enterprises alike. Similarly, Inception's speed transforms code auto-complete tools, eliminating frustrating delays and making them seamless and intuitive. The efficiency of diffusion models means that they run quickly even on edge computing devices, bringing AI from data centers to consumer devices. "AI today is limited because the core algorithm underlying generation is very inefficient, which makes scaling the most powerful models to real-world applications challenging," says Inception CEO and Stanford Professor Stefano Ermon. "Just as Deepseek identified ways of reducing the costs of model training, we have developed approaches to make model inference vastly more efficient and accessible." dLLMs' benefits are not limited to speed and cost savings. Inception's roadmap includes launching models with several other technological advantages provided by diffusion modeling: dLLMs can provide advanced reasoning capabilities by leveraging their built-in error correction mechanisms to fix mistakes and hallucinations. dLLMs can provide a unified framework for processing multimodal data, making them more performant on multimodal tasks. dLLMs can deliver control over output structure, making them ideal for function calling and structured data generation. Inception was founded by professors from Stanford, UCLA, and Cornell—pioneers in diffusion modeling and cornerstone AI technologies, including flash attention, decision transformers, and direct preference optimization. The company's engineering team includes veterans from DeepMind, Microsoft, Meta, OpenAI, and NVIDIA. The company is recruiting researchers and engineers with experience in LLM optimization and deployment. Explore career opportunities at Inception's dLLMs are now available for hands-on exploration. Access Inception's first models at this playground. Also, sign up to get early access to upcoming model releases. For enterprises looking to integrate Inception's technology, its dLLMs are available via an API and through on-premise deployment. Fine-tuning support is provided. Contact the company at sales@ to explore partnership opportunities and bring the next generation of AI to your applications. Visit to get started. About Inception Inception is pioneering diffusion-based large language models (dLLMs) that enable faster, more efficient, and more capable AI systems for enterprise applications. View source version on Contacts Press Contact: VSC, on behalf of Inceptionnatalieb@ Sign in to access your portfolio