Latest news with #Deepgram


Business Wire
16-06-2025
- Business
- Business Wire
Deepgram Launches Voice Agent API: World's Only Enterprise-Ready, Real-Time, and Cost-Effective Conversational AI API
SAN FRANCISCO--(BUSINESS WIRE)-- Deepgram, the leading voice AI platform for enterprise use cases, today announced the general availability (GA) of its Voice Agent API, a single, unified voice-to-voice interface that gives developers full control to build context-aware voice agents that power natural, responsive conversations. Combining speech-to-text, text-to-speech, and large language model (LLM) orchestration with contextualized conversational logic into a unified architecture, the Voice Agent API gives developers the choice of using Deepgram's fully integrated stack (leveraging industry-leading Nova-3 STT and Aura-2 TTS models) or bringing their own LLM and TTS models. It delivers the simplicity developers love and the controllability enterprises need to deploy real-time, intelligent voice agents at scale. Today, companies like Aircall, Jack in the Box, StreamIt, and OpenPhone are building voice agents with Deepgram to save costs, reduce wait times, and increase customer loyalty. Voice Agent API Industry's Only Offering That Delivers the Single, Real-Time API Experience Developers Love, Combined with Full Controllability Enterprises Need. Share In today's market, teams building voice agents are often forced to choose between two extremes: rigid, low-code platforms that lack customization, or DIY toolchains that require stitching together STT, TTS, and LLMs with significant engineering effort. Deepgram's Voice Agent API eliminates this tradeoff by providing a unified API that simplifies development without sacrificing control. Developers can build faster with less complexity, while enterprises retain full control over orchestration, deployment, and model behavior, without compromising on performance or reliability. 'The future of customer engagement is voice-first,' said Scott Stephenson, CEO of Deepgram. 'But most voice systems today are rigid, fragmented, or too slow. With our Voice Agent API, we're giving developers a powerful yet simple interface to build conversational agents that feel natural, respond instantly, and scale across use cases without compromise.' 'We believe the future of customer communication is intelligent, seamless, and deeply human—and that's the vision behind Aircall's AI Voice Agent,' said Scott Chancellor, Chief Executive Officer of Aircall. 'To bring it to life, we needed a partner who could match our ambition, and Deepgram delivered. Their advanced Voice Agent API enabled us to build fast without compromising accuracy or reliability. From managing mid-sentence interruptions to enabling natural, human-like conversations, their service performed with precision. Just as importantly, their collaborative approach helped us iterate quickly and push the boundaries of what voice intelligence can deliver in modern business communications.' 'We believe that integrating AI voice agents will be one of the most impactful initiatives for our business operations over the next five years, driving unparalleled efficiency and elevating the quality of our service,' said Doug Cook, CTO of Jack in the Box. 'Deepgram is a leader in the industry and will be a strategic partner as we embark on this transformative journey.' Developer Simplicity and Faster Time to Market For teams taking the DIY route, the challenge isn't just connecting models but also building and operating the entire runtime layer that makes real-time conversations work. Teams must manage live audio streaming, accurately detect when a user has finished speaking, coordinate model responses, handle mid-sentence interruptions, and maintain a natural conversational cadence. While some platforms offer partial orchestration features, most APIs do not provide a fully integrated runtime. As a result, developers are often left to manage streaming, session state, and coordination logic across fragmented services, which adds complexity and delays time to production. Deepgram's Voice Agent API removes this burden by providing a single, unified API that integrates speech-to-text, LLM reasoning, and text-to-speech with built-in support for real-time conversational dynamics. Capabilities such as barge-in handling and turn-taking prediction are model-driven and managed natively within the platform. This eliminates the need to stitch together multiple vendors or maintain custom orchestration, enabling faster prototyping, reduced complexity, and more time focused on building high-quality experiences. In addition to the Voice Agent API, organizations seeking broader integrations can leverage Deepgram's extensive partner ecosystem, including Twilio and others, to access comprehensive conversational AI solutions and services powered by Deepgram APIs. Maximum Control and Flexibility While the Voice Agent API streamlines development, it also gives teams deep control over performance, behavior, and scalability in production. Built on Deepgram's Enterprise Runtime and full model ownership across the entire voice AI stack, the platform enables model-level optimization at every layer of the interaction loop. This allows for precise tuning of latency, barge-in handling, turn-taking, and domain-specific behavior in ways not possible with disconnected components. Key capabilities include: Flexible Deployment: Run the complete voice stack in cloud, VPC, or on-prem environments to meet enterprise requirements for security, compliance, and performance. Runtime-Level Orchestration: Deepgram's runtime supports mid-session control, real-time prompt updates, model switching, and event-driven signaling to adapt agent behavior dynamically. Bring-Your-Own Models: Teams can integrate their own LLMs or TTS systems while retaining Deepgram's orchestration, streaming pipeline, and real-time responsiveness. 'Deepgram gives us the flexibility to bring our own models, voices, and customize behavior while controlling how we build and orchestrate our voice agents,' said Harshal Jethwa, Engineering Manager at OpenPhone. 'Their system seamlessly handles the complexity of real-time voice coordination, letting us focus on creating exactly the experience we want.' This tightly coordinated design translates directly into measurable performance gains. In recent benchmark testing using the Voice Agent Quality Index (VAQI), Deepgram achieved the highest overall score among all evaluated providers (see Figure 1). VAQI is a composite benchmark that measures the core elements of voice agent quality: latency (how quickly the agent responds), interruption rate (how often it cuts users off), and response coverage (how often it misses valid input). Deepgram outperformed OpenAI by 6.4% and ElevenLabs by 29.3%, reflecting the advantage of its integrated architecture and model-driven turn-taking. The result is smooth, responsive conversations without missed inputs, premature responses, or unnatural delays. Cost-Effectiveness at Scale In addition to control and performance, the Voice Agent API is built for cost efficiency across large-scale deployments. When teams run entirely on Deepgram's vertically integrated stack, pricing is fully consolidated at a flat rate of $4.50 per hour (see Figure 2). This provides predictable, all-in-one billing that simplifies planning and scales with usage. Deepgram's vertically integrated runtime also delivers unmatched compute efficiency, optimizing every stage of the speech pipeline to minimize infrastructure costs while maintaining real-time responsiveness. For teams that bring their own LLM or TTS models, Deepgram offers built-in rate reductions, enabling even lower total cost of ownership for production-scale deployments. 'Deepgram's Voice Agent API stands out for its technical prowess, affordability, and flexibility, making it the smart bet for customer service voice AI,' said Bill French, Senior Solutions Engineer at StreamIt. Start Building with the Voice Agent API Experience how fast and flexible voice agents can be with Deepgram's unified voice-to-voice API. Explore the API in our interactive playground, review documentation, or integrate in minutes using our SDK. New users receive $200 in free credits, enough to process over 40 hours of real-time voice agent usage. Start building natural, responsive conversations with infrastructure built for real-time performance and enterprise-scale. Additional Resources: About Deepgram Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities–all powered by our enterprise-grade runtime. 200,000+ developers build with Deepgram's voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, visit read our developer docs, or follow @DeepgramAI on X and LinkedIn.


Economic Times
11-05-2025
- Business
- Economic Times
OpenAI leads surge in business AI adoption, Ramp AI Index reveals
Reuters The OpenAI logo, a keyboard, and robot hands are seen in this illustration taken January 27, 2025. REUTERS/Dado Ruvic/Illustration OpenAI is at the forefront of enterprise AI adoption, topping the Ramp AI Index by acquiring customers faster than any other provider on American fintech company Ramp's platform. Chinese AI company Manus AI follows closely in second Ramp AI Index, which tracks real corporate spending on AI tools and services from over 30,000 US businesses, highlights a significant uptick in enterprise AI usage. The data is compiled monthly using actual transactions from Ramp's corporate card and bill payment platform, offering a tangible measurement of how businesses are embracing artificial intelligence. While foundational model providers continue to dominate, the report shows a notable rise in the adoption of specialised AI tools tailored to specific enterprise needs. Specialised AI: The next big game One standout example is Turbopuffer, an internal data search engine that leverages vector search to handle billions of entries efficiently. Its speed and precision make it popular among technical teams seeking scalable AI infrastructure. Other rapidly growing AI vendors include: Jasper, which provides AI-powered writing tools for marketers. Deepgram, a speech recognition platform for voice transcription. Snowflake, whose Cortex suite enables businesses to integrate large language models and semantic functions directly into SQL workflows, empowering data teams without requiring system overhauls. Enterprise adoption accelerates ET had earlier reported that larger companies—with annual revenues of at least $500 million—are adopting AI more quickly than smaller organisations. Ramp's latest data supports this trend and further reveals that smaller, specialised AI vendors are seeing impressive gains. Several new entrants climbed into the top ranks for AI-related spending in May, underscoring a shift beyond the dominance of big foundational model new customer count, OpenAI, Cursor, Canva, LinkedIn and GoDaddy lead the charts whereas Maxon Computer, JasperAI and are next in line after Manus AI in terms of largest percentage change in customer count. A recent survey found that one in three tech professionals in India is currently undergoing formal AI training via their employers—highlighting the growing demand for AI-related skills. Ramp also noted that actual AI adoption may be higher than reported, as many businesses use free tools or rely on employees' personal accounts—factors not captured in transaction-based data. Global AI market outlook The global enterprise AI market was valued at $23.95 billion in 2024 and is expected to grow at a compound annual growth rate (CAGR) of 37.6% from 2025 to 2030. However, in India, AI adoption is still maturing. According to Krishna Vij from TeamLease Digital, a talent gap of nearly 50% persists. While India has around 4.2 lakh AI professionals, the estimated need is closer to six lakh. Competition from China Despite restrictions on AI chip exports from the US, China has become the second-largest producer of AI models across text, image, video, and audio domains. As of early 2024, 36% of the 1,328 large language models (LLMs) globally originated in China, second only to the US. In a further push, the Chinese government and private investors have launched a new AI fund worth 60 billion yuan (approximately $8.2 billion).Major developments include: Alibaba's Qwen Series, DeepSeek's R1, Tencent's Hunyuan Turbo S and Manus AI. Manus AI, which has made notable strides toward AI autonomy, can execute complex multi-step workflows and access reliable data via APIs. It has achieved state-of-the-art (SOTA) performance across three difficulty levels. While the US continues to lead AI model development—producing 40 significant models in 2024—China is rapidly closing the gap. The latest Artificial Intelligence Index Report signals a transformative shift in the global AI landscape, as China accelerates its capabilities and investments.
Yahoo
18-02-2025
- Business
- Yahoo
Deepgram Achieves Key Milestone on Path to Delivering Next-Gen, Enterprise-Grade Speech-to-Speech Architecture
Pioneering Achievement Delivers Speech-To-Speech Technology Without Intermediate Text Representations, Setting the Stage for Fully Fluid, Human-Like Enterprise Voice AI Applications SAN FRANCISCO, February 18, 2025--(BUSINESS WIRE)--Deepgram, the leader in enterprise-grade speech AI, today announced a significant technical achievement in speech-to-speech (STS) technology for enterprise use cases. The company has successfully developed a speech-to-speech model that operates without relying on text conversion at any stage, marking a pivotal step toward the development of contextualized end-to-end speech AI systems. This milestone will enable fully natural and responsive voice interactions that preserve nuances, intonation, and emotional tone throughout real-time communication. When fully operationalized, this architecture will be delivered to customers via a simple upgrade from our existing industry-leading architecture. By adopting this technology alongside Deepgram's full-featured voice AI platform, companies will gain a strategic advantage, positioning themselves to deliver cutting-edge, scalable voice AI solutions that evolve with the market and outpace competitors. Advancements Over Existing Architectures Existing speech-to-speech (STS) systems are based on architectures that process speech through sequential stages, such as speech-to-text, text-to-text, and text-to-speech. These architectures have become the standard for production deployments for their modularity and maturity, but eliminating text as an intermediary offers opportunities to improve latency and better preserve emotional and contextual nuances. Meanwhile, multimodal LLMs like Gemini, GPT-4o, and Llama have evolved beyond text-only capabilities to accept additional inputs such as images, videos, and audio. However, despite these advancements, they struggle to capture the fluidity and nuance of human-like conversation. These models still rely on a turn-based framework, where audio input is tokenized and processed within a textual domain, restricting real-time interactivity and expressiveness. To advance the frontier of speech AI, Deepgram is setting the stage for end-to-end STS models, which offer a more direct approach by converting speech to speech without relying on text. Recent research on speech-to-speech models, such as Hertz and Moshi, has highlighted the significant challenges in developing models that are robust and reliable enough for enterprise use cases. These difficulties stem from the inherent complexities of modeling conversational speech and the substantial computational resources required. Overcoming these hurdles demands innovations in data collection, model architecture, and training methodologies. Delivering Speech-to-Speech with Latent Space Embeddings Deepgram is transforming speech-to-speech modeling with a new architecture that fuses the latent spaces of specialized components, eliminating the need for text conversion between them. By embedding speech directly into a latent space, Deepgram ensures that important characteristics such as intonation, pacing, and situational and emotional context are preserved throughout the entire processing pipeline. What sets Deepgram apart is its approach to fusing the hidden states—the internal representations that capture meaning, context, and structure—of each individual function: Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS). This fusion is the first step toward training a controllable single, true end-to-end speech model, enabling seamless processing while retaining the strengths of each best-in-class component. This breakthrough has significant implications for enterprise applications, facilitating more natural conversations while maintaining the control and reliability businesses require. "This achievement represents a fundamental shift in how AI systems can process and respond to human speech," said Scott Stephenson, CEO and Co-founder of Deepgram. "By eliminating text as an intermediate step, we're preserving crucial elements of communication and maintaining the precise control that enterprises need for mission-critical applications." This technical advancement builds on Deepgram's expertise in enterprise speech AI, with over 200,000 developers using its platform, more than 50,000 years of audio processed, and over 1 trillion words transcribed. Key benefits of the new architecture include: Optimized latency design for faster, more responsive interactions Enhanced naturalness, preserving emotional context and conversational nuances Native ability to handle complex, multi-turn conversations Unified, end-to-end training across the entire model, creating a more cohesive and inherently adaptive system that fine-tunes its understanding and response generation directly in the audio space. Utilizing Transfer Learning for Cost-Efficient, High-Accuracy Speech-to-Speech Deepgram's research in the space is accelerated by its use of transfer learning and best-in-class pre-trained models, allowing it to achieve high accuracy with significantly less training data than traditional methods. Without latent techniques, training a model at the scale needed for speech-to-speech would require over 80 billion hours of audio—more than humanity has ever recorded. However, Deepgram's latent space embeddings and transfer learning approach achieve superior comprehension while significantly reducing costs, maintaining interpretability, and accelerating enterprise deployment. This efficiency enables Deepgram to deliver scalable, end-to-end speech AI that meets the demands of real-world voice applications. Empowering Developers with Full Debuggability One of the requirements in enterprise speech-to-speech modeling is the ability to understand and troubleshoot each step of the process. This is particularly challenging when text conversion between steps isn't involved, as verifying both the accuracy of the initial perception and the alignment of the spoken output with the intended response is not straightforward. Deepgram recognized this need and addressed it by designing a new architecture that enables debuggability throughout the entire process. This architecture allows developers to inspect and understand how the system processes spoken dialogue. The design incorporates speech modeling of perception, natural language understanding/generation, and speech production, preserving distinct capabilities during training. Through the ability to decode intermediate representations back to text at specific points, developers can gain insight into what the model perceives, thinks, and generates, ensuring its internal representation aligns with the model output and stays true to the intent of the business user, addressing hallucination concern in scaled business use cases. This capability allows the user to peer into each step throughout generation, helping refine models, improve performance, and deliver more accurate, lifelike, and reliable speech-to-speech solutions. Beyond Speech-to-Speech – A Complete, Enterprise-Ready Voice AI Stack While building an advanced speech-to-speech (STS) model is a major technical achievement, enterprises need more than just a model—they need a complete, scalable platform that ensures seamless deployment, adaptability, and cost efficiency. Deepgram delivers not just cutting-edge STS technology, but an enterprise-ready infrastructure designed for real-world applications. Seamless Integration & Continuous Improvement – Once Deepgram's end-to-end STS model moves to production, businesses will be able to adopt this breakthrough directly through our developer-friendly voice agent API from within the current Deepgram platform. Through continued innovation, enterprises will benefit from the latest advancements, ensuring seamless integration and a future-proof platform for their voice AI applications. Enterprise-Grade Performance & Cost Efficiency – Built for low customer COGS, our platform enables enterprises to deploy high-performance voice AI without excessive costs. This ensures scalability, whether for customer service automation, real-time voice agents, or multilingual applications. Full-Featured Platform and High-Performance Runtime – Deepgram's platform includes powerful capabilities such as: Adaptability - Dynamically fine-tune models for specific industry language, ensuring high accuracy across diverse applications without needing constant retraining. Automation - Streamline transcription, model updates, and data processing, reducing overhead and accelerating deployment. Synthetic data generation - Generate synthetic voice data to improve model training, even with limited real-world data, enhancing accuracy for niche use cases. Data curation - Clean, manage, and organize training data to ensure high-quality, relevant input, improving model performance. Model hot-swapping - Seamlessly switch between different models to optimize performance for specific tasks. Integrations - Effortlessly integrate Deepgram's voice AI with cloud platforms, enterprise systems, and third-party applications, embedding it within existing workflows. With Deepgram, enterprises don't just get speech-to-speech—they get the most advanced, enterprise-ready voice AI platform, designed for real-world deployment and long-term innovation. For more information about Deepgram's novel approach for speech-to-speech, read the technical brief. To learn more about Deepgram's suite of voice AI infrastructure, visit Additional Resources: Explore the technical brief on Deepgram's novel speech-to-speech architecture Watch a fun demo of Deepgram's voice agent API Try Deepgram's interactive demo Get $200 in free credits and try Deepgram for yourself About Deepgram Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities. 200,000+ developers build with Deepgram's voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, visit read our developer docs, or follow @DeepgramAI on X and LinkedIn. View source version on Contacts PR Contact: Nicole GormanGorman Communications, for DeepgramM: Sign in to access your portfolio