Latest news with #DeepSeekV3

Optimizing AI apps in a million-token world

Fast Company

04-06-2025

Business
Fast Company

Optimizing AI apps in a million-token world

The context size problem in large language models is nearly solved. In recent months, models like GPT-4.1, LLaMA 4, and DeepSeek V3 have reached context windows ranging from hundreds of thousands to millions of tokens. We're entering a phase where entire documents, threads, and histories can fit into a single prompt. It marks real progress—but it also brings new questions about how we structure, pass, and prioritize information. WHAT IS CONTEXT SIZE (AND WHY WAS IT A CHALLENGE)? Context size defines how much text a model can process in one go, and is measured in tokens, which are small chunks of text, like words or parts of words. It shaped the way we worked with LLMs: splitting documents, engineering recursive prompts, summarizing inputs—anything to avoid truncation. Now, models like LLaMA 4 Scout can handle up to 10 million tokens, and DeepSeek V3 and GPT-4.1 go beyond 100K and 1M respectively. With those capabilities, many of those older workarounds can be rethought or even removed. FROM BOTTLENECK TO CAPABILITY This progress unlocks new interaction patterns. We're seeing applications that can reason and navigate across entire contracts, full Slack threads, or complex research papers. These use cases were out of reach not long ago. However, just because models can read more does not mean they automatically make better use of that data. The paper ' Why Does the Effective Context Length of LLMs Fall Short? ' examines this gap. It shows that LLMs often attend to only part of the input, especially the more recent or emphasized sections, even when the prompt is long. Another study, ' Explaining Context Length Scaling and Bounds for Language Models,' explores why increasing the window size does not always lead to better reasoning. Both pieces suggest that the problem has shifted from managing how much context a model can take to guiding how it uses that context effectively. Think of it this way: Just because you can read every book ever written about World War I doesn't mean you truly understand it. You might scan thousands of pages, but still fail to retain the key facts, connect the events, or explain the causes and consequences with clarity. What we pass to the model, how we organize it, and how we guide its attention are now central to performance. These are the new levers of optimization. CONTEXT WINDOW ≠ TRAINING TOKENS A model's ability to accept a large context does not guarantee that it has been trained to handle it well. Some models were exposed only to shorter sequences during training. That means even if they accept 1M tokens, they may not make meaningful use of all that input. This gap affects reliability. A model might slow down, hallucinate, or misinterpret input if overwhelmed with too much or poorly organized data. Developers need to verify if the model was fine tuned for long contexts, or simply adapted to accept them. WHAT CHANGES FOR ENGINEERS With these new capabilities, developers can move past earlier limitations. Manual chunking, token trimming, and aggressive summarization become less critical. But this does not remove the need for data prioritization. Prompt compression, token pruning, and retrieval pipelines remain relevant. Techniques like prompt caching help reuse portions of prompts to save costs. Mixture-of-experts (MoE) models, like those used in LLaMA 4 and DeepSeek V3, optimize compute by activating only relevant components. Engineers also need to track what parts of a prompt the model actually uses. Output quality alone does not guarantee effective context usage. Monitoring token relevance, attention distribution, and consistency over long prompts are new challenges that go beyond latency and throughput. IT IS ALSO A PRODUCT AND UX ISSUE For end users, the shift to larger contexts introduces more freedom—and more ways to misuse the system. Many users drop long threads, reports, or chat logs into a prompt and expect perfect answers. They often do not realize that more data can sometimes cloud the model's reasoning. Product design must help users focus. Interfaces should clarify what is helpful to include and what is not. This might mean offering previews of token usage, suggestions to refine inputs, or warnings when the prompt is too broad. Prompt design is no longer just a backend task, but rather part of the user journey. THE ROAD AHEAD: STRUCTURE OVER SIZE Larger context windows open important doors. We can now build systems that follow extended narratives, compare multiple documents, or process timelines that were previously out of reach. But clarity still matters more than capacity. Models need structure to interpret, not just volume to consume. This changes how we design systems, how we shape user input, and how we evaluate performance. The goal is not to give the model everything. It is to give it the right things, in the right order, with the right signals. That is the foundation of the next phase of progress in AI systems.

Yahoo

04-06-2025

Business
Yahoo

DeepSeek 再被懷疑用 Google Gemini 訓練新版 R1 模型

DeepSeek 以低成本訓練出足夠強效的推理 AI 模型，曾經震驚業界，甚至是政界。DeepSeek 最新推出的 R1-0528 模型主打更強數理和編程表現，不過他們的訓練數據卻未曾公開，AI 業界又再一次懷疑 DeepSeek 是透過蒸餾其他 AI 模型而開發新版本。其中一個支持這論點的是澳洲開發者 Sam Paech，他在 X 上發文指出R1-0528 模型的語言風格與 Google Gemini 2.5 Pro 極為相似。他認為 DeepSeek 已經從以往基於 OpenAI 的數據切換至 Gemini 的合成數據。另一位開發者 SpeechMap 則發現，R1 模型生成的'推理痕跡'（AI 在得出結論時的思維過程）也與 Gemini 模型極為相似。 If you're wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs. — Sam Paech (@sam_paech) May 29, 2025 另一邊廂非牟利 AI 研究機構 AI2 的 AI 專家 Nathan Lambert 更發文指 DeepSeek 在缺乏 GPU 和鉅額資金的支持下，也一定會透過市場最佳的模型 API 來蒸餾數據，這次就是 Gemini。 2024 年時，OpenAI 透過金融時報發聲，指他們獲得證據指 DeepSeek V3 是透過蒸餾 ChatGPT 的數據來訓練而成，後來 Bloomberg 也報道指主要金主 Microsoft 偵測到在 2024 年年底，有大量資料經過 OpenAI 開發者帳戶外洩，他們相信是與 DeepSeek 有關。為防止競爭對手利用其模型數據，AI 公司正加強安全措施。例如，OpenAI 現在要求用戶完成身份驗證才能訪問高級模型，而 Google 則開始對 Gemini 模型生成的'推理痕跡'進行摘要處理，讓競爭對手更難以利用其數據。更多內容： DeepSeek may have used Google's Gemini to train its latest model DeepSeek 懶人包｜中國AI新創如何影響美國AI巨企？一文整理歷史、最新影響及未來中國 DeepSeek AI 模型自稱 GPT-4，「AI 天材」是抄襲還是幻想？ DeepSeek 反客為主！連百度搜尋都已確定引入緊貼最新科技資訊、網購優惠，追隨 Yahoo Tech 各大社交平台！ 🎉📱 Tech Facebook： 🎉📱 Tech Instagram： 🎉📱 Tech WhatsApp 社群： 🎉📱 Tech WhatsApp 頻道： 🎉📱 Tech Telegram 頻道：

Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

Miami Herald

28-05-2025

Business
Miami Herald

Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

Developed with SGLang, Atlas Inference surpasses leading AI companies in throughput and cost, running DeepSeek V3 & R1 faster than DeepSeek themselves. NEW YORK CITY, NEW YORK / ACCESS Newswire / May 28, 2025 / Atlas Cloud, the all-in-one AI competency center for training and deploying AI models, today announced the launch of Atlas Inference, an AI inference platform that dramatically reduces GPU and server requirements, enabling faster, more cost-effective deployment of large language models (LLMs). Atlas Inference, co-developed with SGLang, an AI inference engine, maximizes GPU efficiency by processing more tokens faster and with less hardware. When comparing DeepSeek's published performance results, Atlas Inference's 12-node H100 cluster outperformed DeepSeek's reference implementation of their DeepSeek-V3 model while using two-thirds of the servers. Atlas' platform reduces infrastructure requirements and operational costs while addressing hardware costs, which represent up to 80% of AI operational expenses. "We built Atlas Inference to fundamentally break down the economics of AI deployment," said Jerry Tang, Atlas CEO. "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable instead of merely break-even. I believe this will have a significant ripple effect throughout the industry. Simply put, we're surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources." Atlas Inference's performance also exceeds major players like Amazon, NVIDIA and Microsoft, delivering up to 2.1 times greater throughput using 12 nodes compared to competitors' larger setups. It maintains sub-5-second first-token latency and 100-millisecond inter-token latency with more than 10,000 concurrent sessions, ensuring a scaled, superior experience. The platform's performance is driven by four key innovations: Prefill/Decode Disaggregation: Separates compute-intensive operations from memory-bound processes to optimize efficiencyDeepExpert (DeepEP) Parallelism with Load Balancers: Ensures over 90% GPU utilizationTwo-Batch OverlapTechnology: Increases throughput by enabling larger batches and utilization of both compute and communication phases simultaneouslyDisposableTensor Memory Models: Prevents crashes during long sequences for reliable operation "This platform represents a significant leap forward for AI inference," said Yineng Zhang, Core Developer at SGLang. "What we built here may become the new standard for GPU utilization and latency management. We believe this will unlock capabilities previously out of reach for the majority of the industry regarding throughput and efficiency." Combined with a lower cost per token, linear scaling behavior, and reduced emissions compared to leading vendors, Atlas Inference provides a cost-efficient and scalable AI deployment. Atlas Inference works with standard hardware and supports custom models, giving customers complete flexibility. Teams can upload fine-tuned models and keep them isolated on dedicated GPUs, making the platform ideal for organizations requiring brand-specific voice or domain expertise. The platform is available immediately for enterprise customers and early-stage startups. About Atlas Cloud Atlas Cloud is your all-in-one AI competency center, powering leading AI teams with safe, simple, and scalable infrastructure for training and deploying models. Atlas Cloud also offers an on-demand GPU platform that delivers fast, serverless compute. Backed by Dell, HPE, and Supermicro, Atlas delivers near instant access to up to 5,000 GPUs across a global SuperCloud fabric with 99% uptime and baked-in compliance. Learn more at SOURCE: Atlas Cloud press release

Mistral announces new AI model Medium 3 at 8x lower cost

Indian Express

08-05-2025

Business
Indian Express

Mistral announces new AI model Medium 3 at 8x lower cost

French AI startup Mistral has introduced a frontier-level AI model, Mistral Medium 3. The new model from the Paris-based AI company is said to have outperformed models like Claude Sonnet 3.7 and GPT-4o on numerous benchmarks. The new model reportedly costs less than DeepSeek V3. The company has said that organisations can use the new model through its new AI assistant called Le Chat Enterprise that features an agent builder and allows full integration with a variety of apps. Mistral has also teased a more powerful model which will be introduced in the coming weeks. Mistral Medium 3 is said to be pushing efficiency and usability of language models even further. Mistral claims that the new Medium 3 brings a new class of models that balances state-of-the-art performance, is 8x lower in cost, and offers simple deployability to accelerate enterprise usage. The model also leads in professional use cases like coding and multimodal understanding. When it comes to enterprise capabilities, Medium 3 offers hybrid or on-premises in-VPC deployment, custom post-training, and allows integration into enterprise tools and systems. According to the company, the model performs at or above 90 per cent of Claude Sonnet 3.7 on benchmarks across the board at a considerably lower cost – $0.4 input/$2 output per M token. Medium 3 has also surpassed models such as Llama 4 Maverick and enterprise models like Cohere Command A. When it comes to pricing in terms of API and self-deployed systems, the model beats DeepSeek V3. It can also be deployed on any cloud, including self-hosted environments of four GPUs and above. The company claims the model is designed to be frontier-class, particularly in categories of professional use. When it comes to benchmarks, Mistral Medium 3 delivers top performance in instruction following (ArenaHard: 97.1%) and math (Math500: 91%), with strong results in long context tasks (RULER 32K: 96%). In terms of human evaluations, Medium 3 outperforms competitors, especially in coding. The model beats Claude Sonnet 3.7, DeepSeek 3.1, and GPT-4o in several cases.

DeepSeek's upgraded foundational model excels in coding and maths

South China Morning Post

25-03-2025

Business
South China Morning Post

DeepSeek's upgraded foundational model excels in coding and maths

Chinese artificial intelligence (AI) star DeepSeek has upgraded its open-source V3 large language model by adding parameters and improving capabilities in coding and solving mathematical problems. Advertisement The DeepSeek-V3-0324, named after its predecessor and the launch date, has 'enhanced reasoning capabilities, optimised front-end web development and upgraded Chinese writing proficiency', according to a notice on the company's website. The new version and DeepSeek V3 are both foundation models trained on vast data sets that can be applied in different use cases, including that of a chatbot. DeepSeek R1, the reasoning model, is based on DeepSeek V3. The updated foundation model has made improvements in several benchmarks, especially the American Invitational Mathematics Examination (AIME), where it scored 59.4 compared with 39.6 for its predecessor, while achieving an increase of 10 points on LiveCodeBench to achieve 49.2, DeepSeek data showed. This illustration photograph taken on January 29, 2025 shows screens displaying the logos of DeepSeek and OpenAI's AI chatbot ChatGPT. Photo: AFP Compared with DeepSeek V3, which has 671 billion parameters and adopts the company's own commercial license, the new 685-billion-parameter model uses the MIT software licence that is the most popular on developer platform GitHub. Advertisement Launched on AI community Hugging Face as well as the company's own website, DeepSeek-V3-0324 is now the top trending model on Hugging Face, receiving positive comments on its performance.