News from the AI & ML world

DeeperML - #deepseekv3

Nehdiii@Towards AI //
DeepSeek AI has released its V3-0324 endpoint, offering AI developers access to a powerful 685 billion parameter model. This new endpoint boasts lightning-fast responses and a massive 128K context window, accessible via a simple API key. The model is available without rate limiting at a cost-effective price of $0.88 per 164K output tokens, making it an attractive option for developers seeking high performance at a reasonable price.

Lambda is offering DeepSeek V3-0324 live on its Inference API, providing developers with easy access to this powerful AI model. Towards AI has published a series of articles on DeepSeek V3, including a piece on auxiliary-loss-free load balancing. The DeepSeek V3-0324 highlights includes Major Boost in Reasoning performance, 685B total parameters using a Mixture-of-Experts (MoE) design, Stronger front-end development skills and Smarter tool-use capabilities.

However, DeepSeek faces competition from other AI companies, particularly China's Baidu. Baidu recently launched two new AI models, ERNIE X1 and ERNIE 4.5, aiming to compete in the global race for advanced AI. According to TheTechBasic, ERNIE X1 is designed to match DeepSeek R1 in performance but at half the price, while ERNIE 4.5 is capable of handling text, video, images, and audio with improved logic and memory skills. Baidu hopes these new models will help it regain ground against rivals.

Recommended read:
References :
  • lambda.ai: Lambda blog about DeepSeek V3-0324 Live on Lambda!
  • www.techrepublic.com: US Officials Claim DeepSeek AI App Is ‘Designed To Spy on Americans’
  • composio.dev: GPT-4.1 vs. Deepseek v3 vs. Sonnet 3.7 vs. GPT-4.5
  • The Tech Basic: China’s Baidu Fires Back at DeepSeek with Affordable Reasoning AI

@syncedreview.com //
DeepSeek AI is making waves in the large language model (LLM) field with its innovative approach to scaling inference and its commitment to open-source development. The company recently published a research paper detailing a new technique to enhance the scalability of general reward models (GRMs) during the inference phase. This new method allows GRMs to optimize reward generation by dynamically producing principles and critiques, achieved through rejection fine-tuning and rule-based online reinforcement learning. Simultaneously, DeepSeek AI has hinted at the imminent arrival of its next-generation model, R2, sparking considerable excitement within the AI community.

DeepSeek’s advancements come at a crucial time, as the focus in LLM scaling shifts from pre-training to post-training, especially the inference phase. The company's R1 series already demonstrated the potential of pure reinforcement learning in enhancing LLM reasoning capabilities. Reinforcement learning serves as a vital complement to the "next token prediction" mechanism of LLMs, providing them with an "Internal World Model." This enables LLMs to simulate different reasoning paths, evaluate their quality, and choose superior solutions, ultimately leading to more systematic long-term planning, the company, in collaboration with Tsinghua University, unveiled a new research study aimed at improving reward modelling in large language models by utilising more inference time compute. This research resulted in a model named DeepSeek-GRM, which the company asserts will be released as open source.

Further emphasizing its dedication to accessibility and collaboration, DeepSeek AI is planning to open-source its inference engine. DeepSeek AI’s dedication to open-sourcing key components and libraries of its models. The company is "collaborating closely" with existing open-source projects and frameworks to ensure seamless integration and widespread adoption. Additionally, DeepSeek released five high-performance AI infrastructure tools as open-source libraries during Open Source Week, enhancing the scalability, deployment, and efficiency of training large language models. DeepSeek’s efforts reflect a broader industry trend towards leveraging open-source initiatives to accelerate innovation and democratize access to advanced AI technologies.

Recommended read:
References :
  • analyticsindiamag.com: DeepSeek to Open Source its Inference Engine
  • Synced: DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT
  • Towards AI: DeepSeek-V3 Explained, Part 1: Understanding Multi-Head Latent Attention
  • Analytics India Magazine: DeepSeek to Open Source its Inference Engine
  • MarkTechPost: THUDM Releases GLM 4: A 32B Parameter Model Competing Head-to-Head with GPT-4o and DeepSeek-V3
  • Towards AI: DeepSeek-V3 Part 2: DeepSeekMoE

Jaime Hampton@AIwire //
References: AI News , Sify , AIwire ...
DeepSeek's innovative AI models are reshaping China's AI data center infrastructure, leading to a market disruption and potentially underutilized resources. The company's DeepSeek-V3 model has demonstrated performance that rivals ChatGPT but at a significantly reduced cost. This has altered the demand for extensive GPU clusters used in traditional AI training, shifting the focus towards hardware prioritizing low-latency, particularly near tech hubs. This has resulted in increased speculation as well as experienced players who are now posed with the challenge of the DeepSeek V3.

The open-source nature of DeepSeek’s model is also allowing smaller players to compete without the need for extensive pretraining, which is undermining the demand for large data centers. DeepSeek-V3, which runs at 20 tokens per second on a Mac Studio, poses a new challenge for existing AI models. Chinese AI startups are now riding DeepSeek's momentum and building an ecosystem that is revolutionizing the AI landscape. This narrows the technology divide between China and the United States.

Recommended read:
References :
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide
  • Sify: DeepSeek’s AI Revolution: Creating an Entire AI Ecosystem
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet
  • AIwire: Report: China’s Race to Build AI Datacenters Has Hit a Wall
  • Quinta?s weblog: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

Dashveenjit Kaur@AI News //
References: venturebeat.com , AI News , Nordic APIs ...
Chinese AI startup DeepSeek is shaking up the global technology landscape with its latest large language model, DeepSeek-V3-0324. This new model has been lauded for matching the performance of American AI models, while boasting significantly lower development costs. According to Lee Kai-fu, CEO of Chinese startup 01.AI, the gap between Chinese and American AI capabilities has narrowed dramatically, with China even ahead in some specific areas.

DeepSeek-V3-0324 features enhanced reasoning capabilities and improved performance in multiple benchmarks, particularly in mathematics. The model scored 59.4 on the American Invitational Mathematics Examination (AIME), a significant improvement over its predecessor. Häme University lecturer Kuittinen Petri noted DeepSeek's achievements were realized with just a fraction of the resources available to competitors like OpenAI. This breakthrough has been attributed to DeepSeek’s focus on algorithmic efficiency and novel approaches to model architecture, allowing them to overcome restrictions on access to the latest silicon.

This disruption is not going unnoticed, when DeepSeek launched its R1 model in January, America’s Nasdaq plunged 3.1%, while the S&P 500 fell 1.5%. While DeepSeek claimed a $5.6 million training cost, this represented only the marginal cost of the final training run. SemiAnalysis estimates DeepSeek's actual hardware investment at closer to $1.6 billion, with hundreds of millions in operating costs. The developments present opportunities and challenges for the.

Recommended read:
References :
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide
  • Sify: DeepSeek’s AI Revolution: Creating an Entire AI Ecosystem
  • Nordic APIs: ChatGPT vs. DeepSeek: A Side-by-Side Comparison
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet

Dashveenjit Kaur@AI News //
References: AI News , MarkTechPost , AI News ...
DeepSeek, a Chinese AI startup, is causing a stir in the AI industry with its new large language model, DeepSeek-V3-0324. Released with little fanfare on the Hugging Face AI repository, the 641-gigabyte model is freely available for commercial use under an MIT license. Early reports indicate it can run directly on consumer-grade hardware, such as Apple’s Mac Studio with the M3 Ultra chip, especially in a 4-bit quantized version that reduces the storage footprint to 352GB. This innovation challenges the previous notion that Silicon Valley held a chokehold on the AI industry.

China's focus on algorithmic efficiency over hardware superiority has allowed companies like DeepSeek to flourish despite restrictions on access to the latest silicon. DeepSeek's R1 model, launched earlier this year, already rivaled OpenAI's ChatGPT-4 at a fraction of the cost. Now the DeepSeek-V3-0324 features enhanced reasoning capabilities and improved performance. This has sparked a gold rush among Chinese tech startups, rewriting the playbook for AI development and allowing smaller companies to believe they have a shot in the market.

Recommended read:
References :
  • AI News: DeepSeek V3-0324 has become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index in a landmark achievement for open-source AI.
  • MarkTechPost: Artificial intelligence (AI) has made significant strides in recent years, yet challenges persist in achieving efficient, cost-effective, and high-performance models.
  • Quinta?s weblog: Chinese AI startup DeepSeek has quietly released a new large language model that’s already sending ripples through the artificial intelligence industry — not just for its capabilities, but for how it’s being deployed.
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide
  • Composio: Deepseek v3 o324, a new checkpoint, has been released by Deepseek in silence, with no marketing or hype, just a tweet and
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • Sify: DeepSeek’s AI Revolution: Creating an Entire AI Ecosystem
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet

Ryan Daws@AI News //
References: SiliconANGLE , venturebeat.com , AI News ...
DeepSeek, a Chinese AI company, has released DeepSeek V3-0324, an updated AI model that demonstrates impressive performance. The model is now running at 20 tokens per second on a Mac Studio. This model is said to contain 685 billion parameters and its cost-effectiveness challenges the dominance of American AI models, signaling that China continues to innovate in AI despite chip restrictions. Reports from early testers show improvements over previous versions and the model tops non-reasoning AI models in open-source first.

This new model runs on consumer-grade hardware, specifically Apple's Mac Studio with the M3 Ultra chip, diverging from the typical data center requirements for AI. It is freely available for commercial use under the MIT license. According to AI researcher Awni Hannun, the model runs at over 20 tokens per second on a 512GB M3 Ultra. The company has made no formal announcement, just an empty README file and the model weights themselves. This stands in contrast to the carefully orchestrated product launches by Western AI companies.

Recommended read:
References :
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: Chinese AI innovation is reshaping the global technology landscape, challenging assumptions about Western dominance in advanced computing. Recent developments from companies like DeepSeek illustrate how quickly China has adapted to and overcome international restrictions through creative approaches to AI development.
  • AI News: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
  • MarkTechPost: DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
  • Cloud Security Alliance: Cloud Security Alliance: DeepSeek: Behind the Hype and Headlines
  • Quinta?s weblog: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet

Ryan Daws@AI News //
References: venturebeat.com , AI News ,
DeepSeek, a Chinese AI startup, is making waves in the artificial intelligence industry with its DeepSeek-V3 model. This model is demonstrating performance that rivals Western AI models like those from OpenAI and Anthropic, but at significantly lower development costs. The release of DeepSeek-V3 is seen as jumpstarting AI development across China, with other startups and established companies releasing their own advanced models, further fueling competition. This has narrowed the technology gap between China and the United States as China has adapted to and overcome international restrictions through creative approaches to AI development.

One particularly notable aspect of DeepSeek-V3 is its ability to run efficiently on consumer-grade hardware, such as the Mac Studio with an M3 Ultra chip. Reports indicate that the model achieves speeds of over 20 tokens per second on this platform, making it a potential "nightmare for OpenAI". This contrasts sharply with the data center requirements typically associated with state-of-the-art AI models. The company's focus on algorithmic efficiency has allowed them to achieve notable gains despite restricted access to the latest silicon, showcasing that Chinese AI innovation has flourished by focusing on algorithmic efficiency and novel approaches to model architecture.

Recommended read:
References :
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide
  • GZERO Media: How DeepSeek changed China’s AI ambitions

Ryan Daws@AI News //
DeepSeek V3-0324, the latest large language model from Chinese AI startup DeepSeek, is making waves in the artificial intelligence industry. The model, quietly released with an MIT license for commercial use, has quickly become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index. This marks a significant milestone for open-source AI, surpassing proprietary counterparts like Google’s Gemini 2.0 Pro, Anthropic’s Claude 3.7 Sonnet, and Meta’s Llama 3.3 70B.

DeepSeek V3-0324's efficiency is particularly notable. Early reports indicate that it can run directly on consumer-grade hardware, specifically Apple’s Mac Studio with an M3 Ultra chip, achieving speeds of over 20 tokens per second. This capability is a major departure from the typical data center requirements associated with state-of-the-art AI. The updated version demonstrates substantial improvements in reasoning and benchmark performance, as well as enhanced Chinese writing proficiency and optimized translation quality.

Recommended read:
References :
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
  • Analytics Vidhya: DeepSeek V3-0324: Generated 700 Lines of Code without Breaking
  • Analytics India Magazine: The model outperformed all other non-reasoning models across several benchmarks but trailed behind DeepSeek-R1, OpenAI’s o1, o3-mini, and other reasoning models.
  • Cloud Security Alliance: DeepSeek: Behind the Hype and Headlines
  • techstrong.ai: DeepSeek Ups Ante (Again) in Duel with OpenAI, Anthropic
  • www.techradar.com: Deepseek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
  • Analytics Vidhya: DeepSeek V3-0324 vs Claude 3.7: Which is the Better Coder?
  • MarkTechPost: DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
  • www.zdnet.com: It's called V3-0324, but the real question is: Is it foreshadowing the upcoming launch of R2?
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • Composio: Deepseek v3 o324, a new checkpoint, has been released by Deepseek in silence, with no marketing or hype, just a tweet and The post appeared first on .
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet

Ryan Daws@AI News //
DeepSeek V3-0324 has emerged as a leading AI model, topping benchmarks for non-reasoning AI in an open-source breakthrough. This milestone signifies a significant advancement in the field, as it marks the first time an open weights model has achieved the top position among non-reasoning models. The model's performance surpasses proprietary counterparts and edges it closer to proprietary reasoning models, highlighting the growing viability of open-source solutions for latency-sensitive applications. DeepSeek V3-0324 represents a new era for open-source AI, offering a powerful and adaptable tool for developers and enterprises.

DeepSeek-V3 now runs at 20 tokens per second on Apple’s Mac Studio, presenting a challenge to OpenAI’s cloud-dependent business model. The 685-billion-parameter model, DeepSeek-V3-0324, is freely available for commercial use under the MIT license. This achievement, coupled with its cost efficiency and performance, signals a shift in the AI sector, where open-source frameworks increasingly compete with closed systems. Early testers report significant improvements over previous versions, positioning DeepSeek's new model above Claude Sonnet 3.5 from Anthropic.

Recommended read:
References :
  • Analytics India Magazine: The model outperformed all other non-reasoning models across several benchmarks but trailed behind DeepSeek-R1, OpenAI’s o1, o3-mini, and other reasoning models.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
  • Analytics Vidhya: DeepSeek V3-0324: Generated 700 Lines of Code without Breaking
  • Analytics Vidhya: DeepSeek V3-0324 vs Claude 3.7: Which is the Better Coder?
  • Cloud Security Alliance: Markets reacted dramatically, with Nvidia alone losing nearly $600 billion in value in a single day, part of a broader...
  • GZERO Media: Just a few short months ago, Silicon Valley seemed to have the artificial intelligence industry in a chokehold.
  • MarkTechPost: DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • techstrong.ai: DeepSeek Ups Ante (Again) in Duel with OpenAI, Anthropic
  • www.zdnet.com: DeepSeek V3 model gets a major upgrade
  • www.techradar.com: DeepSeek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
  • Composio: Deepseek v3 0324: Finally, the Sonnet 3.5 at Home
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide