Ben Dickson@AI News | VentureBeat
//
DeepSeek, a Chinese AI company, has achieved a breakthrough in AI reward modeling that promises to enhance the reasoning and responsiveness of AI systems. Collaborating with Tsinghua University researchers, DeepSeek developed a technique called "Inference-Time Scaling for Generalist Reward Modeling," demonstrating improved performance compared to existing methods and competitive results against established public reward models. This innovation aims to improve how AI systems learn from human preferences, a key factor in developing more useful and aligned artificial intelligence.
DeepSeek's new approach involves a dual method combining Generative Reward Modeling (GRM) and Self-Principled Critique Tuning (SPCT). GRM provides flexibility in handling various input types and enables scaling during inference time, offering a richer representation of rewards through language compared to previous scalar approaches. SPCT, a learning method, fosters scalable reward-generation behaviors in GRMs through online reinforcement learning. One of the paper's authors explained that this combination allows principles to be generated based on the input query and responses, adaptively aligning the reward generation process. The SPCT technique addresses challenges in creating generalist reward models capable of handling broader tasks. These challenges include input flexibility, accuracy, inference-time scalability, and learning scalable behaviors. By creating self-guiding critiques, SPCT promises more scalable intelligence for enterprise LLMs, particularly in open-ended tasks and domains where current models struggle. DeepSeek has also released models like DeepSeek-V3 and DeepSeek-R1, which have achieved performance close to, and sometimes exceeding, leading proprietary models while using fewer training resources. These advancements signal that cutting-edge AI is not solely the domain of closed labs and highlight the importance of efficient model architecture, training algorithms, and hardware integration. Recommended read:
References :
Ryan Daws@AI News
//
DeepSeek V3-0324, the latest large language model from Chinese AI startup DeepSeek, is making waves in the artificial intelligence industry. The model, quietly released with an MIT license for commercial use, has quickly become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index. This marks a significant milestone for open-source AI, surpassing proprietary counterparts like Google’s Gemini 2.0 Pro, Anthropic’s Claude 3.7 Sonnet, and Meta’s Llama 3.3 70B.
DeepSeek V3-0324's efficiency is particularly notable. Early reports indicate that it can run directly on consumer-grade hardware, specifically Apple’s Mac Studio with an M3 Ultra chip, achieving speeds of over 20 tokens per second. This capability is a major departure from the typical data center requirements associated with state-of-the-art AI. The updated version demonstrates substantial improvements in reasoning and benchmark performance, as well as enhanced Chinese writing proficiency and optimized translation quality. Recommended read:
References :
Ryan Daws@AI News
//
DeepSeek V3-0324 has emerged as a leading AI model, topping benchmarks for non-reasoning AI in an open-source breakthrough. This milestone signifies a significant advancement in the field, as it marks the first time an open weights model has achieved the top position among non-reasoning models. The model's performance surpasses proprietary counterparts and edges it closer to proprietary reasoning models, highlighting the growing viability of open-source solutions for latency-sensitive applications. DeepSeek V3-0324 represents a new era for open-source AI, offering a powerful and adaptable tool for developers and enterprises.
DeepSeek-V3 now runs at 20 tokens per second on Apple’s Mac Studio, presenting a challenge to OpenAI’s cloud-dependent business model. The 685-billion-parameter model, DeepSeek-V3-0324, is freely available for commercial use under the MIT license. This achievement, coupled with its cost efficiency and performance, signals a shift in the AI sector, where open-source frameworks increasingly compete with closed systems. Early testers report significant improvements over previous versions, positioning DeepSeek's new model above Claude Sonnet 3.5 from Anthropic. Recommended read:
References :
Matthias Bastian@THE DECODER
//
References:
TechCrunch
, THE DECODER
,
DeepSeek AI has announced impressive financial results, revealing annual revenues of $200 million with profit margins exceeding 85%. This achievement highlights the potential for significant profitability in the AI language model sector, even when pricing services much lower than competitors like OpenAI. DeepSeek's success comes from efficient architecture and cost management, allowing them to charge just $2.19 per million tokens, which is approximately 25 times less than OpenAI. This pricing strategy, combined with smart resource allocation, has enabled DeepSeek to achieve profitability that rivals that of Nvidia, which reports profit margins of 72-77%.
The company's innovative approach includes maximizing efficiency through a dynamic resource allocation system. During peak daytime hours, all server nodes are dedicated to handling inference requests. When demand decreases at night, resources are redirected to research and training tasks. This smart management helps reduce costs, contributing to the company's high-profit margins. While these figures represent "theoretical" profit margins, they are based on actual usage data, illustrating the potential for AI language models to be highly profitable even with lower pricing strategies. Recommended read:
References :
Harsh Mishra@Analytics Vidhya
//
DeepSeek AI has been making significant contributions to the open-source community, particularly in the realm of AI model efficiency and accessibility. They recently launched the Fire-Flyer File System (3FS), a high-performance distributed file system tailored for AI training and inference workloads. This system is designed to address the challenges of managing large-scale, concurrent data access, a common bottleneck in traditional file systems. 3FS leverages modern SSDs and RDMA networks, offering a shared storage layer that facilitates the development of distributed applications by bypassing limitations seen in more traditional, locality-dependent file systems.
DeepSeek's commitment extends to data processing and model optimization. They have introduced the Smallpond framework for data processing and released quantized DeepSeek-R1 models, optimized for deployment-ready reasoning tasks. The quantized models, including Llama-8B, Llama-70B, Qwen-1.5B, Qwen-7B, Qwen-14B, and Qwen-32B, are available as a Hugging Face collection with evaluations, benchmarks, and setup instructions. These models maintain competitive reasoning accuracy while unlocking significant inference speedups. Recommended read:
References :
Matthias Bastian@THE DECODER
//
Chinese AI company DeepSeek is making waves in the global AI market with its high profit margins and low pricing. The company makes $200 million per year at 85% or greater profit margins, even while charging $2.19 per million tokens on its R1 model, about 25 times less than OpenAI. DeepSeek's financial data suggests a theoretical peak revenue could exceed operating costs by six times when using optimal R1 model pricing.
The company's success has prompted Tencent to unveil its own AI platform, Hunyuan Turbo S, designed specifically to compete with DeepSeek. Although Hunyuan Turbo S is the clear winner in certain cases, it still falls behind DeepSeek-R1-Zero in several instances. DeepSeek uses smart resource management and a dynamic resource allocation system which keeps costs down. Recommended read:
References :
Asif Razzaq@MarkTechPost
//
DeepSeek AI is accelerating the release of its R2 AI reasoning model, a sequel to its R1 model that was launched in January. The R1 model matched or exceeded the performance of models from major Western companies like OpenAI, Meta, and Google. The release of R1 precipitated a significant stock sell-off, and the R2 model is expected to have enhanced coding and reasoning capabilities in multiple languages.
DeepSeek is moving up the release date for R2, which was initially planned for early May. This accelerated release may further intensify concerns in the United States regarding global AI leadership and is expected to encourage many Chinese companies to integrate DeepSeek models into their products. Furthermore, DeepSeek has announced the release of DeepGEMM, a library designed for efficient FP8 General Matrix Multiplications (GEMMs), as part of #OpenSourceWeek. This new library will help improve the efficiency of training AI models. Recommended read:
References :
@timesofindia.indiatimes.com
//
References:
www.artificialintelligence-new
, www.eweek.com
,
Recent developments highlight both the expanding influence and the regulatory hurdles faced by the AI company DeepSeek. In South Korea, the government has halted downloads of DeepSeek's applications, citing concerns over data privacy. This action has removed the company's apps from both the Apple and Google mobile app marketplaces, though their website remains accessible.
Simultaneously, DeepSeek's AI technology is rapidly integrating into China's transportation sector, extending from electric vehicles (EVs) to e-scooters. Major automakers, including BYD, Geely, and Chery Automobile, are incorporating DeepSeek's AI into their vehicles, offering features like preliminary self-driving capabilities. E-scooter brands like Segway-Ninebot and Niu Technologies are also integrating DeepSeek for enhanced features such as AI-powered content creation, data analytics, and driver assistance systems, reflecting what some industry observers are calling "DeepSeek fever" due to its cost-effective AI integration. Perplexity has released "1776," a modified version of DeepSeek-R1. This model addresses the original version's limitations by mitigating censorship on sensitive topics, particularly those related to Chinese history and geopolitics. The modifications were made using post-training techniques to ensure more open and contextually accurate responses, making the modified model available on Perplexity's Sonar AI platform and GitHub. Recommended read:
References :
@the-decoder.com
//
DeepSeek's R1 model has garnered significant attention in the AI landscape. Perplexity AI has created R1 1776, a modified version of DeepSeek-R1 designed to overcome Chinese censorship through specialized post-training techniques. This modification addresses the original model's limitation of responding to sensitive topics with pre-approved Communist Party messaging. Perplexity's post-training process involved extensive data collection on censored Chinese topics, developing a multilingual censorship detection system to identify and address censored responses.
This modification allows R1 1776 to handle previously censored topics comprehensively and without bias, while maintaining its mathematical and reasoning capabilities. Furthermore, IBM has confirmed its integration of distilled versions of DeepSeek's AI models into its WatsonX platform. This decision is validated by a commitment to open source innovation and an eye on the high costs of US-originated AI models. IBM aims to broaden WatsonX's ability to perform secure reasoning by incorporating the "best open source models" available, including those from DeepSeek. Recommended read:
References :
@www.marktechpost.com
//
References:
techstrong.ai
, www.marktechpost.com
,
DeepSeek AI is making strides in AI modeling with its Native Sparse Attention (NSA) mechanism, aimed at reducing computational costs for long-context models. NSA employs a dynamic hierarchical approach, compressing tokens, selectively retaining relevant ones, and using a sliding window to preserve local context. This innovation seeks to balance performance with efficiency, addressing challenges in standard attention mechanisms that face quadratic complexity when processing long sequences. The hardware-aligned design of NSA, with specialized kernels optimized for modern GPUs, further reduces latency in both inference and training.
This algorithmic innovation is already seeing practical application. IBM has decided to integrate "distilled versions" of DeepSeek's AI models into its WatsonX platform, citing a commitment to open-source innovation and aiming to broaden WatsonX's reasoning capabilities. This move reflects a growing industry recognition that large, expensive proprietary AI systems are not always necessary for effective AI solutions. Techniques like GRPO (General Reinforcement Pretraining Optimization) and Unsloth are being used to fine-tune DeepSeek-7B, enhancing its performance on specialized tasks and optimizing memory management for faster, more cost-effective training. Recommended read:
References :
@the-decoder.com
//
Perplexity AI has launched Deep Research, an AI-powered research tool aimed at competing with OpenAI and Google Gemini. Using DeepSeek-R1, Perplexity is offering comprehensive research reports at a much lower cost than OpenAI, with 500 queries per day for $20 per month compared to OpenAI's $200 per month for only 100 queries. The new service automatically conducts dozens of searches and analyzes hundreds of sources to produce detailed reports in one to two minutes.
Perplexity claims Deep Research performs 8 searches and consults 42 sources to generate a 1,300-word report in under 3 minutes. The company says that Deep Research tool works particularly well for finance, marketing, and technology research. The service is launching first on web browsers, with iOS, Android, and Mac versions planned for later release. Perplexity CEO Aravind Srinivas stated he wants to keep making it faster and cheaper for the interest of humanity. Recommended read:
References :
@techhq.com
//
References:
techhq.com
, datafloq.com
DeepSeek is making waves in the AI industry with its open-source AI models, challenging the dominance of proprietary models from industry giants like OpenAI and Anthropic. DeepSeek-R1, a reasoning model built on top of DeepSeek-V3, is being recognized as a significant milestone, sparking excitement within the open-source community. Its accessible AI development approach could democratize the technology by allowing anyone to download, modify, and build upon the system at a lower cost. DeepSeek claims it built its system for approximately $5.6 million – roughly one-tenth the cost of Meta’s Llama model.
The company's open-source approach has also raised some concerns. While DeepSeek has released model weights and some technical documentation, it hasn’t fully disclosed its training data, leading to questions about complete transparency. In addition, a cybersecurity company found security and privacy issues of concern in the DeepSeek iOS mobile app. Data is initially sent to the DeepSeek servers with information such as the device language and User Agent data readable. This has prompted lawmakers in the US House of Representatives to consider a ban of DeepSeek's AI models on federal devices. Recommended read:
References :
David Gerard@Pivot to AI
//
DeepSeek AI is facing increasing scrutiny and controversy due to its capabilities and potential security risks. US lawmakers are pushing for a ban on DeepSeek on government-issued devices, citing concerns that the app transfers user data to a banned state-owned company, China Mobile. This action follows a study that revealed direct links between the app and the Chinese government-owned entity. Security researchers have also discovered hidden code within DeepSeek that transmits user data to China, raising alarms about potential CCP oversight and the compromise of sensitive information.
DeepSeek's capabilities, while impressive, have raised concerns about its potential for misuse. Security researchers found the model doesn't screen out malicious prompts and can provide instructions for harmful activities, including producing chemical weapons and planning terrorist attacks. Despite these concerns, DeepSeek is being used to perform "reasoning" tasks, such as coding, on alternative chips from Groq and Cerebras, with some tasks completed in as little as 1.5 seconds. These advancements challenge traditional assumptions about the resources required for advanced AI, highlighting both the potential and the risks associated with DeepSeek's capabilities. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |