@syncedreview.com
//
DeepSeek AI has unveiled DeepSeek-Prover-V2, a new open-source large language model (LLM) designed for formal theorem proving within the Lean 4 environment. This model advances the field of neural theorem proving by utilizing a recursive theorem-proving pipeline and leverages DeepSeek-V3 to generate high-quality initialization data. DeepSeek-Prover-V2 has achieved top results on the MiniF2F benchmark, showcasing its state-of-the-art performance in mathematical reasoning. The release includes ProverBench, a new benchmark for evaluating mathematical reasoning capabilities.
DeepSeek-Prover-V2 features a unique cold-start training procedure. The process begins by using the DeepSeek-V3 model to decompose complex mathematical theorems into a series of more manageable subgoals. Simultaneously, DeepSeek-V3 formalizes these high-level proof steps in Lean 4, creating a structured sequence of sub-problems. To handle the computationally intensive proof search for each subgoal, the researchers employed a smaller 7B parameter model. Once all the decomposed steps of a challenging problem are successfully proven, the complete step-by-step formal proof is paired with DeepSeek-V3’s corresponding chain-of-thought reasoning. This allows the model to learn from a synthesized dataset that integrates both informal, high-level mathematical reasoning and rigorous formal proofs, providing a strong cold start for subsequent reinforcement learning. Building upon the synthetic cold-start data, the DeepSeek team curated a selection of challenging problems that the 7B prover model couldn’t solve end-to-end, but for which all subgoals had been successfully addressed. By combining the formal proofs of these subgoals, a complete proof for the original problem is constructed. This formal proof is then linked with DeepSeek-V3’s chain-of-thought outlining the lemma decomposition, creating a unified training example of informal reasoning followed by formalization. DeepSeek is also challenging the long-held belief of tech CEOs who've argued that exponential AI improvements require ever-increasing computing power. DeepSeek claims to have produced models comparable to OpenAI, but with significantly less compute and cost, questioning the necessity of massive scale for AI advancement. References :
Classification:
@the-decoder.com
//
DeepSeek's R1 model, released in January 2025, caused significant disruption in the AI industry by demonstrating top-tier AI capabilities on a limited budget and without relying on Nvidia's high-end GPUs. The model, built by optimizing for Huawei's Ascend 910B chips, proved that innovative engineering and talent can overcome hardware limitations, inspiring fear among Silicon Valley giants who had previously dominated the AI landscape. R1 performed competitively with GPT-4o in many benchmarks, particularly in Chinese language tasks, while being significantly cheaper to train and serve, signaling a potential collapse of price-performance curves in the AI market.
The success of R1 was attributed to DeepSeek's motivation to innovate in efficiency, particularly in KV-cache optimization, an area of lesser concern for larger players with abundant resources. This allowed them to achieve cost savings in GPU memory usage by optimizing the Key-Value cache used in every attention layer of the LLM. DeepSeek has published their work, and the results have been verified on a smaller scale. This accomplishment highlighted the importance of optimizing existing resources rather than simply relying on expensive hardware to achieve top-tier AI performance. Now, DeepSeek is preparing to launch R2 in May 2025, with rumored specifications including 1.2 trillion parameters under a hybrid MoE setup and training on over 5.2 petabytes of data. Leaks suggest R2 could be 97% cheaper than GPT-4o with improved vision capabilities. These developments, coupled with DeepSeek's focus on reducing reliance on American silicon, signal a major shift in the AI industry and a potential era of significantly lower AI pricing, potentially rivaling GPT-4.5 and Gemini 2.5, and causing massive market disruption. References :
Classification:
Ben Dickson@AI News | VentureBeat
//
DeepSeek, a Chinese AI company, has achieved a breakthrough in AI reward modeling that promises to enhance the reasoning and responsiveness of AI systems. Collaborating with Tsinghua University researchers, DeepSeek developed a technique called "Inference-Time Scaling for Generalist Reward Modeling," demonstrating improved performance compared to existing methods and competitive results against established public reward models. This innovation aims to improve how AI systems learn from human preferences, a key factor in developing more useful and aligned artificial intelligence.
DeepSeek's new approach involves a dual method combining Generative Reward Modeling (GRM) and Self-Principled Critique Tuning (SPCT). GRM provides flexibility in handling various input types and enables scaling during inference time, offering a richer representation of rewards through language compared to previous scalar approaches. SPCT, a learning method, fosters scalable reward-generation behaviors in GRMs through online reinforcement learning. One of the paper's authors explained that this combination allows principles to be generated based on the input query and responses, adaptively aligning the reward generation process. The SPCT technique addresses challenges in creating generalist reward models capable of handling broader tasks. These challenges include input flexibility, accuracy, inference-time scalability, and learning scalable behaviors. By creating self-guiding critiques, SPCT promises more scalable intelligence for enterprise LLMs, particularly in open-ended tasks and domains where current models struggle. DeepSeek has also released models like DeepSeek-V3 and DeepSeek-R1, which have achieved performance close to, and sometimes exceeding, leading proprietary models while using fewer training resources. These advancements signal that cutting-edge AI is not solely the domain of closed labs and highlight the importance of efficient model architecture, training algorithms, and hardware integration. References :
Classification:
|
BenchmarksBlogsResearch Tools |