News from the AI & ML world

DeeperML - #deepseek

Google AI Hypercomputer Accelerates Llama4 and DeepSeek - Google enhances its AI Hypercomputer by introducing optimized recipes for deploying Llama4 and DeepSeek models, aiming to simplify resource-intensive deployment challenges for developers and ML engineers.

References: AI & Machine Learning , github.com ,

Google is enhancing its AI Hypercomputer with optimized recipes designed to streamline the deployment of large AI models like Meta's Llama4 and DeepSeek. This move aims to alleviate the resource-intensive challenges faced by developers and ML engineers when working with these advanced models. The new recipes will facilitate the use of Llama4 Scout and Maverick models, as well as DeepSeek models, on Google Cloud Trillium TPUs and A3 Mega/Ultra GPUs, making these powerful AI tools more accessible and efficient to deploy.

JetStream, Google’s high-throughput inference engine for LLMs on XLA devices, now supports Llama-4-Scout-17B-16E and Llama-4-Maverick-17B-128E inference on Trillium TPUs. New recipes provide steps to deploy these models using JetStream and MaxText on a Trillium TPU GKE cluster. Pathways on Google Cloud simplifies large-scale machine learning computations by enabling a single JAX client to orchestrate workloads across multiple large TPU slices. MaxText now features reference implementations for Llama4 and DeepSeek, offering detailed guidance on checkpoint conversion, training, and decoding processes.

Developers can find these new recipes and resources on the AI Hypercomputer GitHub repository. These optimized recipes promise to simplify the deployment and resource management of Llama4 and DeepSeek models, enabling users to harness the full potential of these advanced AI technologies on Google Cloud's AI Hypercomputer platform. This initiative underscores Google's commitment to providing a robust AI infrastructure and fostering innovation in the open-source AI community.

Recommended read:

Top link: github.com
Permalink: More details

References :

AI & Machine Learning: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes
github.com: GitHub repository containing TPU recipes for deploying Llama-4-Scout-17B-16E.
github.com: GitHub - AI-Hypercomputer/maxtext: High throughput and scalable foundation model training.

anket.sah@lambda.ai (Anket@lambdalabs.com //

DeepSeek R1-0528 Model Now Available on Lambda Inference - DeepSeek’s R1-0528 model is now available on Lambda’s Inference API, offering advanced reasoning capabilities, but Databricks has noted open source risks associated with it.

References: lambda.ai , thezvi.wordpress.com ,

DeepSeek's latest model, R1-0528, is now available on Lambda’s Inference API, marking an upgrade to the original R1 model released in January 2025. The new model, built upon the deepseek_v3 architecture, boasts a blend of mathematical capabilities, code generation finesse, and reasoning depth, aiming to challenge the dominance of OpenAI’s o3 and Google’s Gemini 2.5 Pro. DeepSeek-R1-0528 employs FP8 quantization, enhancing its ability to handle complex computations efficiently and features a mixture-of-experts (MoE) model with multi-headed latent attention (MLA) and multi-token prediction (MTP), enabling efficient handling of complex reasoning tasks.

DeepSeek-R1-0528, while a solid upgrade, didn't generate the same excitement as the initial R1 release. When R1 was released in January 2025, it was seen as a watershed moment for the company. This time around, it's considered a solid model for its price and status as an open model, and is best suited for tasks that align with its specific strengths. The initial DeepSeek release created a "DeepSeek moment", leading to market reactions and comparisons to other models. The first R1 model was released with a free app featuring a clear design and visible chain-of-thought, which forced other labs to follow suit.

While DeepSeek R1-0528 offers advantages, experts warn of potential risks associated with open-source AI models. Cisco issued a report shortly after R1 began dominating headlines which claimed DeepSeek failed to block a single harmful prompt when tested against 50 random prompts taken from the HarmBench dataset. These risks include potential misuse for cyber threats, spread of misinformation, and reinforcement of biases. There are concerns regarding data poisoning, where compromised training data could lead to biased or disinformation. Furthermore, adversaries could modify the models to bypass controls, generate harmful content, or embed backdoors for exploitation.

Recommended read:

Top link: lambdalabs.com
Permalink: More details

References :

lambda.ai: DeepSeek-R1-0528: The Open-Source Titan Now Live on Lambdaâ€™s Inference API
thezvi.wordpress.com: DeepSeek-r1-0528 Did Not Have a Moment
thezvi.substack.com: DeepSeek-r1-0528 Did Not Have a Moment

@medium.com //

DeepSeek R1-0528 Excels in Math and Reasoning - DeepSeek R1-0528 is a new reasoning model outperforming rivals in math and reasoning, attributed to architecture and training data, posing a challenge to established closed models.

References: , TheSequence , thezvi.substack.com ...

DeepSeek's latest AI model, R1-0528, is making waves in the AI community due to its impressive performance in math and reasoning tasks. This new model, despite having a similar name to its predecessor, boasts a completely different architecture and performance profile, marking a significant leap forward. DeepSeek R1-0528 has demonstrated "unprecedented levels of demand" shooting to the top of the App Store past closed model rivals and overloading their API with unprecedented levels of demand to the point that they actually had to stop accepting payments.

The most notable improvement in DeepSeek R1-0528 is its mathematical reasoning capabilities. On the AIME 2025 test, the model's accuracy increased from 70% to 87.5%, surpassing Gemini 2.5 Pro and putting it in close competition with OpenAI's o3. This improvement is attributed to "enhanced thinking depth," with the model using significantly more tokens per question, engaging in more thorough chains of reasoning. This means the model can check its own work, recognize errors, and course-correct during problem-solving.

DeepSeek's success is challenging established closed models and driving competition in the AI landscape. DeepSeek-R1-0528 continues to utilize a Mixture-of-Experts (MoE) architecture, now scaled up to an enormous size. This sparse activation allows for powerful specialized expertise in different coding domains while maintaining efficiency. The context also continues to remain at 128k (with RoPE scaling or other improvements capable of extending it further.) The rise of DeepSeek is underscored by its performance benchmarks, which show it outperforming some of the industry’s leading models, including OpenAI’s ChatGPT. Furthermore, the release of a distilled variant, R1-0528-Qwen3-8B, ensures broad accessibility of this powerful technology.

Recommended read:

Top link: medium.com
Permalink: More details

References :

: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
TheSequence: The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
lambda.ai: DeepSeek-R1-0528: The Open-Source Titan Now Live on Lambdaâ€™s Inference API
thezvi.substack.com: DeepSeek-r1-0528 Did Not Have a Moment
thezvi.wordpress.com: DeepSeek-r1-0528 Did Not Have a Moment

@www.marktechpost.com //

DeepSeek Updates R1 Model, Boosts Reasoning Capability - DeepSeek has released R1-0528, an updated open-source reasoning AI model, enhancing its capabilities and positioning itself as a strong open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

References: pub.towardsai.net , AI News | VentureBeat , Kyle Wiggers ? ...

DeepSeek, a Chinese AI startup, has launched an updated version of its R1 reasoning AI model, named DeepSeek-R1-0528. This new iteration brings the open-source model near parity with proprietary paid models like OpenAI’s o3 and Google’s Gemini 2.5 Pro in terms of reasoning capabilities. The model is released under the permissive MIT License, enabling commercial use and customization, marking a commitment to open-source AI development. The model's weights and documentation are available on Hugging Face, facilitating local deployment and API integration.

The DeepSeek-R1-0528 update introduces substantial enhancements in the model's ability to handle complex reasoning tasks across various domains, including mathematics, science, business, and programming. DeepSeek attributes these improvements to leveraging increased computational resources and applying algorithmic optimizations in post-training. Notably, the accuracy on the AIME 2025 test has surged from 70% to 87.5%, demonstrating deeper reasoning processes with an average of 23,000 tokens per question, compared to the previous version's 12,000 tokens.

Alongside enhanced reasoning, the updated R1 model boasts a reduced hallucination rate, which contributes to more reliable and consistent output. Code generation performance has also seen a boost, positioning it as a strong contender in the open-source AI landscape. DeepSeek provides instructions on its GitHub repository for those interested in running the model locally and encourages community feedback and questions. The company aims to provide accessible AI solutions, underscored by the availability of a distilled version of R1-0528, DeepSeek-R1-0528-Qwen3-8B, designed for efficient single-GPU operation.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

pub.towardsai.net: DeepSeek R1Â : Is It Right For You? (A Practical Selfâ€‘Assessment for Businesses and Individuals)
AI News | VentureBeat: DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro
Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
MacStories: Details about DeepSeek's R1-0528 model and its improved performance.
MarkTechPost: Information about DeepSeek's R1-0528 model and its enhancements in math and code performance.
www.marktechpost.com: DeepSeek, the Chinese AI Unicorn, has released an updated version of its R1 reasoning model, named DeepSeek-R1-0528. This release enhances the modelâ€™s capabilities in mathematics, programming, and general logical reasoning, positioning it as a formidable open-source alternative to leading models like OpenAIâ€™s o3 and Googleâ€™s Gemini 2.5 Pro. Technical Enhancements The R1-0528 update introduces significant [â€¦]
www.analyticsvidhya.com: When DeepSeek R1 launched in January, it instantly became one of the most talked-about open-source models on the scene, gaining popularity for its sharp reasoning and impressive performance. Fast-forward to today, and DeepSeek is back with a so-called â€œminor trial upgradeâ€, but donâ€™t let the modest name fool you. DeepSeek-R1-0528 delivers major leaps in reasoning, [â€¦]
: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
TheSequence: The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
Fello AI: In late May 2025, Chinese startup DeepSeek quietly rolled out R1-0528, a beefed-up version of its open-source R1 reasoning model.
felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAIâ€™s o3 & Gemini 2.5 Pro

@www.marktechpost.com //

DeepSeek R1-0528 Update Improves Reasoning and Coding - DeepSeek released DeepSeek-R1-0528, with improved performance in math, coding, and general reasoning and is a competitive open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

References: Kyle Wiggers ? , AI News | VentureBeat , MacStories ...

DeepSeek has released a major update to its R1 reasoning model, dubbed DeepSeek-R1-0528, marking a significant step forward in open-source AI. The update boasts enhanced performance in complex reasoning, mathematics, and coding, positioning it as a strong competitor to leading commercial models like OpenAI's o3 and Google's Gemini 2.5 Pro. The model's weights, training recipes, and comprehensive documentation are openly available under the MIT license, fostering transparency and community-driven innovation. This release allows researchers, developers, and businesses to access cutting-edge AI capabilities without the constraints of closed ecosystems or expensive subscriptions.

The DeepSeek-R1-0528 update brings several core improvements. The model's parameter count has increased from 671 billion to 685 billion, enabling it to process and store more intricate patterns. Enhanced chain-of-thought layers deepen the model's reasoning capabilities, making it more reliable in handling multi-step logic problems. Post-training optimizations have also been applied to reduce hallucinations and improve output stability. In practical terms, the update introduces JSON outputs, native function calling, and simplified system prompts, all designed to streamline real-world deployment and enhance the developer experience.

Specifically, DeepSeek R1-0528 demonstrates a remarkable leap in mathematical reasoning. On the AIME 2025 test, its accuracy improved from 70% to an impressive 87.5%, rivaling OpenAI's o3. This improvement is attributed to "enhanced thinking depth," with the model now utilizing significantly more tokens per question, indicating more thorough and systematic logical analysis. The open-source nature of DeepSeek-R1-0528 empowers users to fine-tune and adapt the model to their specific needs, fostering further innovation and advancements within the AI community.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
AI News | VentureBeat: VentureBeat article on DeepSeek R1-0528.
Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
MacStories: Testing DeepSeek R1-0528 on the M3 Ultra Mac Studio and Installing Local GGUF Models with Ollama on macOS
www.analyticsvidhya.com: New Deepseek R1-0528 Update is INSANE
www.marktechpost.com: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
NextBigFuture.com: DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training.
MarkTechPost: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
pandaily.com: In the early hours of May 29, Chinese AI startup DeepSeek quietly open-sourced the latest iteration of its R1 large language model, DeepSeek-R1-0528, on the Hugging Face platform .
www.computerworld.com: Reports that DeepSeek releases a new version of its R1 reasoning AI model.
techcrunch.com: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
the-decoder.com: Deepseek's R1 model closes the gap with OpenAI and Google after major update
Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
Analytics India Magazine: The new DeepSeek-R1 Is as good as OpenAI o3 and Gemini 2.5 Pro
: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
simonwillison.net: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
TheSequence: This article provides an overview of the new DeepSeek R1-0528 model and notes its improvements over the prior model released in January.
Kyle Wiggers ?: News about the release of DeepSeek's updated R1 AI model, emphasizing its increased censorship.
Fello AI: Reports that the R1-0528 model from DeepSeek is matching the capabilities of OpenAI's o3 and Google's Gemini 2.5 Pro.
felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAIâ€™s o3 & Gemini 2.5 Pro
www.tomsguide.com: DeepSeekâ€™s latest update is a serious threat to ChatGPT and Google â€” hereâ€™s why

@pub.towardsai.net //

DeepSeek R1 Model Assessed for Reasoning Capabilities - DeepSeek's R1 model is recognized for high reasoning capabilities, presenting both opportunities and concerns regarding political bias and censorship, requiring careful evaluation of data governance and privacy.

References: pub.towardsai.net

DeepSeek's R1 model is garnering attention as a potential game-changer for entrepreneurs, offering advancements in "reasoning per dollar." This refers to the amount of reasoning power one can obtain for each dollar spent, potentially unlocking opportunities previously deemed too expensive or technologically challenging. The model's high-reasoning capabilities at a reasonable cost are seen as a way to make advanced AI more accessible, particularly for tasks that require deep understanding and synthesis of information. One example is the creation of sophisticated AI-powered tools, like a "lawyer agent" that can review contracts, which were once cost-prohibitive.

The DeepSeek R1 model has been updated and released on Hugging Face, reportedly featuring significant changes and improvements. The update comes amidst both excitement and apprehension regarding the model's capabilities. While the model demonstrates promise in areas like content generation and customer support, concerns exist regarding potential political bias and censorship. This stems from observations of alleged Chinese government influence in the model's system instructions, which may impact the neutrality of generated content.

The adoption of DeepSeek R1 requires careful self-assessment by businesses and individuals, weighing its strengths and potential drawbacks against specific needs and values. Users must consider the model's alignment with their data governance, privacy requirements, and ethical principles. For instance, while the model's content generation capabilities are strong, some categories might be censored or skewed by built-in constraints. Similarly, its chatbot integration may lead to heavily filtered replies, raising concerns about alignment with corporate values. Therefore, it is essential to be comfortable with the possible official or heavily filtered replies, and to consider monitoring the AI's responses to ensure they align with the business' values.

Recommended read:

Top link: pub.towardsai.net
Permalink: More details

References :

pub.towardsai.net: DeepSeek R1Â : Is It Right For You? (A Practical Selfâ€‘Assessment for Businesses and Individuals)

News from the AI & ML world

DeeperML - #deepseek

Google AI Hypercomputer Accelerates Llama4 and DeepSeek - Google enhances its AI Hypercomputer by introducing optimized recipes for deploying Llama4 and DeepSeek models, aiming to simplify resource-intensive deployment challenges for developers and ML engineers.

DeepSeek R1-0528 Model Now Available on Lambda Inference - DeepSeek’s R1-0528 model is now available on Lambda’s Inference API, offering advanced reasoning capabilities, but Databricks has noted open source risks associated with it.

DeepSeek R1-0528 Excels in Math and Reasoning - DeepSeek R1-0528 is a new reasoning model outperforming rivals in math and reasoning, attributed to architecture and training data, posing a challenge to established closed models.

DeepSeek Updates R1 Model, Boosts Reasoning Capability - DeepSeek has released R1-0528, an updated open-source reasoning AI model, enhancing its capabilities and positioning itself as a strong open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

DeepSeek R1-0528 Update Improves Reasoning and Coding - DeepSeek released DeepSeek-R1-0528, with improved performance in math, coding, and general reasoning and is a competitive open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

DeepSeek R1 Model Assessed for Reasoning Capabilities - DeepSeek's R1 model is recognized for high reasoning capabilities, presenting both opportunities and concerns regarding political bias and censorship, requiring careful evaluation of data governance and privacy.

Benchmarks

Blogs

Research Tools