News from the AI & ML world

DeeperML - #languagemodel

Ryan Daws@AI News //
References: venturebeat.com , AI News ,
DeepSeek, a Chinese AI startup, is making waves in the artificial intelligence industry with its DeepSeek-V3 model. This model is demonstrating performance that rivals Western AI models like those from OpenAI and Anthropic, but at significantly lower development costs. The release of DeepSeek-V3 is seen as jumpstarting AI development across China, with other startups and established companies releasing their own advanced models, further fueling competition. This has narrowed the technology gap between China and the United States as China has adapted to and overcome international restrictions through creative approaches to AI development.

One particularly notable aspect of DeepSeek-V3 is its ability to run efficiently on consumer-grade hardware, such as the Mac Studio with an M3 Ultra chip. Reports indicate that the model achieves speeds of over 20 tokens per second on this platform, making it a potential "nightmare for OpenAI". This contrasts sharply with the data center requirements typically associated with state-of-the-art AI models. The company's focus on algorithmic efficiency has allowed them to achieve notable gains despite restricted access to the latest silicon, showcasing that Chinese AI innovation has flourished by focusing on algorithmic efficiency and novel approaches to model architecture.

Recommended read:
References :
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide
  • GZERO Media: How DeepSeek changed China’s AI ambitions

Ryan Daws@AI News //
DeepSeek V3-0324, the latest large language model from Chinese AI startup DeepSeek, is making waves in the artificial intelligence industry. The model, quietly released with an MIT license for commercial use, has quickly become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index. This marks a significant milestone for open-source AI, surpassing proprietary counterparts like Google’s Gemini 2.0 Pro, Anthropic’s Claude 3.7 Sonnet, and Meta’s Llama 3.3 70B.

DeepSeek V3-0324's efficiency is particularly notable. Early reports indicate that it can run directly on consumer-grade hardware, specifically Apple’s Mac Studio with an M3 Ultra chip, achieving speeds of over 20 tokens per second. This capability is a major departure from the typical data center requirements associated with state-of-the-art AI. The updated version demonstrates substantial improvements in reasoning and benchmark performance, as well as enhanced Chinese writing proficiency and optimized translation quality.

Recommended read:
References :
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
  • Analytics Vidhya: DeepSeek V3-0324: Generated 700 Lines of Code without Breaking
  • Analytics India Magazine: The model outperformed all other non-reasoning models across several benchmarks but trailed behind DeepSeek-R1, OpenAI’s o1, o3-mini, and other reasoning models.
  • Cloud Security Alliance: DeepSeek: Behind the Hype and Headlines
  • techstrong.ai: DeepSeek Ups Ante (Again) in Duel with OpenAI, Anthropic
  • www.techradar.com: Deepseek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
  • Analytics Vidhya: DeepSeek V3-0324 vs Claude 3.7: Which is the Better Coder?
  • MarkTechPost: DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
  • www.zdnet.com: It's called V3-0324, but the real question is: Is it foreshadowing the upcoming launch of R2?
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • Composio: Deepseek v3 o324, a new checkpoint, has been released by Deepseek in silence, with no marketing or hype, just a tweet and The post appeared first on .
  • Composio: Deepseek v3-0324 vs. Claude 3.7 Sonnet

Ryan Daws@AI News //
DeepSeek V3-0324 has emerged as a leading AI model, topping benchmarks for non-reasoning AI in an open-source breakthrough. This milestone signifies a significant advancement in the field, as it marks the first time an open weights model has achieved the top position among non-reasoning models. The model's performance surpasses proprietary counterparts and edges it closer to proprietary reasoning models, highlighting the growing viability of open-source solutions for latency-sensitive applications. DeepSeek V3-0324 represents a new era for open-source AI, offering a powerful and adaptable tool for developers and enterprises.

DeepSeek-V3 now runs at 20 tokens per second on Apple’s Mac Studio, presenting a challenge to OpenAI’s cloud-dependent business model. The 685-billion-parameter model, DeepSeek-V3-0324, is freely available for commercial use under the MIT license. This achievement, coupled with its cost efficiency and performance, signals a shift in the AI sector, where open-source frameworks increasingly compete with closed systems. Early testers report significant improvements over previous versions, positioning DeepSeek's new model above Claude Sonnet 3.5 from Anthropic.

Recommended read:
References :
  • Analytics India Magazine: The model outperformed all other non-reasoning models across several benchmarks but trailed behind DeepSeek-R1, OpenAI’s o1, o3-mini, and other reasoning models.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • AI News: DeepSeek V3-0324 tops non-reasoning AI models in open-source first
  • Analytics Vidhya: DeepSeek V3-0324: Generated 700 Lines of Code without Breaking
  • Analytics Vidhya: DeepSeek V3-0324 vs Claude 3.7: Which is the Better Coder?
  • Cloud Security Alliance: Markets reacted dramatically, with Nvidia alone losing nearly $600 billion in value in a single day, part of a broader...
  • GZERO Media: Just a few short months ago, Silicon Valley seemed to have the artificial intelligence industry in a chokehold.
  • MarkTechPost: DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
  • SiliconANGLE: DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license.
  • techstrong.ai: DeepSeek Ups Ante (Again) in Duel with OpenAI, Anthropic
  • www.zdnet.com: DeepSeek V3 model gets a major upgrade
  • www.techradar.com: DeepSeek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
  • Composio: Deepseek v3 0324: Finally, the Sonnet 3.5 at Home
  • AI News: DeepSeek disruption: Chinese AI innovation narrows global technology divide

Michal Langmajer@Fello AI //
OpenAI has announced the release of GPT-4.5, its latest language model which they are calling their 'last non-chain-of-thought model.' According to OpenAI, GPT-4.5 offers substantial enhancements over its predecessors, particularly in advanced reasoning, problem-solving, and contextual understanding. Sam Altman, CEO of OpenAI, described it as the "first model that feels like talking to a thoughtful person," noting moments of astonishment at the quality of advice received from the AI.

However, the rollout is facing challenges due to GPU shortages. Altman stated they are "out of GPUs," leading to a staggered release, initially limited to ChatGPT Pro subscribers who pay $200 a month. While GPT-4.5 is available to developers across all paid API tiers, OpenAI plans to expand access to Plus and Team tiers next week, with tens of thousands of GPUs expected to arrive to alleviate the supply constraints. Despite not being a reasoning model, OpenAI estimates that GPT-4.5 is 30 times more expensive to run than GPT-4o.

Recommended read:
References :
  • Fello AI: OpenAI’s GPT‑4.5 Finally Arrived: Can It Beat Grok 3 and Claude 3.7?
  • Shelly Palmer: Shelly Palmer discusses the release of OpenAI's GPT-4.5.
  • Analytics Vidhya: Everything You Need to Know About OpenAI’s GPT-4.5
  • www.tomshardware.com: Tom's Hardware reports on Sam Altman's statement about GPU shortages delaying the GPT-4.5 release.
  • venturebeat.com: VentureBeat reports OpenAI releases GPT-4.5 claiming 10X efficiency over GPT-4, but says it’s ‘not a frontier model’
  • Gradient Flow: Scaling Up, Costs Up: GPT-4.5 and the Intensifying AI Competition
  • Pivot to AI: OpenAI releases GPT-4.5 with ridiculous prices for a mediocre model
  • Techstrong.ai: TechStrong.ai article on OpenAI's GPT-4.5 AI model.
  • THE DECODER: OpenAI has released GPT-4.5 as a "Research Preview".
  • eWEEK: OpenAI releases GPT-4.5, a “Warm” Generative AI Model, for Paid Plans and APIs
  • www.windowscentral.com: Sam Altman on GPT-4.5: Expensive, yet the closest thing to a thoughtful conversational partner we've seen
  • THE DECODER: OpenAI's largest model GPT-4.5 delivers on vibes instead of benchmarks
  • 9to5Mac: OpenAI announces GPT-4.5, ChatGPT’s largest and best model for chat
  • www.engadget.com: OpenAI's new GPT-4.5 model is a better, more natural conversationalist
  • The Verge: Anthropic’s new ‘hybrid reasoning’ AI model is its smartest yet
  • THE DECODER: OpenAI has presented its largest language model to date. According to Mark Chen, Chief Research Officer at OpenAI, GPT 4.5 shows that the scaling of AI models has not yet reached its limits.
  • The Verge: OpenAI is launching GPT-4.5 today, its newest and largest AI language model.
  • NextBigFuture.com: OpenAI GPT 4.5 Has BIG Coding Improvement – Claims Scaling Still Works – Expensive
  • TechCrunch: OpenAI unveils GPT-4.5 ‘Orion,’ its largest AI model yet
  • PCMag Middle East ai: Reporting that OpenAI has launched GPT-4.5 but is limiting it to priciest tiers due to GPU shortages.
  • Analytics Vidhya: Two days ago, on 27 Feb 2025, OpenAI dropped GPT-4.5, expectations were sky-high. But instead of a groundbreaking leap forward, we got a model prioritizing emotional intelligence over raw reasoning power.
  • AI News | VentureBeat: GPT-4.5 for enterprise: Do its accuracy and knowledge justify the cost?
  • Windows Report: OpenAI released GPT-4.5, but it’s not much of an upgrade from GPT-4o. After DeepSeek was unleashed into the world, everyone wondered what OpenAI would do now that another AI company had developed an extremely powerful model at a tiny fraction of the budget.
  • iHLS: OpenAI Unveils GPT-4.5
  • Data Phoenix: OpenAI releases the long-awaited GPT-4.5/Orion, its last non-chain-of-thought model
  • Towards AI: GPT-4.5: The Next Evolution in AI
  • Towards AI: Towards AI article on TAI #142: GPT-4.5 Released.
  • www.marketingaiinstitute.com: [The AI Show Episode 138]: Introducing GPT-4.5, Claude 3.7 Sonnet, Alexa+, Deep Research Now in ChatGPT Plus & How AI Is Disrupting Writing
  • Analytics Vidhya: Now, this is a shocker, despite a lot of backlash on the cost of GPT 4.5, it becomes #1 in the Chatbot Arena LLM Leaderboard! Securing over 3,200+ votes, OpenAI’s latest model has emerged as number one across all evaluation categories, prominently excelling in Style Control and Multi-Turn interactions.

Jibin Joseph@PCMag Middle East ai //
DeepSeek AI's R1 model, a reasoning model praised for its detailed thought process, is now available on platforms like AWS and NVIDIA NIM. This increased accessibility allows users to build and scale generative AI applications with minimal infrastructure investment. Benchmarks have also revealed surprising performance metrics, with AMD’s Radeon RX 7900 XTX outperforming the RTX 4090 in certain DeepSeek benchmarks. The rise of DeepSeek has put the spotlight on reasoning models, which break questions down into individual steps, much like humans do.

Concerns surrounding DeepSeek have also emerged. The U.S. government is investigating whether DeepSeek smuggled restricted NVIDIA GPUs via Singapore to bypass export restrictions. A NewsGuard audit found that DeepSeek’s chatbot often advances Chinese government positions in response to prompts about Chinese, Russian, and Iranian false claims. Furthermore, security researchers discovered a "completely open" DeepSeek database that exposed user data and chat histories, raising privacy concerns. These issues have led to proposed legislation, such as the "No DeepSeek on Government Devices Act," reflecting growing worries about data security and potential misuse of the AI model.

Recommended read:
References :
  • aws.amazon.com: DeepSeek R1 models now available on AWS
  • www.pcguide.com: DeepSeek GPU benchmarks reveal AMD’s Radeon RX 7900 XTX outperforming the RTX 4090
  • www.tomshardware.com: U.S. investigates whether DeepSeek smuggled Nvidia AI GPUs via Singapore
  • www.wired.com: Article details challenges of testing and breaking DeepSeek's AI safety guardrails.
  • decodebuzzing.medium.com: Benchmarking ChatGPT, Qwen, and DeepSeek on Real-World AI Tasks
  • medium.com: The blog post emphasizes the use of DeepSeek-R1 in a Retrieval-Augmented Generation (RAG) chatbot. It underscores its comparability in performance to OpenAI's o1 model and its role in creating a chatbot capable of handling document uploads, information extraction, and generating context-aware responses.
  • www.aiwire.net: This article highlights the cost-effectiveness of DeepSeek's R1 model in training, noting its training on a significantly smaller cluster of older GPUs compared to leading models from OpenAI and others, which are known to have used far more extensive resources.
  • futurism.com: OpenAI CEO Sam Altman has since congratulated DeepSeek for its "impressive" R1 reasoning model, he promised spooked investors to "deliver much better models."
  • AWS Machine Learning Blog: Protect your DeepSeek model deployments with Amazon Bedrock Guardrails
  • mobinetai.com: DeepSeek is a catastrophically broken model with non-existent, typical shoddy Chinese safety measures that take 60 seconds to dismantle.
  • AI Alignment Forum: Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
  • Pivot to AI: Of course DeepSeek lied about its training costs, as we had strongly suspected.
  • Unite.AI: Artificial Intelligence (AI) is no longer just a technological breakthrough but a battleground for global power, economic influence, and national security.
  • cset.georgetown.edu: China’s ability to launch DeepSeek’s popular chatbot draws US government panel’s scrutiny
  • neuralmagic.com: Enhancing DeepSeek Models with MLA and FP8 Optimizations in vLLM
  • www.unite.ai: Blog post about DeepSeek and the global power shift.
  • cset.georgetown.edu: This article discusses DeepSeek and its impact on the US-China AI race.