Carl Franzen@AI News | VentureBeat
//
Microsoft has announced the release of Phi-4-reasoning-plus, a new small, open-weight language model designed for advanced reasoning tasks. Building upon the architecture of the previously released Phi-4, this 14-billion parameter model integrates supervised fine-tuning and reinforcement learning to achieve strong performance on complex problems. According to Microsoft, the Phi-4 reasoning models outperform larger language models on several demanding benchmarks, despite their compact size. This new model pushes the limits of small AI, demonstrating that carefully curated data and training techniques can lead to impressive reasoning capabilities.
The Phi-4 reasoning family, consisting of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, is specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Phi-4-reasoning-plus, in particular, extends supervised fine-tuning with outcome-based reinforcement learning, which is targeted for improved performance in high-variance tasks such as competition-level mathematics. All models are designed to enable reasoning capabilities, especially on lower-performance hardware such as mobile devices. Microsoft CEO Satya Nadella revealed that AI is now contributing to 30% of Microsoft's code. The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access. The model has been released under a permissive MIT license, enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction. Recommended read:
References :
Adam Zewe@news.mit.edu
//
MIT researchers have unveiled a "periodic table of machine learning," a groundbreaking framework that organizes over 20 common machine-learning algorithms based on a unifying algorithm. This innovative approach allows scientists to combine elements from different methods, potentially leading to improved algorithms or the creation of entirely new ones. The researchers believe this framework will significantly fuel further AI discovery and innovation by providing a structured approach to understanding and developing machine learning techniques.
The core concept behind this "periodic table" is that all these algorithms, while seemingly different, learn a specific kind of relationship between data points. Although the way each algorithm accomplishes this may vary, the fundamental mathematics underlying each approach remains consistent. By identifying a unifying equation, the researchers were able to reframe popular methods and arrange them into a table, categorizing each based on the relationships it learns. Shaden Alshammari, an MIT graduate student and lead author of the related paper, emphasizes that this is not just a metaphor, but a structured system for exploring machine learning. Just like the periodic table of chemical elements, this new framework contains empty spaces, representing algorithms that should exist but haven't been discovered yet. These spaces act as predictions, guiding researchers toward unexplored areas within machine learning. To illustrate the framework's potential, the researchers combined elements from two different algorithms, resulting in a new image-classification algorithm that outperformed current state-of-the-art approaches by 8 percent. The researchers hope that this "periodic table" will serve as a toolkit, allowing researchers to design new algorithms without needing to rediscover ideas from prior approaches. Recommended read:
References :
@developer.nvidia.com
//
NVIDIA is significantly advancing the capabilities of AI development with the introduction of new tools and technologies. The company's latest innovations focus on enhancing the performance of AI agents, improving integration with various software and hardware platforms, and streamlining the development process for enterprises. These advancements include NVIDIA NeMo microservices for creating data-driven AI agents and a G-Assist plugin builder that enables users to customize AI functionalities on GeForce RTX AI PCs.
NVIDIA's NeMo microservices are designed to empower enterprises to build AI agents that can access and leverage data to enhance productivity and decision-making. These microservices provide a modular platform for building and customizing generative AI models, offering features such as prompt tuning, supervised fine-tuning, and knowledge retrieval tools. NVIDIA envisions these microservices as essential building blocks for creating data flywheels, enabling AI agents to continuously learn and improve from enterprise data, business intelligence, and user feedback. The initial use cases include AI agents used by AT&T to process nearly 10,000 documents and a coding assistant used by Cisco Systems. The introduction of the G-Assist plugin builder marks a significant step forward in AI-assisted PC control. This tool allows developers to create custom commands to manage both software and hardware functions on GeForce RTX AI PCs. By enabling integration with large language models (LLMs) and other software applications, the plugin builder expands G-Assist's functionality beyond its initial gaming-focused applications. Users can now tailor AI functionalities to suit their specific needs, automating tasks and controlling various PC functions through voice or text commands. The G-Assist tool runs a lightweight language model locally on RTX GPUs, enabling inference without relying on a cloud connection. Recommended read:
References :
@simonwillison.net
//
OpenAI has recently unveiled its latest AI reasoning models, the o3 and o4-mini, marking a significant step forward in the development of AI agents capable of utilizing tools effectively. These models are designed to pause and thoroughly analyze questions before providing a response, enhancing their reasoning capabilities. The o3 model is presented as OpenAI's most advanced in this category, demonstrating superior performance across various benchmarks, including math, coding, reasoning, science, and visual understanding. Meanwhile, the o4-mini model strikes a balance between cost-effectiveness, speed, and overall performance, offering a versatile option for different applications.
OpenAI's o3 and o4-mini are equipped with the ability to leverage tools within the ChatGPT environment, such as web browsing, Python code execution, image processing, and image generation. This integration allows the models to augment their capabilities by cropping or transforming images, searching the web for relevant information, and analyzing data using Python, all within their thought process. A variant of o4-mini, named "o4-mini-high," is also available, catering to users seeking enhanced performance. These models are accessible to subscribers of OpenAI's Pro, Plus, and Team plans, reflecting the company's commitment to providing advanced AI tools to a wide range of users. Interestingly, the system card for o3 and o4-mini shows that the o3 model tends to make more claims overall. This can lead to both more accurate and more inaccurate claims, including hallucinations, compared to earlier models like o1. OpenAI's internal PersonQA benchmark shows that the hallucination rate increases from 0.16 for o1 to 0.33 for o3. The o3 and o4-mini models also exhibit a limited capability to "sandbag," which, in this context, refers to the model concealing its full capabilities to better achieve a specific goal. Further research is necessary to fully understand the implications of these observations. Recommended read:
References :
@www.microsoft.com
//
References:
news.microsoft.com
, www.microsoft.com
,
Microsoft Research is delving into the transformative potential of AI as "Tools for Thought," aiming to redefine AI's role in supporting human cognition. At the upcoming CHI 2025 conference, researchers will present four new research papers and co-host a workshop exploring this intersection of AI and human thinking. The research includes a study on how AI is changing the way we think and work along with three prototype systems designed to support different cognitive tasks. The goal is to explore how AI systems can be used as Tools for Thought and reimagine AI’s role in human thinking.
As AI tools become increasingly capable, Microsoft has unveiled new AI agents designed to enhance productivity in various domains. The "Researcher" agent can tackle complex research tasks by analyzing work data, emails, meetings, files, chats, and web information to deliver expertise on demand. Meanwhile, the "Analyst" agent functions as a virtual data scientist, capable of processing raw data from multiple spreadsheets to forecast demand or visualize customer purchasing patterns. The new AI agents unveiled over the past few weeks can help people every day with things like research, cybersecurity and more. Johnson & Johnson has reportedly found that only a small percentage, between 10% and 15%, of AI use cases deliver the vast majority (80%) of the value. After encouraging employees to experiment with AI and tracking the results of nearly 900 use cases over about three years, the company is now focusing resources on the highest-value projects. These high-value applications include a generative AI copilot for sales representatives and an internal chatbot answering employee questions. Other AI tools being developed include one for drug discovery and another for identifying and mitigating supply chain risks. Recommended read:
References :
@github.com
//
A critical Remote Code Execution (RCE) vulnerability, identified as CVE-2025-32434, has been discovered in PyTorch, a widely used open-source machine learning framework. This flaw, detected by security researcher Ji’an Zhou, undermines the safety of the `torch.load()` function, even when configured with `weights_only=True`. This parameter was previously trusted to prevent unsafe deserialization, making the vulnerability particularly concerning for developers who relied on it as a security measure. The discovery challenges long-standing security assumptions within machine learning workflows.
This vulnerability affects PyTorch versions 2.5.1 and earlier and has been assigned a CVSS v4 score of 9.3, indicating a critical security risk. Attackers can exploit the flaw by crafting malicious model files that bypass deserialization restrictions, allowing them to execute arbitrary code on the target system during model loading. The impact is particularly severe in cloud-based AI environments, where compromised models could lead to lateral movement, data breaches, or data exfiltration. As Ji'an Zhou noted, the vulnerability is paradoxical because developers often use `weights_only=True` to mitigate security issues, unaware that it can still lead to RCE. To address this critical issue, the PyTorch team has released version 2.6.0. Users are strongly advised to immediately update their PyTorch installations. For systems that cannot be updated immediately, the only viable workaround is to avoid using `torch.load()` with `weights_only=True` entirely. Alternative model-loading methods, such as using explicit tensor extraction tools, are recommended until the patch is applied. With proof-of-concept exploits likely to emerge soon, delayed updates risk widespread system compromises. Recommended read:
References :
@learn.aisingapore.org
//
MIT researchers have achieved a breakthrough in artificial intelligence, specifically aimed at enhancing the accuracy of AI-generated code. This advancement focuses on guiding large language models (LLMs) to produce outputs that strictly adhere to the rules and structures of various programming languages, preventing common errors that can cause system crashes. The new technique, developed by MIT and collaborators, ensures that the AI's focus remains on generating valid and accurate code by quickly discarding less promising outputs. This approach not only improves code quality but also significantly boosts computational efficiency.
This efficiency gain allows smaller LLMs to perform better than larger models in producing accurate and well-structured outputs across diverse real-world scenarios, including molecular biology and robotics. The new method tackles issues with existing methods which distort the model’s intended meaning or are too time-consuming for complex tasks. Researchers developed a more efficient way to control the outputs of a large language model, guiding it to generate text that adheres to a certain structure, like a programming language, and remains error free. The implications of this research extend beyond academic circles, potentially revolutionizing programming assistants, AI-driven data analysis, and scientific discovery tools. By enabling non-experts to control AI-generated content, such as business professionals creating complex SQL queries using natural language prompts, this architecture could democratize access to advanced programming and data manipulation. The findings will be presented at the International Conference on Learning Representations. Recommended read:
References :
@www.analyticsvidhya.com
//
OpenAI recently unveiled its groundbreaking o3 and o4-mini AI models, representing a significant leap in visual problem-solving and tool-using artificial intelligence. These models can manipulate and reason with images, integrating them directly into their problem-solving process. This unlocks a new class of problem-solving that blends visual and textual reasoning, allowing the AI to not just see an image, but to "think with it." The models can also autonomously utilize various tools within ChatGPT, such as web search, code execution, file analysis, and image generation, all within a single task flow.
These models are designed to improve coding capabilities, and the GPT-4.1 series includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. GPT-4.1 demonstrates enhanced performance and lower prices, achieving a 54.6% score on SWE-bench Verified, a significant 21.4 percentage point increase from GPT-4o. This is a big gain in practical software engineering capabilities. Most notably, GPT-4.1 offers up to one million tokens of input context, compared to GPT-4o's 128k tokens, making it suitable for processing large codebases and extensive documentation. GPT-4.1 mini and nano also offer performance boosts at reduced latency and cost. The new models are available to ChatGPT Plus, Pro, and Team users, with Enterprise and education users gaining access soon. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks. With Deep Research products and o3/o4-mini, AI-assisted search-based research is now effective. Recommended read:
References :
@www.theapplepost.com
//
References:
Apple Must
, The Apple Post
,
Apple is doubling down on its efforts to deliver top-tier AI capabilities, rallying its teams to "do whatever it takes" to make Apple Intelligence the best it can be. New leadership, including Craig Federighi and Mike Rockwell, have been brought in to revamp Siri and other AI features. The company is reportedly encouraging the use of open-source models, if necessary, signaling a shift in strategy to prioritize performance and innovation over strict adherence to in-house development. This renewed commitment comes after reports of internal conflict and confused decision-making within Apple's AI teams, suggesting a major course correction to meet its ambitious AI goals.
Apple is planning to release its delayed Apple Intelligence features this fall, including Personal Context, Onscreen Awareness, and deeper app integration, according to sources cited by The New York Times. The features were initially announced in March but were later postponed. Personal Context will allow Siri to understand and reference user emails, messages, files, and photos. Onscreen Awareness will enable Siri to respond to what’s currently on the screen, while Deeper App Integration will give Siri the power to perform complex, multi-step actions across apps without manual input. The push for enhanced AI follows reports of internal strife and shifting priorities within Apple's AI development teams. According to The Information, some potentially exciting projects were shelved in favor of smaller projects. Additionally, the impressive feature demo of contextual intelligence Apple showcased at WWDC "came as a surprise" to some Siri team members. Despite past challenges, Apple is determined to deliver on its AI vision, aiming to integrate advanced intelligence seamlessly into its products and services, potentially with the launch of iOS 19. Recommended read:
References :
@x.com
//
References:
IEEE Spectrum
The integration of Artificial Intelligence (AI) into coding practices is rapidly transforming software development, with engineers increasingly leveraging AI to generate code based on intuitive "vibes." Inspired by the approach of Andrej Karpathy, developers like Naik and Touleyrou are using AI to accelerate their projects, creating applications and prototypes with minimal prior programming knowledge. This emerging trend, known as "vibe coding," streamlines the development process and democratizes access to software creation.
Open-source AI is playing a crucial role in these advancements, particularly among younger developers who are quick to embrace new technologies. A recent Stack Overflow survey of over 1,000 developers and technologists reveals a strong preference for open-source AI, driven by a belief in transparency and community collaboration. While experienced developers recognize the benefits of open-source due to their existing knowledge, younger developers are leading the way in experimenting with these emerging technologies, fostering trust and accelerating the adoption of open-source AI tools. To further enhance the capabilities and reliability of AI models, particularly in complex reasoning tasks, Microsoft researchers have introduced inference-time scaling techniques. In addition, Amazon Bedrock Evaluations now offers enhanced capabilities to evaluate Retrieval Augmented Generation (RAG) systems and models, providing developers with tools to assess the performance of their AI applications. The introduction of "bring your own inference responses" allows for the evaluation of RAG systems and models regardless of their deployment environment, while new citation metrics offer deeper insights into the accuracy and relevance of retrieved information. Recommended read:
References :
Kara Sherrer@eWEEK
//
Runway AI Inc. has launched Gen-4, its latest AI video generation model, addressing the significant challenge of maintaining consistent characters and objects across different scenes. This new model represents a considerable advancement in AI video technology and improves the realism and usability of AI-generated videos. Gen-4 allows users to upload a reference image of an object to be included in a video, along with design instructions, and ensures that the object maintains a consistent look throughout the entire clip.
The Gen-4 model empowers users to place any object or subject in different locations while maintaining consistency, and even allows for modifications such as changing camera angles or lighting conditions. The model combines visual references with text instructions to preserve styles throughout videos. Gen-4 is currently available to paying subscribers and Enterprise customers, with additional features planned for future updates. Recommended read:
References :
Michael Nuñez@AI News | VentureBeat
//
OpenAI, the company behind ChatGPT, has announced a significant strategic shift by planning to release its first open-weight AI model since 2019. This move comes amidst mounting economic pressures from competitors like DeepSeek and Meta, whose open-source models are increasingly gaining traction. CEO Sam Altman revealed the plans on X, stating that the new model will have reasoning capabilities and allow developers to run it on their own hardware, departing from OpenAI's cloud-based subscription model.
This decision marks a notable change for OpenAI, which has historically defended closed, proprietary models. The company is now looking to gather developer feedback to make the new model as useful as possible, planning events in San Francisco, Europe and Asia-Pacific. As models improve, startups and developers increasingly want more tunable latency, and want to use on-prem deplouments requiring full data control, according to OpenAI. The shift comes alongside a monumental $40 billion funding round led by SoftBank, which has catapulted OpenAI's valuation to $300 billion. SoftBank will initially invest $10 billion, with the remaining $30 billion contingent on OpenAI transitioning to a for-profit structure by the end of the year. This funding will help OpenAI continue building AI systems that drive scientific discovery, enable personalized education, enhance human creativity, and pave the way toward artificial general intelligence. The release of the open-weight model is expected to help OpenAI compete with the growing number of efficient open-source alternatives and counter the criticisms that have come from remaining a closed model. Recommended read:
References :
Emilia David@AI News | VentureBeat
//
OpenAI has rolled out significant enhancements to ChatGPT, focusing on integrating real-time data access and boosting reasoning skills. A key update is the integration of Google Drive for ChatGPT Team users, allowing access to Docs, Sheets, and Slides directly within conversations. This feature enables ChatGPT to provide more relevant and personalized responses by automatically incorporating context from these tools, respecting existing user permissions, and facilitating seamless, context-rich interactions for improved team productivity and decision-making. Admins can connect their organization's Google Drive workspace to ChatGPT, with controls for smaller and larger teams, ensuring data security and controlled access.
OpenAI has also unveiled a major upgrade to its image generation capabilities directly within ChatGPT. This new feature, powered by GPT-4o, allows users to create detailed, high-quality images through simple chat-based prompts, eliminating the need to switch between different tools. With improved text integration and multi-object rendering, ChatGPT's image generation is now capable of producing photorealistic results and can compete with industry leaders like Midjourney, Google's Imagen 3, and Adobe's Firefly. This update is rolling out to all users, including those on free plans, providing broad accessibility to advanced AI-driven image creation. Recommended read:
References :
Ryan Daws@AI News
//
Anthropic has unveiled a novel method for examining the inner workings of large language models (LLMs) like Claude, offering unprecedented insight into how these AI systems process information and make decisions. Referred to as an "AI microscope," this approach, inspired by neuroscience techniques, reveals that Claude plans ahead when generating poetry, uses a universal internal blueprint to interpret ideas across languages, and occasionally works backward from desired outcomes instead of building from facts. The research underscores that these models are more sophisticated than previously thought, representing a significant advancement in AI interpretability.
Anthropic's research also indicates Claude operates with conceptual universality across different languages and that Claude actively plans ahead. In the context of rhyming poetry, the model anticipates future words to meet constraints like rhyme and meaning, demonstrating a level of foresight that goes beyond simple next-word prediction. However, the research also uncovered potentially concerning behaviors, as Claude can generate plausible-sounding but incorrect reasoning. In related news, Anthropic is reportedly preparing to launch an upgraded version of Claude 3.7 Sonnet, significantly expanding its context window from 200K tokens to 500K tokens. This substantial increase would enable users to process much larger datasets and codebases in a single session, potentially transforming workflows in enterprise applications and coding environments. The expanded context window could further empower vibe coding, enabling developers to work on larger projects without breaking context due to token limits. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |