News from the AI & ML world

DeeperML - #reinforcementlearning

@www.marktechpost.com //
Microsoft Research has unveiled ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a reinforcement learning framework designed to enhance Large Language Models (LLMs) with agentic reasoning and dynamic tool use. This framework addresses the limitations of current RL-enhanced LLMs, which often rely on static internal knowledge and text-only reasoning, making them unsuitable for tasks requiring real-time information, domain-specific expertise, or precise computations. ARTIST enables models to autonomously decide when, how, and which tools to use, allowing for more effective reasoning strategies that adapt dynamically to a task’s complexity.

Microsoft researchers have also conducted a comparison of API-based and GUI-based AI agents, revealing the distinct advantages of each approach. API agents, which interact with software through programmable interfaces, are found to be faster, more stable, and less error-prone as they complete tasks via direct function calls. GUI agents, on the other hand, mimic human interactions with software interfaces, navigating menus and clicking buttons on a screen. While GUI agents may require multiple actions to accomplish the same goal, their versatility allows them to control almost any software with a visible interface, even without an API.

In a move to foster interoperability across platforms, Microsoft has announced support for the open Agent2Agent (A2A) protocol. This protocol empowers multi-agent applications by enabling structured agent communication, including the exchange of goals, management of state, invocation of actions, and secure return of results. A2A is set to be integrated into Azure AI Foundry and Copilot Studio, allowing developers to build agents that interoperate across clouds and frameworks while maintaining enterprise-grade security and compliance. Microsoft aims to empower both pro and citizen developers to create agents that can orchestrate tasks across diverse environments.

Recommended read:
References :
  • the-decoder.com: Microsoft finds API agents are faster but GUI agents more flexible
  • www.marktechpost.com: Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use
  • www.microsoft.com: Empowering multi-agent apps with the open Agent2Agent (A2A) protocol

@the-decoder.com //
OpenAI is making strides in AI customization and application development with the release of Reinforcement Fine-Tuning (RFT) on its o4-mini reasoning model and the appointment of Fidji Simo as the CEO of Applications. The RFT release allows organizations to tailor their versions of the o4-mini model to specific tasks using custom objectives and reward functions, marking a significant advancement in model optimization. This approach utilizes reinforcement learning principles, where developers provide a task-specific grader that evaluates and scores model outputs based on custom criteria, enabling the model to optimize against a reward signal and align with desired behaviors.

Reinforcement Fine-Tuning is particularly valuable for complex or subjective tasks where ground truth is difficult to define. By using RFT on o4-mini, a compact reasoning model optimized for text and image inputs, developers can fine-tune for high-stakes, domain-specific reasoning tasks while maintaining computational efficiency. Early adopters have demonstrated the practical potential of RFT. This capability allows developers to tweak the model to better fit their needs using OpenAI's platform dashboard, deploy it through OpenAI's API, and connect it to internal systems.

In a move to scale its AI products, OpenAI has appointed Fidji Simo, formerly CEO of Instacart, as the CEO of Applications. Simo will oversee the scaling of AI products, leveraging her extensive experience in consumer tech to drive revenue generation from OpenAI's research and development efforts. Previously serving on OpenAI's board of directors, Simo's background in leading development at Facebook suggests a focus on end-users rather than businesses, potentially paving the way for new subscription services and products aimed at a broader audience. OpenAI is also rolling out a new GitHub connector for ChatGPT's deep research agent, allowing users with Plus, Pro, or Team subscriptions to connect their repositories and ask questions about their code.

Recommended read:
References :
  • AI News | VentureBeat: You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning
  • www.computerworld.com: OpenAI was founded a decade ago with a focus on research, but it has since expanded into products and infrastructure. Now it is looking to again broaden its presence into user-facing apps. The company announced this week that Fidji Simo will join as CEO of applications, a newly-created position. Simo is the current CEO and chair at grocery delivery company Instacart. She will begin her new role at OpenAI later this year, reporting directly to Sam Altman, who will remain overall CEO and oversee research, compute, and applications.
  • the-decoder.com: OpenAI has appointed Fidji Simo as CEO of its new Applications division, reporting directly to OpenAI CEO Sam Altman.
  • www.marktechpost.com: OpenAI Releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization
  • the-decoder.com: OpenAI is expanding its fine-tuning program for o4-mini, introducing Reinforcement Fine-Tuning (RFT) for organizations. The method is designed to help tailor models like o4-mini to highly specific tasks with the help of a programmable grading system.
  • Maginative: OpenAI brings reinforcement fine-tuning and GPT-4.1 Nano Fine-Tuning in the API
  • MarkTechPost: OpenAI Releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization
  • Techzine Global: OpenAI opens the door to reinforcement fine-tuning for o4-mini
  • THE DECODER: OpenAI is expanding its fine-tuning program for o4-mini, introducing Reinforcement Fine-Tuning (RFT) for organizations. The method is designed to help tailor models like o4-mini to highly specific tasks with the help of a programmable grading system.
  • AI News | VentureBeat: Last night, OpenAI published a blog post on its official website authored by CEO and co-founder Sam Altman announcing a major new hire: Fidji Simo, currently CEO and Chair at grocery delivery company Instacart, will join OpenAI as CEO of Applications, a newly created executive position. Simo will …
  • techxplore.com: OpenAI offers to help countries build AI systems
  • The Register - Software: OpenAI drafts Instacart boss as CEO of Apps to lure in the normies

@the-decoder.com //
OpenAI is expanding its global reach through strategic partnerships with governments and the introduction of advanced model customization tools. The organization has launched the "OpenAI for Countries" program, an initiative designed to collaborate with governments worldwide on building robust AI infrastructure. This program aims to assist nations in setting up data centers and adapting OpenAI's products to meet local language and specific needs. OpenAI envisions this initiative as part of a broader global strategy to foster cooperation and advance AI capabilities on an international scale.

This expansion also includes technological advancements, with OpenAI releasing Reinforcement Fine-Tuning (RFT) for its o4-mini reasoning model. RFT enables enterprises to fine-tune their own versions of the model using reinforcement learning, tailoring it to their unique data and operational requirements. This allows developers to customize the model to better fit their needs using OpenAI’s platform dashboard, tweaking it for internal terminology, goals, processes and more. Once deployed, if an employee or leader at the company wants to use it through a custom internal chatbot orcustom OpenAI GPTto pull up private, proprietary company knowledge, answer specific questions about company products and policies, or generate new communications and collateral in the company’s voice, they can do so more easily with their RFT version of the model.

The "OpenAI for Countries" program is slated to begin with ten international projects, supported by funding from both OpenAI and participating governments. Chris Lehane, OpenAI's vice president of global policy, indicated that the program was inspired by the AI Action Summit in Paris, where several countries expressed interest in establishing their own "Stargate"-style projects. Moreover, the release of RFT on o4-mini signifies a major step forward in custom model optimization, offering developers a powerful new technique for tailoring foundation models to specialized tasks. This allows for fine-grained control over how models improve, by defining custom objectives and reward functions.

Recommended read:
References :
  • the-decoder.com: OpenAI launches a program to partner with governments on global AI infrastructure
  • AI News | VentureBeat: You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning
  • www.marktechpost.com: OpenAI releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization
  • MarkTechPost: OpenAI Releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization
  • AI News | VentureBeat: OpenAI names Instacart leader Fidji Simo as new CEO of Applications
  • techxplore.com: OpenAI offers to help countries build AI systems
  • THE DECODER: OpenAI adds new fine-tuning options for o4-mini and GPT-4.1
  • the-decoder.com: OpenAI is expanding its fine-tuning program for o4-mini, introducing Reinforcement Fine-Tuning (RFT) for organizations.
  • Techzine Global: OpenAI opens the door to reinforcement fine-tuning for o4-mini
  • Maginative: OpenAI Brings Reinforcement Fine-Tuning and GPT-4.1 Nano Fine-Tuning in the API

Tor Constantino,@Tor Constantino //
The rise of AI agents is gaining significant momentum, attracting substantial interest and creating new job opportunities across various industries. Recent publications and industry initiatives highlight the transformative potential of AI agents in automating complex tasks and optimizing existing workflows. IBM, for instance, has launched a major agentic AI initiative, introducing a suite of domain-specific AI agents that can be integrated using the watsonx Orchestrate framework, aiming to provide comprehensive observability capabilities across the entire agent lifecycle, while UiPath has launched a next-gen platform for agentic automation designed to orchestrate AI agents, robots, and humans on a single intelligent system to autonomously manage complex tasks across enterprise environments.

AI agents are evolving from simple tools into sophisticated systems capable of reasoning, adapting, and collaborating in more human-like ways. IBM is providing a range of tools that enable organizations to build their agents in minutes. Local AI agents are also gaining traction, offering customization and enhanced privacy by allowing users to run powerful, customizable AI models on their own computers. Tools like Ollama and Langflow are simplifying the process of building and deploying local AI agents, making it accessible to individuals without extensive coding expertise. Outshift by Cisco has achieved a 10x productivity boost with their Agentic AI Platform Engineer, demonstrating the potential of AI agents to significantly improve operational efficiency and reduce turnaround times by automating commonly requested developer tasks.

These advancements are paving the way for a new era of intelligent automation, where AI agents can seamlessly integrate into existing business processes and augment human capabilities. The evolution of AI agents is not only transforming enterprise automation but also unlocking new possibilities for innovation and productivity across various sectors. As the demand for AI agents continues to grow, professionals with expertise in their design, deployment, and orchestration will be highly sought after, making it essential to understand the foundational concepts and advanced implementation strategies of agentic AI.

Recommended read:
References :
  • Tor Constantino: Mastercard and Visa debut AI agents that can research, recommend and pay for purchases — ushering in a new era of autonomous shopping and agentic commerce.
  • learn.aisingapore.org: of AI agents has taken the world by storm. Agents can interact with the world around them, write articles (not this one though), take actions on your behalf, and generally make the difficult part of automating any task easy and approachable.  Agents take aim at the most difficult parts of processes and churn through the...
  • Upward Dynamism: AI agents are the next evolutionary step of ChatGPT & Co. Knowing how they work, their real use cases, strengths and limits is this simple.
  • www.marktechpost.com: In today’s fast-paced financial landscape, leveraging specialized AI agents to handle discrete aspects of analysis is key to delivering timely, accurate insights. Agno’s lightweight, model-agnostic framework empowers developers to rapidly spin up purpose-built agents, such as our Finance Agent for structured market data and Risk Assessment Agent for volatility and sentiment analysis, without boilerplate or
  • Upward Dynamism: 15-Min Guide: Local AI Agents on Your PC with Ollama & Langflow
  • twimlai.com: Podcast interview with Josh Tobin discussing OpenAI's approach to building AI agents.
  • Dremio: Blog post discussing the Model Context Protocol (MCP) as an interoperability layer for AI agents.
  • The Register - Software: AI agents promise big things. How can we support them?
  • The Rundown AI: Exclusive: UiPath launches next-gen platform for 'Agentic Automation'
  • Data Phoenix: FutureHouse launches platform with "superintelligent" scientific AI agents
  • the-decoder.com: Bytedance launches Agent TARS, an open-source AI automation agent

@www.quantamagazine.org //
Recent developments in the field of large language models (LLMs) are focusing on enhancing reasoning capabilities through reinforcement learning. This approach aims to improve model accuracy and problem-solving, particularly in challenging tasks. While some of the latest LLMs, such as GPT-4.5 and Llama 4, were not explicitly trained using reinforcement learning for reasoning, the release of OpenAI's o3 model shows that strategically investing in compute and tailored reinforcement learning methods can yield significant improvements.

Competitors like xAI and Anthropic have also been incorporating more reasoning features into their models, such as the "thinking" or "extended thinking" button in xAI Grok and Anthropic Claude. The somewhat muted response to GPT-4.5 and Llama 4, which lack explicit reasoning training, suggests that simply scaling model size and data may be reaching its limits. The field is now exploring ways to make language models work better, including the use of reinforcement learning.

One of the ways that researchers are making language models work better is to sidestep the requirement for language as an intermediary step. Language isn't always necessary, and that having to turn ideas into language can slow down the thought process. LLMs process information in mathematical spaces, within deep neural networks, however, they must often leave this latent space for the much more constrained one of individual words. Recent papers suggest that deep neural networks can allow language models to continue thinking in mathematical spaces before producing any text.

Recommended read:
References :
  • pub.towardsai.net: The article discusses the application of reinforcement learning to improve the reasoning abilities of LLMs.
  • Sebastian Raschka, PhD: This blog post delves into the current state of reinforcement learning in enhancing LLM reasoning capabilities, highlighting recent advancements and future expectations.
  • Quanta Magazine: This article explores the use of reinforcement learning to make Language Models work better, especially in challenging reasoning tasks.

Megan Crouse@techrepublic.com //
Researchers from DeepSeek and Tsinghua University have recently made significant advancements in AI reasoning capabilities. By combining Reinforcement Learning with a self-reflection mechanism, they have created AI models that can achieve a deeper understanding of problems and solutions without needing external supervision. This innovative approach is setting new standards for AI development, enabling models to reason, self-correct, and explore alternative solutions more effectively. The advancements showcase that outstanding performance and efficiency don’t require secrecy.

Researchers have implemented the Chain-of-Action-Thought (COAT) approach in these enhanced AI models. This method leverages special tokens such as "continue," "reflect," and "explore" to guide the model through distinct reasoning actions. This allows the AI to navigate complex reasoning tasks in a more structured and efficient manner. The models are trained in a two-stage process.

DeepSeek has also released papers expanding on reinforcement learning for LLM alignment. Building off prior work, they introduce Rejective Fine-Tuning (RFT) and Self-Principled Critique Tuning (SPCT). The first method, RFT, has a pre-trained model produce multiple responses and then evaluates and assigns reward scores to each response based on generated principles, helping the model refine its output. The second method, SPCT, uses reinforcement learning to improve the model’s ability to generate critiques and principles without human intervention, creating a feedback loop where the model learns to self-evaluate and improve its reasoning capabilities.

Recommended read:
References :
  • hlfshell: DeepSeek released another cool paper expanding on reinforcement learning for LLM alignment. Building off of their prior work (which I talk about here), they introduce two new methods.
  • www.techrepublic.com: Researchers from DeepSeek and Tsinghua University say combining two techniques improves the answers the large language model creates with computer reasoning techniques.

@www.analyticsvidhya.com //
Google's DeepMind has achieved a significant breakthrough in artificial intelligence with its Dreamer AI system. The AI has successfully mastered the complex task of mining diamonds in Minecraft without any explicit human instruction. This feat, accomplished through trial-and-error reinforcement learning, demonstrates the AI's ability to self-improve and generalize knowledge from one scenario to another, mimicking human-like learning processes. The achievement is particularly noteworthy because Minecraft's randomly generated worlds present a unique challenge, requiring the AI to adapt and understand its environment rather than relying on memorized strategies.

Mining diamonds in Minecraft is a complex, multi-step process that typically requires players to gather resources to build tools, dig to specific depths, and avoid hazards like lava. The Dreamer AI system tackled this challenge by exploring the game environment and identifying actions that would lead to rewards, such as finding diamonds. By repeating successful actions and avoiding less productive ones, the AI quickly learned to navigate the game and achieve its goal. According to Jeff Clune, a computer scientist at the University of British Columbia, this represents a major step forward for the field of AI.

The Dreamer AI system, developed by Danijar Hafner, Jurgis Pasukonis, Timothy Lillicrap and Jimmy Ba, achieved expert status in Minecraft in just nine days, showcasing its rapid learning capabilities. One unique approach used during training was to restart the game with a new virtual universe every 30 minutes, forcing the algorithm to constantly adapt and improve. This innovative method allowed the AI to quickly master the game's mechanics and develop strategies for diamond mining without any prior training or human intervention, pushing the boundaries of what AI can achieve in dynamic and complex environments.

Recommended read:
References :
  • techxplore.com: Google's AI Dreamer learns how to self-improve over time by mastering Minecraft
  • Analytics Vidhya: What if I told you that AI can now outperform humans in some of the most complex video games? AI now masters Minecraft too.
  • eWEEK: The new Dreamer AI system figured out how to conduct the multi-step process of mining diamonds without being taught how to play Minecraft.
  • www.scientificamerican.com: The Dreamer AI system of Google's DeepMind reached the milestone of mastering Minecraft by ‘imagining’ the future impact of possible decisions