News from the AI & ML world

DeeperML - #machinelearning

@mastodon.acm.org //
Advancements in machine learning, APL programming, and computer graphics are driving innovation across various disciplines. ACM Transactions on Probabilistic Machine Learning (TOPML) is highlighting the importance of probabilistic machine learning with its recently launched journal, featuring high-quality research in the field. The journal's co-editors, Wray Buntine, Fang Liu, and Theodore Papamarkou, share their insights on the significance of probabilistic ML and the journal's mission to advance the field.

The APL Forge competition is encouraging developers to create innovative open-source libraries and commercial applications using Dyalog APL. This annual event aims to enhance awareness and usage of APL by challenging participants to solve problems and develop tools using the language. The competition awards £2,500 (GBP) and an expenses-paid trip to present at the next user meeting, making it a valuable opportunity for APL enthusiasts to showcase their skills and contribute to the community. The deadline for submissions is Monday 22 June 2026.

SIGGRAPH 2025 will showcase advancements in 3D generative AI as part of its Technical Papers program. This year's program received a record number of submissions, highlighting the growing interest in artificial intelligence, large language models, robotics, and 3D modeling in VR. Professor Richard Zhang of Simon Fraser University has been inducted into the ACM SIGGRAPH Academy for his contributions to spectral and learning-based methods for geometric modeling and will be the SIGGRAPH 2025 Technical Papers Chair.

Recommended read:
References :
  • blog.siggraph.org: As 3D generative AI matures, it’s reshaping creativity across multiple disciplines. This year, ever-expanding work in the 3D generative AI space will be explored as part of the SIGGRAPH Technical Papers program, including these three novel methods — each offering a unique take on how 3D generative AI is being applied. Check out the award-winning research here:
  • forge.dyalog.com: The 2026 round of the APL Forge is now open! See for more information and to enter this annual competition, which aims to enhance awareness and usage of APL in the community at large by challenging participants to create innovative open-source libraries and commercial applications using Dyalog APL. The deadline for submissions is Monday 22 June 2026.

Kuldeep Jha@Verdict //
Databricks has unveiled Agent Bricks, a new tool designed to streamline the development and deployment of enterprise AI agents. Built on Databricks' Mosaic AI platform, Agent Bricks automates the optimization and evaluation of these agents, addressing the common challenges that prevent many AI projects from reaching production. The tool utilizes large language models (LLMs) as "judges" to assess the reliability of task-specific agents, eliminating manual processes that are often slow, inconsistent, and difficult to scale. Jonathan Frankle, chief AI scientist of Databricks Inc., described Agent Bricks as a generalization of the best practices and techniques observed across various verticals, reflecting how Databricks believes agents should be built.

Agent Bricks originated from the need of Databricks' customers to effectively evaluate their AI agents. Ensuring reliability involves defining clear criteria and practices for comparing agent performance. According to Frankle, AI's inherent unpredictability makes LLM judges crucial for determining when an agent is functioning correctly. This requires ensuring that the LLM judge understands the intended purpose and measurement criteria, essentially aligning the LLM's judgment with that of a human judge. The goal is to create a scaled reinforcement learning system where judges can train an agent to behave as developers intend, reducing the reliance on manually labeled data.

Databricks' new features aim to simplify AI development by using AI to build agents and the pipelines that feed them. Fueled by user feedback, these features include a framework for automating agent building and a no-code interface for creating pipelines for applications. Kevin Petrie, an analyst at BARC U.S., noted that these announcements help Databricks users apply AI and GenAI applications to their proprietary data sets, thereby gaining a competitive advantage. Agent Bricks is currently in beta testing and helps users avoid the trap of "vibe coding" by forcing rigorous testing and evaluation until the model is extremely reliable.

Recommended read:
References :
  • www.bigdatawire.com: Databricks Wants to Take the Pain Out of Building, Deploying AI Agents with Bricks
  • siliconangle.com: The best judge of artificial intelligence could be AI — at least that’s the idea behind Databricks Inc.’s new tool, Agent Bricks.
  • thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
  • www.infoworld.com: Databricks has released a beta version of a new agent building interface to help enterprises automate and optimize the agent building process.
  • thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
  • AI News | VentureBeat: Databricks Agent Bricks automates enterprise AI agent optimization and evaluation, eliminating manual processes that block production deployments.
  • SiliconANGLE: The best judge of artificial intelligence could be AI — at least that’s the idea behind Databricks Inc.’s new tool, Agent Bricks.
  • BigDATAwire: Databricks today launched Agent Bricks, a new offering aimed at helping customers AI agent systems up and running quickly, with the cost, safety, and efficiency they demand.
  • Analytics India Magazine: Databricks also launched MLflow 3.0, a redesigned version of its AI lifecycle management platform.
  • Verdict: Databricks introduces Agent Bricks for AI agent development
  • www.verdict.co.uk: Databricks introduces Agent Bricks for AI agent development
  • techstrong.ai: Highlights Databricks' simplified approach to building and training AI agents.
  • siliconangle.com: Reveals Databricks' play for AI agents and their data platform strategy.
  • techstrong.ai: Databricks this week launched a series of initiatives, including a beta release of an Agent Bricks framework that makes it simpler to create and modify artificial intelligence agents using techniques developed by Mosaic AI Research using multiple types of large language models (LLMs).

Kuldeep Jha@Verdict //
Databricks has unveiled Agent Bricks, a no-code AI agent builder designed to streamline the development and deployment of enterprise AI agents. Built on Databricks’ Mosaic AI platform, Agent Bricks aims to address the challenge of AI agents failing to reach production due to slow, inconsistent, and difficult-to-scale manual evaluation processes. The platform allows users to request task-specific agents and then automatically generates a series of large language model (LLM) "judges" to assess the agent's reliability. This automation is intended to optimize and evaluate enterprise AI agents, reducing reliance on manual vibe tracking and improving confidence in production-ready deployments.

Agent Bricks incorporates research-backed innovations, including Test-time Adaptive Optimization (TAO), which enables AI tuning without labeled data. Additionally, the platform generates domain-specific synthetic data, creates task-aware benchmarks, and optimizes the balance between quality and cost without manual intervention. Jonathan Frankle, Chief AI Scientist of Databricks Inc., emphasized that Agent Bricks embodies the best engineering practices, styles, and techniques observed in successful agent development, reflecting Databricks' philosophical approach to building agents that are reliable and effective.

The development of Agent Bricks was driven by customer needs to evaluate their agents effectively. Frankle explained that AI's unpredictable nature necessitates LLM judges to evaluate agent performance against defined criteria and practices. Databricks has essentially created scaled reinforcement learning, where judges can train an agent to behave as desired by developers, reducing the reliance on labeled data. Hanlin Tang, Databricks’ Chief Technology Officer of Neural Networks, noted that Agent Bricks aims to give users the confidence to take their AI agents into production.

Recommended read:
References :
  • SiliconANGLE: How Databricks’ Agent Bricks uses AI to judge AI
  • thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
  • techstrong.ai: Databricks Simplifies Building and Training of AI Agents
  • www.bigdatawire.com: Databricks Is Making a Long-Term Play to Fix AI’s Biggest Constraint

Heng Chi@AI Accelerator Institute //
AWS is becoming a standard for businesses looking to leverage AI and NLP through its comprehensive services. An article discusses how to design a high-performance data pipeline using core AWS services like Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. These pipelines are crucial for ingesting, processing, and outputting data for training, inference, and decision-making at a large scale, which is essential for modern AI and NLP applications that rely on data-driven insights and automation. The platform's scalability, flexibility, and cost-efficiency make it a preferred choice for building these pipelines.

AWS offers various advantages, including automatic scaling capabilities that ensure high performance regardless of data volume. Its flexibility and integration features allow seamless connections between services like Amazon S3 for storage, AWS Glue for ETL, and Amazon Redshift for data warehousing. Additionally, AWS’s pay-as-you-go pricing model provides cost-effectiveness, with Reserved Instances and Savings Plans enabling further cost optimization. The platform's reliability and global infrastructure offer a strong foundation for building effective machine learning solutions at every stage of the ML lifecycle.

Generative AI applications, while appearing simple, require a more complex system involving workflows that invoke foundation models (FMs), tools, and APIs, using domain-specific data to ground responses. Organizations are adopting a unified approach to build applications where foundational building blocks are offered as services for developing generative AI applications. This approach facilitates centralized governance and operations, streamlining development, scaling generative AI development, mitigating risk, optimizing costs, and accelerating innovation. A well-established generative AI foundation includes offering a comprehensive set of components to support the end-to-end generative AI application lifecycle.

Recommended read:
References :

Haden Pelletier@Towards Data Science //
Recent discussions in statistics highlight significant concepts and applications relevant to data science. A book review explores seminal ideas and controversies in the field, focusing on key papers and historical perspectives. The review mentions Fisher's 1922 paper, which is credited with creating modern mathematical statistics, and discusses debates around hypothesis testing and Bayesian analysis.

Stephen Senn's guest post addresses the concept of "relevant significance" in statistical testing, cautioning against misinterpreting statistical significance as proof of a genuine effect. Senn points out that rejecting a null hypothesis does not necessarily mean it is false, emphasizing the importance of careful interpretation of statistical results.

Furthermore, aspiring data scientists are advised to familiarize themselves with essential statistical concepts for job interviews. These include understanding p-values, Z-scores, and outlier detection methods. A p-value is crucial for hypothesis testing, and Z-scores help identify data points that deviate significantly from the mean. These concepts form a foundation for analyzing data and drawing meaningful conclusions in data science applications.

Recommended read:
References :
  • errorstatistics.com: Stephen Senn (guest post): “Relevant significance? Be careful what you wish for”
  • Towards Data Science: 5 Statistical Concepts You Need to Know Before Your Next Data Science Interview
  • Xi'an's Og: Seminal ideas and controversies in Statistics [book review]
  • medium.com: Statistics for Data Science and Machine Learning
  • medium.com: Why Data Science Needs Statistics

Ken Yeung@Ken Yeung //
Microsoft is exploring the frontier of AI-driven development with its experimental project, Project Amelie. Unveiled at Build 2025, Amelie is an AI agent designed to autonomously construct machine learning pipelines from a single prompt. This project showcases Microsoft's ambition to create AI that can develop AI, potentially revolutionizing how machine learning engineering tasks are performed. Powered by Microsoft Research's RD agent, Amelie aims to automate and optimize research and development processes in machine learning, eliminating the manual setup work typically handled by data scientists.

Early testing results are promising, with Microsoft reporting that Project Amelie has outperformed current benchmarks on MLE-Bench, a framework for evaluating machine learning agents' effectiveness in real-world tasks. During a live demo at Microsoft Build, Seth Juarez, Principal Program Manager for Microsoft's AI Platform, illustrated how Amelie could function as a "mini data scientist in a box," capable of processing and analyzing data that would typically take human scientists a day and a half to complete. This project has potential for applications in other scenarios where users want AI to carry out complex AI-related tasks.

Should Project Amelie become commercialized, it could significantly advance Microsoft's goals for human-agent collaboration. While Microsoft is not alone in this endeavor, with companies like Google's DeepMind and OpenAI also exploring similar technologies, the project highlights a shift towards AI agents handling complex AI-related tasks independently. Developers interested in exploring the capabilities of Project Amelie can sign up to participate in its private preview, offering a glimpse into the future of AI-driven machine learning pipeline development.

Recommended read:
References :
  • PCMag Middle East ai: Microsoft Adds Gen AI Features to Paint, Snipping Tool, and Notepad
  • Ken Yeung: Microsoft’s Project Amelie Is an Experiment in ‘AI Developing AI’

@www.marktechpost.com //
MIT researchers are making significant strides in artificial intelligence, focusing on enhancing AI's ability to learn and interact with the world more naturally. One project involves developing AI models that can learn connections between vision and sound without human intervention. This innovative approach aims to mimic how humans learn, by associating what they see with what they hear. The model could be useful in applications such as journalism and film production, where the model could help with curating multimodal content through automatic video and audio retrieval.

The new machine-learning model can pinpoint exactly where a particular sound occurs in a video clip, eliminating the need for manual labeling. By adjusting how the original model is trained, it learns a finer-grained correspondence between a particular video frame and the audio that occurs in that moment. The enhancements improved the model’s ability to retrieve videos based on an audio query and predict the class of an audio-visual scene, like the sound of a roller coaster in action or an airplane taking flight. Researchers also made architectural tweaks that help the system balance two distinct learning objectives, which improves performance.

Additionally, researchers from the National University of Singapore have introduced 'Thinkless,' an adaptive framework designed to reduce unnecessary reasoning in language models. Thinkless reduces unnecessary reasoning by up to 90% using DeGRPO. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This framework equips a language model with the ability to dynamically decide between using short or long-form reasoning, addressing the issue of resource-intensive and wasteful reasoning sequences for simple queries.

Recommended read:
References :
  • learn.aisingapore.org: AI learns how vision and sound are connected, without human intervention | MIT News
  • news.mit.edu: AI learns how vision and sound are connected, without human intervention
  • www.marktechpost.com: Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO
  • news.mit.edu: Learning how to predict rare kinds of failures
  • MarkTechPost: Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

Carl Franzen@AI News | VentureBeat //
Microsoft has announced the release of Phi-4-reasoning-plus, a new small, open-weight language model designed for advanced reasoning tasks. Building upon the architecture of the previously released Phi-4, this 14-billion parameter model integrates supervised fine-tuning and reinforcement learning to achieve strong performance on complex problems. According to Microsoft, the Phi-4 reasoning models outperform larger language models on several demanding benchmarks, despite their compact size. This new model pushes the limits of small AI, demonstrating that carefully curated data and training techniques can lead to impressive reasoning capabilities.

The Phi-4 reasoning family, consisting of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, is specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Phi-4-reasoning-plus, in particular, extends supervised fine-tuning with outcome-based reinforcement learning, which is targeted for improved performance in high-variance tasks such as competition-level mathematics. All models are designed to enable reasoning capabilities, especially on lower-performance hardware such as mobile devices.

Microsoft CEO Satya Nadella revealed that AI is now contributing to 30% of Microsoft's code. The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access. The model has been released under a permissive MIT license, enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction.

Recommended read:
References :
  • the-decoder.com: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • MarkTechPost: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • THE DECODER: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • AI News | VentureBeat: The release demonstrates that with carefully curated data and training techniques, small models can deliver strong reasoning performance.
  • Maginative: Microsoft’s Phi-4 Reasoning Models Push the Limits of Small AI
  • www.marktechpost.com: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • www.tomshardware.com: Microsoft's CEO reveals that AI writes up to 30% of its code — some projects may have all of its code written by AI
  • Ken Yeung: Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go
  • www.tomsguide.com: Microsoft just unveiled new Phi-4 reasoning AI models — here's why they're a big deal
  • Techzine Global: Microsoft is launching three new advanced small language models as an extension of the Phi series. These models have reasoning capabilities that enable them to analyze and answer complex questions effectively.
  • Analytics Vidhya: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.analyticsvidhya.com: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.windowscentral.com: Microsoft Introduces Phi-4 Reasoning SLM Models — Still "Making Big Leaps in AI" While Its Partnership with OpenAI Frays
  • Towards AI: Phi-4 Reasoning Models
  • the-decoder.com: Microsoft's Phi 4 responds to a simple "Hi" with 56 thoughts
  • Data Phoenix: Microsoft has introduced three new small language models—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.
  • AI News: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.

Adam Zewe@news.mit.edu //
MIT researchers have unveiled a "periodic table of machine learning," a groundbreaking framework that organizes over 20 common machine-learning algorithms based on a unifying algorithm. This innovative approach allows scientists to combine elements from different methods, potentially leading to improved algorithms or the creation of entirely new ones. The researchers believe this framework will significantly fuel further AI discovery and innovation by providing a structured approach to understanding and developing machine learning techniques.

The core concept behind this "periodic table" is that all these algorithms, while seemingly different, learn a specific kind of relationship between data points. Although the way each algorithm accomplishes this may vary, the fundamental mathematics underlying each approach remains consistent. By identifying a unifying equation, the researchers were able to reframe popular methods and arrange them into a table, categorizing each based on the relationships it learns. Shaden Alshammari, an MIT graduate student and lead author of the related paper, emphasizes that this is not just a metaphor, but a structured system for exploring machine learning.

Just like the periodic table of chemical elements, this new framework contains empty spaces, representing algorithms that should exist but haven't been discovered yet. These spaces act as predictions, guiding researchers toward unexplored areas within machine learning. To illustrate the framework's potential, the researchers combined elements from two different algorithms, resulting in a new image-classification algorithm that outperformed current state-of-the-art approaches by 8 percent. The researchers hope that this "periodic table" will serve as a toolkit, allowing researchers to design new algorithms without needing to rediscover ideas from prior approaches.

Recommended read:
References :
  • news.mit.edu: Researchers have created a unifying framework that can help scientists combine existing ideas to improve AI models or create new ones.
  • www.sciencedaily.com: After uncovering a unifying algorithm that links more than 20 common machine-learning approaches, researchers organized them into a 'periodic table of machine learning' that can help scientists combine elements of different methods to improve algorithms or create new ones.
  • techxplore.com: MIT researchers have created a periodic table that shows how more than 20 classical machine-learning algorithms are connected. The new framework sheds light on how scientists could fuse strategies from different methods to improve existing AI models or come up with new ones.
  • learn.aisingapore.org: This article discusses “Periodic table of machine learning†could fuel AI discovery | MIT News

@developer.nvidia.com //
NVIDIA is significantly advancing the capabilities of AI development with the introduction of new tools and technologies. The company's latest innovations focus on enhancing the performance of AI agents, improving integration with various software and hardware platforms, and streamlining the development process for enterprises. These advancements include NVIDIA NeMo microservices for creating data-driven AI agents and a G-Assist plugin builder that enables users to customize AI functionalities on GeForce RTX AI PCs.

NVIDIA's NeMo microservices are designed to empower enterprises to build AI agents that can access and leverage data to enhance productivity and decision-making. These microservices provide a modular platform for building and customizing generative AI models, offering features such as prompt tuning, supervised fine-tuning, and knowledge retrieval tools. NVIDIA envisions these microservices as essential building blocks for creating data flywheels, enabling AI agents to continuously learn and improve from enterprise data, business intelligence, and user feedback. The initial use cases include AI agents used by AT&T to process nearly 10,000 documents and a coding assistant used by Cisco Systems.

The introduction of the G-Assist plugin builder marks a significant step forward in AI-assisted PC control. This tool allows developers to create custom commands to manage both software and hardware functions on GeForce RTX AI PCs. By enabling integration with large language models (LLMs) and other software applications, the plugin builder expands G-Assist's functionality beyond its initial gaming-focused applications. Users can now tailor AI functionalities to suit their specific needs, automating tasks and controlling various PC functions through voice or text commands. The G-Assist tool runs a lightweight language model locally on RTX GPUs, enabling inference without relying on a cloud connection.

Recommended read:
References :
  • developer.nvidia.com: Enhance Your AI Agent with Data Flywheels Using NVIDIA NeMo Microservices
  • www.tomshardware.com: NVIDIA introduces G-Assist plug-in builder, allowing its AI to integrate with LLMs and software
  • developer.nvidia.com: Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM
  • techstrong.ai: NVIDIA Corp. on Wednesday announced general availability of neural module (NeMo) microservices, the software tools behind artificial intelligence (AI) agents for enterprises.
  • the-decoder.com: With its G-Assist tool and a new plug-in builder, Nvidia introduces a system for AI-assisted PC control. Developers can create their own commands to manage both software and hardware functions.

@simonwillison.net //
OpenAI has recently unveiled its latest AI reasoning models, the o3 and o4-mini, marking a significant step forward in the development of AI agents capable of utilizing tools effectively. These models are designed to pause and thoroughly analyze questions before providing a response, enhancing their reasoning capabilities. The o3 model is presented as OpenAI's most advanced in this category, demonstrating superior performance across various benchmarks, including math, coding, reasoning, science, and visual understanding. Meanwhile, the o4-mini model strikes a balance between cost-effectiveness, speed, and overall performance, offering a versatile option for different applications.

OpenAI's o3 and o4-mini are equipped with the ability to leverage tools within the ChatGPT environment, such as web browsing, Python code execution, image processing, and image generation. This integration allows the models to augment their capabilities by cropping or transforming images, searching the web for relevant information, and analyzing data using Python, all within their thought process. A variant of o4-mini, named "o4-mini-high," is also available, catering to users seeking enhanced performance. These models are accessible to subscribers of OpenAI's Pro, Plus, and Team plans, reflecting the company's commitment to providing advanced AI tools to a wide range of users.

Interestingly, the system card for o3 and o4-mini shows that the o3 model tends to make more claims overall. This can lead to both more accurate and more inaccurate claims, including hallucinations, compared to earlier models like o1. OpenAI's internal PersonQA benchmark shows that the hallucination rate increases from 0.16 for o1 to 0.33 for o3. The o3 and o4-mini models also exhibit a limited capability to "sandbag," which, in this context, refers to the model concealing its full capabilities to better achieve a specific goal. Further research is necessary to fully understand the implications of these observations.

Recommended read:
References :
  • Last Week in AI: OpenAI's new GPT-4.1 AI models focus on coding, OpenAI launches a pair of AI reasoning models, o3 and o4-mini, Google's newest Gemini AI model focuses on efficiency, and more!
  • Simon Willison's Weblog: Wrote up some notes on the o3/o4-mini system card, including my frustration at "sandbagging" joining the ever-growing collection of AI terminology with more than one competing definition
  • Towards AI: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia Nemotron-H)
  • composio.dev: OpenAI o3 and o4-mini are out. They are two reasoning state-of-the-art models. They’re expensive, multimodal, and super efficient at tool use. Significantly,
  • pub.towardsai.net: This week, OpenAI finally released its anticipated o3 and o4-mini models, shifting the focus towards AI agents that skillfully use tools.
  • : OpenAI o3 and o4-mini are out. They are two reasoning state-of-the-art models. They’re expensive, multimodal, and super efficient at tool use. Significantly, The post first appeared on.
  • insideAI News: Dataiku Brings AI Agent Creation to AI Platform
  • techstrong.ai: AI Leadership Insights: Tracking and Ranking AI Agents

@www.microsoft.com //
Microsoft Research is delving into the transformative potential of AI as "Tools for Thought," aiming to redefine AI's role in supporting human cognition. At the upcoming CHI 2025 conference, researchers will present four new research papers and co-host a workshop exploring this intersection of AI and human thinking. The research includes a study on how AI is changing the way we think and work along with three prototype systems designed to support different cognitive tasks. The goal is to explore how AI systems can be used as Tools for Thought and reimagine AI’s role in human thinking.

As AI tools become increasingly capable, Microsoft has unveiled new AI agents designed to enhance productivity in various domains. The "Researcher" agent can tackle complex research tasks by analyzing work data, emails, meetings, files, chats, and web information to deliver expertise on demand. Meanwhile, the "Analyst" agent functions as a virtual data scientist, capable of processing raw data from multiple spreadsheets to forecast demand or visualize customer purchasing patterns. The new AI agents unveiled over the past few weeks can help people every day with things like research, cybersecurity and more.

Johnson & Johnson has reportedly found that only a small percentage, between 10% and 15%, of AI use cases deliver the vast majority (80%) of the value. After encouraging employees to experiment with AI and tracking the results of nearly 900 use cases over about three years, the company is now focusing resources on the highest-value projects. These high-value applications include a generative AI copilot for sales representatives and an internal chatbot answering employee questions. Other AI tools being developed include one for drug discovery and another for identifying and mitigating supply chain risks.

Recommended read:
References :

@github.com //
A critical Remote Code Execution (RCE) vulnerability, identified as CVE-2025-32434, has been discovered in PyTorch, a widely used open-source machine learning framework. This flaw, detected by security researcher Ji’an Zhou, undermines the safety of the `torch.load()` function, even when configured with `weights_only=True`. This parameter was previously trusted to prevent unsafe deserialization, making the vulnerability particularly concerning for developers who relied on it as a security measure. The discovery challenges long-standing security assumptions within machine learning workflows.

This vulnerability affects PyTorch versions 2.5.1 and earlier and has been assigned a CVSS v4 score of 9.3, indicating a critical security risk. Attackers can exploit the flaw by crafting malicious model files that bypass deserialization restrictions, allowing them to execute arbitrary code on the target system during model loading. The impact is particularly severe in cloud-based AI environments, where compromised models could lead to lateral movement, data breaches, or data exfiltration. As Ji'an Zhou noted, the vulnerability is paradoxical because developers often use `weights_only=True` to mitigate security issues, unaware that it can still lead to RCE.

To address this critical issue, the PyTorch team has released version 2.6.0. Users are strongly advised to immediately update their PyTorch installations. For systems that cannot be updated immediately, the only viable workaround is to avoid using `torch.load()` with `weights_only=True` entirely. Alternative model-loading methods, such as using explicit tensor extraction tools, are recommended until the patch is applied. With proof-of-concept exploits likely to emerge soon, delayed updates risk widespread system compromises.

Recommended read:
References :

@learn.aisingapore.org //
References: LearnAI , news.mit.edu , techxplore.com ...
MIT researchers have achieved a breakthrough in artificial intelligence, specifically aimed at enhancing the accuracy of AI-generated code. This advancement focuses on guiding large language models (LLMs) to produce outputs that strictly adhere to the rules and structures of various programming languages, preventing common errors that can cause system crashes. The new technique, developed by MIT and collaborators, ensures that the AI's focus remains on generating valid and accurate code by quickly discarding less promising outputs. This approach not only improves code quality but also significantly boosts computational efficiency.

This efficiency gain allows smaller LLMs to perform better than larger models in producing accurate and well-structured outputs across diverse real-world scenarios, including molecular biology and robotics. The new method tackles issues with existing methods which distort the model’s intended meaning or are too time-consuming for complex tasks. Researchers developed a more efficient way to control the outputs of a large language model, guiding it to generate text that adheres to a certain structure, like a programming language, and remains error free.

The implications of this research extend beyond academic circles, potentially revolutionizing programming assistants, AI-driven data analysis, and scientific discovery tools. By enabling non-experts to control AI-generated content, such as business professionals creating complex SQL queries using natural language prompts, this architecture could democratize access to advanced programming and data manipulation. The findings will be presented at the International Conference on Learning Representations.

Recommended read:
References :
  • LearnAI: Making AI-generated code more accurate in any language | MIT News Programmers can now use large language models (LLMs) to generate computer code more quickly. However, this only makes programmers’ lives easier if that code follows the rules of the programming language and doesn’t cause a computer to crash.
  • news.mit.edu: A new technique automatically guides an LLM toward outputs that adhere to the rules of whatever programming language or other format is being used.
  • learn.aisingapore.org: Making AI-generated code more accurate in any language | MIT News
  • techxplore.com: Making AI-generated code more accurate in any language

@www.analyticsvidhya.com //
OpenAI recently unveiled its groundbreaking o3 and o4-mini AI models, representing a significant leap in visual problem-solving and tool-using artificial intelligence. These models can manipulate and reason with images, integrating them directly into their problem-solving process. This unlocks a new class of problem-solving that blends visual and textual reasoning, allowing the AI to not just see an image, but to "think with it." The models can also autonomously utilize various tools within ChatGPT, such as web search, code execution, file analysis, and image generation, all within a single task flow.

These models are designed to improve coding capabilities, and the GPT-4.1 series includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. GPT-4.1 demonstrates enhanced performance and lower prices, achieving a 54.6% score on SWE-bench Verified, a significant 21.4 percentage point increase from GPT-4o. This is a big gain in practical software engineering capabilities. Most notably, GPT-4.1 offers up to one million tokens of input context, compared to GPT-4o's 128k tokens, making it suitable for processing large codebases and extensive documentation. GPT-4.1 mini and nano also offer performance boosts at reduced latency and cost.

The new models are available to ChatGPT Plus, Pro, and Team users, with Enterprise and education users gaining access soon. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks. With Deep Research products and o3/o4-mini, AI-assisted search-based research is now effective.

Recommended read:
References :
  • bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
  • TestingCatalog: OpenAI’s o3 and o4‑mini bring smarter tools and faster reasoning to ChatGPT
  • thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. These models feel incredibly smart.
  • venturebeat.com: OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major advance in visual problem-solving and tool-using artificial intelligence.
  • www.techrepublic.com: OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will get access next week.
  • the-decoder.com: OpenAI's o3 achieves near-perfect performance on long context benchmark
  • the-decoder.com: Safety assessments show that OpenAI's o3 is probably the company's riskiest AI model to date
  • www.unite.ai: Inside OpenAI’s o3 and o4‑mini: Unlocking New Possibilities Through Multimodal Reasoning and Integrated Toolsets
  • thezvi.wordpress.com: Discusses the release of OpenAI's o3 and o4-mini reasoning models and their enhanced capabilities.
  • Simon Willison's Weblog: OpenAI o3 and o4-mini System Card
  • Interconnects: OpenAI's o3: Over-optimization is back and weirder than ever. Tools, true rewards, and a new direction for language models.
  • techstrong.ai: Nobody’s Perfect: OpenAI o3, o4 Reasoning Models Have Some Kinks
  • bsky.app: It's been a couple of years since GPT-4 powered Bing, but with the various Deep Research products and now o3/o4-mini I'm ready to say that AI assisted search-based research actually works now
  • www.analyticsvidhya.com: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
  • pub.towardsai.net: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia Nemotron-H) Also, Grok-3 Mini Shakes Up Cost Efficiency, Codex, Cohere Embed 4, PerceptionLM & more.
  • Last Week in AI: Last Week in AI #307 - GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2
  • composio.dev: OpenAI o3 vs. Gemini 2. 5 Pro vs. o4-mini
  • Towards AI: Details about Open AI's Agentic O3 models

@www.theapplepost.com //
References: Apple Must , The Apple Post ,
Apple is doubling down on its efforts to deliver top-tier AI capabilities, rallying its teams to "do whatever it takes" to make Apple Intelligence the best it can be. New leadership, including Craig Federighi and Mike Rockwell, have been brought in to revamp Siri and other AI features. The company is reportedly encouraging the use of open-source models, if necessary, signaling a shift in strategy to prioritize performance and innovation over strict adherence to in-house development. This renewed commitment comes after reports of internal conflict and confused decision-making within Apple's AI teams, suggesting a major course correction to meet its ambitious AI goals.

Apple is planning to release its delayed Apple Intelligence features this fall, including Personal Context, Onscreen Awareness, and deeper app integration, according to sources cited by The New York Times. The features were initially announced in March but were later postponed. Personal Context will allow Siri to understand and reference user emails, messages, files, and photos. Onscreen Awareness will enable Siri to respond to what’s currently on the screen, while Deeper App Integration will give Siri the power to perform complex, multi-step actions across apps without manual input.

The push for enhanced AI follows reports of internal strife and shifting priorities within Apple's AI development teams. According to The Information, some potentially exciting projects were shelved in favor of smaller projects. Additionally, the impressive feature demo of contextual intelligence Apple showcased at WWDC "came as a surprise" to some Siri team members. Despite past challenges, Apple is determined to deliver on its AI vision, aiming to integrate advanced intelligence seamlessly into its products and services, potentially with the launch of iOS 19.

Recommended read:
References :
  • Apple Must: Apple’s Siri team to do “whatever it takes†to make Apple Intelligence the best it can be
  • The Apple Post: Delayed Apple Intelligence features slated to launch in the fall, report claims
  • jonnyevans: Apple’s Siri team to do “whatever it takes†There may yet be hope for Apple Intelligence as Apple’s AI teams have been instructed to “do whatever it takes†to build the best artificial intelligence features as new Siri team leaders, Craig Federighi, Mike Rockwell, and other A-listers from the crack Apple dev teams get involved.

@x.com //
References: IEEE Spectrum
The integration of Artificial Intelligence (AI) into coding practices is rapidly transforming software development, with engineers increasingly leveraging AI to generate code based on intuitive "vibes." Inspired by the approach of Andrej Karpathy, developers like Naik and Touleyrou are using AI to accelerate their projects, creating applications and prototypes with minimal prior programming knowledge. This emerging trend, known as "vibe coding," streamlines the development process and democratizes access to software creation.

Open-source AI is playing a crucial role in these advancements, particularly among younger developers who are quick to embrace new technologies. A recent Stack Overflow survey of over 1,000 developers and technologists reveals a strong preference for open-source AI, driven by a belief in transparency and community collaboration. While experienced developers recognize the benefits of open-source due to their existing knowledge, younger developers are leading the way in experimenting with these emerging technologies, fostering trust and accelerating the adoption of open-source AI tools.

To further enhance the capabilities and reliability of AI models, particularly in complex reasoning tasks, Microsoft researchers have introduced inference-time scaling techniques. In addition, Amazon Bedrock Evaluations now offers enhanced capabilities to evaluate Retrieval Augmented Generation (RAG) systems and models, providing developers with tools to assess the performance of their AI applications. The introduction of "bring your own inference responses" allows for the evaluation of RAG systems and models regardless of their deployment environment, while new citation metrics offer deeper insights into the accuracy and relevance of retrieved information.

Recommended read:
References :