News from the AI & ML world

DeeperML - #machinelearning

Moonshot AI's Kimi K2 Outperforms GPT-4 Coding Tasks - Moonshot AI released Kimi K2, a trillion-parameter open-source MoE model excelling in long context, code, reasoning, and agentic behavior, outperforming GPT-4 in some coding tasks.

References: venturebeat.com , www.analyticsvidhya.com , www.marktechpost.com ...

Moonshot AI has unveiled Kimi K2, a groundbreaking open-source AI model designed to challenge proprietary systems from industry leaders like OpenAI and Anthropic. This trillion-parameter Mixture-of-Experts (MoE) model boasts a remarkable focus on long context, sophisticated code generation, advanced reasoning capabilities, and agentic behavior, meaning it can autonomously perform complex, multi-step tasks. Kimi K2 is designed to move beyond simply responding to prompts and instead to actively execute actions, utilizing tools and writing code with minimal human intervention.

Kimi K2 has demonstrated superior performance in key benchmarks, particularly in coding and software engineering tasks. On SWE-bench Verified, a challenging benchmark for software development, Kimi K2 achieved an impressive 65.8% accuracy, surpassing many existing open-source models and rivaling some proprietary ones. Furthermore, in LiveCodeBench, a benchmark designed to simulate realistic coding scenarios, Kimi K2 attained 53.7% accuracy, outperforming GPT-4.1 and DeepSeek-V3. The model's strengths extend to mathematical reasoning, where it scored 97.4% on MATH-500, exceeding GPT-4.1's score of 92.4%. These achievements position Kimi K2 as a powerful, accessible alternative for developers and researchers.

The release of Kimi K2 signifies a significant step towards making advanced AI more open and accessible. Moonshot AI is offering two versions of the model: Kimi-K2-Base for researchers and developers seeking customization, and Kimi-K2-Instruct, optimized for chat and agentic applications. The company highlights that Kimi K2's development involved training on over 15.5 trillion tokens and utilizes a custom MuonClip optimizer to ensure stable training at an unprecedented scale. This open-source approach allows the AI community to leverage and build upon this powerful technology, fostering innovation in the development of AI-powered solutions.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

venturebeat.com: Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks â€” and it’s free
www.analyticsvidhya.com: Kimi K2: The Most Powerful Open-Source Agentic Model
MarkTechPost: New AI firm releases Kimi K2 for use
www.marktechpost.com: Moonshot AI Releases Kimiâ€¯K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior
Analytics Vidhya: Remember the flood of open-source Chinese models that disrupted the GenAI industry earlier this year? While DeepSeek took most of the headlines, Kimi K1.5 was one of the prominent names in the list. And the model was quite cool.

@mastodon.acm.org //

Advancements in ML, APL Programming, Computer Graphics - Advancements in machine learning, APL programming, and computer graphics are explored, with a focus on probabilistic machine learning, APL Forge competition, and 3D generative AI at SIGGRAPH 2025.

References: blog.siggraph.org , forge.dyalog.com

Advancements in machine learning, APL programming, and computer graphics are driving innovation across various disciplines. ACM Transactions on Probabilistic Machine Learning (TOPML) is highlighting the importance of probabilistic machine learning with its recently launched journal, featuring high-quality research in the field. The journal's co-editors, Wray Buntine, Fang Liu, and Theodore Papamarkou, share their insights on the significance of probabilistic ML and the journal's mission to advance the field.

The APL Forge competition is encouraging developers to create innovative open-source libraries and commercial applications using Dyalog APL. This annual event aims to enhance awareness and usage of APL by challenging participants to solve problems and develop tools using the language. The competition awards £2,500 (GBP) and an expenses-paid trip to present at the next user meeting, making it a valuable opportunity for APL enthusiasts to showcase their skills and contribute to the community. The deadline for submissions is Monday 22 June 2026.

SIGGRAPH 2025 will showcase advancements in 3D generative AI as part of its Technical Papers program. This year's program received a record number of submissions, highlighting the growing interest in artificial intelligence, large language models, robotics, and 3D modeling in VR. Professor Richard Zhang of Simon Fraser University has been inducted into the ACM SIGGRAPH Academy for his contributions to spectral and learning-based methods for geometric modeling and will be the SIGGRAPH 2025 Technical Papers Chair.

Recommended read:

Top link: mastodon.acm.org
Permalink: More details

References :

blog.siggraph.org: As 3D generative AI matures, itâ€™s reshaping creativity across multiple disciplines. This year, ever-expanding work in the 3D generative AI space will be explored as part of the SIGGRAPH Technical Papers program, including these three novel methods â€” each offering a unique take on how 3D generative AI is being applied. Check out the award-winning research here:
forge.dyalog.com: The 2026 round of the APL Forge is now open! See for more information and to enter this annual competition, which aims to enhance awareness and usage of APL in the community at large by challenging participants to create innovative open-source libraries and commercial applications using Dyalog APL. The deadline for submissions is Monday 22 June 2026.

Kuldeep Jha@Verdict //

Databricks' Agent Bricks Automates AI Agent Development - Databricks introduced Agent Bricks on its Mosaic AI platform to simplify the development and management of AI agents, automating optimization and evaluation processes.

References: www.bigdatawire.com , siliconangle.com , www.infoworld.com ...

Databricks has unveiled Agent Bricks, a new tool designed to streamline the development and deployment of enterprise AI agents. Built on Databricks' Mosaic AI platform, Agent Bricks automates the optimization and evaluation of these agents, addressing the common challenges that prevent many AI projects from reaching production. The tool utilizes large language models (LLMs) as "judges" to assess the reliability of task-specific agents, eliminating manual processes that are often slow, inconsistent, and difficult to scale. Jonathan Frankle, chief AI scientist of Databricks Inc., described Agent Bricks as a generalization of the best practices and techniques observed across various verticals, reflecting how Databricks believes agents should be built.

Agent Bricks originated from the need of Databricks' customers to effectively evaluate their AI agents. Ensuring reliability involves defining clear criteria and practices for comparing agent performance. According to Frankle, AI's inherent unpredictability makes LLM judges crucial for determining when an agent is functioning correctly. This requires ensuring that the LLM judge understands the intended purpose and measurement criteria, essentially aligning the LLM's judgment with that of a human judge. The goal is to create a scaled reinforcement learning system where judges can train an agent to behave as developers intend, reducing the reliance on manually labeled data.

Databricks' new features aim to simplify AI development by using AI to build agents and the pipelines that feed them. Fueled by user feedback, these features include a framework for automating agent building and a no-code interface for creating pipelines for applications. Kevin Petrie, an analyst at BARC U.S., noted that these announcements help Databricks users apply AI and GenAI applications to their proprietary data sets, thereby gaining a competitive advantage. Agent Bricks is currently in beta testing and helps users avoid the trap of "vibe coding" by forcing rigorous testing and evaluation until the model is extremely reliable.

Recommended read:

Top link: Verdict
Permalink: More details

References :

www.bigdatawire.com: Databricks Wants to Take the Pain Out of Building, Deploying AI Agents with Bricks
siliconangle.com: The best judge of artificial intelligence could be AI â€” at least thatâ€™s the idea behind Databricks Inc.â€™s new tool, Agent Bricks.
thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
www.infoworld.com: Databricks has released a beta version of a new agent building interface to help enterprises automate and optimize the agent building process.
thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
AI News | VentureBeat: Databricks Agent Bricks automates enterprise AI agent optimization and evaluation, eliminating manual processes that block production deployments.
SiliconANGLE: The best judge of artificial intelligence could be AI â€” at least thatâ€™s the idea behind Databricks Inc.â€™s new tool, Agent Bricks.
BigDATAwire: Databricks today launched Agent Bricks, a new offering aimed at helping customers AI agent systems up and running quickly, with the cost, safety, and efficiency they demand.
Analytics India Magazine: Databricks also launched MLflow 3.0, a redesigned version of its AI lifecycle management platform.
Verdict: Databricks introduces Agent Bricks for AI agent development
www.verdict.co.uk: Databricks introduces Agent Bricks for AI agent development
techstrong.ai: Highlights Databricks' simplified approach to building and training AI agents.
siliconangle.com: Reveals Databricks' play for AI agents and their data platform strategy.
techstrong.ai: Databricks this week launched a series of initiatives, including a beta release of an Agent Bricks framework that makes it simpler to create and modify artificial intelligence agents using techniques developed by Mosaic AI Research using multiple types of large language models (LLMs).

Kuldeep Jha@Verdict //

Databricks Launches No-Code AI Agent Builder - Databricks launched Agent Bricks, a no-code AI agent builder on its Mosaic AI platform, automating enterprise AI agent optimization and evaluation with research-backed innovations.

References: SiliconANGLE , thenewstack.io , www.bigdatawire.com ...

Databricks has unveiled Agent Bricks, a no-code AI agent builder designed to streamline the development and deployment of enterprise AI agents. Built on Databricks’ Mosaic AI platform, Agent Bricks aims to address the challenge of AI agents failing to reach production due to slow, inconsistent, and difficult-to-scale manual evaluation processes. The platform allows users to request task-specific agents and then automatically generates a series of large language model (LLM) "judges" to assess the agent's reliability. This automation is intended to optimize and evaluate enterprise AI agents, reducing reliance on manual vibe tracking and improving confidence in production-ready deployments.

Agent Bricks incorporates research-backed innovations, including Test-time Adaptive Optimization (TAO), which enables AI tuning without labeled data. Additionally, the platform generates domain-specific synthetic data, creates task-aware benchmarks, and optimizes the balance between quality and cost without manual intervention. Jonathan Frankle, Chief AI Scientist of Databricks Inc., emphasized that Agent Bricks embodies the best engineering practices, styles, and techniques observed in successful agent development, reflecting Databricks' philosophical approach to building agents that are reliable and effective.

The development of Agent Bricks was driven by customer needs to evaluate their agents effectively. Frankle explained that AI's unpredictable nature necessitates LLM judges to evaluate agent performance against defined criteria and practices. Databricks has essentially created scaled reinforcement learning, where judges can train an agent to behave as desired by developers, reducing the reliance on labeled data. Hanlin Tang, Databricks’ Chief Technology Officer of Neural Networks, noted that Agent Bricks aims to give users the confidence to take their AI agents into production.

Recommended read:

Top link: Verdict
Permalink: More details

References :

SiliconANGLE: How Databricksâ€™ Agent Bricks uses AI to judge AI
thenewstack.io: Databricks Launches Agent Bricks, Its New No-Code AI Agent Builder
techstrong.ai: Databricks Simplifies Building and Training of AI Agents
www.bigdatawire.com: Databricks Is Making a Long-Term Play to Fix AIâ€™s Biggest Constraint

Heng Chi@AI Accelerator Institute //

AWS Services for AI and Machine Learning Solutions - AWS is becoming a standard for businesses leveraging AI and NLP, offering services like S3, Lambda, Glue, and SageMaker for high-performance data pipelines, while ZURU improved floor plan generation accuracy by 109% using Amazon Bedrock and SageMaker.

References: AI Talent Development , AI Accelerator Institute

AWS is becoming a standard for businesses looking to leverage AI and NLP through its comprehensive services. An article discusses how to design a high-performance data pipeline using core AWS services like Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. These pipelines are crucial for ingesting, processing, and outputting data for training, inference, and decision-making at a large scale, which is essential for modern AI and NLP applications that rely on data-driven insights and automation. The platform's scalability, flexibility, and cost-efficiency make it a preferred choice for building these pipelines.

AWS offers various advantages, including automatic scaling capabilities that ensure high performance regardless of data volume. Its flexibility and integration features allow seamless connections between services like Amazon S3 for storage, AWS Glue for ETL, and Amazon Redshift for data warehousing. Additionally, AWS’s pay-as-you-go pricing model provides cost-effectiveness, with Reserved Instances and Savings Plans enabling further cost optimization. The platform's reliability and global infrastructure offer a strong foundation for building effective machine learning solutions at every stage of the ML lifecycle.

Generative AI applications, while appearing simple, require a more complex system involving workflows that invoke foundation models (FMs), tools, and APIs, using domain-specific data to ground responses. Organizations are adopting a unified approach to build applications where foundational building blocks are offered as services for developing generative AI applications. This approach facilitates centralized governance and operations, streamlining development, scaling generative AI development, mitigating risk, optimizing costs, and accelerating innovation. A well-established generative AI foundation includes offering a comprehensive set of components to support the end-to-end generative AI application lifecycle.

Recommended read:

Top link: AI Accelerator Institute
Permalink: More details

References :

AI Talent Development: Architect a mature generative AI foundation on AWS
AI Accelerator Institute: Building efficient data pipelines for AI and NLP applications in AWS

Haden Pelletier@Towards Data Science //

Statistical Concepts and Applications in Data Science - Recent discussions in statistics highlight significant concepts and applications relevant to data science, with focus on key papers, historical perspectives, and essential statistical concepts.

References: errorstatistics.com , Towards Data Science , medium.com ...

Recent discussions in statistics highlight significant concepts and applications relevant to data science. A book review explores seminal ideas and controversies in the field, focusing on key papers and historical perspectives. The review mentions Fisher's 1922 paper, which is credited with creating modern mathematical statistics, and discusses debates around hypothesis testing and Bayesian analysis.

Stephen Senn's guest post addresses the concept of "relevant significance" in statistical testing, cautioning against misinterpreting statistical significance as proof of a genuine effect. Senn points out that rejecting a null hypothesis does not necessarily mean it is false, emphasizing the importance of careful interpretation of statistical results.

Furthermore, aspiring data scientists are advised to familiarize themselves with essential statistical concepts for job interviews. These include understanding p-values, Z-scores, and outlier detection methods. A p-value is crucial for hypothesis testing, and Z-scores help identify data points that deviate significantly from the mean. These concepts form a foundation for analyzing data and drawing meaningful conclusions in data science applications.

Recommended read:

Top link: Towards Data Science
Permalink: More details

References :

errorstatistics.com: Stephen Senn (guest post): “Relevant significance? Be careful what you wish for”
Towards Data Science: 5 Statistical Concepts You Need to Know Before Your Next Data Science Interview
Xi'an's Og: Seminal ideas and controversies in Statistics [book review]
medium.com: Statistics for Data Science and Machine Learning
medium.com: Why Data Science Needs Statistics

Ken Yeung@Ken Yeung //

Microsoft's Project Amelie: AI Agent for ML Pipelines - Microsoft launched Project Amelie, an AI agent designed to autonomously build machine learning pipelines from a single prompt, showcasing Microsoft's exploration of AI-driven AI development.

References: PCMag Middle East ai , Ken Yeung

Microsoft is exploring the frontier of AI-driven development with its experimental project, Project Amelie. Unveiled at Build 2025, Amelie is an AI agent designed to autonomously construct machine learning pipelines from a single prompt. This project showcases Microsoft's ambition to create AI that can develop AI, potentially revolutionizing how machine learning engineering tasks are performed. Powered by Microsoft Research's RD agent, Amelie aims to automate and optimize research and development processes in machine learning, eliminating the manual setup work typically handled by data scientists.

Early testing results are promising, with Microsoft reporting that Project Amelie has outperformed current benchmarks on MLE-Bench, a framework for evaluating machine learning agents' effectiveness in real-world tasks. During a live demo at Microsoft Build, Seth Juarez, Principal Program Manager for Microsoft's AI Platform, illustrated how Amelie could function as a "mini data scientist in a box," capable of processing and analyzing data that would typically take human scientists a day and a half to complete. This project has potential for applications in other scenarios where users want AI to carry out complex AI-related tasks.

Should Project Amelie become commercialized, it could significantly advance Microsoft's goals for human-agent collaboration. While Microsoft is not alone in this endeavor, with companies like Google's DeepMind and OpenAI also exploring similar technologies, the project highlights a shift towards AI agents handling complex AI-related tasks independently. Developers interested in exploring the capabilities of Project Amelie can sign up to participate in its private preview, offering a glimpse into the future of AI-driven machine learning pipeline development.

Recommended read:

Top link: Ken Yeung
Permalink: More details

References :

PCMag Middle East ai: Microsoft Adds Gen AI Features to Paint, Snipping Tool, and Notepad
Ken Yeung: Microsoftâ€™s Project Amelie Is an Experiment in â€˜AI Developing AIâ€™

@www.marktechpost.com //

MIT Advances AI Learning and Automation Prediction - MIT researchers are developing AI models for vision and sound connection, algorithms for automation failure prediction, and 'Thinkless,' a framework that reduces unnecessary reasoning in language models by up to 90%.

References: learn.aisingapore.org , news.mit.edu , news.mit.edu ...

MIT researchers are making significant strides in artificial intelligence, focusing on enhancing AI's ability to learn and interact with the world more naturally. One project involves developing AI models that can learn connections between vision and sound without human intervention. This innovative approach aims to mimic how humans learn, by associating what they see with what they hear. The model could be useful in applications such as journalism and film production, where the model could help with curating multimodal content through automatic video and audio retrieval.

The new machine-learning model can pinpoint exactly where a particular sound occurs in a video clip, eliminating the need for manual labeling. By adjusting how the original model is trained, it learns a finer-grained correspondence between a particular video frame and the audio that occurs in that moment. The enhancements improved the model’s ability to retrieve videos based on an audio query and predict the class of an audio-visual scene, like the sound of a roller coaster in action or an airplane taking flight. Researchers also made architectural tweaks that help the system balance two distinct learning objectives, which improves performance.

Additionally, researchers from the National University of Singapore have introduced 'Thinkless,' an adaptive framework designed to reduce unnecessary reasoning in language models. Thinkless reduces unnecessary reasoning by up to 90% using DeGRPO. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This framework equips a language model with the ability to dynamically decide between using short or long-form reasoning, addressing the issue of resource-intensive and wasteful reasoning sequences for simple queries.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

learn.aisingapore.org: AI learns how vision and sound are connected, without human intervention | MIT News
news.mit.edu: AI learns how vision and sound are connected, without human intervention
www.marktechpost.com: Researchers from the National University of Singapore Introduce â€˜Thinkless,â€™ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO
news.mit.edu: Learning how to predict rare kinds of failures
MarkTechPost: Researchers from the National University of Singapore Introduce â€˜Thinkless,â€™ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

News from the AI & ML world

DeeperML - #machinelearning

Moonshot AI's Kimi K2 Outperforms GPT-4 Coding Tasks - Moonshot AI released Kimi K2, a trillion-parameter open-source MoE model excelling in long context, code, reasoning, and agentic behavior, outperforming GPT-4 in some coding tasks.

Advancements in ML, APL Programming, Computer Graphics - Advancements in machine learning, APL programming, and computer graphics are explored, with a focus on probabilistic machine learning, APL Forge competition, and 3D generative AI at SIGGRAPH 2025.

Databricks' Agent Bricks Automates AI Agent Development - Databricks introduced Agent Bricks on its Mosaic AI platform to simplify the development and management of AI agents, automating optimization and evaluation processes.

Databricks Launches No-Code AI Agent Builder - Databricks launched Agent Bricks, a no-code AI agent builder on its Mosaic AI platform, automating enterprise AI agent optimization and evaluation with research-backed innovations.

Statistical Concepts and Applications in Data Science - Recent discussions in statistics highlight significant concepts and applications relevant to data science, with focus on key papers, historical perspectives, and essential statistical concepts.

Microsoft's Project Amelie: AI Agent for ML Pipelines - Microsoft launched Project Amelie, an AI agent designed to autonomously build machine learning pipelines from a single prompt, showcasing Microsoft's exploration of AI-driven AI development.

MIT Advances AI Learning and Automation Prediction - MIT researchers are developing AI models for vision and sound connection, algorithms for automation failure prediction, and 'Thinkless,' a framework that reduces unnecessary reasoning in language models by up to 90%.

Benchmarks

Blogs

Research Tools