News from the AI & ML world

DeeperML

@www.theapplepost.com //
References: Ken Yeung
Google is expanding its use of Gemini AI to revolutionize advertising on YouTube with a new product called "Peak Points," announced at the YouTube Brandcast event in New York. This AI-powered feature analyzes videos to pinpoint moments of maximum viewer engagement, strategically inserting ads at these "peak points." The goal is to improve ad performance by targeting viewers when they are most emotionally invested or attentive, potentially leading to better ad recall and effectiveness for marketers.

This new approach to ad placement signifies a shift from traditional contextual targeting, where ads are placed based on general video metadata or viewer history. Gemini AI provides a more granular analysis, identifying specific timestamps within a video where engagement spikes. This allows YouTube to not only understand what viewers are watching but also how they are watching it, gathering real-time attention data. This data has far-reaching implications, potentially influencing algorithmic recommendations, content development, talent discovery, and platform control.

For content creators, Peak Points fundamentally changes monetization strategies. The traditional mid-roll ad insertion at default intervals will be replaced by Gemini's assessment of content's engagement level. Creators will now be incentivized to create content that not only retains viewers but also generates attention spikes at specific moments. Marketers, on the other hand, are shifting from buying against content to buying against engagement, necessitating a reevaluation of brand safety, storytelling, and overall campaign outcomes in this new attention-based economy.

Recommended read:
References :
  • Ken Yeung: It’s been a year since Google introduced AI Overview to its widely used search engine.

@learn.aisingapore.org //
MIT researchers have uncovered a critical flaw in vision-language models (VLMs) that could have serious consequences in high-stakes environments like medical diagnosis. The study, published May 14, 2025, reveals that these AI models, widely used to analyze medical images, struggle with negation words such as "no" and "not." This deficiency causes them to misinterpret queries, leading to potentially catastrophic errors when retrieving images based on the absence of certain objects. An example provided highlights the case of a radiologist using a VLM to find reports of patients with tissue swelling but without an enlarged heart, the model incorrectly identifying reports with both conditions, leading to an inaccurate diagnosis.

Researchers tested the ability of vision-language models to identify negation in image captions and found the models often performed as well as a random guess. To address this issue, the MIT team created a dataset of images with corresponding captions that include negation words describing missing objects. Retraining a vision-language model with this dataset resulted in improved performance when retrieving images that do not contain specific objects, and also boosted accuracy on multiple choice question answering with negated captions.

Kumail Alhamoud, the lead author of the study, emphasized the significant impact of negation words and the potential for catastrophic consequences if these models are used blindly. While the researchers were able to improve model performance through retraining, they caution that more work is needed to address the root causes of this problem. They hope their findings will alert potential users to this previously unnoticed shortcoming, especially in settings where these models are used to determine patient treatments or identify product defects. Marzyeh Ghassemi, the senior author, warned against using large vision/language models without intensive evaluation if something as fundamental as negation is broken.

Recommended read:
References :
  • learn.aisingapore.org: Study shows vision-language models can’t handle queries with negation words | MIT News
  • www.sciencedaily.com: Study shows vision-language models can't handle queries with negation words

Rowan Cheung@The Rundown AI //
Google is aggressively expanding its Gemini AI across a multitude of devices, signifying a major push to create a seamless AI ecosystem. The tech giant aims to integrate Gemini into everyday experiences by bringing the AI assistant to smartwatches running Wear OS, Android Auto for in-car assistance, Google TV for enhanced entertainment, and even upcoming XR headsets developed in collaboration with Samsung. This expansion aims to provide users with a consistent and powerful AI layer connecting all their devices, allowing for natural voice interactions and context-based conversations across different platforms.

Google's vision for Gemini extends beyond simple voice commands, the AI assistant will offer a range of features tailored to each device. On smartwatches, Gemini will provide convenient access to information and app interactions without needing to take out a phone. In Android Auto, Gemini will replace the current Google voice assistant, enabling more sophisticated tasks like planning routes with charging stops or summarizing messages. For Google TV, the AI will offer personalized content recommendations and educational answers, while on XR headsets, Gemini will facilitate immersive experiences like planning trips using videos, maps, and local information.

In addition to expanding Gemini's presence across devices, Google is also experimenting with its search interface. Reports indicate that Google is testing replacing the "I'm Feeling Lucky" button on its homepage with an "AI Mode" button. This move reflects Google's strategy to keep users engaged on its platform by offering direct access to conversational AI responses powered by Gemini. The AI Mode feature builds on the existing AI Overviews, providing detailed AI-generated responses to search queries on a dedicated results page, further emphasizing Google's commitment to integrating AI into its core services.

Recommended read:
References :
  • the-decoder.com: Google brings Gemini AI to smartwatches, cars, TVs, and XR headsets
  • The Rundown AI: Google's Gemini AI expands across devices
  • www.tomsguide.com: Google is taking Gemini beyond smartphones — here’s what’s coming

@www.aiwire.net //
References: www.aiwire.net , BigDATAwire
SAS is making a significant push towards accountable AI agents, emphasizing ethical oversight and governance within its SAS Viya platform. At SAS Innovate 2025 in Orlando, the company outlined its vision for intelligent decision automation, highlighting its long-standing work in this area. Unlike other tech vendors focused on quantity, SAS CTO Bryan Harris stresses the importance of decision quality, arguing that the value of decisions to the business is the key metric. SAS defines agentic AI as systems that blend reasoning, analytics, and embedded governance to make autonomous decisions with transparency and human oversight when needed.

SAS views Large Language Models (LLMs) as valuable but limited components within a broader AI ecosystem. Udo Sglavo, VP of applied AI and modeling R&D at SAS, describes the agentic AI push as a natural evolution from the company's consulting-driven past. SAS aims to take its extensive IP from solving similar challenges repeatedly and incorporate it into software products. This shift from services to scalable solutions is accelerated by increased customer comfort with prepackaged models, leading to wider adoption of agent-based systems.

SAS emphasizes that LLMs are only one piece of a larger entity, stating that decision quality and ethical considerations are paramount. Bryan Harris noted that LLMs can be unpredictable, which makes them unsuitable for high-stakes applications where auditability and control are critical. The focus on accountable AI agents ensures that enterprises can deploy AI systems that act autonomously while maintaining the necessary transparency and oversight.

Recommended read:
References :
  • www.aiwire.net: Reports on SAS's push to make AI agents accountable, highlighting the company's focus on ethical oversight.
  • BigDATAwire: Informatica Goes All-In on AI Agents for Data Management

Aminu Abdullahi@eWEEK //
References: bsky.app , eWEEK ,
Apple is exploring groundbreaking technology to enable users to control iPhones, iPads, and Vision Pro headsets with their thoughts, marking a significant leap towards hands-free device interaction. The company is partnering with Synchron, a brain-computer interface (BCI) startup, to develop a universal standard for translating neural activity into digital commands. This collaboration aims to empower individuals with disabilities, such as ALS and severe spinal cord injuries, allowing them to navigate and operate their devices without physical gestures.

Apple's initiative involves Synchron's Stentrode, a stent-like implant placed in a vein near the brain's motor cortex. This device picks up neural activity and translates it into commands, enabling users to select icons on a screen or navigate virtual environments. The brain signals work in conjunction with Apple's Switch Control feature, a part of its operating system designed to support alternative input devices. While early users have noted the interface is slower compared to traditional methods, Apple plans to introduce a dedicated software standard later this year to simplify the development of BCI tools and improve performance.

In addition to BCI technology, Apple is also focusing on enhancing battery life in future iPhones through artificial intelligence. The upcoming iOS 19 is expected to feature an AI-powered battery optimization mode that learns user habits and manages app energy usage accordingly. This feature is particularly relevant for the iPhone 17 Air, where it will help offset the impact of a smaller battery. Furthermore, Apple is reportedly exploring the use of advanced memory technology and innovative screen designs for its 20th-anniversary iPhone in 2027, aiming for faster AI processing and extended battery life.

Recommended read:
References :
  • bsky.app: Do you want to control your iPhone with your brain? You might soon be able to. Apple has partnered with brain-computer interface startup Synchron to explore letting people with disabilities or diseases like ALS control their iPhones using decoded brain signals:
  • eWEEK: Apple is developing technology that will allow users to control iPhones, iPads, and Vision Pro headsets with their brain signals, marking a major step toward hands-free, thought-driven device interaction.
  • www.techradar.com: Apple’s move into brain-computer interfaces could be a boon for those with disabilities.

Berry Zwets@Techzine Global //
Databricks has announced its acquisition of Neon, an open-source database startup specializing in serverless Postgres, in a deal reportedly valued at $1 billion. This strategic move is aimed at enhancing Databricks' AI infrastructure, specifically addressing the database bottleneck that often hampers the performance of AI agents. Neon's technology allows for the rapid creation and deployment of database instances, spinning up new databases in milliseconds, which is critical for the speed and scalability required by AI-driven applications. The integration of Neon's serverless Postgres architecture will enable Databricks to provide a more streamlined and efficient environment for building and running AI agents.

Databricks plans to incorporate Neon's scalable Postgres offering into its existing big data platform, eliminating the need to scale separate server and storage components in tandem when responding to AI workload spikes. This resolves a common issue in modern cloud architectures where users are forced to over-provision either compute or storage to meet the demands of the other. With Neon's serverless architecture, Databricks aims to provide instant provisioning, separation of compute and storage, and API-first management, enabling a more flexible and cost-effective solution for managing AI workloads. According to Databricks, Neon reports that 80% of its database instances are provisioned by software rather than humans.

The acquisition of Neon is expected to give Databricks a competitive edge, particularly against competitors like Snowflake. While Snowflake currently lacks similar AI-driven database provisioning capabilities, Databricks' integration of Neon's technology positions it as a leader in the next generation of AI application building. The combination of Databricks' existing data intelligence platform with Neon's serverless Postgres database will allow for the programmatic provisioning of databases in response to the needs of AI agents, overcoming the limitations of traditional, manually provisioned databases.

Recommended read:
References :
  • Databricks: Today, we are excited to announce that we have agreed to acquire Neon, a developer-first, serverless Postgres company.
  • www.infoworld.com: Databricks to acquire open-source database startup Neon to build the next wave of AI agents
  • www.bigdatawire.com: Databricks Nabs Neon to Solve AI Database Bottleneck
  • Dataconomy: Databricks has agreed to acquire Neon, an open-source database startup, for approximately $1 billion.
  • BigDATAwire: Databricks today announced its intent to buy Neon, a database startup founded by Nikita Shamgunov that develops a serverless and infinitely scalable version of the open source Postgres database.
  • Techzine Global: Neon’s technology can spin up a Postgres instance in less than 500 milliseconds, which is crucial for AI agents’ fast working methods.
  • AI News | VentureBeat: The $1 Billion database bet: What Databricks’ Neon acquisition means for your AI strategy

Kevin Okemwa@windowscentral.com //
OpenAI has launched GPT-4.1 and GPT-4.1 mini, the latest iterations of its language models, now integrated into ChatGPT. This upgrade aims to provide users with enhanced coding and instruction-following capabilities. GPT-4.1, available to paid ChatGPT subscribers including Plus, Pro, and Team users, excels at programming tasks and provides a smarter, faster, and more useful experience, especially for coders. Additionally, Enterprise and Edu users are expected to gain access in the coming weeks.

GPT-4.1 mini, on the other hand, is being introduced to all ChatGPT users, including those on the free tier, replacing the previous GPT-4o mini model. It serves as a fallback option when GPT-4o usage limits are reached. OpenAI says GPT-4.1 mini is a "fast, capable, and efficient small model". This approach democratizes access to improved AI, ensuring that even free users benefit from advancements in language model technology.

Both GPT-4.1 and GPT-4.1 mini demonstrate OpenAI's commitment to rapidly advancing its AI model offerings. Initial plans were to release GPT-4.1 via API only for developers, but strong user feedback changed that. The company claims GPT-4.1 excels at following specific instructions, is less "chatty", and is more thorough than older versions of GPT-4o. OpenAI also notes that GPT-4.1's safety performance is at parity with GPT-4o, showing improvements can be delivered without new safety risks.

Recommended read:
References :
  • Maginative: OpenAI has integrated its GPT-4.1 model into ChatGPT, providing enhanced coding and instruction-following capabilities to paid users, while also introducing GPT-4.1 mini for all users.
  • pub.towardsai.net: AI Passes Physician-Level Responses in OpenAI’s HealthBench
  • THE DECODER: OpenAI is rolling out its GPT-4.1 model to ChatGPT, making it available outside the API for the first time.
  • AI News | VentureBeat: OpenAI is rolling out GPT-4.1, its new non-reasoning large language model (LLM) that balances high performance with lower cost, to users of ChatGPT.
  • www.zdnet.com: OpenAI's HealthBench shows AI's medical advice is improving - but who will listen?
  • www.techradar.com: OpenAI just gave ChatGPT users a huge free upgrade – 4.1 mini is available today
  • Simon Willison's Weblog: GPT-4.1 will be available directly in ChatGPT starting today. GPT-4.1 is a specialized model that excels at coding tasks & instruction following.
  • www.windowscentral.com: OpenAI is bringing GPT-4.1 and GPT-4.1 minito ChatGPT, and the new AI models excel in web development and coding tasks compared to OpenAI o3 & o4-mini.
  • www.zdnet.com: GPT-4.1 makes ChatGPT smarter, faster, and more useful for paying users, especially coders
  • www.computerworld.com: OpenAI adds GPT-4.1 models to ChatGPT
  • gHacks Technology News: OpenAI releases GPT-4.1 and GPT-4.1 mini AI models for ChatGPT
  • twitter.com: By popular request, GPT-4.1 will be available directly in ChatGPT starting today. GPT-4.1 is a specialized model that excels at coding tasks & instruction following. Because it’s faster, it’s a great alternative to OpenAI o3 & o4-mini for everyday coding needs.
  • www.ghacks.net: Reports on GPT-4.1 and GPT-4.1 mini AI models in ChatGPT, noting their accessibility to both paid and free users.
  • x.com: Provides initial tweet about the availability of GPT-4.1 in ChatGPT.

@thetechbasic.com //
Microsoft has announced major layoffs affecting approximately 6,000 employees, which is equivalent to 3% of its global workforce. This move is part of a broader strategic shift aimed at streamlining operations and boosting the company's focus on artificial intelligence (AI) and cloud computing. The layoffs are expected to impact various divisions, including LinkedIn, Xbox, and overseas offices. The primary goal of the restructuring is to position Microsoft for success in a "dynamic marketplace" by reducing management layers and increasing agility.

The decision to implement these layoffs comes despite Microsoft reporting strong financial results for FY25 Q3, with $70.1 billion in revenue and a net income of $25.8 billion. According to Microsoft CFO Amy Hood, the company is focused on “building high-performing teams and increasing our agility by reducing layers with fewer managers". The cuts also align with a recurring trend across the industry, with firms eliminating staff who do not meet expectations.

Microsoft's move to prioritize AI investments is costing the company a significant number of jobs. Microsoft is following a trend of other technology companies that are investing heavily in AI, the company has been pouring billions into AI tools and cloud services. The company's cloud service, Azure, is expanding at a rapid rate and the company aims to inject more money into this region.

Recommended read:
References :
  • Chris Westfall: In a move signaling a significant strategic pivot, Microsoft today announced a reduction of its workforce by approximately 7,000 employees. Here's why.
  • Latest from ITPro in News: Microsoft workers face a fresh round of layoffs – here’s who could be impacted
  • Dataconomy: Microsoft is laying off 3% of its workforce: 6,500 jobs gone
  • thetechbasic.com: Microsoft Announces Major Layoffs to Streamline Operations and Boost AI Focus
  • www.windowscentral.com: Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a "dynamic marketplace"
  • techstrong.ai: Microsoft’s 6,000 Layoffs Underscore the Price of Exorbitant AI Investments
  • The Tech Basic: Microsoft Announces Major Layoffs to Streamline Operations and Boost AI Focus
  • The Tech Portal: Microsoft is laying off 6,000 people, 3% of its global employees. This is the largest round of layoffs for the company since 2023.
  • Chris Westfall: Microsoft Lays Off About 3% Of Workers As Company Adjusts For AI Business
  • iHLS: AI Takeover: Microsoft Announces 6,000 Job Cuts Amid Massive AI Investment Push
  • www.ghacks.net: Microsoft slashes workforce due to AI adoption

@Google DeepMind Blog //
Google DeepMind has unveiled AlphaEvolve, an AI agent powered by Gemini, that is revolutionizing algorithm discovery and scientific optimization. This innovative system combines the creative problem-solving capabilities of large language models (LLMs) with automated evaluators to verify solutions and iteratively improve upon promising ideas. AlphaEvolve represents a significant leap in AI's ability to develop sophisticated algorithms for both scientific challenges and everyday computing problems, expanding upon previous work by evolving entire codebases rather than single functions.

AlphaEvolve has already demonstrated its potential by breaking a 56-year-old mathematical record, discovering a more efficient matrix multiplication algorithm that had eluded human mathematicians. The system leverages an ensemble of state-of-the-art large language models, including Gemini Flash and Gemini Pro, to propose and refine algorithmic solutions as code. These programs are then evaluated using automated metrics, providing an objective assessment of accuracy and quality. This approach makes AlphaEvolve particularly valuable in domains where progress can be clearly and systematically measured, such as math and computer science.

The impact of AlphaEvolve extends beyond theoretical breakthroughs, with algorithms discovered by the system already deployed across Google's computing ecosystem. Notably, AlphaEvolve has enhanced the efficiency of Google's data centers, chip design, and AI training processes, including the training of the large language models underlying AlphaEvolve itself. It has also optimized a matrix multiplication kernel used to train Gemini models and found new solutions to open mathematical problems. By optimizing Google’s massive cluster management system, Borg, AlphaEvolve recovers an average of 0.7% of Google’s worldwide computing resources continuously, which translates to substantial cost savings.

Recommended read:
References :
  • Google DeepMind Blog: New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators
  • venturebeat.com: Meet AlphaEvolve, the Google AI that writes its own code—and just saved millions in computing costs
  • SiliconANGLE: Google DeepMind develops AlphaEvolve AI agent optimized for coding and math
  • MarkTechPost: Google DeepMind introduces AlphaEvolve: A Gemini-powered coding AI agent for algorithm discovery and scientific optimization
  • Maginative: Google's DeepMind Unveils AlphaEvolve, an AI System that Designs and Optimizes Algorithms
  • THE DECODER: AlphaEvolve is Google DeepMind's new AI system that autonomously creates better algorithms
  • www.marktechpost.com: Google DeepMind Introduces AlphaEvolve: A Gemini-Powered Coding AI Agent for Algorithm Discovery and Scientific Optimization
  • the-decoder.com: AlphaEvolve is Google DeepMind's new AI system that autonomously creates better algorithms
  • thetechbasic.com: DeepMind’s AlphaEvolve: A New Era of AI-Driven Problem Solving
  • The Tech Basic: DeepMind’s AlphaEvolve: A New Era of AI-Driven Problem Solving
  • LearnAI: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
  • The Next Web: 5 impressive feats of DeepMind’s new self-evolving AI coding agent
  • learn.aisingapore.org: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
  • LearnAI: Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer
  • Towards Data Science: A blend of LLMs' creative generation capabilities with genetic algorithms
  • learn.aisingapore.org: Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer
  • deepmind.google: Provides an overview of AlphaEvolve and its capabilities in designing advanced algorithms.

Carl Franzen@AI News | VentureBeat //
Sakana AI, co-founded by former Google AI scientists, has introduced a new AI model architecture called Continuous Thought Machines (CTM). This innovative design aims to bridge the gap between artificial intelligence and human cognition. Unlike traditional AI models that process inputs in fixed, parallel layers, CTMs incorporate neural timing as a fundamental representation, allowing the model to manage tasks like image recognition, problem-solving, and reinforcement learning with greater flexibility and adaptability. The core concept behind CTM is to enable AI to "think" through problems step-by-step, mimicking the way human brains operate.

The Continuous Thought Machine architecture introduces two key innovations inspired by nature. First, it equips each artificial neuron with a short-term memory of its previous activity, enabling the neuron to decide when to activate again. Second, the CTM allows computation to unfold over steps within each artificial neuron. These neurons operate on their own internal timeline, adjusting their reasoning duration dynamically based on the complexity of the input. This time-based architecture enables CTMs to reason progressively, taking a variable number of "ticks" depending on the challenge at hand. Sakana AI points out that researchers still primarily use a single output from a neuron, which tells us how it’s firing, but “neglects the precise timing of when the neuron is firing in relation to other neurons.”

The CTM architecture distinguishes itself from existing AI models, particularly Transformer-based language models, by allowing each neuron to operate on its internal timeline and make activation decisions based on a short-term memory of its previous states. This unique approach offers the potential for AI to reason more like human brains, which is seen as vital. While CTMs are still primarily a research architecture and not yet production-ready, they represent a significant step towards creating more trustworthy and efficient AI technology. Sakana AI has made their research available, including a paper on arXiv, a microsite, and a GitHub repository.

Recommended read:
References :
  • bdtechtalks.com: An overview of Sakana AI's Continuous Thought Machine (CTM).
  • AI News | VentureBeat: The article discusses Sakana AI's advancements in AI architecture, specifically their continuous thought machines.
  • bdtechtalks.com: Sakana AI's Continuous Thought Machine enhances AI's alignment with human cognition, promising a future of more trustworthy and efficient technology.

Last Week@Last Week in AI //
Anthropic is enhancing its Claude AI model through new integrations and security measures. A new Claude Neptune model is undergoing internal red team reviews to probe its robustness against jailbreaking and ensure its safety protocols are effective. The red team exercises are set to run until May 18, focusing particularly on vulnerabilities in the constitutional classifiers that underpin Anthropic’s safety measures, suggesting that the model is more capable and sensitive, requiring more stringent pre-release testing.

Anthropic has also launched a new feature allowing users to connect more apps to Claude, enhancing its functionality and integration with various tools. This new app connection feature, called Integrations, is available in beta for subscribers to Anthropic’s Claude Max, Team, and Enterprise plans, and soon Pro. It builds on the company's MCP protocol, enabling Claude to draw data from business tools, content repositories, and app development environments, allowing users to connect their tools to Claude, and gain deep context about their work.

Anthropic is also addressing the malicious uses of its Claude models, with a report outlining case studies on how threat actors have misused the models and the steps taken to detect and counter such misuse. One notable case involved an influence-as-a-service operation that used Claude to orchestrate social media bot accounts, deciding when to comment, like, or re-share posts. Anthropic has also observed cases of credential stuffing operations, recruitment fraud campaigns, and AI-enhanced malware generation, reinforcing the importance of ongoing security measures and sharing learnings with the wider AI ecosystem.

Recommended read:
References :

@www.marktechpost.com //
OpenAI has introduced HealthBench, a new open-source benchmark designed to evaluate AI performance in realistic healthcare scenarios. Developed in collaboration with over 262 physicians, HealthBench uses 5,000 multi-turn conversations and over 48,000 rubric criteria to grade AI models across seven medical domains and 49 languages. The benchmark assesses AI responses based on communication quality, instruction following, accuracy, contextual understanding, and completeness, providing a comprehensive evaluation of AI capabilities in healthcare. OpenAI’s latest models, including o3 and GPT-4.1, have shown impressive results on this benchmark.

The most provocative finding from the HealthBench evaluation is that the newest AI models are performing at or beyond the level of human experts in crafting responses to medical queries. Earlier tests from September 2024 showed that doctors could improve AI outputs by editing them, scoring higher than doctors working without AI. However, with the latest April 2025 models, like o3 and GPT-4.1, physicians using these AI responses as a base, on average, did not further improve them. This suggests that for the specific task of generating HealthBench responses, the newest AI matches or exceeds the capabilities of human experts, even with a strong AI starting point.

In related news, FaceAge, a face-reading AI tool developed by researchers at Mass General Brigham, demonstrates promising abilities in predicting cancer outcomes. By analyzing facial photographs, FaceAge estimates a person's biological age and can predict cancer survival with an impressive 81% accuracy rate. This outperforms clinicians in predicting short-term life expectancy, especially for patients receiving palliative radiotherapy. FaceAge identifies subtle facial features associated with aging and provides a quantifiable measure of biological aging that correlates with survival outcomes and health risks, offering doctors more objective and precise survival estimates.

Recommended read:
References :
  • pub.towardsai.net: This week, OpenAI unveiled HealthBench, a significant new open-source benchmark evaluating AI in realistic healthcare scenarios.
  • www.marktechpost.com: This news piece mentions the HealthBench benchmark for evaluating AI models in healthcare.
  • the-decoder.com: The article refers to the HealthBench benchmark developed by OpenAI to assess AI's capabilities in handling healthcare scenarios.
  • www.analyticsvidhya.com: This blog post reports on the release of OpenAI’s HealthBench, an open-source benchmark for evaluating AI models in healthcare.
  • THE DECODER: OpenAI says its latest models outperform doctors in medical benchmark
  • www.zdnet.com: OpenAI's HealthBench shows AI's medical advice is improving - but who will listen?
  • MarkTechPost: OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare
  • eWEEK: FaceAge, a face-reading AI tool that estimates biological age from facial photographs, predicts cancer outcomes with an impressive 81% accuracy rate.
  • The Rundown AI: PLUS: OpenAI launches HealthBench to evaluate AI in healthcare
  • the-decoder.com: The article discusses OpenAI's HealthBench benchmark for evaluating large language models in realistic healthcare settings.
  • www.eweek.com: FaceAge AI Tool Surpasses Doctors with 81% Accuracy in Cancer Survival Prediction
  • Fello AI: Forget everything you thought you knew about medicine! Artificial Intelligence is crashing into healthcare with the force of a meteor, and the breakthroughs are coming so fast it’s hard to keep up.
  • Microsoft Research: Peter Lee and his coauthors, Carey Goldberg and Dr. Zak Kohane, reflect on how generative AI is unfolding in real-world healthcare, drawing on earlier guest conversations to examine what’s working, what’s not, and what questions still remain. The post appeared first on .

@learn.aisingapore.org //
Anthropic's Claude 3.7 model is making waves in the AI community due to its enhanced reasoning capabilities, specifically through a "deep thinking" approach. This method utilizes chain-of-thought (CoT) techniques, enabling Claude 3.7 to tackle complex problems more effectively. This development represents a significant advancement in Large Language Model (LLM) technology, promising improved performance in a variety of demanding applications.

The implications of this enhanced reasoning are already being seen across different sectors. FloQast, for example, is leveraging Anthropic's Claude 3 on Amazon Bedrock to develop an AI-powered accounting transformation solution. The integration of Claude’s capabilities is assisting companies in streamlining their accounting operations, automating reconciliations, and gaining real-time visibility into financial operations. The model’s ability to handle the complexities of large-scale accounting transactions highlights its potential for real-world applications.

Furthermore, recent reports highlight the competitive landscape where models like Mistral AI's Medium 3 are being compared to Claude Sonnet 3.7. These comparisons focus on balancing performance, cost-effectiveness, and ease of deployment. Simultaneously, Anthropic is also enhancing Claude's functionality by allowing users to connect more applications, expanding its utility across various domains. These advancements underscore the ongoing research and development efforts aimed at maximizing the potential of LLMs and addressing potential security vulnerabilities.

Recommended read:
References :
  • learn.aisingapore.org: This article describes how FloQast utilizes Anthropic’s Claude 3 on Amazon Bedrock for its accounting transformation solution.
  • Last Week in AI: LWiAI Podcast #208 - Claude Integrations, ChatGPT Sycophancy, Leaderboard Cheats
  • techcrunch.com: Anthropic lets users connect more apps to Claude
  • Towards AI: The New AI Model Paradox: When “Upgrades” Feel Like Downgrades (Claude 3.7)
  • Towards AI: How to Achieve Structured Output in Claude 3.7: Three Practical Approaches

@siliconangle.com //
References: aithority.com , SiliconANGLE ,
Vectara has announced the launch of its Hallucination Corrector, a new guardian agent integrated within the Vectara platform designed to significantly improve the reliability and accuracy of AI agents and assistants. This innovative approach aims to reduce AI hallucinations to below 1%, addressing a critical challenge in enterprise AI deployments where accuracy is paramount. The Hallucination Corrector builds upon Vectara's existing leadership in hallucination detection and mitigation, offering organizations a solution that not only identifies unreliable responses but also provides explanations and options for correction.

The Hallucination Corrector functions as a 'guardian agent,' actively monitoring and protecting agentic workflows. It leverages the Hughes Hallucination Evaluation Model (HHEM), a widely-used tool with 4 million downloads on Hugging Face, to compare AI-generated responses against source documents and identify inaccuracies. When inconsistencies are detected, the Corrector provides detailed explanations and corrected versions of the responses, ensuring minimal changes are made to maintain accuracy. This capability is particularly beneficial for smaller language models, enabling them to achieve accuracy levels comparable to larger models from Google and OpenAI.

According to Vectara, initial testing of the Hallucination Corrector has shown promising results, reducing hallucination rates in enterprise AI systems to approximately 0.9%. The system not only identifies and corrects factual inconsistencies but also provides a detailed explanation of why a statement is a hallucination. While the corrected output is automatically used in summaries for end-users, experts can utilize the detailed explanations and suggested fixes to refine their models and guardrails, further mitigating the risk of hallucinations in AI applications. The Hallucination Corrector represents a significant step towards building trusted AI and unlocking the full potential of generative AI for enterprises.

Recommended read:
References :
  • aithority.com: Vectara Announces Significant Step Toward Greater Reliability & Accuracy for AI Agents and Assistants with Launch of Vectara Hallucination Corrector
  • SiliconANGLE: Vectara launches Hallucination Corrector to increase the reliability of enterprise AI
  • AI News | VentureBeat: Guardian agents: New approach could reduce AI hallucinations to below 1%

Jibin Joseph@PCMag Middle East ai //
Google is experimenting with replacing its iconic "I'm Feeling Lucky" button with a new "AI Mode" tool. This represents a significant shift in how the search engine operates, moving away from providing a direct link to the first search result and towards offering AI-powered answers directly within the Google search interface. The "I'm Feeling Lucky" button, which has been a staple of Google's homepage since 1998, bypasses the search results page entirely, taking users directly to what Google deems the most relevant website. However, Google believes that most users now prefer browsing a range of links rather than immediately jumping to a single site.

The new AI Mode aims to provide a more interactive and informative search experience. When users ask questions, the AI Mode tool leverages Google's AI chatbot to generate detailed responses instead of simply presenting a list of links. For example, if a user asks "Where can I purchase a camping chair under $100?", AI Mode may display images, prices, and store links directly within the search results. Users can then engage in follow-up questions with the AI, such as "Is it waterproof?", receiving further details and recommendations. The AI also uses real-time information to display store hours, product availability, and other relevant data.

Testing of the AI Mode button is currently limited to a small percentage of users in the U.S. who are part of Google's Search Labs program. Google is exploring different placements for the button, including inside the search bar next to the camera icon, or replacing the "I'm Feeling Lucky" button entirely. Some users have also reported seeing a rainbow-colored glow around the button when they hover over it. While this move aims to align with modern search habits, some users have expressed concern over the potential loss of the nostalgic "I'm Feeling Lucky" feature, and are also concerned about the accuracy of Google's AI Mode.

Recommended read:
References :
  • thetechbasic.com: Google’s New AI Button Might Replace a Classic Feature
  • Dataconomy: Google is ditching I’m Feeling Lucky for AI Search
  • PCMag Middle East ai: Google Tests Swapping 'I'm Feeling Lucky' Button for 'AI Mode'
  • The Tech Basic: Google’s New AI Button Might Replace a Classic Feature
  • The Tech Portal: For decades, Google’s homepage has trained users to initiate searches with keywords,… Content originally published on

John-Anthony Disotto@techradar.com //
Apple is reportedly focusing on AI and design upgrades for its upcoming iPhone lineup. A new Apple Intelligence feature is being developed for iOS 19, set to launch this fall. This feature is an AI-powered battery optimization mode designed to extend battery life, especially for the iPhone 17 Air. This model is expected to utilize the feature to compensate for a smaller battery. The company appears to be heavily invested in AI, viewing it as a crucial element for future device enhancements.

Meanwhile, reports indicate Apple is contemplating a price increase for the next iPhone, possibly the iPhone 17 series. Sources suggest the company aims to avoid attributing the increase to tariffs. Apple has historically relied on Chinese manufacturers, and while past tariffs have been a concern, current trade conditions show a pause on tariffs between the US and China until early August. Apple is considering attributing the price hike to the inclusion of next-generation features and design changes.

In other news, Apple is gearing up for Global Accessibility Awareness Day with a preview of new accessibility features coming to its platforms later this year. This marks the 40th anniversary of Apple's dedicated accessibility office, with continuous development in this area. Upcoming features include App Store Accessibility Nutrition Labels, which will inform users about supported accessibility features like VoiceOver and Reduce Motion. Additionally, the Magnifier app, currently on the iPhone, will be introduced to the Mac, allowing users to manipulate images for better visibility using Continuity Camera or a USB-compatible camera.

Recommended read:
References :
  • Mark Gurman: Reports Apple is preparing a new Apple Intelligence feature for iOS 19 coming this fall — an AI-powered battery optimization mode to extend battery life.
  • www.techradar.com: Apple could be about to launch a new AI battery tool in iOS 19 to help improve your iPhone's run time.
  • The Tech Portal: Apple to introduce AI-powered battery management in iOS 19: Report
  • www.laptopmag.com: A potential pain with the iPhone 17 Air could be fixed with AI: report
  • Fello AI: Reports Is Google Cooked? Apple Exec Drops BOMBSHELL About AI Search Future!
  • MacStories: Source: Apple. With Global Accessibility Awareness Day coming up this Thursday, May 15, Apple is back with its annual preview of accessibility features coming to its platforms later in the year.
  • Bloomberg Technology: NEW: Apple prepares a new Apple Intelligence feature for iOS 19 coming this fall — an AI-powered battery optimization mode to extend battery life. This will be particularly aimed at the iPhone 17 Air, which will use the feature to offset a smaller battery.
  • thetechbasic.com: Apple’s 20th iPhone Upgrade to Boost AI Speed and Battery Life

Kevin Okemwa@windowscentral.com //
OpenAI and Microsoft are reportedly engaged in high-stakes negotiations to revise their existing partnership, a move prompted by OpenAI's aspirations for an initial public offering (IPO). The discussions center around redefining the terms of their strategic alliance, which has seen Microsoft invest over $13 billion in OpenAI since 2019. A key point of contention is Microsoft's desire to secure guaranteed access to OpenAI's AI technology beyond the current contractual agreement, set to expire in 2030. Microsoft is reportedly willing to sacrifice some equity in OpenAI to ensure long-term access to future AI models.

These negotiations also entail OpenAI potentially restructuring its for-profit arm into a Public Benefit Corporation (PBC), a move that requires Microsoft's approval as the startup's largest financial backer. The PBC structure would allow OpenAI to pursue commercial goals and attract further capital, paving the way for a potential IPO. However, the non-profit entity would retain overall control. OpenAI reportedly aims to reduce Microsoft's revenue share from 20% to a share of 10% by 2030, a year when the company forecasts $174B in revenue.

Tensions within the partnership have reportedly grown as OpenAI pursues agreements with Microsoft competitors and targets overlapping enterprise customers. One senior Microsoft executive expressed concern over OpenAI's attitude, stating that they seem to want Microsoft to "give us money and compute and stay out of the way." Despite these challenges, Microsoft remains committed to the partnership, recognizing its importance in the rapidly evolving AI landscape.

Recommended read:
References :
  • the-decoder.com: Microsoft could sacrifice some OpenAI shares - but wants to secure access to AI technology
  • www.techradar.com: OpenAI and Microsoft in talks to revise terms and renew partnership, FT reports
  • The Rundown AI: OpenAI, Microsoft's 'high-stakes' negotiations
  • www.computerworld.com: OpenAI’s IPO aspirations prompt rethink of Microsoft alliance
  • www.windowscentral.com: OpenAI wants Microsoft to provide money and compute and stay out of the way as it renegotiates multi-billion-dollar partnership
  • The Tech Portal: According to media reports, OpenAI and Microsoft are now negotiating to redefine… Content originally published on

@the-decoder.com //
Recent developments in AI safety research were highlighted at the Singapore Conference on AI in April 2025, where over 100 experts from eleven countries convened to establish shared priorities for ensuring the technical safety of AI systems. The "Singapore Consensus on Global AI Safety Research Priorities" emerged from this meeting, focusing on general-purpose AI (GPAI) systems, including language models, multimodal models, and autonomous AI agents. The report strategically avoids political questions, concentrating instead on the technical aspects of AI safety research. The primary objective is to foster a "trusted ecosystem" that promotes AI innovation while proactively addressing potential societal risks.

The consensus report divides technical AI safety research into three critical areas: risk assessment, building trustworthy systems, and post-deployment control. Risk assessment involves developing methods for measuring and predicting risks associated with AI, including standardized audit techniques, benchmarks for identifying dangerous capabilities, and assessing social impacts. A key challenge identified is the "evidence dilemma," balancing the need for concrete evidence of risks against the potential for those risks to escalate rapidly. The report advocates for prospective risk analysis, similar to techniques used in nuclear safety and aviation, to proactively identify and mitigate potential dangers.

Other research focuses on enhancing the capabilities of Language Models (LLMs) through methods like reinforcement learning (RL) and improved memory management. One advancement, RL^V, unifies reasoning and verification in LLMs without compromising training scalability, using the LLM's generative capabilities to act as both a reasoner and a verifier. Additionally, recursive summarization is being explored as a way to enable long-term dialog memory in LLMs, allowing them to maintain consistent and coherent conversations by continuously updating their understanding of past interactions. These advancements address key limitations in current AI systems, such as inconsistent recall and the ability to verify the accuracy of their reasoning.

Recommended read:
References :
  • the-decoder.com: 100 experts call for more research into the control of AI systems
  • www.marktechpost.com: RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

S.Dyema Zandria@The Tech Basic //
Google has launched the AI Futures Fund, a new initiative designed to support the next wave of AI startups. This fund will invest in companies leveraging Google DeepMind's AI tools, providing them with financial backing and early access to Google's most advanced AI technology. The program aims to foster innovation by giving startups a competitive edge in the rapidly evolving AI landscape.

The AI Futures Fund offers multiple layers of support, including early access to Google's newest AI models from DeepMind, allowing startups to create videos, translate languages, and design images faster. Participating startups will also benefit from direct collaboration with Google experts from DeepMind and Google Labs, who will provide guidance on product improvement. Additionally, startups will receive Google Cloud credits to help cut down on computing costs and allow them to run their AI projects without high expenses. Some startups may also receive direct cash investments from Google to grow their business.

Google’s strategic move aims to build a strong ecosystem around its AI models. By supporting startups that use its technology, Google intends to establish its AI models as industry standards. Two startups have already benefited from the fund such as Toonsutra, a comic app from India, translates stories into 22 languages using Google’s tools, and Viggle, a meme-making app, uses Google’s AI to turn photos into funny videos. Furthermore, Google announced a three-month accelerator program for AI-focused startups in India. This dual initiative underscores Google’s commitment to fostering AI innovation and expanding its AI ecosystem, both globally and in key markets like India.

Recommended read:
References :
  • Analytics India Magazine: Google Launches AI Futures Fund to Support Next Wave of AI Startups
  • SiliconANGLE: Google’s new AI Futures Fund turns on the tap for startup founders
  • The Tech Basic: Google’s New Fund Helps Startups Create Amazing AI Tools
  • Maginative: Google Launches AI Futures Fund to Back Startups Using DeepMind Technology
  • Dataconomy: Google now scouting for AI innovators with its rolling AI Futures Fund

staff@insideAI News //
Saudi Arabia is making major strides in artificial intelligence, unveiling deals with several leading U.S. technology firms including NVIDIA, AMD, Cisco, and Amazon Web Services. These partnerships are primarily formed through HUMAIN, the AI subsidiary of Saudi Arabia’s Public Investment Fund (PIF), which controls about $940 billion in assets. As part of these collaborations, Saudi Arabia’s Crown Prince Mohammed bin Salman has launched ‘Humain’ with the intent of establishing the kingdom as a global leader in artificial intelligence. This initiative aligns with the Kingdom’s Vision 2030 plan to diversify its economy and reduce dependence on oil revenues.

NVIDIA has partnered with HUMAIN to construct AI factories in Saudi Arabia. The partnership underscores HUMAIN’s mission to position Saudi Arabia as an international AI powerhouse, and will have a projected capacity of up to 500 megawatts. The initial phase includes the deployment of 18,000 NVIDIA GB300 Grace Blackwell AI supercomputers with NVIDIA InfiniBand networking. AMD has also signed an agreement with HUMAIN where the parties will invest up to $10 billion to deploy 500 megawatts of AI compute capacity over the next five years.

In addition to chip manufacturers, networking and cloud service providers are also involved. Cisco will partner with HUMAIN AI enterprise to power AI infrastructure and ecosystem growth, with new investments in research, talent, and digital skills. Amazon Web Services (AWS) and HUMAIN plan to invest over $5 billion to build an “AI Zone” in the kingdom, incorporating dedicated AWS AI infrastructure and services. These efforts are supported by the U.S. government easing AI chip export rules to Gulf states, which had previously limited the access of such countries to high-end AI chips.

Recommended read:
References :
  • insideAI News: Saudi Arabia Unveils AI Deals with NVIDIA, AMD, Cisco, AWS
  • THE DECODER: Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states
  • the-decoder.com: Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states
  • www.theguardian.com: Reports on deals by US tech firms, including Nvidia and Cisco, to expand AI capabilities in Saudi Arabia and the UAE.
  • Maginative: Saudi Arabia’s Crown Prince Mohammed bin Salman has launched ‘Humain’, a state-backed AI company aimed at establishing the kingdom as a global leader in artificial intelligence, coinciding with a major investment forum attracting top U.S. tech executives.
  • Analytics India Magazine: NVIDIA to Deploy 18,000 Chips for AI Data Centres in Saudi Arabia.
  • insidehpc.com: NVIDIA announced a partnership with HUMAIN, the AI subsidiary of Saudi Arabia’s Public Investment Fund, to build AI factories in the kingdom. HUMAIN said the partnership will develop a projected capacity of up to 500 megawatts powered by several hundred thousand of ....
  • insidehpc.com: NVIDIA in Partnership to Build AI Factories in Saudi Arabia
  • www.nextplatform.com: Saudi Arabia Has The Wealth – And Desire – To Become An AI Player
  • THE DECODER: Nvidia will supply advanced chips for Saudi Arabia’s Humain AI project
  • MarkTechPost: NVIDIA AI Introduces Audio-SDS: A Unified Diffusion-Based Framework for Prompt-Guided Audio Synthesis and Source Separation without Specialized Datasets

@felloai.com //
References: felloai.com , TestingCatalog
Google is significantly expanding its applications of artificial intelligence in healthcare and education, aiming to improve efficiency and accessibility. In healthcare, Google's AMIE (Articulate Medical Intelligence Explorer) AI can now interpret medical images such as X-rays, MRIs, and CT scans, marking a potential breakthrough in AI-powered medical diagnostics. The multimodal AMIE can intelligently request, interpret, and reason about visual medical information during diagnostic conversations, suggesting a future where AI could surpass human capabilities in certain diagnostic areas. This development addresses a previous limitation where AI couldn't directly process and understand medical imaging, a crucial aspect of diagnosis.

Google is also redefining education with AI tools. Infinity Learn, in collaboration with Google Cloud Consulting, has developed an AI tutor to assist students preparing for exams. This AI tutor, powered by Google Cloud’s Vertex AI Retrieval Augmented Generation (RAG) services and a Gemini 2.0 Flash model, acts as a custom search engine, providing detailed guidance for solving problems in subjects like math, physics, and chemistry. The AI tutor is designed not just to provide answers, but to foster in-depth knowledge and conceptual clarity, helping students independently find solutions and understand the reasoning behind them.

Additionally, Google is developing new generative media features for NotebookLM, including video overviews. Users may soon be able to transform their notebook content into short video summaries, potentially powered by Google’s Veo 2 model, which specializes in generating concise video segments. NotebookLM is also hinting at a broader content discovery direction through a newly revealed section titled "Editor’s Picks," suggesting a shift towards a more social or community-driven aspect, potentially turning NotebookLM into a knowledge-sharing platform.

Recommended read:
References :
  • felloai.com: Article on Google working on an AI that will replace your doctor.
  • TestingCatalog: Google is developing Video Overviews feature for NotebookLM, including generative media features.