@www.marktechpost.com
//
OpenAI has introduced HealthBench, a new open-source benchmark designed to evaluate AI performance in realistic healthcare scenarios. Developed in collaboration with over 262 physicians, HealthBench uses 5,000 multi-turn conversations and over 48,000 rubric criteria to grade AI models across seven medical domains and 49 languages. The benchmark assesses AI responses based on communication quality, instruction following, accuracy, contextual understanding, and completeness, providing a comprehensive evaluation of AI capabilities in healthcare. OpenAI’s latest models, including o3 and GPT-4.1, have shown impressive results on this benchmark.
The most provocative finding from the HealthBench evaluation is that the newest AI models are performing at or beyond the level of human experts in crafting responses to medical queries. Earlier tests from September 2024 showed that doctors could improve AI outputs by editing them, scoring higher than doctors working without AI. However, with the latest April 2025 models, like o3 and GPT-4.1, physicians using these AI responses as a base, on average, did not further improve them. This suggests that for the specific task of generating HealthBench responses, the newest AI matches or exceeds the capabilities of human experts, even with a strong AI starting point. In related news, FaceAge, a face-reading AI tool developed by researchers at Mass General Brigham, demonstrates promising abilities in predicting cancer outcomes. By analyzing facial photographs, FaceAge estimates a person's biological age and can predict cancer survival with an impressive 81% accuracy rate. This outperforms clinicians in predicting short-term life expectancy, especially for patients receiving palliative radiotherapy. FaceAge identifies subtle facial features associated with aging and provides a quantifiable measure of biological aging that correlates with survival outcomes and health risks, offering doctors more objective and precise survival estimates. Recommended read:
References :
@felloai.com
//
References:
felloai.com
, TestingCatalog
Google is significantly expanding its applications of artificial intelligence in healthcare and education, aiming to improve efficiency and accessibility. In healthcare, Google's AMIE (Articulate Medical Intelligence Explorer) AI can now interpret medical images such as X-rays, MRIs, and CT scans, marking a potential breakthrough in AI-powered medical diagnostics. The multimodal AMIE can intelligently request, interpret, and reason about visual medical information during diagnostic conversations, suggesting a future where AI could surpass human capabilities in certain diagnostic areas. This development addresses a previous limitation where AI couldn't directly process and understand medical imaging, a crucial aspect of diagnosis.
Google is also redefining education with AI tools. Infinity Learn, in collaboration with Google Cloud Consulting, has developed an AI tutor to assist students preparing for exams. This AI tutor, powered by Google Cloud’s Vertex AI Retrieval Augmented Generation (RAG) services and a Gemini 2.0 Flash model, acts as a custom search engine, providing detailed guidance for solving problems in subjects like math, physics, and chemistry. The AI tutor is designed not just to provide answers, but to foster in-depth knowledge and conceptual clarity, helping students independently find solutions and understand the reasoning behind them. Additionally, Google is developing new generative media features for NotebookLM, including video overviews. Users may soon be able to transform their notebook content into short video summaries, potentially powered by Google’s Veo 2 model, which specializes in generating concise video segments. NotebookLM is also hinting at a broader content discovery direction through a newly revealed section titled "Editor’s Picks," suggesting a shift towards a more social or community-driven aspect, potentially turning NotebookLM into a knowledge-sharing platform. Recommended read:
References :
Sam Khosroshahi@lambdalabs.com
//
References:
Fello AI
, MarkTechPost
NVIDIA is making significant strides in healthcare and robotics, leveraging the power of AI. A key development is the NVIDIA DNA LLM, designed to accelerate genomic research. This technology is set to transform how we analyze and interpret biological data, potentially leading to breakthroughs in curing diseases. NVIDIA is also working on new AI models, specifically multimodal models, to enhance physical common sense and embodied reasoning in AI systems, essential for applications like robotics.
NVIDIA's approach includes innovations like Evo 2, a large AI model for biology trained on a massive dataset of 9.3 trillion DNA base pairs, capable of not just analyzing but also generating entire genomic sequences. Furthermore, the introduction of Cosmos-Reason1, a vision-language model, aims to improve AI's ability to reason about physical environments, a critical advancement for robotics and self-driving vehicles. Dion Harris of NVIDIA provided context to Nvidia CEO Jensen Huang's GTC 2025 keynote, highlighting the company's focus on AI factories and advancements in simulation technology. Recommended read:
References :
Sam Khosroshahi@lambdalabs.com
//
References:
Fello AI
, lambdalabs.com
,
NVIDIA is pushing the boundaries of artificial intelligence in healthcare and robotics, introducing several groundbreaking advancements. One notable innovation is the DNA LLM, designed to decode the complex genetic information found in DNA, RNA, and proteins. This tool aims to transform genomic research, potentially leading to new understandings and treatments for various diseases.
The company's commitment to AI extends to robotics with the release of Isaac GR00T N1, an open-source platform for humanoid robots. This initiative is expected to accelerate innovation in the field, providing developers with the resources needed to create more advanced and capable robots. Additionally, an NVIDIA research team has developed Hymba, a family of small language models that combine transformer attention with state space models, surpassing the Llama-3.2-3B model in performance while significantly reducing cache size and increasing throughput. Recommended read:
References :
Neel Patel@AI & Machine Learning
//
References:
Compute
, IEEE Spectrum
Google Cloud and NVIDIA are collaborating to accelerate AI in healthcare by leveraging the NVIDIA BioNeMo framework and Google Kubernetes Engine (GKE). This partnership aims to speed up drug discovery and development by providing powerful infrastructure and tools for medical and pharmaceutical researchers. The NVIDIA BioNeMo platform is a generative AI framework enabling researchers to model and simulate biological sequences and structures, placing major demands for computing with powerful GPUs and scalable infrastructure.
With BioNeMo running on GKE, medical organizations can achieve breakthroughs and new research with levels of speed and effectiveness that were previously unheard of. Google DeepMind has also introduced Gemini Robotics, AI models built on Google's Gemini foundation model, enhancing robotics by integrating vision, language, and action. While AI isn't seen as a "silver bullet," Google DeepMind's Demis Hassabis emphasizes its undeniable benefits within five to ten years, as evidenced by developments like Alphafold 3, which accurately predicts the structure of molecules like DNA and RNA. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |