@the-decoder.com
//
References:
pub.towardsai.net
, THE DECODER
,
OpenAI has recently introduced HealthBench, a new open-source benchmark designed to evaluate the performance and safety of large language models (LLMs) in realistic healthcare scenarios. This evaluation framework aims to address the shortcomings of existing benchmarks by incorporating real-world clinical interactions and expert validation. HealthBench was developed in collaboration with 262 physicians from 60 countries, representing 26 medical specialties, ensuring a comprehensive and globally relevant assessment of AI in healthcare. The benchmark focuses on evaluating how well language models handle realistic medical conversations across seven key themes, ranging from emergency referrals to global health.
HealthBench utilizes 5,000 multi-turn conversations between models and users, either laypersons or healthcare professionals, with model responses assessed using example-specific rubrics created by physicians. Each rubric consists of clearly defined criteria, both positive and negative, with associated point values. These criteria capture crucial behavioral attributes such as clinical accuracy, communication clarity, completeness, and instruction adherence. The evaluation spans over 48,000 unique criteria, with scoring handled by a model-based grader, GPT-4.1, validated against expert physician judgment to ensure reliability. According to OpenAI, their latest models, including GPT-4.1 and o3, have demonstrated superior performance compared to physician responses on the HealthBench benchmark. While earlier tests showed that doctors could improve older model outputs, the latest models outperform physicians even without additional input or refinement. This suggests that the newest AI is performing at or beyond the level human experts could refine, even with a strong AI starting point. This finding sparks debate about the role of AI in healthcare, moving from a copilot assisting physicians to potentially automating certain response generation tasks, while OpenAI acknowledges the limitations of comparing AI chat responses to real-world clinical care, the advancements highlight the rapidly evolving capabilities of AI in healthcare. Recommended read:
References :
@computerworld.com
//
OpenAI has announced the integration of GPT-4.1 and GPT-4.1 mini models into ChatGPT, aimed at enhancing coding and web development capabilities. The GPT-4.1 model, designed as a specialized model excelling at coding tasks and instruction following, is now available to ChatGPT Plus, Pro, and Team users. According to OpenAI, GPT-4.1 is faster and a great alternative to OpenAI o3 & o4-mini for everyday coding needs, providing more help to developers creating applications.
OpenAI is also rolling out GPT-4.1 mini, which will be available to all ChatGPT users, including those on the free tier, replacing the previous GPT-4o mini model. This model serves as the fallback option once GPT-4o usage limits are reached. The release notes confirm that GPT 4.1 mini offers various improvements over GPT-4o mini, including instruction-following, coding, and overall intelligence. This initiative is part of OpenAI's effort to make advanced AI tools more accessible and useful for a broader audience, particularly those engaged in programming and web development. Johannes Heidecke, Head of Systems at OpenAI, has emphasized that the new models build upon the safety measures established for GPT-4o, ensuring parity in safety performance. According to Heidecke, no new safety risks have been introduced, as GPT-4.1 doesn’t introduce new modalities or ways of interacting with the AI, and that it doesn’t surpass o3 in intelligence. The rollout marks another step in OpenAI's increasingly rapid model release cadence, significantly expanding access to specialized capabilities in web development and coding. Recommended read:
References :
@twitter.com
//
OpenAI has announced the release of GPT-4.1 and GPT-4.1 mini, the latest iterations of their large language models, now accessible within ChatGPT. This move marks the first time GPT-4.1 is available outside of the API, opening up its capabilities to a broader user base. GPT-4.1 is designed as a specialized model that excels at coding tasks and instruction following, making it a valuable tool for developers and users with coding needs. OpenAI is making the models accessible via the “more models” dropdown selection in the top corner of the chat window within ChatGPT, giving users the flexibility to choose between GPT-4.1, GPT-4.1 mini, and other models.
The GPT-4.1 model is being rolled out to paying subscribers of ChatGPT Plus, Pro, and Team, with Enterprise and Education users expected to gain access in the coming weeks. For free users, OpenAI is introducing GPT-4.1 mini, which replaces GPT-4o mini as the default model once the daily GPT-4o limit is reached. The "mini" version provides a smaller-scale parameter and less powerful version with similar safety standards. OpenAI’s decision to add GPT-4.1 to ChatGPT was driven by popular demand, despite initially planning to keep it exclusive to the API. GPT-4.1 was built prioritizing developer needs and production use cases. The company claims GPT-4.1 delivers a 21.4-point improvement over GPT-4o on the SWE-bench Verified software engineering benchmark, and a 10.5-point gain on instruction-following tasks in Scale’s MultiChallenge benchmark. In addition, it reduces verbosity by 50% compared to other models, a trait enterprise users praised during early testing. The model supports standard context windows for ChatGPT, ranging from 8,000 tokens for free users to 128,000 tokens for Pro users. Recommended read:
References :
Carl Franzen@AI News | VentureBeat
//
OpenAI has recently unveiled GPT-4.1, an enhanced version of its language model, now integrated into ChatGPT. This move expands access to the model's improved coding and instruction-following capabilities for ChatGPT Plus, Pro, and Team subscribers. Enterprise and Education users are slated to gain access in the coming weeks. Furthermore, OpenAI is replacing the GPT-4o mini model with GPT-4.1 mini for all users, including those on the free tier, positioning it as the fallback model when GPT-4o usage limits are reached. According to OpenAI, both models match GPT-4o's safety performance, while offering better coding and instruction-following capabilities.
GPT-4.1 was specifically designed for enterprise-grade practicality, prioritizing developer needs and production use cases. It delivers significant improvements on software engineering and instruction-following benchmarks, with reduced verbosity favored by enterprise users during testing. While the API versions of GPT-4.1 can process up to one million tokens, this expanded capacity is not yet available in ChatGPT, though future support has been hinted at. This extended context capability allows API users to feed entire codebases or large legal and financial documents into the model. The model supports standard context windows for ChatGPT: 8,000 tokens for free users, 32,000 tokens for Plus users, and 128,000 tokens for Pro users. In addition to model upgrades, OpenAI has introduced HealthBench, a new open-source benchmark for evaluating AI in healthcare scenarios. Developed with over 262 physicians, HealthBench uses multi-turn conversations and rubric criteria to grade models. OpenAI's o3 leads with an overall score of 0.60 on HealthBench. The most provocative result concerns human-AI interaction where with the latest April 2025 models (o3, GPT-4.1), physicians using these AI responses as a base, on average, did not further improve them (both AI alone and AI+physician scoring ~0.48–0.49). For the specific task of crafting HealthBench responses, the newest AI seems to be performing at or beyond the level human experts could refine, even with a strong AI starting point. Recommended read:
References :
Michael Nuñez@AI News | VentureBeat
//
References:
www.techradar.com
, venturebeat.com
,
OpenAI has recently augmented ChatGPT's Deep Research feature with a highly anticipated PDF export function. This new tool allows users with a ChatGPT Plus, Team, or Pro subscription to download their generated reports as fully formatted PDFs. These PDFs come complete with tables, images, and clickable citations, making it easier to archive, share, and reuse the research within other tools. Enterprise and Education users can expect to gain access to this feature soon, enhancing the utility of Deep Research for students and professionals alike.
The update highlights OpenAI's intensifying focus on the enterprise market, particularly following the hiring of Instacart CEO Fidji Simo to lead the new "Applications" division. Deep Research itself embodies this enterprise-focused strategy. By dedicating engineering resources to workflow features like PDF export, OpenAI demonstrates an understanding that business growth depends on solving specific business problems, and providing practical value to professional users who require shareable, verifiable research. In other news, reports indicate that Microsoft and OpenAI are in the process of renegotiating their partnership terms, potentially impacting the structure of their multi-billion-dollar deal for a future IPO. Meanwhile, the US Copyright Office has issued a statement challenging the common legal argument that training AI models on copyrighted material constitutes fair use. The agency argues that AI systems process information differently from humans, ingesting perfect copies of works and generating new content at superhuman speed and scale, which can potentially compete with original works in the market. Recommended read:
References :
@www.marktechpost.com
//
OpenAI has introduced HealthBench, a new open-source benchmark designed to evaluate AI performance in realistic healthcare scenarios. Developed in collaboration with over 262 physicians, HealthBench uses 5,000 multi-turn conversations and over 48,000 rubric criteria to grade AI models across seven medical domains and 49 languages. The benchmark assesses AI responses based on communication quality, instruction following, accuracy, contextual understanding, and completeness, providing a comprehensive evaluation of AI capabilities in healthcare. OpenAI’s latest models, including o3 and GPT-4.1, have shown impressive results on this benchmark.
The most provocative finding from the HealthBench evaluation is that the newest AI models are performing at or beyond the level of human experts in crafting responses to medical queries. Earlier tests from September 2024 showed that doctors could improve AI outputs by editing them, scoring higher than doctors working without AI. However, with the latest April 2025 models, like o3 and GPT-4.1, physicians using these AI responses as a base, on average, did not further improve them. This suggests that for the specific task of generating HealthBench responses, the newest AI matches or exceeds the capabilities of human experts, even with a strong AI starting point. In related news, FaceAge, a face-reading AI tool developed by researchers at Mass General Brigham, demonstrates promising abilities in predicting cancer outcomes. By analyzing facial photographs, FaceAge estimates a person's biological age and can predict cancer survival with an impressive 81% accuracy rate. This outperforms clinicians in predicting short-term life expectancy, especially for patients receiving palliative radiotherapy. FaceAge identifies subtle facial features associated with aging and provides a quantifiable measure of biological aging that correlates with survival outcomes and health risks, offering doctors more objective and precise survival estimates. Recommended read:
References :
Kevin Okemwa@windowscentral.com
//
OpenAI and Microsoft are reportedly engaged in high-stakes negotiations to revise their existing partnership, a move prompted by OpenAI's aspirations for an initial public offering (IPO). The discussions center around redefining the terms of their strategic alliance, which has seen Microsoft invest over $13 billion in OpenAI since 2019. A key point of contention is Microsoft's desire to secure guaranteed access to OpenAI's AI technology beyond the current contractual agreement, set to expire in 2030. Microsoft is reportedly willing to sacrifice some equity in OpenAI to ensure long-term access to future AI models.
These negotiations also entail OpenAI potentially restructuring its for-profit arm into a Public Benefit Corporation (PBC), a move that requires Microsoft's approval as the startup's largest financial backer. The PBC structure would allow OpenAI to pursue commercial goals and attract further capital, paving the way for a potential IPO. However, the non-profit entity would retain overall control. OpenAI reportedly aims to reduce Microsoft's revenue share from 20% to a share of 10% by 2030, a year when the company forecasts $174B in revenue. Tensions within the partnership have reportedly grown as OpenAI pursues agreements with Microsoft competitors and targets overlapping enterprise customers. One senior Microsoft executive expressed concern over OpenAI's attitude, stating that they seem to want Microsoft to "give us money and compute and stay out of the way." Despite these challenges, Microsoft remains committed to the partnership, recognizing its importance in the rapidly evolving AI landscape. Recommended read:
References :
Kevin Okemwa@windowscentral.com
//
OpenAI and Microsoft are reportedly engaged in high-stakes negotiations to revise their existing partnership, a move prompted by OpenAI's aspirations for an initial public offering (IPO). The discussions center around redefining the terms of their strategic alliance, which has seen Microsoft invest over $13 billion in OpenAI since 2019. A key point of contention is Microsoft's desire to secure guaranteed access to OpenAI's AI technology beyond the current contractual agreement, set to expire in 2030. Microsoft is reportedly willing to sacrifice some equity in OpenAI to ensure long-term access to future AI models.
These negotiations also entail OpenAI potentially restructuring its for-profit arm into a Public Benefit Corporation (PBC), a move that requires Microsoft's approval as the startup's largest financial backer. The PBC structure would allow OpenAI to pursue commercial goals and attract further capital, paving the way for a potential IPO. However, the non-profit entity would retain overall control. OpenAI reportedly aims to reduce Microsoft's revenue share from 20% to a share of 10% by 2030, a year when the company forecasts $174B in revenue. Tensions within the partnership have reportedly grown as OpenAI pursues agreements with Microsoft competitors and targets overlapping enterprise customers. One senior Microsoft executive expressed concern over OpenAI's attitude, stating that they seem to want Microsoft to "give us money and compute and stay out of the way." Despite these challenges, Microsoft remains committed to the partnership, recognizing its importance in the rapidly evolving AI landscape. Recommended read:
References :
@techcrunch.com
//
References:
venturebeat.com
, Last Week in AI
OpenAI is making a bold move to defend its leadership in the AI space with a reported $3 billion acquisition of Windsurf, an AI-native integrated development environment (IDE). This strategic maneuver, dubbed the "Windsurf initiative," comes as the company faces increasing competition from Google and Anthropic, particularly in the realm of AI-powered coding. The acquisition aims to strengthen OpenAI's position and provide developers with superior coding capabilities, while also securing its role as a primary interface for autonomous AI agents.
The enterprise AI landscape is becoming increasingly competitive, with Google and Anthropic making significant strides. Google, leveraging its infrastructure and the expertise of Gemini head Josh Woodward, has been updating its Gemini models to enhance their coding abilities. Anthropic has also gained traction with its Claude series, which are becoming defaults on popular AI coding platforms like Cursor. These platforms, including Windsurf, Replit, and Lovable, are where developers are increasingly turning to generate code using high-level prompts in agentic environments. In addition to the Windsurf acquisition, OpenAI is also enhancing its API with new integration capabilities. These improvements are designed to boost the performance of Large Language Models (LLMs) and image generators, offering updated functionalities and improved user interfaces. These updates reflect OpenAI's commitment to providing developers with advanced tools, and to stay competitive in the rapidly evolving AI landscape. Recommended read:
References :
Tom Dotan@Newcomer
//
OpenAI is facing an identity crisis, according to former research scientist Steven Adler, stemming from its history, culture, and contentious transition from a non-profit to a for-profit entity. Adler's insights, shared in a recent discussion, delve into the company's early development of GPT-3 and GPT-4, highlighting internal cultural and ethical disagreements. This comes as OpenAI's enterprise adoption accelerates, seemingly at the expense of its rivals, signaling a significant shift in the AI landscape.
OpenAI's recent $3 billion acquisition of Windsurf, an AI-native integrated development environment (IDE), underscores its urgent need to defend its territory in AI-powered coding against growing competition from Google and Anthropic. The move reflects OpenAI's imperative to equip developers with superior coding capabilities and secure a dominant position in the emerging agentic AI world. This deal is seen as a defensive maneuver as OpenAI finds itself on the back foot, needing to counter challenges from competitors who are making significant inroads in AI-assisted coding. Meanwhile, tensions are reportedly simmering between OpenAI and Microsoft, its key partner. Negotiations are shaky, with Microsoft seeking a larger equity stake and retention of IP rights to OpenAI's models, while OpenAI aims to claw those rights back. These issues, along with disagreements over an AGI provision that allows OpenAI an out once it develops artificial general intelligence, have complicated OpenAI's plans for a for-profit conversion and the current effort to become a public benefit corporation. Furthermore, venture capitalists and limited partners are offloading shares in secondaries, which may come at a steep loss compared to 2021 valuations, adding another layer of complexity to OpenAI's current situation. Recommended read:
References :
@the-decoder.com
//
References:
THE DECODER
, AI News | VentureBeat
Microsoft is making a significant push towards AI interoperability by adding support for the Agent2Agent (A2A) protocol to its Azure AI Foundry and Copilot Studio. This move aims to break down the walled garden approach to AI development, allowing AI agents built on different platforms to communicate and collaborate seamlessly. Satya Nadella, Microsoft's CEO, has publicly endorsed both Google DeepMind's A2A and Anthropic's Model Context Protocol (MCP), signaling a major industry shift toward open standards. Nadella emphasized the importance of protocols like A2A and MCP for enabling an agentic web, where AI systems can interoperate by design.
This commitment to interoperability will allow customers to build agentic systems that can work together regardless of the platform they are built on. Microsoft's support for A2A will enable Copilot Studio agents to call on external agents, even those outside the Microsoft ecosystem or built with tools like LangChain or Semantic Kernel. According to Microsoft, Copilot Studio is already used by over 230,000 organizations, including 90 percent of the Fortune 500, suggesting a potentially wide adoption of A2A-enabled agentic collaboration. A public preview of A2A in Azure Foundry and Copilot Studio is expected to launch soon. OpenAI is also contributing to the advancement of AI interoperability through its Agents SDK, introduced in March. This SDK provides a framework for building multi-agent workflows, allowing developers to define agent behavior, connect to external tools, and manage the action flow. The Agents SDK also supports the Model Context Protocol (MCP), enabling agents to discover and call functions from any compatible server. By supporting open standards like A2A and MCP, both Microsoft and OpenAI are fostering a future where AI agents can work together to automate daily workflows and collaborate across platforms, promoting innovation and avoiding vendor lock-in. Recommended read:
References :
@www.marktechpost.com
//
OpenAI has announced the release of Reinforcement Fine-Tuning (RFT) for its o4-mini reasoning model, alongside supervised fine-tuning (SFT) for the GPT-4.1 nano model. RFT enables developers to customize a private version of the o4-mini model based on their enterprise's unique products, internal terminology, and goals. This allows for a more tailored AI experience, where the model can generate communications, answer specific questions about company knowledge, and pull up private, proprietary company knowledge with greater accuracy. RFT represents a move beyond traditional supervised fine-tuning, offering more flexible control for complex, domain-specific tasks.
The process involves applying a feedback loop during training, where developers can initiate training sessions, upload datasets, and set up assessment logic through OpenAI’s online developer platform. Instead of relying on fixed question-answer pairs, RFT uses a grader model to score multiple candidate responses per prompt, adjusting the model weights to favor high-scoring outputs. This approach allows for fine-tuning to subtle requirements, such as a specific communication style, policy guidelines, or domain-specific expertise. Organizations with clearly defined problems and verifiable answers can benefit significantly from RFT, aligning models with nuanced objectives. Several organizations have already leveraged RFT in closed previews, demonstrating its versatility across industries. Accordance AI improved the performance of a tax analysis model, while Ambience Healthcare increased the accuracy of medical coding. Other use cases include legal document analysis by Harvey, Stripe API code generation by Runloop, and content moderation by SafetyKit. OpenAI also announced that supervised fine-tuning is now supported for its GPT-4.1 nano model, the company’s most affordable and fastest offering to date, opening customization to all paid API tiers. The cost model for RFT is more transparent, based on active training time rather than per-token processing. Recommended read:
References :
@the-decoder.com
//
OpenAI is expanding its global reach through strategic partnerships with governments and the introduction of advanced model customization tools. The organization has launched the "OpenAI for Countries" program, an initiative designed to collaborate with governments worldwide on building robust AI infrastructure. This program aims to assist nations in setting up data centers and adapting OpenAI's products to meet local language and specific needs. OpenAI envisions this initiative as part of a broader global strategy to foster cooperation and advance AI capabilities on an international scale.
This expansion also includes technological advancements, with OpenAI releasing Reinforcement Fine-Tuning (RFT) for its o4-mini reasoning model. RFT enables enterprises to fine-tune their own versions of the model using reinforcement learning, tailoring it to their unique data and operational requirements. This allows developers to customize the model to better fit their needs using OpenAI’s platform dashboard, tweaking it for internal terminology, goals, processes and more. Once deployed, if an employee or leader at the company wants to use it through a custom internal chatbot orcustom OpenAI GPTto pull up private, proprietary company knowledge, answer specific questions about company products and policies, or generate new communications and collateral in the company’s voice, they can do so more easily with their RFT version of the model. The "OpenAI for Countries" program is slated to begin with ten international projects, supported by funding from both OpenAI and participating governments. Chris Lehane, OpenAI's vice president of global policy, indicated that the program was inspired by the AI Action Summit in Paris, where several countries expressed interest in establishing their own "Stargate"-style projects. Moreover, the release of RFT on o4-mini signifies a major step forward in custom model optimization, offering developers a powerful new technique for tailoring foundation models to specialized tasks. This allows for fine-grained control over how models improve, by defining custom objectives and reward functions. Recommended read:
References :
@analyticsindiamag.com
//
OpenAI has unveiled a new GitHub connector for its ChatGPT Deep Research tool, empowering developers to analyze their codebases directly within the AI assistant. This integration allows seamless connection of both private and public GitHub repositories, enabling comprehensive analysis to generate reports, documentation, and valuable insights based on the code. The Deep Research agent can now sift through source code and engineering documentation, respecting existing GitHub permissions by only accessing authorized repositories, streamlining the process of understanding and maintaining complex projects.
This new functionality aims to simplify code analysis and documentation processes, making it easier for developers to understand and maintain complex projects. Developers can leverage the connector to implement new APIs by finding real examples in their codebase, break down product specifications into manageable technical tasks with dependencies mapped out, or generate summaries of code structure and patterns for onboarding new team members or creating technical documentation. OpenAI Product Leader Nate Gonzalez stated that users found ChatGPT's deep research agent so valuable that they wanted it to connect to their internal sources, in addition to the web. The GitHub connector is currently rolling out to ChatGPT Plus, Pro, and Team users. Enterprise and Education customers will gain access soon. OpenAI emphasizes that the connector respects existing permissions structures and honors GitHub permission settings. This launch follows the recent integration of ChatGPT Team with tools like Google Drive, furthering OpenAI's goal of seamlessly integrating ChatGPT into internal workflows by pulling relevant context from various platforms where knowledge typically resides within organizations. OpenAI also plans to add more deep research connectors in the future. Recommended read:
References :
@the-decoder.com
//
References:
techxplore.com
, THE DECODER
,
OpenAI has launched a new initiative called "OpenAI for Countries" in collaboration with the US government, aimed at assisting countries in building their own artificial intelligence infrastructures. This program seeks to promote democratic AI globally and provide an alternative to versions of AI that could be used to consolidate power. The initiative follows interest expressed by several countries after the AI Action Summit in Paris, where the idea of "Stargate"-style projects was discussed.
The "OpenAI for Countries" program aims to launch ten initial projects with individual countries or regions. These projects will involve helping to build in-country data center capacity, delivering customized instances of ChatGPT tailored for local languages and cultures, and raising and deploying national start-up funds. OpenAI, in coordination with the US government, will assist partner countries in improving health care, education, and public services through these customized AI solutions. Funding will come from both OpenAI and participating governments. In exchange for OpenAI's assistance, partner countries are expected to invest in expanding the global Stargate Project. This project, announced by former US President Donald Trump, aims to invest up to $500 billion in AI infrastructure, solidifying US leadership in AI technology. According to OpenAI, this collaboration will foster a growing global network effect for democratic AI. The effort underscores the importance of acting now to support countries preferring to build on democratic AI rails and providing a clear alternative to authoritarian versions of AI. Recommended read:
References :
Carl Franzen@AI News | VentureBeat
//
References:
pub.towardsai.net
, thezvi.wordpress.com
,
OpenAI is facing increased scrutiny regarding its operational structure, leading to a notable reversal in its plans. The company, initially founded as a nonprofit, will now retain the nonprofit's governance control, ensuring that the original mission remains at the forefront. This decision comes after "constructive dialogue" with the Attorney Generals of Delaware and California, suggesting a potential legal challenge if OpenAI had proceeded with its initial plan to convert fully into a profit-maximizing entity. The company aims to maintain its commitment to developing Artificial General Intelligence (AGI) for the benefit of all humanity, and CEO Sam Altman insists that OpenAI is "not a normal company and never will be."
As part of this restructuring, OpenAI will transition its for-profit arm, currently an LLC, into a Public Benefit Corporation (PBC). This move aims to balance the interests of shareholders with the company's core mission. The nonprofit will remain a large shareholder in the PBC, giving it the resources to support its beneficial objectives. The company is getting rid of the capped profit structure, which may allow the company to be more aggressive in the marketplace. Bret Taylor, Chairman of the Board of OpenAI, emphasized that the company will continue to be overseen and controlled by the nonprofit. This updated plan demonstrates a commitment to the original vision of OpenAI while adapting to the demands of funding AGI development, which Altman estimates will require "hundreds of billions of dollars of compute." Further demonstrating its commitment to advancing AI technology, OpenAI is reportedly acquiring Windsurf (formerly Codeium) for $3 billion. While specific details of the acquisition are not provided, it's inferred that Windsurf's coding capabilities will be integrated into OpenAI's AI models, potentially enhancing their coding abilities. The acquisition aligns with OpenAI's broader strategy of pushing the boundaries of AI capabilities and making them accessible to a wider audience. This move may improve the abilities of models like the o-series (rewarding verifiable math, science, and code solutions) and agentic o3 models (rewarding tool use), which the industry is pushing forward aggressively with new training approaches. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |