News from the AI & ML world

DeeperML - #gpt-4o

Maria Deutscher@SiliconANGLE //
OpenAI has officially rolled out native image generation capabilities within ChatGPT, powered by its GPT-4o model. This significant upgrade replaces the previous DALL-E integration, aiming for more consistent results, fewer content restrictions and improved accuracy in interpreting user prompts. The new feature is available to all ChatGPT users, including those on the free tier, with API access for developers planned in the near future.

The integration of image generation into GPT-4o allows users to create detailed and lifelike visuals through natural conversation, making it easier to communicate effectively through visuals. GPT-4o can accurately render text within images, supports complex prompts with up to 20 different objects, and can generate images based on uploaded references. Users can refine their results through natural conversation, with the AI maintaining context across multiple exchanges - making it easier to iteratively perfect an image through dialogue. Early testing shows the system produces more consistent images than DALL-E 3.

Recommended read:
References :
  • THE DECODER: OpenAI brings native image generation to ChatGPT
  • AI News | VentureBeat: ‘Insane’: OpenAI introduces GPT-4o native image generation and it’s already wowing users
  • SiliconANGLE: OpenAI upgrades ChatGPT’s image generation capabilities
  • www.tomsguide.com: I just went hands-on with ChatGPT-4o's enhanced image generator and I can't believe this is free
  • www.tomsguide.com: OpenAI just unveiled new ChatGPT image generator powered by Sora — here's what you can do now
  • Search Engine Journal: OpenAI Rolls Out GPT-4o Image Creation To Everyone
  • TestingCatalog: OpenAI brings advanced image generation to GPT-4o in ChatGPT and Sora
  • Quartz: OpenAI is making it easier to generate realistic photos
  • How-To Geek: ChatGPT Can Finally Generate Images With Legible Text
  • www.techradar.com: ChatGPT integrated image generation is powerful and, maybe, worrisome.
  • www.zdnet.com: ChatGPT finally gets a much better image generator - how to try it for free
  • Fello AI: Discusses OpenAI integrating native image generation directly into ChatGPT.
  • AI4Business: OpenAI 4o image generation: tutti i dettagli della model card
  • www.tomsguide.com: ChatGPT’s AI image generator just got a huge upgrade — here’s 7 incredible examples of what it can do
  • THE DECODER: OpenAI outlines new image generation rules for ChatGPT

Jesus Rodriguez@TheSequence //
OpenAI has recently launched new audio features and tools aimed at enhancing the capabilities of AI agents. The releases include updated transcription and text-to-speech models, as well as tools for building AI agents. The audio models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, promise better performance than the previous Whisper models, achieving lower word error rates across multiple languages and demonstrating improvements in challenging audio conditions like varying accents and background noise. These models are built on top of language models, making them potentially vulnerable to prompt injection attacks.

OpenAI also unveiled new tools for AI agent development, featuring a Responses API, built-in web search, file search, and computer use functionalities, alongside an open-source Agents SDK. Furthermore, they introduced o1 Pro, a new reasoning model, positioned for complex reasoning tasks, comes with a high cost, priced at $150 per million input tokens and $600 per million output tokens. The gpt-4o-mini-tts text-to-speech model introduces "steerability", allowing developers to control the tone and delivery of the model.

Recommended read:
References :
  • Data Phoenix: OpenAI Launches New Tools for Building AI Agents
  • Fello AI: OpenAI's new o1 Pro pricing strategy with a substantial markup compared to previous models.
  • TheSequence: The Sequence Engineering #513: A Deep Dive Into OpenAI's New Tools for Developing AI Agents
  • AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
  • Windows Copilot News: Canadian Media Outlets Sue OpenAI Over Copyright Infringement
  • www.techrepublic.com: Have Some Spare Cash? You’ll Need it for OpenAI’s New API
  • bsky.app: Discussion of OpenAI's new o1-Pro API pricing and its implications for the AI community.
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • bsky.app: This blog post discusses OpenAI's new audio models, noting their promising features but also mentioning the issue of mixing instructions and data in the same token stream.
  • www.techrepublic.com: This article reports on OpenAI's new text-to-speech and speech-to-text tools based on GPT-4o, highlighting their capabilities and potential applications but also mentioning a possible similar path for video.
  • Analytics Vidhya: OpenAI's Audio Models: How to Access, Features, Applications, and More
  • MarkTechPost: OpenAI Introduced Advanced Audio Models ‘gpt-4o-mini-tts’, ‘gpt-4o-transcribe’, and ‘gpt-4o-mini-transcribe’: Enhancing Real-Time Speech Synthesis and Transcription Capabilities for Developers
  • Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
  • THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
  • Composio: Finally, OpenAI gave in and launched a new agentic framework called Agents SDK.
  • Last Week in AI: Our 204th episode with a summary and discussion of last week's big AI news! Recorded on 03/21/2025 Hosted by and . Feel free to email us your questions and feedback at and/or  Read out our text newsletter and comment on the podcast at . https://discord.gg/nTyezGSKwP In this episode: Baidu launched two new multimodal models, Ernie 4.5 and Ernie X1, boasting competitive pricing and capabilities compared to Western counterparts like GPT-4.5 and DeepSeek R1. OpenAI introduced new audio models, including impressive speech-to-text and text-to-speech systems, and added O1 Pro to their developer API at high costs, reflecting efforts for more profitability. Nvidia and Apple announced significant hardware advancements, including Nvidia's future GPU plans and Apple's new Mac Studio offering that can run DeepSeek R1. DeepSeek employees are facing travel restrictions, suggesting China is treating its AI development with increased secrecy and urgency, emphasizing a wartime footing in AI competition.

Chris McKay@Maginative //
OpenAI has recently unveiled new audio models based on GPT-4o, significantly enhancing its text-to-speech and speech-to-text capabilities. These new tools are intended to give AI agents a voice, enabling a range of applications, with demonstrations including the ability for an AI to read emails in character. The announcement includes the introduction of new transcription models, specifically gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to outperform the existing Whisper model.

The text-to-speech and speech-to-text tools are based on GPT-4o. While these models show promise, some experts have noted potential vulnerabilities. Like other large language model (LLM)-driven multi-modal models, they appear susceptible to prompt-injection-adjacent issues, stemming from the mixing of instructions and data within the same token stream. OpenAI hinted it may take a similar path with video.

Recommended read:
References :
  • AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • bsky.app: I published some notes on OpenAI's new text-to-speech and speech-to-text models.
  • Samrat Man Singh: OpenAI announced some new audio models yesterday, including new transcription models( gpt-4o-transcribe and gpt-4o-mini-transcribe ).
  • www.techrepublic.com: The text-to-speech and speech-to-text tools are all based on GPT-4o. OpenAI hinted it may take a similar path with video.
  • MarkTechPost: Reports on OpenAI introducing advanced audio models.
  • Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
  • THE DECODER: OpenAI has released a new generation of audio models that let developers customize how their AI assistants speak.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • Last Week in AI: #204 - OpenAI Audio, Rubin GPUs, MCP, Zochi

Megan Crouse@techrepublic.com //
OpenAI has unveiled a suite of advancements, including enhanced audio models and a significantly more expensive AI reasoning model called o1 Pro. The new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved transcription capabilities compared to Whisper, although they are susceptible to prompt injection attacks due to their foundation on language models. Users can access these models via the Realtime API, enabling real-time transcription from microphone input using a standalone Python script.

OpenAI's o1 Pro comes with a steep price tag of $150 per million input tokens and $600 per million output tokens. This makes it ten times more expensive than the standard o1 model and twice as costly as GPT-4.5. While OpenAI claims o1 Pro "thinks harder" and delivers superior responses for complex reasoning tasks, early benchmarks suggest only incremental improvements. Access to o1 Pro is currently limited to developers who have spent at least $5 on OpenAI's API services, targeting users building AI agents and automation tools.

Recommended read:
References :
  • Fello AI: OpenAI Just Dropped Its Most Expensive AI Model Yet, And It Costs a Fortune
  • www.techrepublic.com: OpenAI Gives Its Agents a Voice – Now a ‘Medieval Knight’ Can Read Your Work Emails
  • AI News | VentureBeat: Describes OpenAI’s new voice AI model gpt-4o-transcribe and its ability to add speech to existing text apps.
  • MarkTechPost: Explains the release of advanced audio models gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe by OpenAI.
  • THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • www.producthunt.com: OpenAI GPT-4o Audio Models
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More

Emilia David@AI News | VentureBeat //
OpenAI is enhancing GPT-4o with improved instruction following and problem-solving capabilities. The company has updated GPT-4o to better handle detailed instructions, especially when processing multi-task prompts, thus improving performance and intuition. This model can be accessed by subscribers through the API as "chatgpt-4o-latest" and in ChatGPT.

OpenAI has announced its support for Anthropic’s Model Context Protocol (MCP), an open-source standard designed to streamline the integration between AI assistants and various data systems. With MCP, AI models can connect directly to systems where data lives, eliminating the need for custom integrations and allowing real-time access to business tools and repositories. OpenAI will integrate MCP support into its Agents SDK immediately, with the ChatGPT desktop app and Responses API following soon. This protocol aims to create a unified framework for AI applications to access and utilize external data sources.

ChatGPT Team users can now add internal databases as references, allowing the platform to respond with improved contextual awareness. By connecting internal knowledge bases, ChatGPT Team could become more invaluable to users who ask the platform strategy questions or for analysis. This allows users to perform semantic searches of their data, link directly to internal sources in responses, and ensure ChatGPT understands internal company lingo.

Recommended read:
References :
  • Shelly Palmer: In a surprising move, OpenAI announced yesterday it will adopt rival Anthropic's MCP across its product line.
  • THE DECODER: OpenAI has updated GPT-4o to better handle detailed instructions, especially when processing multi-task prompts.
  • AI News | VentureBeat: OpenAI adds internal data referencing
  • Analytics Vidhya: OpenAI has announced its support for Anthropic’s Model Context Protocol (MCP), an open-source standard designed to streamline the integration between AI assistants and various data systems.

@singularityhub.com //
OpenAI models, including the recently released GPT-4o, are facing scrutiny due to their vulnerability to "jailbreaks." Researchers have demonstrated that targeted attacks can bypass the safety measures implemented in these models, raising concerns about their potential misuse. These jailbreaks involve manipulating the models through techniques like "fine-tuning," where models are retrained to produce responses with malicious intent, effectively creating an "evil twin" capable of harmful tasks. This highlights the ongoing need for further development and robust safety measures within AI systems.

The discovery of these vulnerabilities poses significant risks for applications relying on the safe behavior of OpenAI's models. The concern is that, as AI capabilities advance, the potential for harm may outpace the ability to prevent it. This risk is particularly urgent as open-weight models, once released, cannot be recalled, underscoring the need to collectively define an acceptable risk threshold and take action before that threshold is crossed. A bad actor could disable safeguards and create the “evil twin” of a model: equally capable, but with no ethical or legal bounds.

Recommended read:
References :
  • www.artificialintelligence-news.com: Recent research has highlighted potential vulnerabilities in OpenAI models, demonstrating that their safety measures can be bypassed by targeted attacks. These findings underline the ongoing need for further development in AI safety systems.
  • www.datasciencecentral.com: OpenAI models, although advanced, are not completely secure from manipulation and potential misuse. Researchers have discovered vulnerabilities that can be exploited to retrain models for malicious purposes, highlighting the importance of ongoing research in AI safety.
  • Blog (Main): OpenAI models have been found vulnerable to manipulation through "jailbreaks," prompting concerns about their safety and potential misuse in malicious activities. This poses a significant risk for applications relying on the models’ safe behavior.
  • SingularityHub: This article discusses Anthropic's new system for defending against AI jailbreaks and its successful resistance to hacking attempts.

Ryan Daws@Developer Tech News //
OpenAI has unveiled a new suite of APIs and tools aimed at streamlining the development of AI agents. This initiative addresses the challenges faced by software developers in building production-ready applications, with the goal of transforming how they create systems capable of autonomously handling complex, multi-step tasks. The new offerings are designed to empower developers and enterprises to build, deploy, and scale reliable, high-performing AI agents more easily.

The suite includes the Responses API, which combines the simplicity of the Chat Completions API with the tool-use capabilities of the Assistants API. This API supports built-in tools like web search, file search, and computer use, facilitating the creation of agents that can interact effectively with real-world systems. Additionally, OpenAI has introduced the Agents SDK, an orchestration framework that simplifies the design and scaling of agents, featuring built-in observability tools for performance logging, visualization, and analysis. These tools are expected to enhance productivity and innovation across various industries by enabling the creation of more efficient and capable AI-driven applications.

Recommended read:
References :
  • Developer Tech News: Discusses how reasoning models from OpenAI are streamlining software development.
  • Windows Report: OpenAI has announced the release of a comprehensive suite of tools and APIs designed to simplify the development of AI agents. The company says the new AI agents aim to transform how developers create systems capable of autonomously handling complex, multi-step tasks. The new tools include the Responses API, which combines the Chat Completions API’s

Ellie Ramirez-Camara@Data Phoenix //
References: Data Phoenix , Fello AI , TheSequence ...
OpenAI has recently unveiled a suite of new tools aimed at simplifying the development of AI agents. This release includes the Responses API, designed as a flexible foundation for building agents, along with built-in capabilities for web search, file search, and computer use. An open-source Agents SDK is also part of the package, intended to help developers orchestrate both single-agent and multi-agent workflows, providing the essential building blocks for creating reliable AI agents.

OpenAI has also introduced o1 Pro, its latest AI reasoning model, but it comes with a steep price tag. At $150 per million input tokens and $600 per million output tokens, o1 Pro is significantly more expensive than previous models, costing ten times the price of the standard o1 model and twice as much as GPT-4.5. OpenAI claims that o1 Pro "thinks harder" and provides "consistently better" responses, especially for complex tasks, but early benchmarks suggest the improvements may be incremental.

Recommended read:
References :
  • Data Phoenix: OpenAI has launched a comprehensive suite of new tools including the Responses API, built-in capabilities for web search, file search, and computer use, and an open-source Agents SDK—all designed to make it significantly easier for developers to build AI agents.
  • Fello AI: Move over, GPT-4.5, there’s a new overpriced AI model in town. OpenAI has just launched o1 Pro, its latest AI reasoning model, and it comes with a price tag so absurd it makes previous models look like dollar-store knockoffs.
  • Shelly Palmer: Last week, Google and OpenAI asked the White House for permission to train AI on copyrighted content, arguing that restrictive laws will cripple U.S. innovation while China advances unchecked.
  • TheSequence: Responses API, file and web search and multi agent coordination are some of the key capabilities of the new stack.
  • bsky.app: Bsky post about OpenAI's new text-to-speech and speech-to-text models

Matthias Bastian@THE DECODER //
OpenAI has released another update to its GPT-4o model in ChatGPT, delivering enhanced instruction following capabilities, particularly for prompts with multiple requests. This improvement is a significant upgrade which has also allowed it to acheive second place on the LM Arena leaderboard, only being beaten by Gemini 2.5. The update also boasts improved capabilities in handling complex technical and coding problems, alongside enhanced intuition and creativity, with the added benefit of fewer emojis in its responses.

This update, referred to as chatgpt-4o-latest, is also now available in their API, and also gives access to the model used for ChatGPT. This version is priced higher at $5/million input and $15/million output compared to the regular GPT-4o, which is priced at $2.50/$10. OpenAI plans to bring these improvements to a dated model in the API in the coming weeks, and although they released the update on Twitter, users have complained that a more suitable place for this announcement would be the OpenAI Platform Changelog.

Recommended read:
References :