News from the AI & ML world

DeeperML - #gpt4o

Dr. Hura@Digital Information World //
OpenAI has released exciting updates for ChatGPT's Advanced Voice Mode, aimed at creating more natural and engaging user interactions. The primary focus of these updates is to reduce interruptions during conversations, a common issue where the AI would interject during pauses, hindering the flow of natural dialogue. This improvement allows users to take short breaths or think without the AI prematurely responding.

The Advanced Voice Mode is now available to all ChatGPT users with paid plans. Those with the free version of the chatbot will get access to the latest Advanced Voice Mode that enables users to pause without getting interrupted or when they want to speak to the AI assistant. The system requirements include Android app version 1.2024.206 or later, and for iOS, app version 1.2024.206 or later with iOS 16.4 or later.

In addition to minimizing interruptions, the update introduces a more personable tone to ChatGPT's voice interactions. The AI is designed to be more specific, direct, creative, and engaging in its replies, making conversations feel less robotic and more human-like. These changes come amid competition from other companies launching similar AI voice assistants, such as Sesame's new tool, Maya and Miles.

Recommended read:
References :
  • Digital Information World: This will make the AI assistant more personable and interrupt users so much less.
  • gHacks Technology News: OpenAI Updates ChatGPT Voice Mode for More Natural and Engaging Interactions
  • THE DECODER: OpenAI brings native image generation to ChatGPT
  • AI News | VentureBeat: As AI-generated images become more precise and accessible, GPT-4o represents a significant step forward in the space.
  • www.tomsguide.com: OpenAI just unveiled new ChatGPT image generator powered by Sora — here's what you can do now
  • www.zdnet.com: ChatGPT finally gets a much better image generator - how to try it for free
  • How-To Geek: ChatGPT Can Finally Generate Images With Legible Text
  • www.techradar.com: OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
  • Simon Willison: OpenAI's new multi-modal image output, added to GPT-4o and ChatGPT this morning, finally gave me the selfie with a bear I've always wanted
  • Analytics Vidhya: A few days ago, Gemini rolled out its image generation feature in the 2.0 Flash version, and the internet erupted with stunning examples. Now, OpenAI is stepping up to the plate, raising the bar even higher by introducing native image generation (powered by GPT-4o) in ChatGPT.
  • www.techrepublic.com: As of March, any account holder can create images using GPT-4o in ChatGPT for free. See how to make ChatGPT work for your business.
  • SiliconANGLE: OpenAI upgrades ChatGPT’s image generation capabilities
  • TestingCatalog: OpenAI Brings Advanced Image Generation to GPT-4o in ChatGPT and Sora
  • thezvi.wordpress.com: Fun With GPT-4o Image Generation
  • Simon Willison's Weblog: Introducing 4o Image Generation
  • The Tech Basic: OpenAI’s ChatGPT Now Generates Highly Detailed Images With GPT-4o
  • gHacks Technology News: ChatGPT integrates GPT-4o for more realistic and detailed image creation
  • futurism.com: OpenAI is rolling out brand new image generation capabilities today for ChatGPT. And guess what? It finally, almost, nails text.
  • www.tomsguide.com: Here's what happened when I tested ChatGPT-4o image generator as well as what I like and don't like about this model.
  • thezvi.substack.com: Fun With GPT-4o Image Generation
  • PCMag Middle East ai: OpenAI has added AI image generation capabilities to ChatGPT. Users can now select the prompt, provide prompts, and get desired images within the regular ChatGPT window.
  • www.tomsguide.com: OpenAI is rolling out a series of upgrades to ChatGPT's Advanced Voice Mode this week, and they could make a big difference to your time with the chatbot.

Maria Deutscher@SiliconANGLE //
OpenAI has officially rolled out native image generation capabilities within ChatGPT, powered by its GPT-4o model. This significant upgrade replaces the previous DALL-E integration, aiming for more consistent results, fewer content restrictions and improved accuracy in interpreting user prompts. The new feature is available to all ChatGPT users, including those on the free tier, with API access for developers planned in the near future.

The integration of image generation into GPT-4o allows users to create detailed and lifelike visuals through natural conversation, making it easier to communicate effectively through visuals. GPT-4o can accurately render text within images, supports complex prompts with up to 20 different objects, and can generate images based on uploaded references. Users can refine their results through natural conversation, with the AI maintaining context across multiple exchanges - making it easier to iteratively perfect an image through dialogue. Early testing shows the system produces more consistent images than DALL-E 3.

Recommended read:
References :
  • THE DECODER: OpenAI brings native image generation to ChatGPT
  • AI News | VentureBeat: ‘Insane’: OpenAI introduces GPT-4o native image generation and it’s already wowing users
  • SiliconANGLE: OpenAI upgrades ChatGPT’s image generation capabilities
  • www.tomsguide.com: I just went hands-on with ChatGPT-4o's enhanced image generator and I can't believe this is free
  • www.tomsguide.com: OpenAI just unveiled new ChatGPT image generator powered by Sora — here's what you can do now
  • Search Engine Journal: OpenAI Rolls Out GPT-4o Image Creation To Everyone
  • TestingCatalog: OpenAI brings advanced image generation to GPT-4o in ChatGPT and Sora
  • Quartz: OpenAI is making it easier to generate realistic photos
  • How-To Geek: ChatGPT Can Finally Generate Images With Legible Text
  • www.techradar.com: ChatGPT integrated image generation is powerful and, maybe, worrisome.
  • www.zdnet.com: ChatGPT finally gets a much better image generator - how to try it for free
  • Fello AI: Discusses OpenAI integrating native image generation directly into ChatGPT.
  • AI4Business: OpenAI 4o image generation: tutti i dettagli della model card
  • www.tomsguide.com: ChatGPT’s AI image generator just got a huge upgrade — here’s 7 incredible examples of what it can do
  • THE DECODER: OpenAI outlines new image generation rules for ChatGPT
  • AI News | VentureBeat: The new feature has been widely embraced by users of X, but it raises copyright concerns and goes against Studio Ghibli's creator.
  • www.zdnet.com: ChatGPT's new image generator creates stunning images for some users.

Chris McKay@Maginative //
OpenAI has recently unveiled new audio models based on GPT-4o, significantly enhancing its text-to-speech and speech-to-text capabilities. These new tools are intended to give AI agents a voice, enabling a range of applications, with demonstrations including the ability for an AI to read emails in character. The announcement includes the introduction of new transcription models, specifically gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to outperform the existing Whisper model.

The text-to-speech and speech-to-text tools are based on GPT-4o. While these models show promise, some experts have noted potential vulnerabilities. Like other large language model (LLM)-driven multi-modal models, they appear susceptible to prompt-injection-adjacent issues, stemming from the mixing of instructions and data within the same token stream. OpenAI hinted it may take a similar path with video.

Recommended read:
References :
  • AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • bsky.app: I published some notes on OpenAI's new text-to-speech and speech-to-text models.
  • Samrat Man Singh: OpenAI announced some new audio models yesterday, including new transcription models( gpt-4o-transcribe and gpt-4o-mini-transcribe ).
  • www.techrepublic.com: The text-to-speech and speech-to-text tools are all based on GPT-4o. OpenAI hinted it may take a similar path with video.
  • MarkTechPost: Reports on OpenAI introducing advanced audio models.
  • Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
  • THE DECODER: OpenAI has released a new generation of audio models that let developers customize how their AI assistants speak.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • Last Week in AI: #204 - OpenAI Audio, Rubin GPUs, MCP, Zochi

@techxplore.com //
OpenAI's GPT-4o has introduced a new image generation feature, allowing users to transform images into various art styles, including the distinctive look of Studio Ghibli. This built-in tool, accessible to all ChatGPT users since late March 2025, enables the conversion of personal photos, memes, and historical images into animated versions reminiscent of films like "Spirited Away" and "My Neighbor Totoro." The trend quickly gained popularity, with users sharing their creations on platforms like X/Twitter.

The rise of Ghibli-style AI art has stirred both excitement and ethical debates. While many users have enjoyed reimagining familiar images in the iconic animation style, concerns about AI's use of copyrighted creative works and its impact on human artists have surfaced. Hayao Miyazaki, known for his traditional hand-drawn animation, has expressed reservations about AI's role in the industry. Despite these concerns, OpenAI has largely embraced the trend, even incorporating measures to prevent direct imitation of living artists' styles.

Recommended read:
References :

Emilia David@AI News | VentureBeat //
OpenAI is enhancing GPT-4o with improved instruction following and problem-solving capabilities. The company has updated GPT-4o to better handle detailed instructions, especially when processing multi-task prompts, thus improving performance and intuition. This model can be accessed by subscribers through the API as "chatgpt-4o-latest" and in ChatGPT.

OpenAI has announced its support for Anthropic’s Model Context Protocol (MCP), an open-source standard designed to streamline the integration between AI assistants and various data systems. With MCP, AI models can connect directly to systems where data lives, eliminating the need for custom integrations and allowing real-time access to business tools and repositories. OpenAI will integrate MCP support into its Agents SDK immediately, with the ChatGPT desktop app and Responses API following soon. This protocol aims to create a unified framework for AI applications to access and utilize external data sources.

ChatGPT Team users can now add internal databases as references, allowing the platform to respond with improved contextual awareness. By connecting internal knowledge bases, ChatGPT Team could become more invaluable to users who ask the platform strategy questions or for analysis. This allows users to perform semantic searches of their data, link directly to internal sources in responses, and ensure ChatGPT understands internal company lingo.

Recommended read:
References :
  • Shelly Palmer: In a surprising move, OpenAI announced yesterday it will adopt rival Anthropic's MCP across its product line.
  • THE DECODER: OpenAI has updated GPT-4o to better handle detailed instructions, especially when processing multi-task prompts.
  • AI News | VentureBeat: OpenAI adds internal data referencing
  • Analytics Vidhya: OpenAI has announced its support for Anthropic’s Model Context Protocol (MCP), an open-source standard designed to streamline the integration between AI assistants and various data systems.

Matthias Bastian@THE DECODER //
OpenAI has released another update to its GPT-4o model in ChatGPT, delivering enhanced instruction following capabilities, particularly for prompts with multiple requests. This improvement is a significant upgrade which has also allowed it to acheive second place on the LM Arena leaderboard, only being beaten by Gemini 2.5. The update also boasts improved capabilities in handling complex technical and coding problems, alongside enhanced intuition and creativity, with the added benefit of fewer emojis in its responses.

This update, referred to as chatgpt-4o-latest, is also now available in their API, and also gives access to the model used for ChatGPT. This version is priced higher at $5/million input and $15/million output compared to the regular GPT-4o, which is priced at $2.50/$10. OpenAI plans to bring these improvements to a dated model in the API in the coming weeks, and although they released the update on Twitter, users have complained that a more suitable place for this announcement would be the OpenAI Platform Changelog.

Recommended read:
References :

@www.techmeme.com //
References: www.techmeme.com , Res Obscura ,
Recent studies have shown that AI models are reaching new heights in research capabilities. Three case studies, leveraging GPT-4o, OpenAI o1, and Claude 3.5 Sonnet, demonstrate that these models are now capable of conducting historical research at a PhD level. This represents a significant milestone in AI's ability to reliably analyze and interpret complex data, opening new possibilities for researchers and professionals across various domains. Benjamin Breen of Res Obscura, highlighted the implications of these advancements and how they could lead to breakthroughs, especially in fields such as biology, physics, and medicine.

OpenAI has also announced a new AI agent called Deep Research, specifically designed to assist users with in-depth and complex research tasks. Available to ChatGPT Pro subscribers with a limit of 100 queries per month, Deep Research aims to streamline the process of gathering and analyzing information from multiple sources. This new feature targets professionals in finance, science, policy, and engineering, as well as individuals making significant purchases requiring thorough research. Future plans include expanding access to Plus and Team users, increasing query limits, and incorporating multimedia outputs like images and data visualizations. Additionally, OpenAI intends to enable connectivity to specialized data sources, including subscription-based and internal resources, to further enhance the robustness and personalization of its output.

Recommended read:
References :
  • www.techmeme.com: Three case studies using GPT-4o, OpenAI o1, and Claude 3.5 Sonnet for historical research show that the models are now good enough for PhD-level analysis (Benjamin Breen/Res Obscura)
  • Res Obscura: Three case studies using GPT-4o, OpenAI o1, and Claude 3.5 Sonnet for historical research show that the models are now good enough for PhD-level analysis (Benjamin Breen/Res Obscura)
  • techcrunch.com: OpenAI unveils Deep Research, an AI agent for creating in-depth reports, available to subscribers of the $200 ChatGPT Pro tier and limited to 100 queries/month