OpenAI Releases New Audio Features and Agent Tools

Jesus Rodriguez@TheSequence //

OpenAI Releases New Audio Features and Agent Tools

OpenAI has recently launched new audio features and tools aimed at enhancing the capabilities of AI agents. The releases include updated transcription and text-to-speech models, as well as tools for building AI agents. The audio models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, promise better performance than the previous Whisper models, achieving lower word error rates across multiple languages and demonstrating improvements in challenging audio conditions like varying accents and background noise. These models are built on top of language models, making them potentially vulnerable to prompt injection attacks.

OpenAI also unveiled new tools for AI agent development, featuring a Responses API, built-in web search, file search, and computer use functionalities, alongside an open-source Agents SDK. Furthermore, they introduced o1 Pro, a new reasoning model, positioned for complex reasoning tasks, comes with a high cost, priced at $150 per million input tokens and $600 per million output tokens. The gpt-4o-mini-tts text-to-speech model introduces "steerability", allowing developers to control the tone and delivery of the model.

Original img attribution: https://substackcdn.com/image/fetch/w_1200,h_600,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930f02c5-7c21-4df5-8948-ff1084b2be29_1024x1024.png

ImgSrc: substackcdn.com

References :

Data Phoenix: OpenAI Launches New Tools for Building AI Agents
Fello AI: OpenAI's new o1 Pro pricing strategy with a substantial markup compared to previous models.
TheSequence: The Sequence Engineering #513: A Deep Dive Into OpenAI's New Tools for Developing AI Agents
AI News | VentureBeat: OpenAIâ€™s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
Windows Copilot News: Canadian Media Outlets Sue OpenAI Over Copyright Infringement
www.techrepublic.com: Have Some Spare Cash? Youâ€™ll Need it for OpenAIâ€™s New API
bsky.app: Discussion of OpenAI's new o1-Pro API pricing and its implications for the AI community.
Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
bsky.app: This blog post discusses OpenAI's new audio models, noting their promising features but also mentioning the issue of mixing instructions and data in the same token stream.
www.techrepublic.com: This article reports on OpenAI's new text-to-speech and speech-to-text tools based on GPT-4o, highlighting their capabilities and potential applications but also mentioning a possible similar path for video.
Analytics Vidhya: OpenAI's Audio Models: How to Access, Features, Applications, and More
MarkTechPost: OpenAI Introduced Advanced Audio Models â€˜gpt-4o-mini-ttsâ€™, â€˜gpt-4o-transcribeâ€™, and â€˜gpt-4o-mini-transcribeâ€™: Enhancing Real-Time Speech Synthesis and Transcription Capabilities for Developers
Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
Composio: Finally, OpenAI gave in and launched a new agentic framework called Agents SDK.
Last Week in AI: Our 204th episode with a summary and discussion of last week's big AI news! Recorded on 03/21/2025 Hosted by and . Feel free to email us your questions and feedback at and/orÂ Read out our text newsletter and comment on the podcast at . https://discord.gg/nTyezGSKwP In this episode: Baidu launched two new multimodal models, Ernie 4.5 and Ernie X1, boasting competitive pricing and capabilities compared to Western counterparts like GPT-4.5 and DeepSeek R1. OpenAI introduced new audio models, including impressive speech-to-text and text-to-speech systems, and added O1 Pro to their developer API at high costs, reflecting efforts for more profitability. Nvidia and Apple announced significant hardware advancements, including Nvidia's future GPU plans and Apple's new Mac Studio offering that can run DeepSeek R1. DeepSeek employees are facing travel restrictions, suggesting China is treating its AI development with increased secrecy and urgency, emphasizing a wartime footing in AI competition.

Classification:

HashTags: #OpenAI #AIModels #SpeechRecognition
Company: OpenAI
Target: AI Developers
Product: GPT-4
Feature: Audio Models
Type: AI
Severity: Informative

News from the AI & ML world

DeeperML

OpenAI Releases New Audio Features and Agent Tools

Classification: