News from the AI & ML world
Jesus Rodriguez@TheSequence
//
OpenAI has recently launched new audio features and tools aimed at enhancing the capabilities of AI agents. The releases include updated transcription and text-to-speech models, as well as tools for building AI agents. The audio models, named gpt-4o-transcribe and gpt-4o-mini-transcribe, promise better performance than the previous Whisper models, achieving lower word error rates across multiple languages and demonstrating improvements in challenging audio conditions like varying accents and background noise. These models are built on top of language models, making them potentially vulnerable to prompt injection attacks.
OpenAI also unveiled new tools for AI agent development, featuring a Responses API, built-in web search, file search, and computer use functionalities, alongside an open-source Agents SDK. Furthermore, they introduced o1 Pro, a new reasoning model, positioned for complex reasoning tasks, comes with a high cost, priced at $150 per million input tokens and $600 per million output tokens. The gpt-4o-mini-tts text-to-speech model introduces "steerability", allowing developers to control the tone and delivery of the model.
ImgSrc: substackcdn.com
References :
- Data Phoenix: OpenAI Launches New Tools for Building AI Agents
- Fello AI: OpenAI's new o1 Pro pricing strategy with a substantial markup compared to previous models.
- TheSequence: The Sequence Engineering #513: A Deep Dive Into OpenAI's New Tools for Developing AI Agents
- AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
- Windows Copilot News: Canadian Media Outlets Sue OpenAI Over Copyright Infringement
- www.techrepublic.com: Have Some Spare Cash? You’ll Need it for OpenAI’s New API
- bsky.app: Discussion of OpenAI's new o1-Pro API pricing and its implications for the AI community.
- Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
- bsky.app: This blog post discusses OpenAI's new audio models, noting their promising features but also mentioning the issue of mixing instructions and data in the same token stream.
- www.techrepublic.com: This article reports on OpenAI's new text-to-speech and speech-to-text tools based on GPT-4o, highlighting their capabilities and potential applications but also mentioning a possible similar path for video.
- Analytics Vidhya: OpenAI's Audio Models: How to Access, Features, Applications, and More
- MarkTechPost: OpenAI Introduced Advanced Audio Models ‘gpt-4o-mini-tts’, ‘gpt-4o-transcribe’, and ‘gpt-4o-mini-transcribe’: Enhancing Real-Time Speech Synthesis and Transcription Capabilities for Developers
- Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
- THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
- Composio: Finally, OpenAI gave in and launched a new agentic framework called Agents SDK.
- Last Week in AI: Our 204th episode with a summary and discussion of last week's big AI news! Recorded on 03/21/2025 Hosted by and . Feel free to email us your questions and feedback at and/or Read out our text newsletter and comment on the podcast at . https://discord.gg/nTyezGSKwP In this episode: Baidu launched two new multimodal models, Ernie 4.5 and Ernie X1, boasting competitive pricing and capabilities compared to Western counterparts like GPT-4.5 and DeepSeek R1. OpenAI introduced new audio models, including impressive speech-to-text and text-to-speech systems, and added O1 Pro to their developer API at high costs, reflecting efforts for more profitability. Nvidia and Apple announced significant hardware advancements, including Nvidia's future GPU plans and Apple's new Mac Studio offering that can run DeepSeek R1. DeepSeek employees are facing travel restrictions, suggesting China is treating its AI development with increased secrecy and urgency, emphasizing a wartime footing in AI competition.
Classification:
- HashTags: #OpenAI #AIModels #SpeechRecognition
- Company: OpenAI
- Target: AI Developers
- Product: GPT-4
- Feature: Audio Models
- Type: AI
- Severity: Informative