News from the AI & ML world

DeeperML - #voiceai

Megan Crouse@techrepublic.com //
OpenAI has unveiled a suite of advancements, including enhanced audio models and a significantly more expensive AI reasoning model called o1 Pro. The new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved transcription capabilities compared to Whisper, although they are susceptible to prompt injection attacks due to their foundation on language models. Users can access these models via the Realtime API, enabling real-time transcription from microphone input using a standalone Python script.

OpenAI's o1 Pro comes with a steep price tag of $150 per million input tokens and $600 per million output tokens. This makes it ten times more expensive than the standard o1 model and twice as costly as GPT-4.5. While OpenAI claims o1 Pro "thinks harder" and delivers superior responses for complex reasoning tasks, early benchmarks suggest only incremental improvements. Access to o1 Pro is currently limited to developers who have spent at least $5 on OpenAI's API services, targeting users building AI agents and automation tools.

Recommended read:
References :
  • Fello AI: OpenAI Just Dropped Its Most Expensive AI Model Yet, And It Costs a Fortune
  • www.techrepublic.com: OpenAI Gives Its Agents a Voice – Now a ‘Medieval Knight’ Can Read Your Work Emails
  • AI News | VentureBeat: Describes OpenAI’s new voice AI model gpt-4o-transcribe and its ability to add speech to existing text apps.
  • MarkTechPost: Explains the release of advanced audio models gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe by OpenAI.
  • THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • www.producthunt.com: OpenAI GPT-4o Audio Models
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More

Keshav Kumaresan@DagsHub Blog //
AI is making waves in unexpected areas. A recent study has found that AI-generated memes are, on average, funnier and more shareable than those created solely by humans. Researchers from KTH Royal Institute of Technology, LMU Munich, and TU Darmstadt, discovered that memes crafted entirely by OpenAI's GPT-4 scored higher in humor, creativity, and shareability. However, human-created memes still hold the crown for the absolute funniest individual examples, showcasing the unique personal touch humans bring to humor.

The Cognitive Revolution podcast recently featured Andreessen Horowitz partners Olivia Moore and Anish Acharya discussing the rapid advancements in voice AI. The discussion explored how the latest improvements are enabling more natural voice interactions across various platforms. Businesses are already utilizing voice AI for tasks ranging from complex negotiations to after-hours customer support.

Recommended read:
References :
  • DagsHub Blog: Let's look deeper into the nuances of video segmentation and discuss different segmentation methods, challenges, and the potential future of the field.
  • The Cognitive Revolution: In this episode of The Cognitive Revolution, host Nathan Labenz speaks with Andreessen Horowitz partners Olivia Moore and Anish Acharya about the rapid evolution of voice AI technology and its real-world applications.
  • eWEEK: A new study reveals AI-generated memes are funnier than human-made ones on average, but the best memes still come from us. Is AI the future of internet humor?

Chris McKay@Maginative //
OpenAI has recently unveiled new audio models based on GPT-4o, significantly enhancing its text-to-speech and speech-to-text capabilities. These new tools are intended to give AI agents a voice, enabling a range of applications, with demonstrations including the ability for an AI to read emails in character. The announcement includes the introduction of new transcription models, specifically gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to outperform the existing Whisper model.

The text-to-speech and speech-to-text tools are based on GPT-4o. While these models show promise, some experts have noted potential vulnerabilities. Like other large language model (LLM)-driven multi-modal models, they appear susceptible to prompt-injection-adjacent issues, stemming from the mixing of instructions and data within the same token stream. OpenAI hinted it may take a similar path with video.

Recommended read:
References :
  • AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • bsky.app: I published some notes on OpenAI's new text-to-speech and speech-to-text models.
  • Samrat Man Singh: OpenAI announced some new audio models yesterday, including new transcription models( gpt-4o-transcribe and gpt-4o-mini-transcribe ).
  • www.techrepublic.com: The text-to-speech and speech-to-text tools are all based on GPT-4o. OpenAI hinted it may take a similar path with video.
  • MarkTechPost: Reports on OpenAI introducing advanced audio models.
  • Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
  • THE DECODER: OpenAI has released a new generation of audio models that let developers customize how their AI assistants speak.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • Last Week in AI: #204 - OpenAI Audio, Rubin GPUs, MCP, Zochi

Andrew Liszewski@The Verge //
References: The Verge , bsky.app , bsky.app ...
Amazon has announced Alexa+, a new, LLM-powered version of its popular voice assistant. This upgraded version will cost $19.99 per month, but will be included at no extra cost for Amazon Prime subscribers. Alexa+ boasts enhanced AI agent capabilities, enabling users to perform tasks like booking Ubers, creating study plans, and sending texts via voice command. These new features are intended to provide a more seamless and natural conversational experience. Early access to Alexa+ will begin in late March 2025 for customers with eligible Echo Show devices in the United States.

Amazon emphasizes that Alexa+ utilizes a "model agnostic" system, drawing on Amazon Bedrock and employing various AI models, including Amazon Nova and those from Anthropic, to optimize performance. This approach allows Alexa+ to choose the best model for each task, leveraging specialized "experts" for orchestrating services. With seamless integration into tens of thousands of devices and services, including news sources like Time, Reuters, and the Associated Press, Alexa+ provides accurate and real-time information.

Recommended read:
References :
  • The Verge: Alexa Plus’ AI upgrades cost $19.99, but it’s all free with Prime
  • bsky.app: Amazon has announced Alexa+, an LLM-powered version if Alexa that will cost $19.99 per month or free with Prime. It will provide typical AI agent capabilities like booking Ubers, creating study plans or texting a friend, all via voice command. Siri is now dead last when it comes to AI assistants. https://apnews.com/article/amazon-alexa-fee-ai-assistant-017c17bddfa6742d1e78873cdda3663f#
  • THE DECODER: Alexa+: Amazon's new AI assistant launches for $19.99, free with Prime
  • bsky.app: Amazon has announced Alexa+, an LLM-powered version if Alexa that will cost $19.99 per month or free with Prime.
  • Techstrong.ai: Amazon’s New Alexa+ Is GenAI-Powered
  • Dataconomy: Amazon revamps Alexa.com and updates its app
  • techcrunch.com: Amazon Alexa+ can do your grocery shopping, too
  • PCWorld: A new Alexa AI is coming: What it will cost and when you can try it
  • Dataconomy: Amazon unveils AI-powered Alexa Plus
  • PCMag Middle East ai: Amazon showed off the upgraded Alexa+ at a press event in New York City, revealing it can choose from a whole collection of generative AI models to fulfill your requests in a more conversational way. We got to check her out in action.
  • Shelly Palmer: Amazon Unveils Alexa+
  • PCMag Middle East ai: Amazon's AI-enhanced Alexa+ will be coming to virtually all Echo devices made in the last five years. That doesn't bode well if you were hoping for new models.
  • Maginative: Amazon Unveils Alexa+: A Smarter, More Conversational AI Assistant
  • AI News | VentureBeat: Rebuilding Alexa: How Amazon is mixing models, agents and browser-use for smarter AI
  • techcrunch.com: Amazon’s new and improved Alexa experience, Alexa+, starts at $19.99 per month, or free for Amazon Prime subscribers.
  • SiliconANGLE: Amazon debuts LLM-powered Alexa+ with expanded automation features
  • Play HT: Overview of the features and capabilities of Amazon's new AI-powered Alexa+ service.

Andrew Liszewski@The Verge //
References: bsky.app , Play HT , THE DECODER ...
Amazon has unveiled Alexa+, a generative AI-powered upgrade to its digital assistant, Alexa. This reboot includes a monthly subscription fee, marking a significant shift for the service. The new AI assistant was revealed at a news conference in New York, with Amazon showcasing its enhanced capabilities.

Alexa+ is scheduled to roll out in March 2025 for $20 per month, but it will be available for free to Amazon Prime subscribers. The AI assistant will work on "almost every" Alexa device the company has shipped. The service promises advanced features such as booking concert tickets, making dinner reservations, and organizing information from handwritten documents.

Recommended read:
References :
  • bsky.app: Details on new features and cost of Alexa+
  • Play HT: Information on how conversational agents can improve customer experience
  • Techstrong.ai: Details about Amazon's new Alexa+ GenAI-powered assistant.
  • THE DECODER: Alexa+: Amazon's new AI assistant launches for $19.99, free with Prime
  • Shelly Palmer: In a "better late than never" moment, Amazon has unveiled Alexa+: an advanced version of its voice assistant that integrates generative AI to enhance user interactions. Priced at $19.99 per month (but free for Amazon Prime members), the service is set to launch with early access in the U.S. next month, initially available on Echo Show devices, with plans for broader international and device expansion.
  • PCMag Middle East ai: Will Amazon Prime Get More Expensive When Alexa+ Arrives? Probably. - The price of a Prime membership has been static since 2022. Adding a free, next-gen Alexa+ AI assistant to its list of perks is likely too good to be true.
  • The Verge: Amazon has finally taken the wraps off its AI-enhanced version of Alexa, called Alexa Plus.
  • AI News | VentureBeat: VentureBeat reports Rebuilding Alexa: How Amazon is mixing models, agents and browser-use for smarter AI
  • PCMag Middle East ai: Meet the New Alexa+, She's Way Smarter, and a Lot Less Stiff
  • Maginative: Maginative covers Amazon Unveils Alexa+: A Smarter, More Conversational AI Assistant
  • The Verge: The Verge reports on Amazon's reinvention of Alexa with Alexa Plus.