Megan Crouse@techrepublic.com
//
OpenAI has unveiled a suite of advancements, including enhanced audio models and a significantly more expensive AI reasoning model called o1 Pro. The new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved transcription capabilities compared to Whisper, although they are susceptible to prompt injection attacks due to their foundation on language models. Users can access these models via the Realtime API, enabling real-time transcription from microphone input using a standalone Python script.
OpenAI's o1 Pro comes with a steep price tag of $150 per million input tokens and $600 per million output tokens. This makes it ten times more expensive than the standard o1 model and twice as costly as GPT-4.5. While OpenAI claims o1 Pro "thinks harder" and delivers superior responses for complex reasoning tasks, early benchmarks suggest only incremental improvements. Access to o1 Pro is currently limited to developers who have spent at least $5 on OpenAI's API services, targeting users building AI agents and automation tools. Recommended read:
References :
Keshav Kumaresan@DagsHub Blog
//
References:
DagsHub Blog
, The Cognitive Revolution
,
AI is making waves in unexpected areas. A recent study has found that AI-generated memes are, on average, funnier and more shareable than those created solely by humans. Researchers from KTH Royal Institute of Technology, LMU Munich, and TU Darmstadt, discovered that memes crafted entirely by OpenAI's GPT-4 scored higher in humor, creativity, and shareability. However, human-created memes still hold the crown for the absolute funniest individual examples, showcasing the unique personal touch humans bring to humor.
The Cognitive Revolution podcast recently featured Andreessen Horowitz partners Olivia Moore and Anish Acharya discussing the rapid advancements in voice AI. The discussion explored how the latest improvements are enabling more natural voice interactions across various platforms. Businesses are already utilizing voice AI for tasks ranging from complex negotiations to after-hours customer support. Recommended read:
References :
Chris McKay@Maginative
//
OpenAI has recently unveiled new audio models based on GPT-4o, significantly enhancing its text-to-speech and speech-to-text capabilities. These new tools are intended to give AI agents a voice, enabling a range of applications, with demonstrations including the ability for an AI to read emails in character. The announcement includes the introduction of new transcription models, specifically gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to outperform the existing Whisper model.
The text-to-speech and speech-to-text tools are based on GPT-4o. While these models show promise, some experts have noted potential vulnerabilities. Like other large language model (LLM)-driven multi-modal models, they appear susceptible to prompt-injection-adjacent issues, stemming from the mixing of instructions and data within the same token stream. OpenAI hinted it may take a similar path with video. Recommended read:
References :
Andrew Liszewski@The Verge
//
Amazon has announced Alexa+, a new, LLM-powered version of its popular voice assistant. This upgraded version will cost $19.99 per month, but will be included at no extra cost for Amazon Prime subscribers. Alexa+ boasts enhanced AI agent capabilities, enabling users to perform tasks like booking Ubers, creating study plans, and sending texts via voice command. These new features are intended to provide a more seamless and natural conversational experience. Early access to Alexa+ will begin in late March 2025 for customers with eligible Echo Show devices in the United States.
Amazon emphasizes that Alexa+ utilizes a "model agnostic" system, drawing on Amazon Bedrock and employing various AI models, including Amazon Nova and those from Anthropic, to optimize performance. This approach allows Alexa+ to choose the best model for each task, leveraging specialized "experts" for orchestrating services. With seamless integration into tens of thousands of devices and services, including news sources like Time, Reuters, and the Associated Press, Alexa+ provides accurate and real-time information. Recommended read:
References :
Andrew Liszewski@The Verge
//
Amazon has unveiled Alexa+, a generative AI-powered upgrade to its digital assistant, Alexa. This reboot includes a monthly subscription fee, marking a significant shift for the service. The new AI assistant was revealed at a news conference in New York, with Amazon showcasing its enhanced capabilities.
Alexa+ is scheduled to roll out in March 2025 for $20 per month, but it will be available for free to Amazon Prime subscribers. The AI assistant will work on "almost every" Alexa device the company has shipped. The service promises advanced features such as booking concert tickets, making dinner reservations, and organizing information from handwritten documents. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |