News from the AI & ML world

DeeperML - #voiceai

Pierluigi Paganini@securityaffairs.com //
OpenAI is facing scrutiny over its ChatGPT user logs due to a recent court order mandating the indefinite retention of all chat data, including deleted conversations. This directive stems from a lawsuit filed by The New York Times and other news organizations, who allege that ChatGPT has been used to generate copyrighted news articles. The plaintiffs believe that even deleted chats could contain evidence of infringing outputs. OpenAI, while complying with the order, is appealing the decision, citing concerns about user privacy and potential conflicts with data privacy regulations like the EU's GDPR. The company emphasizes that this retention policy does not affect ChatGPT Enterprise or ChatGPT Edu customers, nor users with a Zero Data Retention agreement.

Sam Altman, CEO of OpenAI, has advocated for what he terms "AI privilege," suggesting that interactions with AI should be afforded the same privacy protections as communications with professionals like lawyers or doctors. This stance comes as OpenAI faces criticism for not disclosing to users that deleted and temporary chat logs were being preserved since mid-May in response to the court order. Altman argues that retaining user chats compromises their privacy, which OpenAI considers a core principle. He fears that this legal precedent could lead to a future where all AI conversations are recorded and accessible, potentially chilling free expression and innovation.

In addition to privacy concerns, OpenAI has identified and addressed malicious campaigns leveraging ChatGPT for nefarious purposes. These activities include the creation of fake IT worker resumes, the dissemination of misinformation, and assistance in cyber operations. OpenAI has banned accounts linked to ten such campaigns, including those potentially associated with North Korean IT worker schemes, Beijing-backed cyber operatives, and Russian malware distributors. These malicious actors utilized ChatGPT to craft application materials, auto-generate resumes, and even develop multi-stage malware. OpenAI is actively working to combat these abuses and safeguard its platform from being exploited for malicious activities.

Recommended read:
References :
  • chatgptiseatingtheworld.com: After filing an objection with Judge Stein, OpenAI took to the court of public opinion to seek the reversal of Magistrate Judge Wang’s broad order requiring OpenAI to preserve all ChatGPT logs of people’s chats.
  • Reclaim The Net: Private prompts once thought ephemeral could now live forever, thanks for demands from the New York Times.
  • Digital Information World: If you’ve ever used ChatGPT’s temporary chat feature thinking your conversation would vanish after closing the window — well, it turns out that wasn’t exactly the case.
  • iHLS: AI Tools Exploited in Covert Influence and Cyber Ops, OpenAI Warns
  • Schneier on Security: Report on the Malicious Uses of AI
  • The Register - Security: OpenAI boots accounts linked to 10 malicious campaigns
  • Jon Greig: Russians are using ChatGPT to incrementally improve malware. Chinese groups are using it to mass create fake social media comments. North Koreans are using it to refine fake resumes is likely only catching a fraction of nation-state use
  • Jon Greig: Russians are using ChatGPT to incrementally improve malware. Chinese groups are using it to mass create fake social media comments. North Koreans are using it to refine fake resumes is likely only catching a fraction of nation-state use
  • www.zdnet.com: How global threat actors are weaponizing AI now, according to OpenAI
  • thehackernews.com: OpenAI has revealed that it banned a set of ChatGPT accounts that were likely operated by Russian-speaking threat actors and two Chinese nation-state hacking groups to assist with malware development, social media automation, and research about U.S. satellite communications technologies, among other things.
  • securityaffairs.com: OpenAI bans ChatGPT accounts linked to Russian, Chinese cyber ops
  • therecord.media: Russians are using ChatGPT to incrementally improve malware. Chinese groups are using it to mass create fake social media comments. North Koreans are using it to refine fake resumes is likely only catching a fraction of nation-state use
  • Tech Monitor: OpenAI highlights exploitative use of ChatGPT by Chinese entities
  • siliconangle.com: OpenAI to retain deleted ChatGPT conversations following court order
  • eWEEK: ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order in NYT Case
  • gbhackers.com: OpenAI Shuts Down ChatGPT Accounts Linked to Russian, Iranian & Chinese Cyber
  • Policy ? Ars Technica: OpenAI is retaining all ChatGPT logs “indefinitely.†Here’s who’s affected.
  • AI News | VentureBeat: Sam Altman calls for ‘AI privilege’ as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions
  • www.techradar.com: Sam Altman says AI chats should be as private as ‘talking to a lawyer or a doctor’, but OpenAI could soon be forced to keep your ChatGPT conversations forever
  • aithority.com: New Relic Report Shows OpenAI’s ChatGPT Dominates Among AI Developers
  • the-decoder.com: ChatGPT scams range from silly money-making ploys to calculated political meddling
  • hackread.com: OpenAI, a leading artificial intelligence company, has revealed it is actively fighting widespread misuse of its AI tools…
  • Metacurity: OpenAI banned ChatGPT accounts tied to Russian and Chinese hackers using the tool for malware, social media abuse, and U.S.

Carl Franzen@AI News | VentureBeat //
ElevenLabs has launched Conversational AI 2.0, a significant upgrade to its platform designed for building advanced voice agents for enterprise use. The new system allows agents to handle both speech and text simultaneously, enabling more fluid and natural interactions. This update introduces features aimed at creating more intelligent and secure conversations, making it suitable for applications like customer support, call centers, and outbound sales and marketing. According to Jozef Marko from ElevenLabs, Conversational AI 2.0 sets a new standard for voice-driven experiences.

One key highlight of Conversational AI 2.0 is its advanced turn-taking model. This technology analyzes conversational cues in real-time, such as hesitations and filler words like "um" and "ah", to determine when the agent should speak or listen. This eliminates awkward pauses and interruptions, creating a more natural flow. The platform also features integrated language detection, enabling seamless multilingual discussions without manual configuration. This allows the agent to recognize the language spoken by the user and respond accordingly, catering to global enterprises and fostering more inclusive experiences.

In related news, Anthropic is rolling out voice mode for its Claude apps, utilizing ElevenLabs for speech generation. While currently only available in English, this feature allows users to engage in spoken conversations with Claude, enhancing accessibility and convenience. The voice conversations count toward regular usage limits based on subscription plans, with varying limits for free and paid users. This integration marks a significant step in making AI more conversational and user-friendly, leveraging ElevenLabs' technology to power its speech capabilities.

Recommended read:
References :
  • the-decoder.com: Elevenlabs has released Conversational AI 2.0, an updated system that allows its agents to handle speech and text simultaneously for more fluid interactions.
  • AI News | VentureBeat: With Conversational AI 2.0, ElevenLabs aims to provide tools and infrastructure for truly intelligent, context-aware enterprise voice agents.
  • THE DECODER: Article about ElevenLabs' new AI voice system which enables smoother interactions.
  • www.producthunt.com: Product Hunt post on Conversational AI 2.0.

Megan Crouse@techrepublic.com //
OpenAI has unveiled a suite of advancements, including enhanced audio models and a significantly more expensive AI reasoning model called o1 Pro. The new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved transcription capabilities compared to Whisper, although they are susceptible to prompt injection attacks due to their foundation on language models. Users can access these models via the Realtime API, enabling real-time transcription from microphone input using a standalone Python script.

OpenAI's o1 Pro comes with a steep price tag of $150 per million input tokens and $600 per million output tokens. This makes it ten times more expensive than the standard o1 model and twice as costly as GPT-4.5. While OpenAI claims o1 Pro "thinks harder" and delivers superior responses for complex reasoning tasks, early benchmarks suggest only incremental improvements. Access to o1 Pro is currently limited to developers who have spent at least $5 on OpenAI's API services, targeting users building AI agents and automation tools.

Recommended read:
References :
  • Fello AI: OpenAI Just Dropped Its Most Expensive AI Model Yet, And It Costs a Fortune
  • www.techrepublic.com: OpenAI Gives Its Agents a Voice – Now a ‘Medieval Knight’ Can Read Your Work Emails
  • AI News | VentureBeat: Describes OpenAI’s new voice AI model gpt-4o-transcribe and its ability to add speech to existing text apps.
  • MarkTechPost: Explains the release of advanced audio models gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe by OpenAI.
  • THE DECODER: OpenAI releases new AI voice models with customizable speaking styles
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • www.producthunt.com: OpenAI GPT-4o Audio Models
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More

Keshav Kumaresan@DagsHub Blog //
AI is making waves in unexpected areas. A recent study has found that AI-generated memes are, on average, funnier and more shareable than those created solely by humans. Researchers from KTH Royal Institute of Technology, LMU Munich, and TU Darmstadt, discovered that memes crafted entirely by OpenAI's GPT-4 scored higher in humor, creativity, and shareability. However, human-created memes still hold the crown for the absolute funniest individual examples, showcasing the unique personal touch humans bring to humor.

The Cognitive Revolution podcast recently featured Andreessen Horowitz partners Olivia Moore and Anish Acharya discussing the rapid advancements in voice AI. The discussion explored how the latest improvements are enabling more natural voice interactions across various platforms. Businesses are already utilizing voice AI for tasks ranging from complex negotiations to after-hours customer support.

Recommended read:
References :
  • DagsHub Blog: Let's look deeper into the nuances of video segmentation and discuss different segmentation methods, challenges, and the potential future of the field.
  • The Cognitive Revolution: In this episode of The Cognitive Revolution, host Nathan Labenz speaks with Andreessen Horowitz partners Olivia Moore and Anish Acharya about the rapid evolution of voice AI technology and its real-world applications.
  • eWEEK: A new study reveals AI-generated memes are funnier than human-made ones on average, but the best memes still come from us. Is AI the future of internet humor?

Chris McKay@Maginative //
OpenAI has recently unveiled new audio models based on GPT-4o, significantly enhancing its text-to-speech and speech-to-text capabilities. These new tools are intended to give AI agents a voice, enabling a range of applications, with demonstrations including the ability for an AI to read emails in character. The announcement includes the introduction of new transcription models, specifically gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to outperform the existing Whisper model.

The text-to-speech and speech-to-text tools are based on GPT-4o. While these models show promise, some experts have noted potential vulnerabilities. Like other large language model (LLM)-driven multi-modal models, they appear susceptible to prompt-injection-adjacent issues, stemming from the mixing of instructions and data within the same token stream. OpenAI hinted it may take a similar path with video.

Recommended read:
References :
  • AI News | VentureBeat: OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
  • Analytics Vidhya: OpenAI’s Audio Models: How to Access, Features, Applications, and More
  • Maginative: OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever
  • bsky.app: I published some notes on OpenAI's new text-to-speech and speech-to-text models.
  • Samrat Man Singh: OpenAI announced some new audio models yesterday, including new transcription models( gpt-4o-transcribe and gpt-4o-mini-transcribe ).
  • www.techrepublic.com: The text-to-speech and speech-to-text tools are all based on GPT-4o. OpenAI hinted it may take a similar path with video.
  • MarkTechPost: Reports on OpenAI introducing advanced audio models.
  • Simon Willison's Weblog: OpenAI announced today, for both text-to-speech and speech-to-text. They're very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
  • THE DECODER: OpenAI has released a new generation of audio models that let developers customize how their AI assistants speak.
  • venturebeat.com: DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI
  • Last Week in AI: #204 - OpenAI Audio, Rubin GPUs, MCP, Zochi