Nvidia Open Sources Transcription AI Model Parakeet on Hugging Face

@venturebeat.com //

Nvidia Open Sources Transcription AI Model Parakeet on Hugging Face

Nvidia has launched Parakeet-TDT-0.6B-V2, a fully open-source transcription AI model, on Hugging Face. This represents a new standard for Automatic Speech Recognition (ASR). The model, boasting 600 million parameters, has quickly topped the Hugging Face Open ASR Leaderboard with a word error rate of just 6.05%. This level of accuracy positions it near proprietary transcription models, such as OpenAI’s GPT-4o-transcribe and ElevenLabs Scribe, making it a significant advancement in open-source speech AI. Parakeet operates under a commercially permissive CC-BY-4.0 license.

The speed of Parakeet-TDT-0.6B-V2 is a standout feature. According to Hugging Face’s Vaibhav Srivastav, it can "transcribe 60 minutes of audio in 1 second." Nvidia reports this is achieved with a real-time factor of 3386, meaning it processes audio 3386 times faster than real-time when running on Nvidia's GPU-accelerated hardware. This speed is attributed to its transformer-based architecture, fine-tuned with high-quality transcription data and optimized for inference on NVIDIA hardware using TensorRT and FP8 quantization. The model also supports punctuation, capitalization, and detailed word-level timestamping.

Parakeet-TDT-0.6B-V2 is aimed at developers, researchers, and industry teams building various applications. This includes transcription services, voice assistants, subtitle generators, and conversational AI platforms. Its accessibility and performance make it an attractive option for commercial enterprises and indie developers looking to build speech recognition and transcription services into their applications. With its release on May 1, 2025, Parakeet is set to make a considerable impact on the field of speech AI.

Original img attribution: https://venturebeat.com/wp-content/uploads/2025/05/cfr0z3n_cybernetic_parakeet_-chaos_25_-ar_169_-profile_mw9xg_ab003f9b-d14b-4bdf-ae43-ee11fb934b1a.png?w=1024?w=1200&strip=all

ImgSrc: venturebeat.com

References :

Techmeme: Nvidia launches open-source transcription model Parakeet-TDT-0.6B-V2, topping the Hugging Face Open ASR Leaderboard with a word error rate of 6.05% (Carl Franzen/VentureBeat)
@techmeme.com - Techmeme: Nvidia launches open-source transcription model Parakeet-TDT-0.6B-V2, topping the Hugging Face Open ASR Leaderboard with a word error rate of 6.05% (Carl Franzen/VentureBeat)
venturebeat.com: An attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services...
www.marktechpost.com: NVIDIA Open Sources Parakeet TDT 0.6B: Achieving a New Standard for Automatic Speech Recognition ASR and Transcribes an Hour of Audio in One Second
AI News | VentureBeat: Reports Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face
MarkTechPost: Reports NVIDIA Open Sources Parakeet TDT 0.6B: Achieving a New Standard for Automatic Speech Recognition ASR and Transcribes an Hour of Audio in One Second
www.eweek.com: NVIDIAâ€™s AI Transcription Tool Produces 60 Minutes of Text in 1 Second
eWEEK: NVIDIA has released a new version of its Parakeet transcription tool, boasting the lowest error rate of any of its competitors. In addition, the company made the code public on GitHub. Parakeet TDT 0.6B is a 600-million-parameter automatic speech recognition model. It can transcribe 60 minutes of audio per second, Hugging Face data scientist Vaibhav [â€¦]

Classification:

HashTags: #OpenSourceAI #SpeechRecognition #NvidiaParakeet
Company: Nvidia
Target: Developers
Product: Parakeet
Feature: Speech Recognition
Malware: Parakeet-TDT-0.6B-V2
Type: AI
Severity: Informative

News from the AI & ML world

DeeperML

Nvidia Open Sources Transcription AI Model Parakeet on Hugging Face

Classification: