Meta Launches Llama API with Accelerated Inference Speeds

Michael Nuñez@AI News | VentureBeat //

Meta Launches Llama API with Accelerated Inference Speeds

Meta has officially launched its Llama API, marking a significant entry into the commercial AI services market. This move allows developers to easily explore and fine-tune artificial intelligence models, accessing inference speeds up to 18 times faster than traditional GPU-based solutions. Meta's partnership with Cerebras is central to this achievement, enabling processing speeds of 2,648 tokens per second for Llama 4. This partnership transforms Meta’s popular open-source Llama models into a commercial service and positions them to compete directly with OpenAI, Anthropic, and Google.

The Llama API leverages Cerebras’ specialized AI chips to deliver unprecedented speed increases. This breakthrough is a result of Meta's collaboration with Cerebras, whose system outperforms competitors like SambaNova and Groq, as well as traditional GPU-based services. The API provides access to a lightweight software development kit (SDK) compatible with OpenAI, streamlining the process for developers to convert models. As part of the launch, Meta is also providing partners with tools to detect and prevent threats such as phishing attacks and various types of online fraud created using AI technologies.

Furthermore, Meta is working with Groq to accelerate the official Llama API, serving it on the world’s most efficient inference chip. Meta is also launching the Meta AI app, its first dedicated app for its AI assistant, powered by Llama 4. The app will include a discover feed so people can share and explore how others are using the AI. It is intended to provide a more personalized experience by adapting to user preferences and maintaining context across conversations. The Meta AI app represents a new way for users to interact with Meta's AI assistant beyond existing integrations with WhatsApp, Instagram, Facebook, and Messenger.

Original img attribution: https://venturebeat.com/wp-content/uploads/2025/04/nuneybits_Vector_art_of_a_computer_chip_in_the_colors_of_Goolge_71db7087-b8f7-4ae3-ae07-4b36bdee2086.webp?w=1024?w=1200&strip=all

ImgSrc: venturebeat.com

References :

Groq: Blog post about the official Llama API and its acceleration by Groq
venturebeat.com: Article about Meta's new Llama API and its partnership with Cerebras
techstrong.ai: Meta is making available in limited preview an application programming interface (API) and associated tools for open source Llama models that promises to make it simpler for developers to explore and fine tune artificial intelligence (AI) models.
AI News | VentureBeat: Meta's Llama API offers developers access to inference speeds up to 18 times faster than traditional GPU-based solutions.
THE DECODER: Meta launches AI assistant app and Llama API platform
techstrong.ai: Meta today at the LlamaCon 2025Â announced it is making available in limited preview an application programming interface (API) and associated tools for open source Llama models that promises to make it simpler for developers to explore and fine tune artificial intelligence (AI) models. In addition, Meta is making available a set of AI protection tools, [...]
the-decoder.com: Meta unveiled several new AI initiatives at its first LlamaCon developer conference, headlined by a standalone AI assistant app and a comprehensive API for its Llama language models.
AWS News Blog: Llama 4 models from Meta now available in Amazon Bedrock serverless

Classification:

HashTags: #Meta #LlamaAPI #AI
Company: Meta
Target: AI Services Market
Product: Llama
Feature: Llama API
Type: AI
Severity: Major

News from the AI & ML world

DeeperML

Meta Launches Llama API with Accelerated Inference Speeds

Classification: