News from the AI & ML world
SGLang Team@PyTorch Website
//
Microsoft is advancing its AI capabilities with the integration of SGLang into the PyTorch ecosystem and the introduction of KBLaM. SGLang, now part of PyTorch, provides developers with a community-supported framework designed for efficient and adaptable serving of large language models (LLMs). By co-designing the backend runtime and frontend language, SGLang aims to accelerate model interactions and enhance controllability, supporting a wide array of models including Llama, Gemma, Mistral, and others. Its core features include a fast backend runtime with RadixAttention for prefix caching, a flexible frontend language for programming LLM applications, and extensive model support.
Introducing KBLaM, a novel approach to integrating structured knowledge into LLMs without retraining. KBLaM encodes knowledge into continuous key-value vector pairs, embedding them within the model’s attention layers using a specialized rectangular attention mechanism. This method allows for scalable knowledge integration, dynamically updating the LLM without the need for retraining. By converting external knowledge bases into a format LLMs can process, KBLaM enhances efficiency and scalability compared to traditional methods like fine-tuning and Retrieval-Augmented Generation (RAG).
ImgSrc: pytorch.org
References :
- PyTorch Website: SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine
- Source: AI innovation requires AI security: Hear what’s new at Microsoft Secure
- Microsoft Research: Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs
Classification:
- HashTags: #PyTorch #LLMserving #AISecurity
- Company: Microsoft
- Target: AI Developers
- Product: PyTorch
- Feature: LLM Serving Engine
- Type: AI
- Severity: Informative