@www.marktechpost.com
//
The Allen Institute for AI (AI2) has launched OLMoTrace, a groundbreaking open-source tool designed to bring transparency to the often-opaque world of large language models (LLMs). OLMoTrace enables real-time tracing of LLM outputs directly back to the original training data, addressing a significant barrier to enterprise AI adoption: the difficulty in understanding how these systems arrive at their decisions. The tool is integrated into the Ai2 Playground, allowing users to experiment with the recently released OLMo 2 32B model and explore the connections between its outputs and the vast datasets it was trained on.
OLMoTrace distinguishes itself from existing methods like retrieval-augmented generation (RAG) and confidence scores by providing a direct link to the source material used in training. Unlike RAG, which enhances model generation with external sources, OLMoTrace focuses on tracing outputs back to the model's internal knowledge, offering a glimpse into how the model learned specific information. The technology identifies long, unique text sequences in model outputs and matches them with specific documents from the training corpus, highlighting the relevant text and linking to the original source material. The tool searches for verbatim matches of word sequences within the training data, considering token rarity to highlight particularly specific passages. For each word sequence, it presents up to ten relevant documents, merging overlapping sequences for a clean display. This approach has already revealed insights, such as tracing incorrect information about a model's knowledge cutoff to examples in fine-tuning data. Ai2 aims to decode language model behavior with OLMoTrace, fostering trust and enabling a deeper understanding of AI decision-making. References :
Classification:
@www.marktechpost.com
//
The Allen Institute for AI (Ai2) has launched OLMoTrace, an open-source tool designed to bring a new level of transparency to Large Language Models (LLMs). This application allows users to trace the outputs of AI models back to their original training data. This data traceability is vital for those interested in governance, regulation, and auditing. It directly addresses concerns about the lack of transparency in AI decision-making.
The tool is available for use with Ai2’s flagship model, OLMo 2 32B, as well as the entire OLMo family and custom fine-tuned models. OLMoTrace works by identifying long, unique text sequences in model outputs and matching them with documents from the training corpus. The system highlights relevant text and provides links to the original source material, allowing users to understand how the model learned the information it uses. The technology identifies long, unique text sequences in model outputs and matches them with specific documents from the training corpus. According to Jiacheng Liu, lead researcher for OLMoTrace, this tool marks a pivotal step forward for AI development, laying the foundation for more transparent AI systems. By offering greater insight into how AI models generate their responses, users can ensure that the data supporting their outputs is trustworthy and verifiable. The system supports OLMo models including OLMo-2-32B-Instruct and leverages their full training data—over 4.6 trillion tokens across 3.2 billion documents. References :
Classification:
|
BenchmarksBlogsResearch Tools |