News from the AI & ML world
@www.marktechpost.com
//
OpenAI has launched the Evals API, a new tool designed to streamline the evaluation of large language models (LLMs) for developers and teams. The Evals API introduces programmatic evaluation capabilities, allowing developers to define tests, automate evaluation runs, and iterate on prompts directly from their workflows. Previously, evaluations were accessible only through the OpenAI dashboard, but the new API enables a more integrated and systematic approach to assessing model performance.
The Evals API aims to address the challenges of manually evaluating LLM performance, which can be time-consuming, especially when scaling applications across diverse domains. By providing a systematic approach, OpenAI hopes to improve custom test case assessments, measure improvements across prompt iterations, and automate quality assurance in development pipelines. This will enable developers to treat evaluation as a core part of their development cycle, similar to unit tests in traditional software engineering.
Despite these advancements, OpenAI has announced a delay in the launch of GPT-5 by a few months. According to CEO Sam Altman, the delay is due to the company's efforts to significantly improve the model and ensure enough capacity to support expected high demand. In the meantime, OpenAI plans to release o3 and o4-mini models. The company has also faced capacity issues with current features, as seen with the restrictions placed on the image generation software after its launch.
References :
- www.marktechpost.com: In a significant move to empower developers and teams working with large language models (LLMs), OpenAI has introduced the Evals API, a new toolset that brings programmatic evaluation capabilities to the forefront. While evaluations were previously accessible via the OpenAI dashboard, the new API allows developers to define tests, automate evaluation runs, and iterate on […] The post appeared first on .
- www.tomsguide.com: While you'll have to wait a bit longer for the new model, OpenAI does have a couple of exciting announcements
- the-decoder.com: OpenAI has introduced an Evals API that enables programmatic test creation and automation.
- THE DECODER: OpenAI releases Evals API for systematic prompt testing
Classification: