News from the AI & ML world
@github.blog
//
Microsoft is aggressively integrating artificial intelligence into its developer tools and workflows, demonstrated by the announcements made at Microsoft Build 2025. A key focus is streamlining the development process by leveraging AI to accelerate the creation of issues and pull requests within GitHub, a fundamental aspect of software development. This involves utilizing GitHub Copilot to draft issues and assigning them to a coding agent for asynchronous execution, resulting in quicker generation of pull requests. This approach aims to maintain familiarity for developers while significantly improving efficiency and consistency. The importance of well-structured issues and pull requests remains paramount, even in AI-accelerated workflows, as they provide shared context, facilitate asynchronous coordination, support audit and analytics, and enable automation hooks.
BenchmarkQED, an open-source toolkit, has been introduced to automate the benchmarking of Retrieval-Augmented Generation (RAG) systems. This toolkit includes components for query generation, evaluation, and dataset preparation, all designed to support rigorous and reproducible testing. BenchmarkQED complements Microsoft's open-source GraphRAG library, enabling users to conduct GraphRAG-style evaluations across various models, metrics, and datasets. The toolkit addresses the growing need to benchmark RAG performance as new techniques emerge, particularly in answering questions over private datasets.
BenchmarkQED facilitates the comparison of RAG methods, including LazyGraphRAG, against competing approaches like vector-based RAG with a 1M-token context window. Tests show that LazyGraphRAG demonstrates significant win rates, especially on complex, global queries that require reasoning over large portions of the dataset. This advancement aims to enhance the performance of RAG systems, particularly in scenarios where conventional vector-based RAG struggles with questions requiring an understanding of dataset qualities not explicitly stated in the text. The toolkit represents a major step forward in automating and scaling RAG benchmarking.
ImgSrc: github.blog
References :
- github.blog: Learn how to spin up a GitHub Issue, hand it to Copilot, and get a draft pull request in the same workflow you already know.
- www.microsoft.com: BenchmarkQED is an open-source toolkit for benchmarking RAG systems using automated query generation, evaluation, and dataset prep.
Classification: