News from the AI & ML world
@Google DeepMind Blog
//
ARC Prize has launched ARC-AGI-2, its toughest AI benchmark yet, accompanied by the announcement of their 2025 competition with $1 million in prizes. ARC-AGI-2 aims to push the limits of general and adaptive AI. As AI progresses beyond narrow tasks to general intelligence, these challenges aim to uncover capability gaps and actively guide innovation. ARC-AGI-2 is designed to be relatively easy for humans, who can solve every task in under two attempts, yet hard or impossible for AI, focusing on areas like symbolic interpretation, compositional reasoning, and contextual rule application.
The benchmark includes datasets with varying visibility and includes the following characteristics: symbolic interpretation, compositional reasoning and contextual rule application. Most existing benchmarks focus on superhuman capabilities, testing advanced, specialised skills. The competition challenges AI developers to attain an 85% accuracy rating on ARC-AGI-2’s private evaluation dataset.
ImgSrc: lh3.googleuserc
References :
- Google DeepMind Blog: FACTS Grounding: A new benchmark for evaluating the factuality of large language models
- AI News: ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2
- eWEEK: New AI Benchmark ARC-AGI-2 ‘Significantly Raises the Bar for AI’
Classification:
- HashTags: #AIBenchmark #ARCAGI2 #LLMs
- Target: AI Models
- Product: AI Models
- Feature: AI Benchmarking
- Type: AI
- Severity: Informative