AIME 2026: Problems from the American Invitational Mathematics Examination

Mathematics

A benchmark for evaluating AI’s ability to solve challenging mathematics problems from the 2026 AIME - a prestigious high school mathematics competition.

Overview

Evaluation of AI performance on the American Invitational Mathematics Examination (AIME) 2026. AIME is a prestigious 15-problem, 3-hour competition for high school students who qualify through the AMC 10/12. Problems require ingenuity and mathematical insight across algebra, combinatorics, geometry, and number theory, with integer answers in the range 0–999.

Dataset

The dataset is sourced from math-ai/aime26 on HuggingFace and contains 30 problems (15 from AIME I and 15 from AIME II). Each problem has a corresponding integer answer in the range 0–999.

Scoring

Each problem is scored as correct/incorrect using numeric matching. The scorer strips LaTeX \boxed{} formatting before comparison. Results are reported as accuracy with standard error.

Evaluation Report

The following results were obtained using uv run inspect eval inspect_evals/aime2026 --model <model> with eval version 1-A. As of 2026-03-18, there are no publicly available results to compare against.

Model Accuracy Std Error
openai/gpt-5-nano 0.7667 0.079
openai/gpt-5.4-nano 0.4000 0.091

Changelog