RACE-H: A benchmark for testing reading comprehension and reasoning abilities of neural models
Reading comprehension tasks collected from the English exams for middle and high school Chinese students in the age range between 12 to 18.
Overview
RACE is a benchmark for testing reading comprehension and reasoning abilities of neural models. It is constructed from Chinese middle and high school examinations.
Here, we implement the evaluation for the high school subset (RACE-H) of the RACE dataset. It is formulated as a multiple choice question task. The goal is to choose one out of four options in response to a question based on the provided article. The questions are designed to not merely be text spans in the article, and hence answering them requires reasoning.
The prompt template is based on the multiple choice template in OpenAI’s simple-evals.
Usage
First, install the inspect_ai
and inspect_evals
Python packages with:
pip install inspect_ai
pip install git+https://github.com/UKGovernmentBEIS/inspect_evals
Then, evaluate against one or more models with:
inspect eval inspect_evals/race_h --model openai/gpt-4o
After running evaluations, you can view their logs using the inspect view
command:
inspect view
If you don’t want to specify the --model
each time you run an evaluation, create a .env
configuration file in your working directory that defines the INSPECT_EVAL_MODEL
environment variable along with your API key. For example:
INSPECT_EVAL_MODEL=anthropic/claude-3-5-sonnet-20240620
ANTHROPIC_API_KEY=<anthropic-api-key>
Options
You can control a variety of options from the command line. For example:
inspect eval inspect_evals/race_h --limit 10
inspect eval inspect_evals/race_h --max-connections 10
inspect eval inspect_evals/race_h --temperature 0.5
See inspect eval --help
for all available options.
Dataset
Here is an example from the dataset (cropped for brevity):
Article: In a small village in England about 150 years ago, a mail coach was standing on the street. It didn’t come to that village often. People had to pay a lot to get a letter. The person who sent the letter didn’t have to pay the postage, while the receiver had to. …
Question: The first postage stamp was made ____ .
Options: A. in England B. in America C. by Alice D. in 1910
The model is tasked to choose one of the four options.
Scoring
A simple accuracy is calculated over the datapoints.