OR-Bench Hard-1K (subset of OR-Bench: Over-Refusal Benchmark)
OR-Bench Hard-1K measures over-refusal in LLMs: the tendency to reject safe prompts that superficially resemble harmful ones. The 1K subset contains prompts rejected by at least 3 frontier models, making it the most challenging split of the 80K dataset. Prompts were generated by rewriting toxic seeds into benign borderline variants using Mixtral, then filtered via an ensemble LLM moderator. Scoring uses an LLM judge classifying responses as direct_answer, direct_refusal, or indirect_refusal; the metric is rejection rate (lower is better).
Overview
⚠️ External evaluation. Code lives in an upstream repository. inspect_evals lists it for discoverability; review the upstream repo and pinned commit before running.
Source: haeliotang/inspect-evals-orbench@8757ec4
OR-Bench Hard-1K measures over-refusal in LLMs: the tendency to reject safe prompts that superficially resemble harmful ones. The 1K subset contains prompts rejected by at least 3 frontier models, making it the most challenging split of the 80K dataset. Prompts were generated by rewriting toxic seeds into benign borderline variants using Mixtral, then filtered via an ensemble LLM moderator. Scoring uses an LLM judge classifying responses as direct_answer, direct_refusal, or indirect_refusal; the metric is rejection rate (lower is better).
Usage
Installation
This is an externally-maintained evaluation. Clone the upstream repository at the pinned commit and install its dependencies:
git clone https://github.com/haeliotang/inspect-evals-orbench
cd inspect-evals-orbench
git checkout 8757ec41608f3930e3c2bc4d5619d09a2381727a
uv syncRunning evaluations
CLI
uv run inspect eval src/or_bench/or_bench.py@or_bench_hard_1k --model openai/gpt-5-nanoPython
from inspect_ai import eval
from or_bench.or_bench import or_bench_hard_1k
eval(or_bench_hard_1k(), model="openai/gpt-5-nano")View logs
uv run inspect viewMore information
For the dataset, scorer, task parameters, and validation, see the upstream repo: haeliotang/inspect-evals-orbench.
Options
You can control a variety of options from the command line. For example:
uv run inspect eval src/or_bench/or_bench.py@or_bench_hard_1k --limit 10
uv run inspect eval src/or_bench/or_bench.py@or_bench_hard_1k --max-connections 10
uv run inspect eval src/or_bench/or_bench.py@or_bench_hard_1k --temperature 0.5See uv run inspect eval --help for all available options.
More command-line options: Inspect docs ↗