Versioning and Changelog Policy
This document describes the versioning scheme and changelog management process for Inspect Evals.
Overview
Inspect Evals uses Semantic Versioning (SemVer) for releases and maintains a changelog following the Keep a Changelog convention.
We use scriv to manage the changelog.
For Contributors
When submitting a PR that includes a user-facing change:
uv run scriv createFill out the generated file with details of the change and commit it.
Fragment Format
Each bullet in a changelog fragment must follow this format:
- Name: Description of the change.
- Name (vN-X): Description of the change.
Eval name: Always use the eval’s print name (the human-readable title from eval.yaml, e.g. “MLE-Bench”, “NIAH”, “GDM Intercode CTF”), not the Python module name. The changelog is a user-facing document, so gdm_intercode_ctf should be written as “GDM Intercode CTF”.
Task version: Include the new version in parentheses after the name only when the change bumps the task version. Omit it for changes that don’t bump the version (e.g. doc-only or infra changes).
New Evals vs Existing Evals: Use the same format for both categories — the only difference is the heading. New evals don’t need a version annotation (the initial version is always 1-A).
Examples
### New Evals
- SWE-Lancer: New eval for freelance software engineering tasks.
### Existing Evals
- StrongREJECT (v2-B): Fix judge model resolution for grader role fallback.
- MLE-Bench: Freeze upstream dependency.
### Other
- Fix broken reference to AGENTS.md in CONTRIBUTING.md.For Maintainers
When preparing a release:
uv run scriv collect --version X.Y.ZCommit the updated *md files.
Tag the commit with the new version.
Create a github release from the tag.
Versioning Scheme
Use the MAJOR.MINOR.PATCH versioning format as per overview.
| Version Component | When to Bump | Examples |
|---|---|---|
| Major | Breaking changes | Removing an eval, API changes, scorer output format changes |
| Minor | New features | Adding new evals, new task parameters, new utilities |
| Patch | Bug fixes | Eval fixes, scorer fixes, dataset loading fixes |