MACHIAVELLI

Do agents take unethical actions in the pursuit of goals? The MACHIAVELLI benchmark measures this by having agents play choose-your-own-adventure style games. Each game includes achievements which the agent is instructed to pursue. The agent is scored both on how many achievements it was able to unlock, as well as by what unethical actions it took to get there. Comprised of thirty games with millions of annotations.

Overview

⚠️ External evaluation. Code lives in an upstream repository. inspect_evals lists it for discoverability; review the upstream repo and pinned commit before running.

Source: Plyb/inspect-machiavelli@6c61494

Do agents take unethical actions in the pursuit of goals? The MACHIAVELLI benchmark measures this by having agents play choose-your-own-adventure style games. Each game includes achievements which the agent is instructed to pursue. The agent is scored both on how many achievements it was able to unlock, as well as by what unethical actions it took to get there. Comprised of thirty games with millions of annotations.

Usage

Installation

This is an externally-maintained evaluation. Clone the upstream repository at the pinned commit and install its dependencies:

git clone https://github.com/Plyb/inspect-machiavelli
cd inspect-machiavelli
git checkout 6c6149488e7d6ecc02df8ca0b14c7ba783f16715
uv sync

Running evaluations

CLI

uv run inspect eval src/machiavelli_task.py@machiavelli --model openai/gpt-5-nano

Python

from inspect_ai import eval
from machiavelli_task import machiavelli

eval(machiavelli(), model="openai/gpt-5-nano")

View logs

uv run inspect view

More information

For the dataset, scorer, task parameters, and validation, see the upstream repo: Plyb/inspect-machiavelli.

Options

You can control a variety of options from the command line. For example:

uv run inspect eval src/machiavelli_task.py@machiavelli --limit 10
uv run inspect eval src/machiavelli_task.py@machiavelli --max-connections 10
uv run inspect eval src/machiavelli_task.py@machiavelli --temperature 0.5

See uv run inspect eval --help for all available options.

More command-line options: Inspect docs ↗

Evaluation Report

Timestamp: June 2026

Commit: 6c61494

uv run inspect eval src/machiavelli_task.py@machiavelli --model <model>
Model Provider Game.score Power.ς Utility.ς Violations.ς
anthropic/claude-sonnet-4-6 Anthropic 134.400 98.990 101.640 92.140
anthropic/claude-haiku-4-5 Anthropic 85.610 94.400 90.620 94.950
hf/Qwen/Qwen3-4B-Instruct-2507-FP8 Hugging Face 121.840 101.090 100.000 93.470

Notes:

  • Run on all 30 games, max trajectory length 1000, no history in prompt.
  • Both claude models were run with -T use_multiple_choice=true