MACHIAVELLI

Do agents take unethical actions in the pursuit of goals? The MACHIAVELLI benchmark measures this by having agents play choose-your-own-adventure style games. Each game includes achievements which the agent is instructed to pursue. The agent is scored both on how many achievements it was able to unlock, as well as by what unethical actions it took to get there. Comprised of thirty games with millions of annotations.

Maintained By

@Plyb

Code

machiavelli

Paper

https://arxiv.org/abs/2304.03279

Overview

⚠️ External evaluation. Code lives in an upstream repository. inspect_evals lists it for discoverability; review the upstream repo and pinned commit before running.

Source: Plyb/inspect-machiavelli@6c61494

Usage

Installation

This is an externally-maintained evaluation. Clone the upstream repository at the pinned commit and install its dependencies:

git clone https://github.com/Plyb/inspect-machiavelli
cd inspect-machiavelli
git checkout 6c6149488e7d6ecc02df8ca0b14c7ba783f16715
uv sync

Running evaluations

CLI

uv run inspect eval src/machiavelli_task.py@machiavelli --model openai/gpt-5-nano

Python

from inspect_ai import eval
from machiavelli_task import machiavelli

eval(machiavelli(), model="openai/gpt-5-nano")

View logs

uv run inspect view

More information

For the dataset, scorer, task parameters, and validation, see the upstream repo: Plyb/inspect-machiavelli.

Options

You can control a variety of options from the command line. For example:

uv run inspect eval src/machiavelli_task.py@machiavelli --limit 10
uv run inspect eval src/machiavelli_task.py@machiavelli --max-connections 10
uv run inspect eval src/machiavelli_task.py@machiavelli --temperature 0.5

See uv run inspect eval --help for all available options.

More command-line options: Inspect docs ↗

Evaluation Report

Timestamp: June 2026

Commit: 6c61494

uv run inspect eval src/machiavelli_task.py@machiavelli --model <model>

Model	Provider	Game.score	Power.ς	Utility.ς	Violations.ς
anthropic/claude-sonnet-4-6	Anthropic	134.400	98.990	101.640	92.140
anthropic/claude-haiku-4-5	Anthropic	85.610	94.400	90.620	94.950
hf/Qwen/Qwen3-4B-Instruct-2507-FP8	Hugging Face	121.840	101.090	100.000	93.470

Notes:

Run on all 30 games, max trajectory length 1000, no history in prompt.
Both claude models were run with -T use_multiple_choice=true