Tests whether an AI system can act as a safety guardrail by generating and executing code to protect web navigation agents from unsafe actions based on user constraints.
Testing AI Safety Guardrails for Web Navigation Agents
This evaluation tests whether an AI system can act as a safety guardrail by generating and executing code to protect web navigation agents from unsafe actions based on user constraints.
The Mind2Web-SC evaluation tests an AI system’s ability to act as a guardrail for the SeeAct web navigation agent. The evaluation:
Task Decomposition: The model identifies which safety rule applies to a given user request and web action
Code Generation: The model generates Python guardrail code that checks if user information satisfies safety requirements
Execution & Decision: The generated code is executed to determine if the action should be GRANTED or DENIED
Originally developed as part of GuardAgent research, this implementation has been ported to Inspect to support model-agnostic evaluation across different providers.
Usage
Basic Usage
# Run with Google Gemini (default)uv run inspect eval mind2web_sc --model google/gemini-1.5-flash# Run with OpenAI GPT-4uv run inspect eval mind2web_sc --model openai/gpt-4# Run with Anthropic Claudeuv run inspect eval mind2web_sc --model anthropic/claude-3-5-sonnet-20241022
Configuration Options
# Adjust few-shot learning examplesuv run inspect eval inspect_evals/mind2web_sc --model openai/gpt-4 -T num_shots=2# Use custom dataset pathuv run inspect eval inspect_evals/mind2web_sc --model openai/gpt-4 \-T dataset_path=/path/to/dataset
Parameters
dataset_path (str): Path to the dataset directory containing sample_labeled_all.json (default: built-in dataset)
num_shots (int): Number of examples for few-shot learning (default value: 3)