Extensions
Overview
There are several ways to extend Inspect to integrate with systems not directly supported by the core package. These include:
Model APIs (model hosting services, local inference engines, etc.)
Tool Environments (local or cloud container runtimes)
Storage Systems (for datasets, prompts, and evaluation logs)
For each of these, you can create an extension within a Python package, and then use it without any special registration with Inspect (this is done via setuptools entry points).
Model APIs
You can add a model provider by deriving a new class from ModelAPI
and adding the @modelapi
decorator to it. For example:
@modelapi(name="custom")
class CustomModelAPI(ModelAPI):
def __init__(
self,
str,
model_name: str | None = None,
base_url: str | None = None,
api_key: = GenerateConfig(),
config: GenerateConfig **model_args: dict[str,Any]
-> None:
) super().__init__(model_name, base_url, api_key, config)
async def generate(
self,
input: list[ChatMessage],
list[ToolInfo],
tools:
tool_choice: ToolChoice,
config: GenerateConfig,-> ModelOutput:
) ...
The __init__()
method must call the super().__init__()
method, and typically instantiates the model client library.
The generate()
method handles interacting with the model, converting inspect messages, tools, and config into model native data structures. In addition, there are some optional properties you can override to specify various behaviours and constraints (default max tokens and connections, identifying rate limit errors, whether to collapse consecutive user and/or assistant messages, etc.).
See the ModelAPI source code for further documentation on these properties. See the implementation of the built-in model providers for additional insight on building a custom provider.
Model Registration
If you are publishing a custom model API within a Python package, you should register an inspect_ai
setuptools entry point. This will ensure that inspect loads your extension before it attempts to resolve a model name that uses your provider.
For example, if your package was named inspect_package
and your model provider was exported from a source file named inspect_extensions.py
at the root of your package, you would register it like this in pyproject.toml
:
[project.entry-points.inspect_ai]
inspect_package = "inspect_package.inspect_extensions"
[tool.poetry.plugins.inspect_ai]
inspect_package = "inspect_package.inspect_extensions"
Model Usage
Once you’ve created the class, decorated it with @modelapi
as shown above, and registered it, then you can use it as follows:
inspect eval ctf.py --model custom/my-model
Where my-model
is the name of some model supported by your provider (this will be passed to __init()__
in the model_name
argument).
You can also reference it from within Python calls to get_model()
or eval()
:
# get a model instance
= get_model("custom/my-model")
model
# run an eval with the model
eval(math, model = "custom/my-model")
Tool Environments
Tool Environments provide a mechanism for sandboxing execution of tool code as well as providing more sophisticated infrastructure (e.g. creating network hosts for a cybersecurity eval). Inspect comes with two tool environments built in:
Environment Type | Description |
---|---|
local |
Run tool_environment() methods in the same file system as the running evaluation (should only be used if you are already running your evaluation in another sandbox). |
docker |
Run tool_environment() methods within a Docker container |
To create a custom tool environment, derive a class from ToolEnvironment
, implement the required static and instance methods, and add the @toolenv
decorator to it. For example:
@toolenv(name="podman")
class PodmanToolEnvironment(ToolEnvironment):
@classmethod
async def task_init(
str, config: str | None
cls, task_name: -> None:
)
...
@classmethod
async def sample_init(
cls, str,
task_name: str | None,
config: dict[str, str]
metadata: -> dict[str, ToolEnvironment]:
)
...
@classmethod
async def sample_cleanup(
cls,str,
task_name: str | None,
config: dict[str, ToolEnvironment],
environments: bool,
interrupted: -> None:
)
...
@classmethod
async def task_cleanup(
str, config: str | None, cleanup: bool
cls, task_name: -> None:
)
...
@classmethod
async def cli_cleanup(cls, id: str | None) -> None:
...
async def exec(
self,
list[str],
cmd: input: str | bytes | None = None,
dict[str, str] = {},
env: int | None = None,
timeout: -> ExecResult[str]:
)
...
async def write_file(
self, file: str, contents: str | bytes
-> None:
)
...
@overload
async def read_file(
self, file: str, text: Literal[True] = True
-> str:
)
...
@overload
async def read_file(
self, file: str, text: Literal[False]
-> bytes:
)
...
async def read_file(
self, file: str, text: bool = True
-> Union[str | bytes]:
) ...
The class methods take care of various stages of initialisation, setup, and teardown:
Method | Lifecycle | Purpose |
---|---|---|
task_init() |
Called at the beginning of each Task . |
Expensive initialisation operations (e.g. pulling or building images) |
sample_init() |
Called at the beginning of each Sample . |
Create ToolEnvironment instances for the sample. |
sample_cleanup() |
Called at the end of each Sample |
Cleanup ToolEnvironment instances for the sample. |
task_cleanup() |
Called at the end of each Task . |
Last chance handler for any resources not yet cleaned up (see also discussion below). |
cli_cleanup() |
Called via inspect toolenv cleanup |
CLI invoked manual cleanup of resources created by this ToolEnvironment . |
max_samples() |
Called at startup | Provide a default max_samples (used to cap the default, explicit max_samples will override this). |
In the case of parallel execution of a group of tasks that share a working directory and tool environment, the task_init()
and task_cleanup()
functions may be called once for the entire group as a performance optimisation.
The task_cleanup()
has a number of important functions:
- There may be global resources that are not tied to samples that need to be cleaned up.
- It’s possible that
sample_cleanup()
will be interrupted (e.g. via a Ctrl+C) during execution. In that case its resources are still not cleaned up. - The
sample_cleanup()
function might be long running, and in the case of error or interruption you want to provide explicit user feedback on the cleanup in the console (which isn’t possible when cleanup is run “inline” with samples). Aninterrupted
flag is passed tosample_cleanup()
which allows for varying behaviour for this scenario. - Cleanup may be disabled (e.g. when the user passes
--no-toolenv-cleanup
) in which case it should print container IDs and instructions for cleaning up after the containers are no longer needed.
To implement task_cleanup()
properly, you’ll likely need to track running environments using a per-coroutine ContextVar
. The DockerToolEnvironment
provides an example of this. Note that the cleanup
argument passed to task_cleanup()
indicates whether to actually clean up (it would be False
if --no-toolenv-cleanup
was passed to inspect eval
). In this case you might want to print a list of the resources that were not cleaned up and provide directions on how to clean them up manually.
The cli_cleanup()
function is a global cleanup handler that should be able to do the following:
- Cleanup all environments created by this provider (corresponds to e.g.
inspect toolenv cleanup docker
at the CLI). - Cleanup a single environment created by this provider (corresponds to e.g.
inspect toolenv cleanup docker <id>
at the CLI).
The task_cleanup()
function will typically print out the information required to invoke cli_cleanup()
when it is invoked with cleanup = False
. Try invoking the DockerToolEnvironmet
with --no-toolenv-cleanup
to see an example.
The ToolEnvironment
instance methods provide access to process execution and file input/output within the environment. A few notes on implementing these methods:
The
exec()
method currently only handles text output. If a call results in binary output then aUnicodeDecodeError
will be raised. Tool environments should catch this and raise aToolError
.The
read_file()
method raise aFileNotFoundError
if the specifiedfile
does not exist in the tool environment, as tools callingread_file()
will often want to catch theFileNotFoundError
and re-throw aToolError
(since models will frequently attempt to read files that do not exist).
The best way to learn about writing tool environments is to look at the source code for the built in environments, LocalToolEnvironment and DockerToolEnvironment.
Environment Registration
You should build your custom tool environment within a Python package, and then register an inspect_ai
setuptools entry point. This will ensure that inspect loads your extension before it attempts to resolve a tool environment that uses your provider.
For example, if your package was named inspect_package
and your tool environment provider was exported from a source file named inspect_extensions.py
at the root of your package, you would register it like this in pyproject.toml
:
[project.entry-points.inspect_ai]
inspect_package = "inspect_package.inspect_extensions"
[tool.poetry.plugins.inspect_ai]
inspect_package = "inspect_package.inspect_extensions"
Environment Usage
Once the package is installed, you can refer to the custom tool environment the same way you’d refer to a built in tool environment. For example:
Task(
...,="podman"
tool_environment )
Tool environments can be invoked with an optional configuration parameter, which is passed as the config
argument to the startup()
and setup()
methods. In Python this is done with a tuple
Task(
...,=("podman","config.yaml")
tool_environment )
Storage
Filesystems with fsspec
Datasets, prompt templates, and evaluation logs can be stored using either the local filesystem or a remote filesystem. Inspect uses the fsspec package to read and write files, which provides support for a wide variety of filesystems, including:
Support for Amazon S3 is built in to Inspect via the s3fs package. Other filesystems may require installation of additional packages. See the list of built in filesystems and other known implementations for all supported storage back ends.
See Custom Filesystems below for details on implementing your own fsspec compatible filesystem as a storage back-end.
Filesystem Functions
The following Inspect API functions use fsspec:
resource()
for reading prompt templates and other supporting files.csv_dataset()
andjson_dataset()
for reading datasets (note thatfiles
referenced within samples can also use fsspec filesystem references).list_eval_logs()
,read_eval_log()
,write_eval_log()
, andretryable_eval_logs()
.
For example, to use S3 you would prefix your paths with s3://
:
# read a prompt template from s3
"s3://inspect-prompts/ctf.txt")
prompt_template(
# read a dataset from S3
"s3://inspect-datasets/ctf-12.csv")
csv_dataset(
# read eval logs from S3
"s3://my-s3-inspect-log-bucket") list_eval_logs(
Custom Filesystems
See the fsspec developer documentation for details on implementing a custom filesystem. Note that if your implementation is only for use with Inspect, you need to implement only the subset of the fsspec API used by Inspect. The properties and methods used by Inspect include:
sep
open()
makedirs()
info()
created()
exists()
ls()
walk()
unstrip_protocol()
invalidate_cache()
As with Model APIs and Tool Environments, fsspec filesystems should be registered using a setuptools entry point. For example, if your package is named inspect_package
and you have implemented a myfs://
filesystem using the MyFs
class exported from the root of the package, you would register it like this in pyproject.toml
:
[project.entry-points."fsspec.specs"]
myfs = "inspect_package:MyFs"
[tool.poetry.plugins."fsspec.specs"]
myfs = "inspect_package:MyFs"
Once this package is installed, you’ll be able to use myfs://
with Inspect without any further registration.