Project Structure
Inspect Cyber discovers evaluation configuration files by recursively searching for eval.yaml
or eval.yml
files at or below a given absolute path (other than in hidden directories). This enables Inspect Cyber to find evaluation configuration files no matter the structure of the project. However, to enhance legibility and facilitate collaboration, it is recommended to organise directories roughly as follows.
project/
├── evals/
│ └── sql_injection/
│ ├── eval.yaml
│ ├── sandbox_configs/
│ │ ├── compose.yaml
│ │ └── helm-values.yaml
│ ├── images/
│ │ ├── victim/
│ │ │ └── Dockerfile
│ ├── solution/
│ │ └── solution.sh
│ └── resources/
│ └── demo.py
├── images/
│ └── agent/
│ └── Dockerfile
├── scripts/
│ └── build.sh └── README.md
- 1
-
Putting all evaluations in a subdirectory makes them easy to find and run (e.g.,
create_agentic_eval_dataset("/path/to/project/evals")
). - 2
- For evaluations that support multiple sandbox types, it can be usful to group the configuration files together. See Sandboxes for more.
- 3
- Creating one subdirectory per evaluation enables developing and running each in isolation.
- 4
- Storing files related to an evaluation’s infrastructure in a subdirectory streamlines understanding and building the environment.
- 5
- Solution scripts are an effective way of validating that evaluations have been set up in a correct, solvable way. It is helpful to put them in a subdirectory because the scripts occasionally require additional files. See Verifying Solvability for more.
- 6
- Grouping files that will be copied into the evaluation environment makes them easier to find and edit.
- 7
- Infrastructure that is shared between evals, such as the image for an agent, can be stored in a top level directory to reduce duplication.
- 8
- It is often handy to write scripts for doing certain tasks, like building images.
Note
While there can be many evaluation configuration files at or below the absolute path provided to create_agentic_eval_dataset()
, there can only be one per directory. Otherwise it is not clear which to use. If multiple evaluation configuration files are found in the same directory, an error will be raised.