HiBayES
Hierarchical Bayesian Modelling Framework for AI Evaluation Statistics
Welcome to HiBayES
A Python framework for analysing AI evaluation data using hierarchical Bayesian statistical models.
What is HiBayES?
HiBayES is a statistical analysis framework designed specifically for Inspect AI evaluation logs. It implements the methodologies from our paper HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics, providing researchers with robust tools to:
- Extract insights from complex AI evaluation data
- Apply sophisticated hierarchical Bayesian models
- Communicate findings through visualisations and tables
It is intended to be used once the individual has explored and understood the data and candidate statistical models have been proposed. HiBayES helps to productionise the evaluation plan for ease of use in repeated evaluations and ensures appropriate levels of reproducibility and audit.
Key Features
🔄 Reproducible Pipelines
Built-in DVC integration ensures your analyses are fully reproducible. Version control your data, code, model fit, diagnostics and results seamlessly.
🔌 Extensible Architecture
Plugin system allows you to add custom extractors, processors, models, and communicators tailored to your domain.
🎨 Pallete of useful analysis
We provide built in methods for common workflows in AI evaluation such as stastical models, recommended health checks and useful plots.
Getting Started
Ready to dive in? Check out our Getting Started Guide for installation instructions and a tutorial.
Community & Support
- 📖 Read the paper for theoretical background
- 🐛 Report bugs or request features GitHub Issues
- 🤝 Contributions welcome!
Citation
If you use HiBayES in your research, please cite:
@article{luettgau2025hibayeshierarchicalbayesianmodeling,
title={HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics},
author={Lennart Luettgau and Harry Coppock and Magda Dubois and Christopher Summerfield and Cozmin Ududec},
year={2025},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2505.05602},
}