Analysis State
The AnalysisState is the central data structure that flows through the entire hibayes pipeline. It carries your data, processed features, fitted models, and results from one stage to the next.
AnalysisState
The main container for your analysis. It holds:
- data: The raw loaded DataFrame
- processed_data: Transformed data after processors run
- features: Extracted arrays for model fitting (e.g.,
obs,model_index) - coords: Named coordinates for ArviZ (e.g.,
{"model": ["gpt-4", "claude"]}) - dims: Dimension mappings for parameters (e.g.,
{"model_effects": ["model"]}) - models: List of
ModelAnalysisStateobjects (one per fitted model) - communicate: Plots and tables generated by communicators
from hibayes.analysis_state import AnalysisState
# Access data at different stages
state.data # raw loaded data
state.processed_data # after processors run
state.features # extracted model features
state.coords # coordinate labels
state.dims # dimension names
# Access specific items
state.feature("obs") # get a specific feature
state.coord("model") # get coordinate labels for "model"
state.models # list of ModelAnalysisState objects
state.get_model("two_level_group_binomial") # get model by name
# Get the best model by a diagnostic metric
best = state.get_best_model(with_respect_to="elpd_waic")Adding outputs
Persistence
The AnalysisState can be saved and loaded for reproducibility:
Automatic saving between stages
When using the CLI (hibayes full --config hibayes.yaml --out ./results), the state is automatically saved after each major stage. This enables:
- Checkpointing: Resume from a failed run without restarting from scratch
- Inspection: Examine intermediate results at any point
- DVC integration: Track outputs with version control
The CLI pipeline (see src/hibayes/cli/full.py) saves after each stage:
analysis_state = load_data(config=config.data_loader, display=display)
analysis_state = process_data(analysis_state=analysis_state, ...)
analysis_state.save(path=out)
analysis_state = model(analysis_state=analysis_state, ...)
analysis_state.save(path=out)
analysis_state = communicate(analysis_state=analysis_state, ...)
analysis_state.save(path=out)- 1
- After processing: data, processed_data, features, coords, dims saved
- 2
- After modelling: models with inference_data, diagnostics added
- 3
- After communicate: plots and tables added to communicate/
Each save() call overwrites the previous state, building up the complete analysis incrementally.
Folder layout
results/
├── data.parquet
├── processed_data.parquet
├── features.pkl
├── coords.json
├── dims.json
├── logs/
│ ├── logs_load.txt
│ ├── logs_process.txt
│ └── logs_model.txt
├── communicate/
│ ├── forest_plot.png
│ └── summary_table.csv
└── models/
└── two_level_group_binomial/
├── metadata.json
├── model_config.json
├── inference_data.nc
└── diagnostics.json
ModelAnalysisState
Each fitted model gets its own ModelAnalysisState, which stores:
- model: The NumPyro model function
- model_config: Configuration (fit settings, link function, tag)
- features: Model-specific features (inherited from AnalysisState if not set)
- coords/dims: Coordinate and dimension info for ArviZ
- inference_data: ArviZ InferenceData with posterior samples
- diagnostics: Results from checkers (e.g., R-hat, WAIC)
- is_fitted: Whether the model has been fitted
# Access model state
model_state = state.get_model("two_level_group_binomial")
model_state.model_name # name (+ tag if set)
model_state.model # the NumPyro model function
model_state.model_config # ModelConfig object
model_state.inference_data # ArviZ InferenceData
model_state.is_fitted # True if fitted
# Access diagnostics from checkers
model_state.diagnostics # all diagnostics dict
model_state.diagnostic("rhat_max") # specific diagnostic
model_state.add_diagnostic("my_stat", 0.5) # add custom diagnostic
# Get features for prior predictive (obs set to None)
model_state.prior_features
# Link function for transforming parameters
prob = model_state.link_function(logit_values)Using in custom components
When writing custom processors, checkers, or communicators, you receive the state and return it after modifications:
- 1
- Always return the state - it flows to the next stage in the pipeline.