Analysis State
The AnalysisState is the central data structure that flows through the entire hibayes pipeline. It carries your data, processed features, fitted models, and results from one stage to the next.
AnalysisState
The main container for your analysis. It holds:
- data: The raw loaded DataFrame
- processed_data: Transformed data after processors run
- features: Extracted arrays for model fitting (e.g.,
obs,model_index) - coords: Named coordinates for ArviZ (e.g.,
{"model": ["gpt-4", "claude"]}) - dims: Dimension mappings for parameters (e.g.,
{"model_effects": ["model"]}) - models: List of
ModelAnalysisStateobjects (one per fitted model) - communicate: Plots and tables generated by communicators
from hibayes.analysis_state import AnalysisState
# Access data at different stages
state.data # raw loaded data
state.processed_data # after processors run
state.features # extracted model features
state.coords # coordinate labels
state.dims # dimension names
# Access specific items
state.feature("obs") # get a specific feature
state.coord("model") # get coordinate labels for "model"
state.models # list of ModelAnalysisState objects
state.get_model("two_level_group_binomial") # get model by name
# Get the best model by a diagnostic metric
best = state.get_best_model(with_respect_to="elpd_waic")Adding outputs
Persistence
The AnalysisState can be saved and loaded for reproducibility:
Automatic saving between stages
When using the CLI (hibayes full --config hibayes.yaml --out ./results), the state is automatically saved after each major stage. This enables:
- Checkpointing: Resume from a failed run without restarting from scratch
- Inspection: Examine intermediate results at any point
- DVC integration: Track outputs with version control
The CLI pipeline (see src/hibayes/cli/full.py) saves after each stage:
analysis_state = load_data(config=config.data_loader, display=display)
analysis_state = process_data(analysis_state=analysis_state, ...)
analysis_state.save(path=out)
analysis_state = model(analysis_state=analysis_state, ...)
analysis_state.save(path=out)
analysis_state = communicate(analysis_state=analysis_state, ...)
analysis_state.save(path=out)- 1
- After processing: data, processed_data, features, coords, dims saved
- 2
- After modelling: models with inference_data, diagnostics added
- 3
- After communicate: plots and tables added to communicate/
Each save() call overwrites the previous state, building up the complete analysis incrementally.
Frequent saves (default)
By default, the CLI saves state after each model fit and each communicator run. This provides:
- Crash recovery: If a long-running analysis fails mid-way, you won’t lose completed work
- Progress monitoring: Inspect partial results while the analysis is still running
- Debugging: Identify exactly which step caused a failure
To disable frequent saves (save only at stage boundaries):
Incremental saves
When frequent saves are enabled, hibayes uses incremental saving to avoid repeatedly writing large, unchanged files. After the initial save, subsequent saves skip:
data.parquetandprocessed_data.parquet(data doesn’t change after processing)features.pkl,coords.json,dims.json(static after extraction)model.pklfor each model (model function doesn’t change)inference_data.ncif the posterior samples haven’t changed
Files that may change are always saved:
diagnostics.json(checkers add new diagnostics)- Diagnostic figures and summary files
inference_data.ncwhen new groups are added (e.g., posterior predictive samples)
This typically provides 10x or greater speedup for frequent saves compared to full saves, making crash recovery essentially free in terms of performance overhead.
Folder layout
results/
├── data.parquet
├── processed_data.parquet
├── features.pkl
├── coords.json
├── dims.json
├── logs/
│ ├── logs_load.txt
│ ├── logs_process.txt
│ └── logs_model.txt
├── communicate/
│ ├── forest_plot.png
│ └── summary_table.csv
└── models/
└── two_level_group_binomial/
├── metadata.json
├── model_config.json
├── inference_data.nc
└── diagnostics.json
ModelAnalysisState
Each fitted model gets its own ModelAnalysisState, which stores:
- model: The NumPyro model function
- model_config: Configuration (fit settings, link function, tag)
- features: Model-specific features (inherited from AnalysisState if not set)
- coords/dims: Coordinate and dimension info for ArviZ
- inference_data: ArviZ InferenceData with posterior samples
- diagnostics: Results from checkers (e.g., R-hat, WAIC)
- is_fitted: Whether the model has been fitted
# Access model state
model_state = state.get_model("two_level_group_binomial")
model_state.model_name # name (+ tag if set)
model_state.model # the NumPyro model function
model_state.model_config # ModelConfig object
model_state.inference_data # ArviZ InferenceData
model_state.is_fitted # True if fitted
# Access diagnostics from checkers
model_state.diagnostics # all diagnostics dict
model_state.diagnostic("rhat_max") # specific diagnostic
model_state.add_diagnostic("my_stat", 0.5) # add custom diagnostic
# Get features for prior predictive (obs set to None)
model_state.prior_features
# Link function for transforming parameters
prob = model_state.link_function(logit_values)Using in custom components
When writing custom processors, checkers, or communicators, you receive the state and return it after modifications:
- 1
- Always return the state - it flows to the next stage in the pipeline.