Analysis State

This object flows through the whole hibayes workflow. It stores the data, extracted features, checker and communicator results, fitted models and much more.

The AnalysisState is the central data structure that flows through the entire hibayes pipeline. It carries your data, processed features, fitted models, and results from one stage to the next.

AnalysisState

The main container for your analysis. It holds:

data: The raw loaded DataFrame
processed_data: Transformed data after processors run
features: Extracted arrays for model fitting (e.g., obs, model_index)
coords: Named coordinates for ArviZ (e.g., {"model": ["gpt-4", "claude"]})
dims: Dimension mappings for parameters (e.g., {"model_effects": ["model"]})
models: List of ModelAnalysisState objects (one per fitted model)
communicate: Plots and tables generated by communicators

from hibayes.analysis_state import AnalysisState

# Access data at different stages
state.data              # raw loaded data
state.processed_data    # after processors run
state.features          # extracted model features
state.coords            # coordinate labels
state.dims              # dimension names

# Access specific items
state.feature("obs")           # get a specific feature
state.coord("model")           # get coordinate labels for "model"
state.models                   # list of ModelAnalysisState objects
state.get_model("two_level_group_binomial")  # get model by name

# Get the best model by a diagnostic metric
best = state.get_best_model(with_respect_to="elpd_waic")

Adding outputs

# Add plots and tables for the communicate stage
state.add_plot(fig, "posterior_forest")
state.add_table(df, "model_comparison")

# Add logs for a specific stage
state.add_log("Processed 1000 rows", stage="process")

Persistence

The AnalysisState can be saved and loaded for reproducibility:

from pathlib import Path

# Save entire analysis
state.save(Path("./results"))

# Load from disk
state = AnalysisState.load(Path("./results"))

Automatic saving between stages

When using the CLI (hibayes full --config hibayes.yaml --out ./results), the state is automatically saved after each major stage. This enables:

Checkpointing: Resume from a failed run without restarting from scratch
Inspection: Examine intermediate results at any point
DVC integration: Track outputs with version control

The CLI pipeline (see src/hibayes/cli/full.py) saves after each stage:

analysis_state = load_data(config=config.data_loader, display=display)
analysis_state = process_data(analysis_state=analysis_state, ...)
analysis_state.save(path=out)

analysis_state = model(analysis_state=analysis_state, ...)
analysis_state.save(path=out)

analysis_state = communicate(analysis_state=analysis_state, ...)
analysis_state.save(path=out)

1: After processing: data, processed_data, features, coords, dims saved
2: After modelling: models with inference_data, diagnostics added
3: After communicate: plots and tables added to communicate/

Each save() call overwrites the previous state, building up the complete analysis incrementally.

Folder layout

results/
├── data.parquet
├── processed_data.parquet
├── features.pkl
├── coords.json
├── dims.json
├── logs/
│   ├── logs_load.txt
│   ├── logs_process.txt
│   └── logs_model.txt
├── communicate/
│   ├── forest_plot.png
│   └── summary_table.csv
└── models/
    └── two_level_group_binomial/
        ├── metadata.json
        ├── model_config.json
        ├── inference_data.nc
        └── diagnostics.json

ModelAnalysisState

Each fitted model gets its own ModelAnalysisState, which stores:

model: The NumPyro model function
model_config: Configuration (fit settings, link function, tag)
features: Model-specific features (inherited from AnalysisState if not set)
coords/dims: Coordinate and dimension info for ArviZ
inference_data: ArviZ InferenceData with posterior samples
diagnostics: Results from checkers (e.g., R-hat, WAIC)
is_fitted: Whether the model has been fitted

# Access model state
model_state = state.get_model("two_level_group_binomial")

model_state.model_name       # name (+ tag if set)
model_state.model            # the NumPyro model function
model_state.model_config     # ModelConfig object
model_state.inference_data   # ArviZ InferenceData
model_state.is_fitted        # True if fitted

# Access diagnostics from checkers
model_state.diagnostics                    # all diagnostics dict
model_state.diagnostic("rhat_max")         # specific diagnostic
model_state.add_diagnostic("my_stat", 0.5) # add custom diagnostic

# Get features for prior predictive (obs set to None)
model_state.prior_features

# Link function for transforming parameters
prob = model_state.link_function(logit_values)

Using in custom components

When writing custom processors, checkers, or communicators, you receive the state and return it after modifications:

from hibayes.process import DataProcessor, process


@process
def my_processor() -> DataProcessor:
    def process(state, display):
        # Read from state
        df = state.processed_data

        # Modify and update
        state.features = {"obs": ...}
        state.coords = {"group": [...]}

        return state
    return process

1: Always return the state - it flows to the next stage in the pipeline.