Uganda Cultural and Cognitive Benchmark (UCCB)
The first comprehensive question-answering dataset designed to evaluate cultural understanding and reasoning abilities of Large Language Models concerning Uganda’s multifaceted environment across 24 cultural domains including education, traditional medicine, media, economy, literature, and social norms.
Overview
The first comprehensive question-answering dataset designed to evaluate the cultural understanding and reasoning abilities of Large Language Models (LLMs) concerning Uganda’s multifaceted environment across 24 cultural domains.
Usage
# Evaluate full dataset
inspect eval inspect_evals/uccb
# Evaluate with subset for testing
inspect eval inspect_evals/uccb --limit 100
# Evaluate with chain of thought
inspect eval inspect_evals/uccb -T cot=true
# Custom max tokens
inspect eval inspect_evals/uccb -T max_tokens=1024
# Custom scorer model
inspect eval inspect_evals/uccb -T scorer_model=anthropic/claude-3-5-sonnet-20240620Dataset
- Source: CraneAILabs/UCCB
- Samples: 1,039 total samples
- Format: Open-ended question answering
- Languages: English (Ugandan English) with multilingual elements
- License: CC BY-NC-SA 4.0
Categories
The benchmark covers 24 cultural domains including:
- Education (70 samples) - Educational systems and practices
- Ugandan Herbs (65 samples) - Traditional medicinal plants
- Media (65 samples) - Media landscape and outlets
- Economy (62 samples) - Economic structures and practices
- Notable Key Figures (59 samples) - Important historical and contemporary figures
- Literature (53 samples) - Literary works and traditions
- Architecture (51 samples) - Building styles and structures
- Folklore (49 samples) - Traditional stories and beliefs
- Language (47 samples) - Linguistic diversity and usage
- Religion (44 samples) - Religious practices and beliefs
- Values & Social Norms (43 samples) - Cultural values and social expectations
- Sports (23 samples) - Sports culture and achievements
- And 12 additional domains covering attire, customs, festivals, food, geography, demographics, history, traditions, street life, music, value addition, and slang.
Scoring
The evaluation uses model-based grading appropriate for cultural knowledge assessment:
- model_graded_qa: Uses a configurable model (default: GPT-4o-mini) to evaluate whether submitted answers demonstrate correct understanding of Ugandan culture, history, and society
- accuracy(): Overall accuracy metric based on model grading
- stderr(): Standard error calculation for statistical significance
Grading Criteria
The model grader evaluates submissions based on: - Factual accuracy: Contains correct information about Ugandan culture/history - Cultural understanding: Demonstrates appropriate cultural knowledge and respect - Key information coverage: Includes main points from the reference answer - Cultural context: Shows understanding of relevant Ugandan cultural context
Answers are considered correct even if they don’t match word-for-word, as long as they convey the same key cultural insights and factual information.
Evaluation Report
Implementation Details
- Based on the original UCCB dataset from Crane AI Labs
- Uses generative QA approach with cultural context system message
- Supports both standard and chain-of-thought evaluation modes
- Temperature set to 0.0 for reproducible results
Results
To run evaluations and populate this table with actual results, use the following commands:
# Evaluate with OpenAI models
uv run inspect eval inspect_evals/uccb --model openai/gpt-4o --limit 100
uv run inspect eval inspect_evals/uccb --model openai/gpt-4o-mini --limit 100
# Evaluate with Anthropic models
uv run inspect eval inspect_evals/uccb --model anthropic/claude-3-5-sonnet-20240620 --limit 100
# Evaluate with custom scorer model
uv run inspect eval inspect_evals/uccb --model openai/gpt-4o -T scorer_model=anthropic/claude-3-5-sonnet-20240620 --limit 100Results
| Model | Accuracy | Avg Score (1-5) | Samples | Date | Notes |
|---|---|---|---|---|---|
| anthropic/claude-sonnet-4.5 | 82.0% | 4.10 | 1039 | 2025-10-11 | Best performing model - excellent cultural reasoning and context awareness |
| x-ai/grok-4 | 77.6% | 3.88 | 1039 | 2025-10-11 | Strong performance on cultural understanding tasks |
| cohere/command-a | 77.0% | 3.85 | 1039 | 2025-10-11 | Solid baseline with good factual accuracy |
| openai/gpt-5 | 55.0% | 2.75 | 1039 | 2025-10-11 | Moderate performance on cultural context |
| google/gemini-2.5-pro | 23.2% | 1.16 | 1039 | 2025-10-11 | Requires improvement on Ugandan cultural understanding |
Evaluation Details: - Scoring Method: Model-based grading using GPT-4o - Total Samples: 1,039 across 24 cultural domains - Temperature: 0.0 (reproducible results) - Max Tokens: 512 - Success Rate: 100% completion for all models - Evaluation Date: October 11, 2025
Performance Metrics: - Processing Speed: Ranged from 0.37 items/s (Grok-4) to 1.43 items/s (Gemini-2.5-Pro) - Average Evaluation Time: 728s - 2,798s depending on model
Note: Accuracy percentages are calculated from average scores (Accuracy = Avg Score / 5 × 100). Scores reflect semantic understanding of Ugandan cultural context, not exact text matching.
Citation
@misc{uccb_2025,
author = {Lwanga Caleb and Gimei Alex and Kavuma Lameck and Kato Steven Mubiru and Roland Ganafa and Sibomana Glorry and Atuhaire Collins and JohnRoy Nangeso and Bronson Bakunga},
title = {The Ugandan Cultural Context Benchmark (UCCB) Suite},
year = {2025},
url = {https://huggingface.co/datasets/CraneAILabs/UCCB}
}