O-NET: A high-school student knowledge test

Knowledge

inspect-evals

Questions and answers from the Ordinary National Educational Test (O-NET), administered annually by the National Institute of Educational Testing Service to Matthayom 6 (Grade 12 / ISCED 3) students in Thailand. The exam contains six subjects: English language, math, science, social knowledge, and Thai language. There are questions with multiple-choice and true/false answers. Questions can be in either English or Thai.

Contributed By

@bact

Code

src/inspect_evals/onet

Overview

O-NET is an evaluation dataset consisting of high-school-level multiple-choice and true/false questions in mathematics, science, social knowledge, English language comprehension, and Thai language comprehension.

The model is prompted with a question and tasked with answering it.

Dataset

O-NET M6 contains both English and Thai and is based on Thailand’s Ordinary National Educational Test (O-NET), administered annually by the National Institute of Educational Testing Service to Matthayom 6 (Grade 12 / ISCED 3) students. The dataset is released as part of OpenThaiGPT - Thai Exams Eval.

The original dataset contains some questions that require written responses, some unanswerable questions, and questions with more than one correct answer. These questions are excluded in this evaluation in Inspect Evals.

Here is an example prompt from the dataset (after it has been further processed by Inspect):

The question can be in either English or Thai.

Answer the following multiple choice question. The entire content of your response should be of the following format: ‘ANSWER: $LETTER’ (without quotes) where LETTER is one of A,B,C,D,E. Think step by step before answering.

Question: คำประพันธ์ต่อไปนี้ใช้ภาพพจน์ตามข้อใด อันชาติใดไร้ช่างชำนาญศิลป์ เหมือนนารินไร้โฉมบรรโลมสง่า ใครใครเห็นไม่เป็นที่จำเริญตา เขาจะพากันเย้ยให้อับอาย

Choices: A) อุปมา B) อุปลักษณ์ C) สัญลักษณ์ D) ลัทพจน์ E) บุคคลวัต

The model is then expected to provide an answer.

Scoring

A simple accuracy is calculated over the datapoints.

Benchmark Results

For reference, see the benchmark results against the following models conducted by the OpenThaiGPT project:

meta-llama/Llama-3.1
openthaigpt/openthaigpt1.5
Qwen/Qwen2.5
scb10x/llama-3-typhoon-v1.5x

Exam names	meta-llama/Llama-3.1-70B-Instruct	openthaigpt/openthaigpt1.5-72b-instruct	Qwen/Qwen2.5-72B-Instruct	scb10x/llama-3-typhoon-v1.5x-70b-instruct
onet_m6_english	82.69	90.38	86.54	73.08
onet_m6_math	58.82	41.18	23.53	41.18
onet_m6_science	57.14	67.86	64.29	50.00
onet_m6_social	76.36	65.45	63.64	67.27
onet_m6_thai	60.00	56.92	60.00	55.38

Full results are available at openthaigpt.aieat.or.th (updated on 30 September 2024).

Parameters

`onet_m6`

dataset_name (Literal[‘default’, ‘english’, ‘math’, ‘science’, ‘social’, ‘thai’]): (default: 'default')
shuffle (bool): (default: True)

Changelog

[2-A] - 2026-02-16

Migrate version to new scheme. See #907.

[1.0.1] - 2025-12-18

Adds backoff policy for functions that connect to huggingface servers.