GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian
License:
CC-BY-NC-SA-4.0
Steward:
Tbilisi State UniversityTask: LLM
Release Date: 2/24/2026
Format: JSON
Size: 15.14 KB
Share
Description
GeoLogicQA is a manually-curated logical and inferential reasoning dataset for the Georgian language (a Kartvelian language). Designed to evaluate deep language understanding, the dataset bypasses simple pattern recognition in favor of multi-step deduction, reading comprehension, and arithmetic problem-solving. It aims to address the gap in evaluation benchmarks for low-resource languages.
Specifics
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
It is intended solely for research and scientific purposes.
Forbidden Usage
The dataset is strictly for non-commercial, academic research purposes.
Metadata
GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian
Overview
GeoLogicQA is a manually-curated logical and inferential reasoning dataset for the Georgian language (a Kartvelian language). Designed to evaluate deep language understanding, the dataset bypasses simple pattern recognition in favor of multi-step deduction, reading comprehension, and arithmetic problem-solving.
Curation & Size
The dataset contains a total of 106 complex questions:
100 questions adapted from the Kangaroo Mathematics Competition (9th to 12th-grade levels).
6 questions sourced from Komarovi educational math/physics materials.
Linguistic Validation: To ensure the highest quality for low-resource language training and evaluation, native Georgian speakers rigorously validated each question. This process specifically addressed linguistic nuances and resolved polysemy (words with multiple meanings) to guarantee the reasoning constraints were unambiguous and culturally appropriate.
Data Format
The dataset is provided as a single geologicqa.json file containing a JSON
array of 106 dictionaries. Each dictionary represents a single reasoning task.
Schema Fields
question(string): The text of the logical puzzle, scenario, or math problem in Georgian.answer(string): The correct solution or deduction.id(string, optional): The identifier or specific test code for the question.source(string): The original source material the question was adapted from.
Citation
If you use this dataset in your research, please cite the authors:
@inproceedings{koberidze-etal-2025-benchmark,
title = "A Benchmark for Evaluating Logical Reasoning in {G}eorgian For Large Language Models",
author = "Koberidze, Irakli and
Elizbarashvili, Archil and
Tsintsadze, Magda",
editor = "Estevanell-Valladares, Ernesto Luis and
Picazo-Izquierdo, Alicia and
Ranasinghe, Tharindu and
Mikaberidze, Besik and
Ostermann, Simon and
Gurgurov, Daniil and
Mueller, Philipp and
Borg, Claudia and
{\v{S}}imko, Mari{\'a}n",
booktitle = "Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages",
m sep,
year = "2025",
address = "Varna, Bulgaria",
publisher = "INCOMA Ltd., Shoumen, Bulgaria",
url = "https://aclanthology.org/2025.lowresnlp-1.13/",
pages = "121--130",
abstract = "Advancements in LLMs have largely overlooked low-resource languages (LRLs), creating a gap in evaluation benchmarks. To address this for Georgian, a Kartvelian language, we introduce GeoLogicQA. This novel, manually-curated benchmark assesses LLMs' logical and inferential reasoning through 100 questions. Questions cover syllogistic deduction, inferential reading comprehension, common-sense reasoning, and arithmetic, adapted from challenging sources (Kangaroo Mathematics Competition) and validated by native Georgian speakers for linguistic nuances. Initial evaluations of state-of-the-art LLMs (Gemini 2.5 Flash, DeepSeek-V3, Grok-3, GPT-4o) show an average accuracy of 64{\%} to 83{\%}, significantly exceeding the human baseline of 47{\%}. While demonstrating strong reasoning potential, error analysis reveals persistent challenges in multi-step combinatorial and highly constrained inferential tasks. GeoLogicQA is a public resource for tracking progress and diagnosing weaknesses in Georgian LLMs. We plan to expand the benchmark and establish a public leader-board to foster continuous improvement."
}
License
The original Kangaroo Mathematics Competition questions were adapted with explicit permission from the Association Kangourou sans Frontières (AKSF).
The packaged GeoLogicQA dataset is released under the CC BY-NC-SA 4.0 license.