GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian

License icon

License:

CC-BY-NC-SA-4.0

Shield icon

Steward:

Tbilisi State University

Task: LLM

Release Date: 2/24/2026

Format: JSON

Size: 15.14 KB


Share

Description

GeoLogicQA is a manually-curated logical and inferential reasoning dataset for the Georgian language (a Kartvelian language). Designed to evaluate deep language understanding, the dataset bypasses simple pattern recognition in favor of multi-step deduction, reading comprehension, and arithmetic problem-solving. It aims to address the gap in evaluation benchmarks for low-resource languages.

Specifics

Licensing

Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)

https://spdx.org/licenses/CC-BY-NC-SA-4.0.html

Considerations

Restrictions/Special Constraints

It is intended solely for research and scientific purposes.

Forbidden Usage

The dataset is strictly for non-commercial, academic research purposes.

Metadata

GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian

Paper

Overview

GeoLogicQA is a manually-curated logical and inferential reasoning dataset for the Georgian language (a Kartvelian language). Designed to evaluate deep language understanding, the dataset bypasses simple pattern recognition in favor of multi-step deduction, reading comprehension, and arithmetic problem-solving.

Curation & Size

The dataset contains a total of 106 complex questions:

  • 100 questions adapted from the Kangaroo Mathematics Competition (9th to 12th-grade levels).

  • 6 questions sourced from Komarovi educational math/physics materials.

Linguistic Validation: To ensure the highest quality for low-resource language training and evaluation, native Georgian speakers rigorously validated each question. This process specifically addressed linguistic nuances and resolved polysemy (words with multiple meanings) to guarantee the reasoning constraints were unambiguous and culturally appropriate.

Data Format

The dataset is provided as a single geologicqa.json file containing a JSON array of 106 dictionaries. Each dictionary represents a single reasoning task.

Schema Fields

  • question (string): The text of the logical puzzle, scenario, or math problem in Georgian.

  • answer (string): The correct solution or deduction.

  • id (string, optional): The identifier or specific test code for the question.

  • source (string): The original source material the question was adapted from.

Citation

If you use this dataset in your research, please cite the authors:

@inproceedings{koberidze-etal-2025-benchmark,
    title = "A Benchmark for Evaluating Logical Reasoning in {G}eorgian For Large Language Models",
    author = "Koberidze, Irakli  and
      Elizbarashvili, Archil  and
      Tsintsadze, Magda",
    editor = "Estevanell-Valladares, Ernesto Luis  and
      Picazo-Izquierdo, Alicia  and
      Ranasinghe, Tharindu  and
      Mikaberidze, Besik  and
      Ostermann, Simon  and
      Gurgurov, Daniil  and
      Mueller, Philipp  and
      Borg, Claudia  and
      {\v{S}}imko, Mari{\'a}n",
    booktitle = "Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages",
    m sep,
    year = "2025",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url = "https://aclanthology.org/2025.lowresnlp-1.13/",
    pages = "121--130",
    abstract = "Advancements in LLMs have largely overlooked low-resource languages (LRLs), creating a gap in evaluation benchmarks. To address this for Georgian, a Kartvelian language, we introduce GeoLogicQA. This novel, manually-curated benchmark assesses LLMs' logical and inferential reasoning through 100 questions. Questions cover syllogistic deduction, inferential reading comprehension, common-sense reasoning, and arithmetic, adapted from challenging sources (Kangaroo Mathematics Competition) and validated by native Georgian speakers for linguistic nuances. Initial evaluations of state-of-the-art LLMs (Gemini 2.5 Flash, DeepSeek-V3, Grok-3, GPT-4o) show an average accuracy of 64{\%} to 83{\%}, significantly exceeding the human baseline of 47{\%}. While demonstrating strong reasoning potential, error analysis reveals persistent challenges in multi-step combinatorial and highly constrained inferential tasks. GeoLogicQA is a public resource for tracking progress and diagnosing weaknesses in Georgian LLMs. We plan to expand the benchmark and establish a public leader-board to foster continuous improvement."
}

License

The original Kangaroo Mathematics Competition questions were adapted with explicit permission from the Association Kangourou sans Frontières (AKSF).

The packaged GeoLogicQA dataset is released under the CC BY-NC-SA 4.0 license.