IBT Torwali Wordlist
License:
CC-BY-SA-4.0
Steward:
Collaborative Action For Research & Development (CARD)Task: NLP
Release Date: 3/5/2026
Format: CSV
Size: 312.87 KB
Share
Description
The IBT Torwali Wordlist contains approximately 20,000 unique entries in Torwali (ISO 639-3: trw), an under-documented Indo-Aryan language spoken in northern Pakistan. The dataset comprises standardized lexical entries covering core vocabulary, function words, and culturally salient terms, with consistent orthography and normalization suitable for linguistic and computational use. Entries are aligned with English and Urdu glosses, and include part-of-speech tag.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
Use is restricted for commercial entities with annual revenue above 1 million USD
Forbidden Usage
The dataset cannot be employed in systems that fabricate language output or in projects that could amplify negative, biased, or abusive content.
Processes
Intended Use
The IBT Torwali wordlist is intended for linguistic research, language documentation, and the development of educational and computational resources for the Torwali language.
Metadata
Language
Torwali (trw) is an Indo-Aryan language spoken in the Swat Kohistan region of northern Pakistan, mainly in the Bahrain and Chail areas. It is used across several valleys and exhibits variation in pronunciation and vocabulary among local speech communities. Torwali has a strong oral tradition, with folktales, poetry, songs, and storytelling forming a core part of its cultural identity. Although actively spoken, Torwali remains under-documented and has limited standardized writing, making it an essential language for linguistic research, preservation efforts, and resource development.
Semantic Domains (IBT Torwali Wordlist)
Person and Society
Kinship, social roles, customs, relationsBody and Health
Body parts, illness, physical statesEmotion and Cognition
Feelings, perception, thinkingLanguage and Communication
Speech, writing, interactionNature and Environment
Weather, landforms, plants, animalsFood and Livelihood
Agriculture, cooking, work, economyObjects and Material Culture
Tools, clothing, household itemsAction and Movement
Activities, motion, physical actionsTime, Space, and Quantity
Temporal, spatial, numerical conceptsCulture, Belief, and Knowledge
Tradition, religion, education
List of Alphabets
آ اَ ٲ ب پ ت ٹ ث ج چ ڇ خ د ذ ڑ ر ز ڙ ژ ط ض ص ش ݜ س ظ غ ف ق ک گ ل م ن و ہ ی ء او
Sample Entries
اتفاق تے
Torwali: اتفاق تے
Part of Speech: Adverb
English Gloss: Unitedly; with unity
Urdu Gloss: اتفاق سے، مل کر
Semantic Domain (EN): States
Semantic Domain (UR): حالت
Date: 23 Nov 2015
اَٹکے
Torwali: اَٹکے
Part of Speech: Noun
English Gloss: Severe cold days
Urdu Gloss: سخت سردی کے دن
Semantic Domain (EN): States
Semantic Domain (UR): حالت
Date: 23 Nov 2015
Citation
Proper citation is required when using this dataset.
@misc{IBT_North_Pakistan,
title = {IBT North Pakistan},
howpublished = {\url{https://ibtnorthpakistan.org/}},
note = {Accessed: 2025-01-06}
}