IBT Torwali Wordlist

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Collaborative Action For Research & Development (CARD)

Task: NLP

Release Date: 3/5/2026

Format: CSV

Size: 312.87 KB


Share

Description

The IBT Torwali Wordlist contains approximately 20,000 unique entries in Torwali (ISO 639-3: trw), an under-documented Indo-Aryan language spoken in northern Pakistan. The dataset comprises standardized lexical entries covering core vocabulary, function words, and culturally salient terms, with consistent orthography and normalization suitable for linguistic and computational use. Entries are aligned with English and Urdu glosses, and include part-of-speech tag.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

Use is restricted for commercial entities with annual revenue above 1 million USD

Forbidden Usage

The dataset cannot be employed in systems that fabricate language output or in projects that could amplify negative, biased, or abusive content.

Processes

Intended Use

The IBT Torwali wordlist is intended for linguistic research, language documentation, and the development of educational and computational resources for the Torwali language.

Metadata

Language

Torwali (trw) is an Indo-Aryan language spoken in the Swat Kohistan region of northern Pakistan, mainly in the Bahrain and Chail areas. It is used across several valleys and exhibits variation in pronunciation and vocabulary among local speech communities. Torwali has a strong oral tradition, with folktales, poetry, songs, and storytelling forming a core part of its cultural identity. Although actively spoken, Torwali remains under-documented and has limited standardized writing, making it an essential language for linguistic research, preservation efforts, and resource development.

Semantic Domains (IBT Torwali Wordlist)

  1. Person and Society
    Kinship, social roles, customs, relations

  2. Body and Health
    Body parts, illness, physical states

  3. Emotion and Cognition
    Feelings, perception, thinking

  4. Language and Communication
    Speech, writing, interaction

  5. Nature and Environment
    Weather, landforms, plants, animals

  6. Food and Livelihood
    Agriculture, cooking, work, economy

  7. Objects and Material Culture
    Tools, clothing, household items

  8. Action and Movement
    Activities, motion, physical actions

  9. Time, Space, and Quantity
    Temporal, spatial, numerical concepts

  10. Culture, Belief, and Knowledge
    Tradition, religion, education

List of Alphabets

آ اَ ٲ ب پ ت ٹ ث ج چ ڇ خ د ذ ڑ ر ز ڙ ژ ط ض ص ش ݜ س ظ غ ف ق ک گ ل م ن و ہ ی ء او

Sample Entries

اتفاق تے

  • Torwali: اتفاق تے

  • Part of Speech: Adverb

  • English Gloss: Unitedly; with unity

  • Urdu Gloss: اتفاق سے، مل کر

  • Semantic Domain (EN): States

  • Semantic Domain (UR): حالت

  • Date: 23 Nov 2015

اَٹکے

  • Torwali: اَٹکے

  • Part of Speech: Noun

  • English Gloss: Severe cold days

  • Urdu Gloss: سخت سردی کے دن

  • Semantic Domain (EN): States

  • Semantic Domain (UR): حالت

  • Date: 23 Nov 2015

Citation

Proper citation is required when using this dataset.

@misc{IBT_North_Pakistan,
  title        = {IBT North Pakistan},
  howpublished = {\url{https://ibtnorthpakistan.org/}},
  note         = {Accessed: 2025-01-06}
}