Common Voice Scripted Speech 23.0 - Wakhi

Locale: wbl

Size: 305.60 MB

Task: ASR

Format: MP3

License: CC-0


Wakhi (Wuk̃hikwor) — Wakhi (wbl)

This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset for Wakhi (wbl). The dataset contains 16 hours of recorded speech (13 hours validated) from 13 speakers.

Language

Wakhi or Wakhani is indigenously termed as K̃hikwor (contraction of Wuk̃hikwor). It's an old eastern Iranian or Iranic language within the Pamiri branch. Though, diasporas also live in Russia and Turkey as well as in the European, American an Australian continents, the Wakhi (or K̃hikwor) language is spoken indigenously in Pakistan, China, Afghanistan, Tajikistan and Kirghizistan.

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, percentage refers to the number of clips annotated with this gender.

GenderPertentage
Undefined88.0%
Male Masculine12.0%

Age

Self-declared age information, percentage refers to the number of clips annotated with this age band.

Age BandPercentage
Undefined21.0%
Thirties1.0%
Sixties12.0%
Seventies66.0%

Text corpus

The corpus has sentences of Hunza Wakhi and based on extensive anthropological and linguistic fieldwork in Pakistan, China, Afghanistan and Tajikistan.

Writing system

The script used is Roman Anglicized writing system, which is the approved script by Wakhi Tajik Cultural Association (WTCA), Pakistan, an Ishkoman Wakhi Welfare Organization (IWWO). Through this script, the literate Wakhi people easily and happily interact with each other across the borders on social media forums: it thus facilitates their creativities, thought expression in textual form and binds them together.

Symbol table

D̃d̃ Dh Ee Ẽẽ Ff Gg Gh g̃h Ii Jj J̃j̃ Kk Kh K̃h Ll Mm Nn Oo Pp Qq Rr Ss Sh S̃h Tt T̃t̃ Th Uu Ũũ Vv Ww Yy Zz Zh Z̃hZ̃z̃

Sample

There follows a randomly selected sample of five sentences from the corpus.

Kum insone ki cẽ haq en k̃hat e disht, yowe k̃hũ Khũdhoy disht.
Yemi ya inson ki yowes̃h aql-e bũnyodher bafig̃h et shakig̃hev yewerd.
Agar ki aql-e ya jũz cam en nik̃hinden, insoni cẽ kũ haywon’v en  be lup darinda.
Woz sakes̃h dem k̃hũ jahon insonev winen ki dẽ aql en qiti, cerenges̃h darinda wocen.
Parwardigore haya dẽstan inson-e rũwes̃h e jũr k̃hak dẽstan, bihisht et dũz̃akh-e tasawũr ratk.

Automatic random samples

Ya’ni ayem tosh Khon tat cẽ tarat-e k̃hat en aya bu asp-e kũleki yan woz steti Gũl-e Qũrbon tat’rek.
Cẽ ilmẽn beshkhayi hũnar nast.
Cerg mazhẽ yavẽr wẽchel k̃hetuwa?
Yowe ya yupki det dez̃hd wozomdi dem chaleksar yow pũrũti kert Podshohbachaher.
Chi chiz jondori tra?

Sources

  1. Texts (sentences) made out of my own brain (creation) during the assignment period.

  2. Texts out of selected Wakhi poetries.

  3. New Wakhi transcriptions (texts) of the interviews out of my extensive fieldwork in Pakistan, china, Afghanistan and Tajikistan

  4. Wakhi publications from formal website: www.fazalamin.com

Text domains

General

Community links

Datasheet authors

  • Mazdak Beg

  • Ahmad Jami Sakhi

  • Mr. Amanullah

  • Fazal Amin Beg

Funding

Mozilla Foundation

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.