Common Voice Scripted Speech 23.0 - Wakhi
Locale: wbl
Size: 305.60 MB
Task: ASR
Format: MP3
License: CC-0
Wakhi (Wuk̃hikwor) — Wakhi (wbl
)
This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset
for Wakhi (wbl
). The dataset contains 16 hours of recorded
speech (13 hours validated) from 13 speakers.
Language
Wakhi or Wakhani is indigenously termed as K̃hikwor (contraction of Wuk̃hikwor). It's an old eastern Iranian or Iranic language within the Pamiri branch. Though, diasporas also live in Russia and Turkey as well as in the European, American an Australian continents, the Wakhi (or K̃hikwor) language is spoken indigenously in Pakistan, China, Afghanistan, Tajikistan and Kirghizistan.
Demographic information
The dataset includes the following distribution of age and gender.
Gender
Self-declared gender information, percentage refers to the number of clips annotated with this gender.
Gender | Pertentage |
---|---|
Undefined | 88.0% |
Male Masculine | 12.0% |
Age
Self-declared age information, percentage refers to the number of clips annotated with this age band.
Age Band | Percentage |
---|---|
Undefined | 21.0% |
Thirties | 1.0% |
Sixties | 12.0% |
Seventies | 66.0% |
Text corpus
The corpus has sentences of Hunza Wakhi and based on extensive anthropological and linguistic fieldwork in Pakistan, China, Afghanistan and Tajikistan.
Writing system
The script used is Roman Anglicized writing system, which is the approved script by Wakhi Tajik Cultural Association (WTCA), Pakistan, an Ishkoman Wakhi Welfare Organization (IWWO). Through this script, the literate Wakhi people easily and happily interact with each other across the borders on social media forums: it thus facilitates their creativities, thought expression in textual form and binds them together.
Symbol table
D̃d̃ Dh Ee Ẽẽ Ff Gg Gh g̃h Ii Jj J̃j̃ Kk Kh K̃h Ll Mm Nn Oo Pp Qq Rr Ss Sh S̃h Tt T̃t̃ Th Uu Ũũ Vv Ww Yy Zz Zh Z̃hZ̃z̃
Sample
There follows a randomly selected sample of five sentences from the corpus.
Kum insone ki cẽ haq en k̃hat e disht, yowe k̃hũ Khũdhoy disht.
Yemi ya inson ki yowes̃h aql-e bũnyodher bafig̃h et shakig̃hev yewerd.
Agar ki aql-e ya jũz cam en nik̃hinden, insoni cẽ kũ haywon’v en be lup darinda.
Woz sakes̃h dem k̃hũ jahon insonev winen ki dẽ aql en qiti, cerenges̃h darinda wocen.
Parwardigore haya dẽstan inson-e rũwes̃h e jũr k̃hak dẽstan, bihisht et dũz̃akh-e tasawũr ratk.
Automatic random samples
Ya’ni ayem tosh Khon tat cẽ tarat-e k̃hat en aya bu asp-e kũleki yan woz steti Gũl-e Qũrbon tat’rek.
Cẽ ilmẽn beshkhayi hũnar nast.
Cerg mazhẽ yavẽr wẽchel k̃hetuwa?
Yowe ya yupki det dez̃hd wozomdi dem chaleksar yow pũrũti kert Podshohbachaher.
Chi chiz jondori tra?
Sources
Texts (sentences) made out of my own brain (creation) during the assignment period.
Texts out of selected Wakhi poetries.
New Wakhi transcriptions (texts) of the interviews out of my extensive fieldwork in Pakistan, china, Afghanistan and Tajikistan
Wakhi publications from formal website: www.fazalamin.com
Text domains
General
Community links
Datasheet authors
Mazdak Beg
Ahmad Jami Sakhi
Mr. Amanullah
Fazal Amin Beg
Funding
Mozilla Foundation
Licence
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.