Common Voice Scripted Speech 23.0 - Yidgha

Locale: ydg

Size: 220.31 MB

Task: ASR

Format: MP3

License: CC-0


[Yadgha] — Yadgha (ydg)

This datasheet is for version 23.0 of the the Mozilla Common Voice Scripted Speech dataset for Yadgha (ydg). The dataset contains 12 hours of recorded speech (11 hours validated) from 15 speakers.

Language

Yadgha (ISO 639-3: ydg), also known as Lutkohiwar, is spoken in the Lutkoh Valley, situated approximately 46 km west of Chitral town. The Yadgha people trace their origins to the Munjan valley in Afghanistan, having migrated to the Lutkoh Valley 31 generations ago. The Yadgha community consists of around 6,000 speakers, although this number is gradually decreasing. Speakers of the language are shifting to Khowar, the lingua franca of Chitral valley. Yadgha is a written language, and several poets compose poetry in it. However, limited literacy activities are currently underway to support the language's preservation

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, percentage refers to the number of clips annotated with this gender.

GenderPertentage
Undefined100.0%

Age

Self-declared age information, percentage refers to the number of clips annotated with this age band.

Age BandPercentage
Undefined88.0%
Thirties12.0%

Text corpus

The text came from my own writing. The number of sentences are 2000.

Writing system

The writing of Yadgha language is Perso-Arabic, develop by the community with support of Forum for language initiatives, a few years back.

Symbol table

آ ا ب پ ت ٹ ث ج چ ح خ ݯ ځ څ ݮ د ذ ر ز ڑ ژ ݱ س ش ݰ ص ض ط ظ ع غ ف ڤ ک گ ګ م ن ں و ہ ة ھ ء ی ے

Sample

There follows a randomly selected sample of sentences from the corpus.

نَمن یاغو شَماؤ نغن غور ڤے انسان خدان پیدا کڑے تو چر زیمونے نے ہورغن تیار اوئے

Automatic random samples

ترے یف ملن لے یوغن زو پیرو بلت ڤئیم
ترے چوف گاٹے، پتارئی، املیرغے اویت
مئے ڤرائی کُو ݰوت لا مف پتا چش وا
امیر ہمزہ نے گپن خاڤد
سے تورو لاست کرن زو ڤئیم سے ویرے لاست کرن وو ڤیو

Sources

I wrote sentences are my own. There is very few written material of the language. Those are world list and alphabet book.

Text domains

General

Processing

I wrote the sentences my own. I am a poet of the language and usually do write my poetry. Using the skill I develop the corpus that comprised on various general topics.

Community links

Datasheet authors

Common Voice Community

Funding

Meesum Alam

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.