Datasets
Ehugbo TTS: biblical text to speech dataset in Ehugbo Language
License: CC-BY-NC-SA-4.0
Locale: ig-ehugbo
Task: TTS
Format: WAV
Size: 437.69 MB
Speech Data Collection for The Nupe Language
License: CC-BY-NC-SA-4.0
Locale: nup
Task: NLP
Format: WAV, TXT
Size: 1.58 GB
KyrgyzNER: Human-Annotated NER Dataset for Kyrgyz
License: CC-BY-NC-SA-4.0
Locale: ky
Task: NLP
Format: CONLL-2003
Size: 585.87 KB
Putèr Newspaper Corpus
License: CC0-1.0
Locale: rm-puter
Task: OTH
Format: TSV
Size: 8.94 MB
Sutsilvan Newspaper Corpus
License: CC0-1.0
Locale: rm-sutsilv
Task: OTH
Format: TSV
Size: 8.87 MB
Sursilvan Newspaper Corpus
License: CC0-1.0
Locale: rm-sursilv
Task: OTH
Format: TSV
Size: 37.80 MB
Rumantsch Grischun Newspaper Corpus
License: CC0-1.0
Locale: rm-rumgr
Task: OTH
Format: TSV
Size: 19.03 MB
Podcast Homostoria (Indonesia)
License: CC-BY-SA-4.0
Locale: id
Task: ASR
Format: mp3
Size: 302.97 MB
Imre 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: WEBM
Size: 99.60 MB
Berta 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: FLAC
Size: 209.52 MB
Anna 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: WEBM
Size: 95.27 MB
Dave 1.0
License: CC0-1.0
Locale: es-ES
Task: TTS
Format: WEBM
Size: 85.24 MB
Kathleen 1.0
License: CC0-1.0
Locale: en-US
Task: TTS
Format: FLAC
Size: 211.96 MB
Joe 1.0
License: CC0-1.0
Locale: en-US
Task: TTS
Format: WEBM
Size: 75.78 MB
Kerstin 1.0
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WEBM
Size: 132.05 MB
ReRooted: Speech Corpus of Testimonials from Armenian Refugees and Immigrants
License: GPL-3.0
Locale: hy
Task: ASR
Format: WAV, TEXTGRID
Size: 3.25 GB
Kaleem Magazine Urdu Corpus
License: CC-BY-NC-4.0
Locale: urd
Task: NLP
Format: TXT
Size: 2.74 MB
Baloch Publishers Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.04 MB
Chishti Sons Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.65 MB
FUB-Narratives
License: NOODL-1.0
Locale: fub
Task: NLP
Format: TXT
Size: 168.34 KB
Jazab Sindhi Newspaper Corpus
License: CC-BY-NC-SA-4.0
Locale: snd
Task: NLP
Format: TXT
Size: 2.33 MB
Tamir Sindhi News Corpus
License: CC-BY-NC-SA-4.0
Locale: snd
Task: NLP
Format: TXT
Size: 2.56 MB
Mediamen Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.82 MB
Speech Corpus of Armenian Question-Answer Dialogues
License: GPL-3.0
Locale: hy
Task: ASR
Format: WAV, TEXTGRID, TXT
Size: 2.10 GB