Datasets
Future-proofing Gbagyi: A community centered approach
License: CC-BY-NC-SA-4.0
Locale: gbr
Task: NLP
Format: WAV
Size: 18.88 GB
Sindh Sujag Newspaper Corpus
License: CC-BY-4.0
Locale: snd
Task: NLP
Format: TXT
Size: 2.63 MB
Aim Foundation Dari Literature Corpus
License: CC-BY-NC-4.0
Locale: prs
Task: NLP
Format: TXT
Size: 1.74 MB
Rana Printers Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 3.00 MB
Anjuman-e-Katib Farsi/Persian Literature Corpus
License: CC-BY-NC-4.0
Locale: fas
Task: NLP
Format: TXT
Size: 2.82 MB
Kaleem Art Press Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 2.85 MB
Kaleem Art Press Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: OTH
Format: TXT
Size: 1.84 MB
Keblagh-e-Azergi Hazargi literature corpus
License: CC-BY-NC-4.0
Locale: haz
Task: NLP
Format: TXT
Size: 193.28 KB
Documenting Ekpeye Folktales and Preserving Cultural Heritage
License: CC-BY-NC-SA-4.0
Locale: ekp
Task: OTH
Format: MP4, TXT, DOCX
Size: 5.97 GB
Atyap Afwan_: Preserving Tyap Through Community-Driven Speech Data
License: CC-BY-NC-SA-4.0
Locale: kcg
Task: NLP
Format: WAV, TXT
Size: 251.51 MB
Basaa-ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: bas
Task: NLP
Format: MP3, TSV
Size: 14.66 MB
Mozilla Common Voice Spontaneous Speech ASR Shared Task Test Data
License: CC0-1.0
Locale: mul
Task: ASR
Format: MP3, TSV
Size: 784.80 MB
Everyday Interactions in Ibọnọ and Obolo Languages
License: CC-BY-NC-SA-4.0
Locale: ibn, ann
Task: NLP
Format: WAV, TXT
Size: 2.43 GB
TidyVoiceX_ASV
License: CC0-1.0
Locale: mul
Task: OTH
Format: WAV
Size: 36.72 GB
Ehugbo TTS: biblical text to speech dataset in Ehugbo Language
License: CC-BY-NC-SA-4.0
Locale: ig-ehugbo
Task: TTS
Format: WAV
Size: 437.69 MB
Speech Data Collection for The Nupe Language
License: CC-BY-NC-SA-4.0
Locale: nup
Task: NLP
Format: WAV, TXT
Size: 1.58 GB
KyrgyzNER: Human-Annotated NER Dataset for Kyrgyz
License: CC-BY-NC-SA-4.0
Locale: ky
Task: NLP
Format: CONLL-2003
Size: 585.87 KB
Putèr Newspaper Corpus
License: CC0-1.0
Locale: rm-puter
Task: OTH
Format: TSV
Size: 8.94 MB
Sutsilvan Newspaper Corpus
License: CC0-1.0
Locale: rm-sutsilv
Task: OTH
Format: TSV
Size: 8.87 MB
Sursilvan Newspaper Corpus
License: CC0-1.0
Locale: rm-sursilv
Task: OTH
Format: TSV
Size: 37.80 MB
Rumantsch Grischun Newspaper Corpus
License: CC0-1.0
Locale: rm-rumgr
Task: OTH
Format: TSV
Size: 19.03 MB
Podcast Homostoria (Indonesia)
License: CC-BY-SA-4.0
Locale: id
Task: ASR
Format: mp3
Size: 302.97 MB
Imre 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: WEBM
Size: 99.60 MB
Berta 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: FLAC
Size: 209.52 MB