Datasets
NAWA-E-WATAN Balochi Newspaper Corpus
License: CC-BY-NC-4.0
Locale: bgn
Task: NLP
Format: TXT
Size: 1.43 MB
Gawri (گاؤری) Magazine Corpus
License: CC-BY-NC-4.0
Locale: gwc
Task: NLP
Format: TXT
Size: 146.71 KB
TTS Javanese - Ngapak Dialect
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 567.12 MB
Jember Javanese Spontaneous Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: jav
Task: ASR
Format: MP3, TSV
Size: 271.65 MB
Zacatlán Tepetzintla Nahuatl Transcriptions
License: CC-BY-ND-4.0
Locale: nhi
Task: ASR
Format: TRS
Size: 320.28 KB
Zacatlán Tepetzintla Nahuatl Audio
License: CC-BY-ND-4.0
Locale: nhi
Task: ASR
Format: WAV
Size: 50.19 GB
Bulu-TTS-Dataset 1.0
License: NOODL-1.0
Locale: bum
Task: TTS
Format: MP3, TSV
Size: 87.40 MB
TTS Sasak Language
License: CC-BY-SA-4.0
Locale: sas
Task: TTS
Format: WEBM, TSV
Size: 293.92 MB
Betawi TTS of Cultural Language (BEKAL)
License: CC-BY-SA-4.0
Locale: bew
Task: TTS
Format: WEBM, TSV
Size: 309.99 MB
Khmer ASR Cultural Dataset (V2)
License: CC-BY-SA-4.0
Locale: khm
Task: ASR
Format: WAV
Size: 35.86 GB
Taruen's Tatar Folklore Text Corpus
License: CC0-1.0
Locale: tt
Task: NLP
Format: TXT
Size: 1.40 MB
TTS-Tolaki
License: CC-BY-NC-SA-4.0
Locale: lbw
Task: TTS
Format: WEBM, TSV
Size: 249.04 MB
Mandar Spontaneous Speech
License: CC-BY-NC-4.0
Locale: mdr
Task: ASR
Format: MP3, TSV
Size: 534.45 MB
TTS Central Javanese
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 440.11 MB
TTS Javanese-Lumajang Dialect
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 684.32 MB
Adamawa Fulfulde-French Parallel Corpus of Narratives 1.2
License: NOODL-1.0
Locale: fub
Task: MT
Format: TSV
Size: 112.17 KB
Ewondo-TTS-Dataset
License: NOODL-1.0
Locale: ewo
Task: TTS
Format: MP3, TSV
Size: 152.70 MB
Bamun-French Parallel Corpus 1.1
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 99.78 KB
TTS Muna Dataset
License: CC-BY-NC-SA-4.0
Locale: mnb
Task: TTS
Format: WEBM & TSV
Size: 316.34 MB
Hawrami Kurdish TTS dataset 1.0
License: CC-BY-4.0
Locale: hac
Task: TTS
Format: WAV
Size: 706.11 MB
Common Voice 7.0 - Single Word Target Segment
License: CC0-1.0
Locale: mul
Task: ASR
Format: TSV, MP3
Size: 3.51 GB
Greek PhD Theses Corpus v1.0
License: CC-BY-NC-SA-4.0
Locale: gr-GR
Task: NLP
Format: JASONL
Size: 7.02 GB
openbook.gr v1.0
License: CC-BY-NC-SA-4.0
Locale: gr-GR
Task: NLP
Format: Markdown (.md)
Size: 251.63 MB
TidyVoiceX2_ASV
License: CC0-1.0
Locale: mul
Task: OTH
Format: WAV
Size: 23.11 GB