Datasets
English–Punjabi (Shahmukhi) Parallel Sentences Corpus (Mediamen Archives)
License: CC-BY-NC-4.0
Locale: en-PK, pnb
Task: MT
Format: CSV
Size: 1.08 MB
HESEIA Sentence Bias Dataset
License: CC-BY-SA-4.0
Locale: es-AR
Task: OTH
Format: CSV
Size: 235.43 KB
RFE/RL Tatar-Bashkir News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: tt,ba,ru
Task: NLP
Format: TXT
Size: 102.44 MB
Luhya ASR data subset 70 hours
License: CC-BY-4.0
Locale: luy
Task: ASR
Format: WAV, XLSX
Size: 13.90 GB
Effect AI Scripted Speech 1.0 - English
License: CC0-1.0
Locale: en
Task: TTS
Format: CSV, MP3
Size: 663.45 MB
DataTrust Africa: Speech Corpus of Public Radio Recordings from Northern Uganda
License: NOODL-1.0
Locale: en-US
Task: NLP
Format: MP3
Size: 179.82 MB
Khmer ASR Cultural Dataset
License: CC-BY-SA-4.0
Locale: khm
Task: ASR
Format: WAV
Size: 12.59 GB
Corpus of Panjebar Semangat Javanese-Language Magazine
License: CC-BY-SA-4.0
Locale: Jav
Task: OTH
Format: TXT
Size: 4.31 MB
SI-NLI
License: CC-BY-NC-SA-4.0
Locale: sl
Task: NLU
Format: TSV
Size: 392.44 KB
Vallader Newspaper Corpus
License: CC0-1.0
Locale: rm-vallader
Task: OTH
Format: TSV
Size: 18.71 MB
Multilingual Religious Parallel Corpus (Kaleem Art Press)
License: CC-BY-SA-4.0
Locale: mul
Task: MT
Format: CSV
Size: 2.27 MB
Sindh Line Publishers
License: CC-BY-SA-4.0
Locale: snd
Task: NLP
Format: TXT
Size: 2.22 MB
Spoken-Congolese-French-Dataset
License: NOODL-1.0
Locale: fr-CG
Task: NLP
Format: MP3, WAV, TSV
Size: 3.44 GB
Ewondo_Mbida-Mbani_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: ewo
Task: NLP
Format: MP3, TSV
Size: 19.25 MB
Balochi Academy Text Corpus
License: CC-BY-NC-SA-4.0
Locale: bgn
Task: NLP
Format: TXT
Size: 1.88 MB
Mada Narratives
License: NOODL-1.0
Locale: mxu
Task: NLP
Format: TXT
Size: 65.04 KB
Bamun-French Parallel Corpus
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 99.24 KB
Surmiran Newspaper Corpus
License: CC0-1.0
Locale: rm-surmiran
Task: OTH
Format: TSV
Size: 11.89 MB
DhoNam: Dholuo Speech dataset
License: NOODL-1.0
Locale: Luo
Task: ASR
Format: WEBM
Size: 2.49 GB
Archivo de la Comisionada María de los Ángeles Guzmán García (COTAI Nuevo León / InfoNL)
License: CC-BY-4.0
Locale: es-MX
Task: NLP
Format: ZIP, PDF, CSV, XLSX
Size: 866.15 MB
Common Voice Spontaneous Speech 2.0 - Kenyah
License: CC0-1.0
Locale: xkl
Task: ASR
Format: MP3
Size: 212.06 MB
Common Voice Spontaneous Speech 2.0 - Ushojo
License: CC0-1.0
Locale: ush
Task: ASR
Format: MP3
Size: 102.83 MB
Common Voice Spontaneous Speech 2.0 - Kuku
License: CC0-1.0
Locale: ukv
Task: ASR
Format: MP3
Size: 233.85 MB
Common Voice Spontaneous Speech 2.0 - Rutoro
License: CC0-1.0
Locale: ttj
Task: ASR
Format: MP3
Size: 272.63 MB