Datasets
Institute of African Digital Humanities
Bamun-French Parallel Corpus
This dataset is a parallel corpus of Bamun (Shupamem) to French texts. Text were obtained by transcription of raw audio files. Translation were added to enri...
Task: MT
Format: TSV
License: NOODL-1.0
Size: 99.24 KB
Created: 12/24/2025
Locale: bax
Pro Svizra Rumantscha
Surmiran Newspaper Corpus
2.9 million tokens in the Surmiran variety of Romansh from the daily newspaper “La Quotidiana”.
Task: OTH
Format: TSV
License: CC0-1.0
Size: 11.89 MB
Created: 12/22/2025
Locale: rm-surmiran
Maseno Centre for Applied Artificial Intelligence (MCAAI)
DhoNam: Dholuo Speech dataset
DhoNam: Dholuo Speech dataset is a speech corpus designed to supercharge Automatic Speech Recognition (ASR) and other speech technologies for Dholuo, one of ...
Task: ASR
Format: WEBM
License: NOODL-1.0
Size: 2.49 GB
Created: 12/20/2025
Locale: Luo
Amnesia
Archivo de la Comisionada María de los Ángeles Guzmán García (COTAI Nuevo León / InfoNL)
Este archivo preserva la memoria institucional y académica de la gestión de la Dra. María de los Ángeles Guzmán García como Comisionada de la Comisión de Tra...
Task: NLP
Format: ZIP, PDF, CSV, XLSX
License: CC-BY-4.0
Size: 866.15 MB
Created: 12/19/2025
Locale: es-MX
Common Voice
Common Voice Spontaneous Speech 2.0 - Kenyah
A collection of spontaneous spoken phrases in Kenyah.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 212.06 MB
Created: 12/5/2025
Locale: xkl
Common Voice
Common Voice Spontaneous Speech 2.0 - Ushojo
A collection of spontaneous spoken phrases in Ushojo.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 102.83 MB
Created: 12/5/2025
Locale: ush
Common Voice
Common Voice Spontaneous Speech 2.0 - Kuku
A collection of spontaneous spoken phrases in Kuku.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 233.85 MB
Created: 12/5/2025
Locale: ukv
Common Voice
Common Voice Spontaneous Speech 2.0 - Rutoro
A collection of spontaneous spoken phrases in Rutoro.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 272.63 MB
Created: 12/5/2025
Locale: ttj
Common Voice
Common Voice Spontaneous Speech 2.0 - Turkish
A collection of spontaneous spoken phrases in Turkish.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 4.20 MB
Created: 12/5/2025
Locale: tr
Common Voice
Common Voice Spontaneous Speech 2.0 - Papantla Totonac
A collection of spontaneous spoken phrases in Papantla Totonac.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 205.51 MB
Created: 12/5/2025
Locale: top
Common Voice
Common Voice Spontaneous Speech 2.0 - Toba Qom
A collection of spontaneous spoken phrases in Toba Qom.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 172.41 MB
Created: 12/5/2025
Locale: tob
Common Voice
Common Voice Spontaneous Speech 2.0 - Thai
A collection of spontaneous spoken phrases in Thai.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 87.66 KB
Created: 12/5/2025
Locale: th
Common Voice
Common Voice Spontaneous Speech 2.0 - snv
A collection of spontaneous spoken phrases in snv.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 212.72 MB
Created: 12/5/2025
Locale: snv
Common Voice
Common Voice Spontaneous Speech 2.0 - Shona
A collection of spontaneous spoken phrases in Shona.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 1.53 MB
Created: 12/5/2025
Locale: sn
Common Voice
Common Voice Spontaneous Speech 2.0 - Tashlhiyt
A collection of spontaneous spoken phrases in Tashlhiyt.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 6.50 MB
Created: 12/5/2025
Locale: shi
Common Voice
Common Voice Spontaneous Speech 2.0 - Sena
A collection of spontaneous spoken phrases in Sena.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 24.57 MB
Created: 12/5/2025
Locale: seh
Common Voice
Common Voice Spontaneous Speech 2.0 - Serian Bidayuh
A collection of spontaneous spoken phrases in Serian Bidayuh.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 199.91 MB
Created: 12/5/2025
Locale: sdo
Common Voice
Common Voice Spontaneous Speech 2.0 - Scots
A collection of spontaneous spoken phrases in Scots.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 227.79 MB
Created: 12/5/2025
Locale: sco
Common Voice
Common Voice Spontaneous Speech 2.0 - Amba
A collection of spontaneous spoken phrases in Amba.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 260.95 MB
Created: 12/5/2025
Locale: rwm
Common Voice
Common Voice Spontaneous Speech 2.0 - Ruuli
A collection of spontaneous spoken phrases in Ruuli.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 360.93 MB
Created: 12/5/2025
Locale: ruc
Common Voice
Common Voice Spontaneous Speech 2.0 - Russian
A collection of spontaneous spoken phrases in Russian.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 49.98 MB
Created: 12/5/2025
Locale: ru
Common Voice
Common Voice Spontaneous Speech 2.0 - Puno Quechua
A collection of spontaneous spoken phrases in Puno Quechua.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 178.68 MB
Created: 12/5/2025
Locale: qxp
Common Voice
Common Voice Spontaneous Speech 2.0 - Western Penan
A collection of spontaneous spoken phrases in Western Penan.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 247.12 MB
Created: 12/5/2025
Locale: pne
Common Voice
Common Voice Spontaneous Speech 2.0 - Sabah Malay
A collection of spontaneous spoken phrases in Sabah Malay.
Task: ASR
Format: MP3
License: CC0-1.0
Size: 275.80 MB
Created: 12/5/2025
Locale: msi
