Datasets
Bamun-French Parallel Corpus 2.0
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 184.29 KB
Common Voice Scripted Speech 25.0 - Kinyarwanda
License: CC0-1.0
Locale: rw
Task: ASR
Format: MP3
Size: 57.18 GB
Common Voice Scripted Speech 25.0 - French
License: CC0-1.0
Locale: fr
Task: ASR
Format: MP3
Size: 28.39 GB
Common Voice Scripted Speech 25.0 - Spanish
License: CC0-1.0
Locale: es
Task: ASR
Format: MP3
Size: 48.23 GB
Araina Text Corpus (Occitan Aranese)
License: CC0-1.0
Locale: oc
Task: LM
Format: txt
Size: 22.97 MB
Common Voice Scripted Speech 25.0 - Belarusian
License: CC0-1.0
Locale: be
Task: ASR
Format: MP3
Size: 36.21 GB
Corpus de llenguatge ofensiu en català
License: CC-BY-SA-4.0
Locale: ca
Task: NLP
Format: TSV
Size: 57.35 KB
Common Voice Scripted Speech 25.0 - German
License: CC0-1.0
Locale: de
Task: ASR
Format: MP3
Size: 34.69 GB
Common Voice Scripted Speech 25.0 - Esperanto
License: CC0-1.0
Locale: eo
Task: ASR
Format: MP3
Size: 39.00 GB
Oro_Word
License: CC0-1.0
Locale: om
Task: TTS
Format: .WAV, CSV
Size: 1.28 MB
INEL Kalmyk Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: xal
Task: ASR
Format: TSV, MP3
Size: 138.31 MB
INEL Nganasan Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: nio
Task: ASR
Format: TSV, MP3
Size: 1.29 GB
INEL Evenki Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: evn
Task: ASR
Format: TSV, MP3
Size: 103.03 MB
INEL Dolgan Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: dlg
Task: ASR
Format: TSV, MP3
Size: 583.34 MB
INEL Kamas Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: xas
Task: ASR
Format: TSV, MP3
Size: 376.64 MB
INEL Selkup Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: sel
Task: ASR
Format: TSV, MP3
Size: 45.46 MB
INEL Enets Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: enf, enh
Task: ASR
Format: TSV, MP3
Size: 140.56 MB
INEL Nenets Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: yrk
Task: ASR
Format: TSV, MP3
Size: 8.35 MB
Common Voice Scripted Speech 25.0 - Bengali
License: CC0-1.0
Locale: bn
Task: ASR
Format: MP3
Size: 24.84 GB
Common Voice Scripted Speech 25.0 - Chinese (China)
License: CC0-1.0
Locale: zh-CN
Task: ASR
Format: MP3
Size: 21.38 GB
English Hausa Parallel Corpus
License: CC-BY-NC-4.0
Locale: eng, hau
Task: MT
Format: csv
Size: 164.32 KB
Persian Literature Corpus by Najwai Sukhan
License: CC-BY-NC-4.0
Locale: fas
Task: NLP
Format: TXT
Size: 38.62 MB
Heroes English-Spanish Dubbed Movie Speech Corpus
License: CC-BY-SA-4.0
Locale: eng, spa
Task: NLP
Format: wav, csv, txt
Size: 1.68 GB
Common Voice Scripted Speech 25.0 - Swahili
License: CC0-1.0
Locale: sw
Task: ASR
Format: MP3
Size: 20.87 GB