Datasets
Finnish Public Domain 20th Century Literature Text Corpus
License: CC0-1.0
Locale: fi, sv
Task: NLP
Format: TXT
Size: 205.76 MB
Saraiki-English Parallel Corpus
License: CC-BY-NC-4.0
Locale: mul
Task: MT
Format: CSV
Size: 1.92 MB
English–Punjabi (Shahmukhi) Parallel Sentences Corpus (Mediamen Archives)
License: CC-BY-NC-4.0
Locale: en-PK, pnb
Task: MT
Format: CSV
Size: 1.08 MB
ddd-kenya-somali-68hrs-asr-part1
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 7.68 GB
Common Voice Spontaneous Speech 3.0 - Shona
License: CC0-1.0
Locale: sn
Task: ASR
Format: MP3
Size: 1.53 MB
Lili 1.0
License: CC0-1.0
Locale: sk-SK
Task: TTS
Format: WEBM
Size: 72.38 MB
Kaleem Art Press Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: OTH
Format: TXT
Size: 1.84 MB
Common Voice Spontaneous Speech 3.0 - Esperanto
License: CC0-1.0
Locale: eo
Task: ASR
Format: MP3
Size: 12.51 MB
ddd-kenya-somali-68hrs-asr-part3
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 1.33 GB
ddd-kenya-somali-68hrs-asr-part2
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 8.07 GB
Baloch Publishers Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.04 MB
Common Voice Spontaneous Speech 3.0 - Ruuli
License: CC0-1.0
Locale: ruc
Task: ASR
Format: MP3
Size: 365.95 MB
Common Voice Spontaneous Speech 3.0 - Sinhala
License: CC0-1.0
Locale: si
Task: ASR
Format: MP3
Size: 2.52 MB
Kohistani Shina Word List
License: CC-BY-NC-4.0
Locale: plk
Task: NLP
Format: TXT
Size: 394.05 KB
Finance Sentences - North American Spanish
License: CC0-1.0
Locale: es-US
Task: NLP
Format: TSV, JSON
Size: 18.35 MB
Common Voice Spontaneous Speech 3.0 - Frisian
License: CC0-1.0
Locale: fy-NL
Task: ASR
Format: MP3
Size: 323.25 KB
Dmitri 1.0
License: CC0-1.0
Locale: ru-RU
Task: TTS
Format: WEBM
Size: 96.63 MB
Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: OTH
Format: TXT
Size: 1.84 MB
Anna 1.0
License: CC0-1.0
Locale: hu-HU
Task: TTS
Format: WEBM
Size: 95.27 MB
Multilingual Religious Parallel Corpus (Kaleem Art Press)
License: CC-BY-SA-4.0
Locale: mul
Task: MT
Format: CSV
Size: 2.27 MB
Common Voice Spontaneous Speech 3.0 - Kenyah
License: CC0-1.0
Locale: xkl
Task: ASR
Format: MP3
Size: 212.73 MB
Bamun-French Parallel Corpus 1.1
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 99.78 KB
Kerstin 1.0
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WEBM
Size: 132.05 MB
Mihai 1.0
License: CC0-1.0
Locale: ro-RO
Task: TTS
Format: WEBM
Size: 66.31 MB