Datasets
Kokoro Speech Dataset
License: libribox
Locale: ja
Task: TTS
Format: FLAC
Size: 3.98 GB
Sundanese TTS
License: CC-BY-SA-4.0
Locale: sun
Task: TTS
Format: WEBM, TSV
Size: 298.10 MB
Bangor Miami Spanish-English Corpus
License: GPL-3.0
Locale: es-US, en-US
Task: ASR
Format: MP3, CHA, TSV
Size: 1.12 GB
Elkhani Hazargi Literature Corpus
License: CC-BY-NC-4.0
Locale: haz
Task: NLP
Format: TXT
Size: 2.46 MB
Dari Literature Corpus by Anjuman e Adabi Nayestan
License: CC-BY-NC-4.0
Locale: prs
Task: NLP
Format: TXT
Size: 12.67 MB
IBT Torwali Wordlist
License: CC-BY-SA-4.0
Locale: trw
Task: NLP
Format: CSV
Size: 312.87 KB
Bangor Siarad Welsh-English Corpus
License: GPL-3.0
Locale: cym
Task: ASR
Format: MP3, CHA. TSV
Size: 2.13 GB
Bangor Patagonia Welsh-Spanish Corpus
License: GPL-3.0
Locale: cym, spa
Task: ASR
Format: MP3, CHA, TSV
Size: 988.02 MB
Saraiki-English Parallel Corpus
License: CC-BY-NC-4.0
Locale: mul
Task: MT
Format: CSV
Size: 1.92 MB
Jhoke Publisher Multan’s Saraiki Newspaper Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.30 MB
Mada-French Parallel Corpus 1.0
License: NOODL-1.0
Locale: mxu
Task: TTS
Format: TSV
Size: 122.37 KB
Javanese TTS of Banyumasan Dialect
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 559.08 MB
Finnish Public Domain 20th Century Literature Text Corpus
License: CC0-1.0
Locale: fi, sv
Task: NLP
Format: TXT
Size: 205.76 MB
Thorsten-Voice-44kHz-Full
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,PARQUET
Size: 7.99 GB
Thorsten-Voice Dataset 2023.09 Hessisch
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 255.96 MB
Thorsten-Voice Dataset 2022.10
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 1.30 GB
Thorsten-Voice Dataset 2021.06 Emotional
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 380.80 MB
Daily Expressions in Highland Puebla Nahuatl
License: CC-BY-SA-4.0
Locale: azz
Task: NLP
Format: TSV
Size: 22.00 KB
Cuentos en Mam leídos en voz alta
License: CC-BY-SA-4.0
Locale: mam
Task: ASR
Format: MP3, TSV
Size: 110.28 MB
Cuentos en Kʼicheʼ leídos en voz alta
License: CC-BY-SA-4.0
Locale: quc
Task: ASR
Format: MP3. TSV
Size: 152.62 MB
CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes
License: CC-BY-NC-SA-4.0
Locale: cy
Task: NLP
Format: TXT, TSV
Size: 147.89 MB
Finance Sentences - North American Spanish
License: CC0-1.0
Locale: es-US
Task: NLP
Format: TSV, JSON
Size: 18.35 MB
Thorsten-Voice Dataset 2021.02
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV, CSV
Size: 2.55 GB
Persian VOA Corpus 2003-2008
License: Unlicense
Locale: fa
Task: NLP
Format: TXT
Size: 17.16 MB