Datasets
smoltalk-chinese
License: Apache-2.0
Locale: zh
Task: LLM
Format: parquet
Size: 879.81 MB
Ewondo_Fong_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: ewo
Task: NLP
Format: MP3, TSV
Size: 16.80 MB
Common Voice Scripted Speech 24.0 - Loja Highland Kichwa
License: CC0-1.0
Locale: qvj
Task: ASR
Format: MP3
Size: 221.72 MB
Gawri (گاؤری) Magazine Corpus
License: CC-BY-NC-4.0
Locale: gwc
Task: NLP
Format: TXT
Size: 146.71 KB
Common Voice Scripted Speech 24.0 - Losso
License: CC0-1.0
Locale: nmz
Task: ASR
Format: MP3
Size: 205.70 MB
English–Punjabi (Shahmukhi) Parallel Sentences Corpus (Mediamen Archives)
License: CC-BY-NC-4.0
Locale: en-PK, pnb
Task: MT
Format: CSV
Size: 1.08 MB
Common Voice Scripted Speech 24.0 - Xitsonga
License: CC0-1.0
Locale: ts
Task: ASR
Format: MP3
Size: 1016.43 KB
Adamawa Fulfulde-French Parallel Corpus of Narratives 1.2
License: NOODL-1.0
Locale: fub
Task: MT
Format: TSV
Size: 112.17 KB
Common Voice Scripted Speech 24.0 - Cantonese
License: CC0-1.0
Locale: yue
Task: ASR
Format: MP3
Size: 5.98 GB
Compar:IA conversations
License: Etalab 2.0
Locale: fr
Task: NLG
Format: PARQUET
Size: 1.81 GB
Common Voice Scripted Speech 24.0 - Kom
License: CC0-1.0
Locale: bkm
Task: ASR
Format: MP3
Size: 253.86 MB
Jember Javanese Spontaneous Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: jav
Task: ASR
Format: MP3, TSV
Size: 271.65 MB
Mozilla Common Voice Spontaneous Speech ASR Shared Task Train/Dev Data
License: CC0-1.0
Locale: mul
Task: ASR
Format: mp3
Size: 4.30 GB
Western Balochi Literature Cropus
License: CC-BY-NC-4.0
Locale: bgn
Task: NLP
Format: TXT
Size: 2.26 MB
Common Voice Scripted Speech 24.0 - Paiwan
License: CC0-1.0
Locale: pwn
Task: ASR
Format: MP3
Size: 280.68 MB
Common Voice 7.0 - Single Word Target Segment
License: CC0-1.0
Locale: mul
Task: ASR
Format: TSV, MP3
Size: 3.51 GB
Brahui Research Work Corpus
License: CC-BY-NC-SA-4.0
Locale: brh
Task: NLP
Format: TXT
Size: 1.13 MB
Common Voice Scripted Speech 24.0 - Mokpwe
License: CC0-1.0
Locale: bri
Task: ASR
Format: MP3
Size: 188.52 MB
Kohistani Shina Word List
License: CC-BY-NC-4.0
Locale: plk
Task: NLP
Format: TXT
Size: 394.05 KB
Common Voice Scripted Speech 24.0 - Toki Pona
License: CC0-1.0
Locale: tok
Task: ASR
Format: MP3
Size: 464.92 MB
Khowar Word List
License: CC-BY-NC-4.0
Locale: khw
Task: NLP
Format: TXT
Size: 64.22 KB
Thorsten-Voice Dataset 2021.06 Emotional
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 380.80 MB
Khowar Literature Corpus by FLI
License: CC-BY-NC-4.0
Locale: khw
Task: NLP
Format: TXT
Size: 244.85 KB
ReRooted: Speech Corpus of Testimonials from Armenian Refugees and Immigrants
License: GPL-3.0
Locale: hy
Task: ASR
Format: WAV, TEXTGRID
Size: 3.25 GB