Datasets
Gojri Literature Corpus
License: CC-BY-NC-4.0
Locale: gju
Task: NLP
Format: TXT
Size: 117.97 KB
Common Voice Scripted Speech 24.0 - Votic
License: CC0-1.0
Locale: vot
Task: ASR
Format: MP3
Size: 7.81 MB
Chishti Sons Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.65 MB
Gosia 1.0
License: CC0-1.0
Locale: pl-PL
Task: TTS
Format: WEBM
Size: 39.75 MB
Baloch Publishers Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.04 MB
Common Voice Scripted Speech 24.0 - Tshivenda
License: CC0-1.0
Locale: ve
Task: ASR
Format: MP3
Size: 1.15 MB
Kaleem Magazine Urdu Corpus
License: CC-BY-NC-4.0
Locale: urd
Task: NLP
Format: TXT
Size: 2.74 MB
Common Voice Scripted Speech 24.0 - Kihemba
License: CC0-1.0
Locale: hem
Task: ASR
Format: MP3
Size: 201.53 MB
Kaleem Art Press Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 2.85 MB
Common Voice Scripted Speech 24.0 - Rukai
License: CC0-1.0
Locale: dru
Task: ASR
Format: MP3
Size: 213.60 MB
Rana Printers Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 3.00 MB
Common Voice Scripted Speech 24.0 - Moksha
License: CC0-1.0
Locale: mdf
Task: ASR
Format: MP3
Size: 10.54 MB
Kaleem Art Press Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: OTH
Format: TXT
Size: 1.84 MB
Common Voice Scripted Speech 24.0 - Tunen
License: CC0-1.0
Locale: tvu
Task: ASR
Format: MP3
Size: 195.38 MB
Anjuman-e-Katib Farsi/Persian Literature Corpus
License: CC-BY-NC-4.0
Locale: fas
Task: NLP
Format: TXT
Size: 2.82 MB
Tugão 1.0
License: CC0-1.0
Locale: pt-PT
Task: TTS
Format: WEBM
Size: 61.84 MB
Mediamen Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.82 MB
Common Voice Scripted Speech 24.0 - Dholuo
License: CC0-1.0
Locale: luo
Task: ASR
Format: MP3
Size: 2.23 GB
Multilingual Religious Parallel Corpus (Kaleem Art Press)
License: CC-BY-SA-4.0
Locale: mul
Task: MT
Format: CSV
Size: 2.27 MB
Common Voice Scripted Speech 24.0 - Bateri
License: CC0-1.0
Locale: btv
Task: ASR
Format: MP3
Size: 205.82 MB
Keblagh-e-Azergi Hazargi literature corpus
License: CC-BY-NC-4.0
Locale: haz
Task: NLP
Format: TXT
Size: 193.28 KB
Common Voice Scripted Speech 24.0 - Nüpode Huitoto
License: CC0-1.0
Locale: hux
Task: ASR
Format: MP3
Size: 229.65 MB
Aim Foundation Dari Literature Corpus
License: CC-BY-NC-4.0
Locale: prs
Task: NLP
Format: TXT
Size: 1.74 MB
Common Voice Scripted Speech 24.0 - Chinese (China)
License: CC0-1.0
Locale: zh-CN
Task: ASR
Format: MP3
Size: 21.31 GB