Datasets
Lingala-TTS-Dataset
License: NOODL-1.0
Locale: lin
Task: TTS
Format: WAV, TSV
Size: 962.04 MB
Polish Public Domain 20th Century Literature Text Corpus
License: CC0-1.0
Locale: pl
Task: NLP
Format: TXT
Size: 10.86 MB
Dolgan Folklore Text Corpus
License: CC0-1.0
Locale: dlg
Task: NLP
Format: TXT
Size: 57.16 KB
GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian
License: CC-BY-NC-SA-4.0
Locale: ka
Task: LLM
Format: JSON
Size: 15.14 KB
Bojonegoro Javanese TTS
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: .tar.gz, WEBM
Size: 469.50 MB
ATLAS Cross-Lingual Transfer Matrix
License: Apache-2.0
Locale: en-US
Task: NLP
Format: CSV
Size: 2.36 KB
Zacatlán Tepetzintla Nahuatl ASR Dataset
License: CC-BY-ND-4.0
Locale: nhi
Task: ASR
Format: FLAC, TSV
Size: 789.98 MB
Kyrgyz Folklore Text Corpus
License: CC0-1.0
Locale: ky
Task: NLP
Format: TXT
Size: 1.28 MB
Finweb-Edu-Chinese-v2.2
License: Apache-2.0
Locale: zh
Task: LLM
Format: parquet
Size: 624.68 MB
Manggarai Language for NLP
License: CC-BY-NC-SA-4.0
Locale: mqy
Task: TTS
Format: WEBM, TSV
Size: 287.61 MB
World Factbook (JSON)
License: CC0-1.0
Locale: en
Task: NLP
Format: JSON
Size: 7.10 MB
Eastern Balochi Literature Corpus
License: CC-BY-NC-4.0
Locale: bgp
Task: NLP
Format: TXT
Size: 949.67 KB
ABC-Draco
License: Onshape
Locale: en-US
Task: CV
Format: GLTF with Draco compression
Size: 43.32 GB
Trabajo de Campo - Huave
License: CC-BY-4.0
Locale: huv
Task: ASR
Format: MP3, TSV
Size: 538.25 MB
Gojri Literature Corpus
License: CC-BY-NC-4.0
Locale: gju
Task: NLP
Format: TXT
Size: 117.97 KB
Khowar Literature Corpus by FLI
License: CC-BY-NC-4.0
Locale: khw
Task: NLP
Format: TXT
Size: 244.85 KB
Khowar Word List
License: CC-BY-NC-4.0
Locale: khw
Task: NLP
Format: TXT
Size: 64.22 KB
Kohistani Shina Word List
License: CC-BY-NC-4.0
Locale: plk
Task: NLP
Format: TXT
Size: 394.05 KB
Brahui Research Work Corpus
License: CC-BY-NC-SA-4.0
Locale: brh
Task: NLP
Format: TXT
Size: 1.13 MB
Talar (تلار) Barahui Magazine Corpus
License: CC-BY-NC-SA-4.0
Locale: brh
Task: NLP
Format: TXT
Size: 317.22 KB
Western Balochi Literature Cropus
License: CC-BY-NC-4.0
Locale: bgn
Task: NLP
Format: TXT
Size: 2.26 MB
NAWA-E-WATAN Balochi Newspaper Corpus
License: CC-BY-NC-4.0
Locale: bgn
Task: NLP
Format: TXT
Size: 1.43 MB
Gawri (گاؤری) Magazine Corpus
License: CC-BY-NC-4.0
Locale: gwc
Task: NLP
Format: TXT
Size: 146.71 KB
TTS Javanese - Ngapak Dialect
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 567.12 MB