Datasets
Ewondo_Fong_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: ewo
Task: NLP
Format: MP3, TSV
Size: 16.80 MB
Luhya ASR data subset 70 hours
License: CC-BY-4.0
Locale: luy
Task: ASR
Format: WAV, XLSX
Size: 13.90 GB
English–Punjabi (Shahmukhi) Parallel Sentences Corpus (Mediamen Archives)
License: CC-BY-NC-4.0
Locale: en-PK, pnb
Task: MT
Format: CSV
Size: 1.08 MB
Compar:IA conversations
License: Etalab 2.0
Locale: fr
Task: NLG
Format: PARQUET
Size: 1.81 GB
Adamawa Fulfulde - French Parallel Corpus of Narratives 1.0
License: NOODL-1.0
Locale: fub
Task: MT
Format: TSV
Size: 112.50 KB
smoltalk-chinese
License: Apache-2.0
Locale: zh
Task: LLM
Format: parquet
Size: 879.81 MB
Mediamen Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.82 MB
Chishti Sons Punjabi Literature Corpus
License: CC-BY-NC-4.0
Locale: pnb
Task: NLP
Format: TXT
Size: 1.65 MB
Baloch Publishers Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.04 MB
Kaleem Magazine Urdu Corpus
License: CC-BY-NC-4.0
Locale: urd
Task: NLP
Format: TXT
Size: 2.74 MB
Kaleem Art Press Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 2.85 MB
Rana Printers Urdu Literature Corpus
License: CC-BY-NC-4.0
Locale: ur
Task: OTH
Format: TXT
Size: 3.00 MB
Multilingual Religious Parallel Corpus (Kaleem Art Press)
License: CC-BY-SA-4.0
Locale: mul
Task: MT
Format: CSV
Size: 2.27 MB
Keblagh-e-Azergi Hazargi literature corpus
License: CC-BY-NC-4.0
Locale: haz
Task: NLP
Format: TXT
Size: 193.28 KB
Kaleem Art Press Saraiki Literature Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: OTH
Format: TXT
Size: 1.84 MB
Anjuman-e-Katib Farsi/Persian Literature Corpus
License: CC-BY-NC-4.0
Locale: fas
Task: NLP
Format: TXT
Size: 2.82 MB
Aim Foundation Dari Literature Corpus
License: CC-BY-NC-4.0
Locale: prs
Task: NLP
Format: TXT
Size: 1.74 MB