Datasets
Sentence translation difficulty in English - BOUQuET
License: CC-BY-SA-4.0
Locale: en
Task: NLP
Format: TSV
Size: 85.61 KB
Bamun-TTS-Dataset
License: NOODL-1.0
Locale: bax
Task: TTS
Format: MP3, TSV
Size: 219.97 MB
Territórios Digitais
License: CC-BY-4.0
Locale: pt, en
Task: N/A
Format: DOCX, PDF, XLSX
Size: 4.24 MB
Chuvash TTS
License: CC-BY-SA-4.0
Locale: cv
Task: TTS
Format: PARQUET
Size: 854.02 MB
RFE/RL Persian News Text Corpus
License: CC-BY-NC-SA-4.0
Locale: fa
Task: NLP
Format: TXT
Size: 307.78 MB
Saraiki 10 Hours TTS Dataset
License: CC-BY-NC-SA-4.0
Locale: srk
Task: TTS
Format: WEBM, TSV
Size: 584.44 MB
Kannada Time Aligned Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: kan
Task: ASR
Format: OGG, SRT
Size: 355.77 MB
Sentence translation difficulty in Spanish - BOUQuET
License: CC-BY-SA-4.0
Locale: es
Task: MT
Format: TSV
Size: 81.48 KB
Yezoum_ALCAM-MultimodalDataset
License: NOODL-1.0
Locale: ewo
Task: NLP
Format: MP3, TSV
Size: 12.81 MB
Common Voice Spontaneous Speech 3.0 - Serian Bidayuh
License: CC0-1.0
Locale: sdo
Task: ASR
Format: MP3
Size: 201.26 MB
Common Voice Scripted Speech 25.0 - Pashto
License: CC0-1.0
Locale: ps
Task: ASR
Format: MP3
Size: 97.81 GB
Common Voice Scripted Speech 25.0 - English
License: CC0-1.0
Locale: en
Task: ASR
Format: MP3
Size: 87.84 GB
Common Voice Scripted Speech 25.0 - Catalan
License: CC0-1.0
Locale: ca
Task: ASR
Format: MP3
Size: 78.67 GB
Bamun-French Parallel Corpus 2.0
License: NOODL-1.0
Locale: bax
Task: MT
Format: TSV
Size: 184.29 KB
Common Voice Scripted Speech 25.0 - Kinyarwanda
License: CC0-1.0
Locale: rw
Task: ASR
Format: MP3
Size: 57.18 GB
Common Voice Scripted Speech 25.0 - French
License: CC0-1.0
Locale: fr
Task: ASR
Format: MP3
Size: 28.39 GB
Common Voice Scripted Speech 25.0 - Spanish
License: CC0-1.0
Locale: es
Task: ASR
Format: MP3
Size: 48.23 GB
Araina Text Corpus (Occitan Aranese)
License: CC0-1.0
Locale: oc
Task: LM
Format: txt
Size: 22.97 MB
Common Voice Scripted Speech 25.0 - Belarusian
License: CC0-1.0
Locale: be
Task: ASR
Format: MP3
Size: 36.21 GB
Corpus de llenguatge ofensiu en català
License: CC-BY-SA-4.0
Locale: ca
Task: NLP
Format: TSV
Size: 57.35 KB
Common Voice Scripted Speech 25.0 - German
License: CC0-1.0
Locale: de
Task: ASR
Format: MP3
Size: 34.69 GB
Common Voice Scripted Speech 25.0 - Esperanto
License: CC0-1.0
Locale: eo
Task: ASR
Format: MP3
Size: 39.00 GB
Oro_Word
License: CC0-1.0
Locale: om
Task: TTS
Format: .WAV, CSV
Size: 1.28 MB
INEL Kalmyk Speech Corpus
License: CC-BY-NC-SA-4.0
Locale: xal
Task: ASR
Format: TSV, MP3
Size: 138.31 MB