Datasets
Common Voice Scripted Speech 24.0 - Thai
License: CC0-1.0
Locale: th
Task: ASR
Format: MP3
Size: 8.35 GB
Common Voice Spontaneous Speech 2.0 - Thai
License: CC0-1.0
Locale: th
Task: ASR
Format: MP3
Size: 87.66 KB
Common Voice Scripted Speech 24.0 - Lao
License: CC0-1.0
Locale: lo
Task: ASR
Format: MP3
Size: 8.91 MB
Common Voice Scripted Speech 24.0 - Tamil
License: CC0-1.0
Locale: ta
Task: ASR
Format: MP3
Size: 8.56 GB
Common Voice Scripted Speech 24.0 - Chinese (Taiwan)
License: CC0-1.0
Locale: zh-TW
Task: ASR
Format: MP3
Size: 2.93 GB
Chitwan 1.0
License: CC0-1.0
Locale: ne-NE
Task: TTS
Format: WEBM
Size: 61.68 MB
Common Voice Scripted Speech 24.0 - Taiwanese (Minnan)
License: CC0-1.0
Locale: nan-tw
Task: ASR
Format: MP3
Size: 462.92 MB
Khmer ASR Cultural Dataset (V2)
License: CC-BY-SA-4.0
Locale: khm
Task: ASR
Format: WAV
Size: 35.86 GB
Common Voice Scripted Speech 24.0 - Vietnamese
License: CC0-1.0
Locale: vi
Task: ASR
Format: MP3
Size: 427.03 MB
Common Voice Scripted Speech 24.0 - Atayal
License: CC0-1.0
Locale: tay
Task: ASR
Format: MP3
Size: 248.14 MB
Common Voice Scripted Speech 24.0 - Hakha Chin
License: CC0-1.0
Locale: cnh
Task: ASR
Format: MP3
Size: 160.39 MB
English–Punjabi (Shahmukhi) Parallel Sentences Corpus (Mediamen Archives)
License: CC-BY-NC-4.0
Locale: en-PK, pnb
Task: MT
Format: CSV
Size: 1.08 MB
Common Voice Scripted Speech 24.0 - Punjabi
License: CC0-1.0
Locale: pa-IN
Task: ASR
Format: MP3
Size: 110.84 MB
Common Voice Scripted Speech 24.0 - Tatar
License: CC0-1.0
Locale: tt
Task: ASR
Format: MP3
Size: 825.25 MB
Common Voice Scripted Speech 24.0 - Bengali
License: CC0-1.0
Locale: bn
Task: ASR
Format: MP3
Size: 24.75 GB
Common Voice Spontaneous Speech 2.0 - Tashlhiyt
License: CC0-1.0
Locale: shi
Task: ASR
Format: MP3
Size: 6.50 MB
Common Voice Scripted Speech 24.0 - Telugu
License: CC0-1.0
Locale: te
Task: ASR
Format: MP3
Size: 58.46 MB
Khmer ASR Cultural Dataset
License: CC-BY-SA-4.0
Locale: khm
Task: ASR
Format: WAV
Size: 12.59 GB
Common Voice Scripted Speech 24.0 - Tajik
License: CC0-1.0
Locale: tg
Task: ASR
Format: MP3
Size: 17.34 MB
Common Voice Scripted Speech 24.0 - Seri
License: CC0-1.0
Locale: sei
Task: ASR
Format: MP3
Size: 208.50 MB
Common Voice Scripted Speech 24.0 - Cantonese
License: CC0-1.0
Locale: yue
Task: ASR
Format: MP3
Size: 5.98 GB
Common Voice Scripted Speech 24.0 - Assamese
License: CC0-1.0
Locale: as
Task: ASR
Format: MP3
Size: 160.08 MB
Common Voice Scripted Speech 24.0 - Dhatki
License: CC0-1.0
Locale: mki
Task: ASR
Format: MP3
Size: 187.48 MB
Common Voice Scripted Speech 24.0 - Tupuri
License: CC0-1.0
Locale: tui
Task: ASR
Format: MP3
Size: 236.84 MB