Datasets

Filters:
Search results for “Thai”
Common Voice

Common Voice Scripted Speech 24.0 - Paiwan

A collection of scripted spoken phrases in Paiwan.
License Icon

License: CC0-1.0

Locale Icon

Locale: pwn

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 280.68 MB

Open Home Foundation

Tugão 1.0

Text to speech dataset for Portuguese, male speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: pt-PT

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 61.84 MB

Community

TTS Sasak Language

TTS dataset that uses everyday Sasak language in informal contexts with various topics.
License Icon

License: CC-BY-SA-4.0

Locale Icon

Locale: sas

Task Icon

Task: TTS

Format Icon

Format: WEBM, TSV

Size Icon

Size: 293.92 MB

Open Home Foundation

Anna 1.0

Text to speech dataset for Hungarian, female speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: hu-HU

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 95.27 MB

Common Voice

Common Voice Scripted Speech 24.0 - Nepali

A collection of scripted spoken phrases in Nepali.
License Icon

License: CC0-1.0

Locale Icon

Locale: ne-NP

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 38.98 MB

Common Voice

Common Voice Scripted Speech 24.0 - Marathi

A collection of scripted spoken phrases in Marathi.
License Icon

License: CC0-1.0

Locale Icon

Locale: mr

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 559.35 MB

Community

TTS Balinese Language

This TTS dataset contains Balinese language used in daily activities.
License Icon

License: CC-BY-SA-4.0

Locale Icon

Locale: ban

Task Icon

Task: TTS

Format Icon

Format: WEBM, TSV

Size Icon

Size: 301.05 MB

Common Voice

Common Voice Scripted Speech 24.0 - Brahui

A collection of scripted spoken phrases in Brahui.
License Icon

License: CC0-1.0

Locale Icon

Locale: brh

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 205.87 MB

Common Voice

Common Voice Scripted Speech 24.0 - Hindi

A collection of scripted spoken phrases in Hindi.
License Icon

License: CC0-1.0

Locale Icon

Locale: hi

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 524.28 MB

Common Voice

Common Voice Scripted Speech 24.0 - Mongolian

A collection of scripted spoken phrases in Mongolian.
License Icon

License: CC0-1.0

Locale Icon

Locale: mn

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 2.87 GB

Common Voice

Common Voice Scripted Speech 24.0 - Mundang

A collection of scripted spoken phrases in Mundang.
License Icon

License: CC0-1.0

Locale Icon

Locale: mua

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 207.88 MB

Common Voice

Common Voice Scripted Speech 24.0 - Japanese

A collection of scripted spoken phrases in Japanese.
License Icon

License: CC0-1.0

Locale Icon

Locale: ja

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 13.95 GB

Common Voice

Common Voice Scripted Speech 24.0 - Malayalam

A collection of scripted spoken phrases in Malayalam.
License Icon

License: CC0-1.0

Locale Icon

Locale: ml

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 220.23 MB

Common Voice

Common Voice Scripted Speech 24.0 - Malay

A collection of scripted spoken phrases in Malay.
License Icon

License: CC0-1.0

Locale Icon

Locale: ms

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 72.48 MB

Open Home Foundation

Lili 1.0

Text to speech dataset for Slovak, female speaker, approximately 2 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: sk-SK

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 72.38 MB

Collaborative Action For Research & Development (CARD)

IBT Torwali Wordlist

The IBT Torwali Wordlist contains approximately 20,000 unique entries in Torwali (ISO 639-3: trw), an under-documented Indo-Aryan language spoken in northern Pakistan. The dataset comprises standardized lexical entries covering core vocabulary, function words, and culturally salient terms, with consistent orthography and normalization suitable for linguistic and computational use. Entries are aligned with English and Urdu glosses, and include part-of-speech tag.
License Icon

License: CC-BY-SA-4.0

Locale Icon

Locale: trw

Task Icon

Task: NLP

Format Icon

Format: CSV

Size Icon

Size: 312.87 KB

Common Voice

Common Voice Scripted Speech 24.0 - Nuasue

A collection of scripted spoken phrases in Nuasue.
License Icon

License: CC0-1.0

Locale Icon

Locale: yav

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 266.34 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Heng Hua

A collection of spontaneous spoken phrases in Heng Hua.
License Icon

License: CC0-1.0

Locale Icon

Locale: cpx

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 217.12 MB

Common Voice

Common Voice Scripted Speech 24.0 - Tunen

A collection of scripted spoken phrases in Tunen.
License Icon

License: CC0-1.0

Locale Icon

Locale: tvu

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 195.38 MB

Common Voice

Common Voice Scripted Speech 24.0 - Chinese (China)

A collection of scripted spoken phrases in Chinese (China).
License Icon

License: CC0-1.0

Locale Icon

Locale: zh-CN

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 21.31 GB

Common Voice

Common Voice Scripted Speech 24.0 - Pashto

A collection of scripted spoken phrases in Pashto.
License Icon

License: CC0-1.0

Locale Icon

Locale: ps

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 49.98 GB

Common Voice

Common Voice Scripted Speech 24.0 - Basaa

A collection of scripted spoken phrases in Basaa.
License Icon

License: CC0-1.0

Locale Icon

Locale: bas

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 242.77 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Betawi

A collection of spontaneous spoken phrases in Betawi.
License Icon

License: CC0-1.0

Locale Icon

Locale: bew

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 213.73 MB

Common Voice

Common Voice Scripted Speech 24.0 - Chinese (Hong Kong)

A collection of scripted spoken phrases in Chinese (Hong Kong).
License Icon

License: CC0-1.0

Locale Icon

Locale: zh-HK

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 3.40 GB