Datasets

Filters:
Search results for “common voice ”
Common Voice

Common Voice Scripted Speech 24.0 - Kazakh

A collection of scripted spoken phrases in Kazakh.
License Icon

License: CC0-1.0

Locale Icon

Locale: kk

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 74.09 MB

Common Voice

Common Voice Scripted Speech 24.0 - Kalenjin

A collection of scripted spoken phrases in Kalenjin.
License Icon

License: CC0-1.0

Locale Icon

Locale: kln

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 1.68 GB

Common Voice

Common Voice Scripted Speech 24.0 - Korean

A collection of scripted spoken phrases in Korean.
License Icon

License: CC0-1.0

Locale Icon

Locale: ko

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 206.21 MB

Common Voice

Common Voice Scripted Speech 24.0 - Bafia

A collection of scripted spoken phrases in Bafia.
License Icon

License: CC0-1.0

Locale Icon

Locale: ksf

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 388.95 MB

Common Voice

Common Voice Scripted Speech 24.0 - Khowar

A collection of scripted spoken phrases in Khowar.
License Icon

License: CC0-1.0

Locale Icon

Locale: khw

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 409.89 MB

Common Voice

Common Voice Scripted Speech 24.0 - Kalasha

A collection of scripted spoken phrases in Kalasha.
License Icon

License: CC0-1.0

Locale Icon

Locale: kls

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 205.53 MB

Common Voice

Common Voice Scripted Speech 24.0 - Parkari Koli

A collection of scripted spoken phrases in Parkari Koli.
License Icon

License: CC0-1.0

Locale Icon

Locale: kvx

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 219.47 MB

Common Voice

Common Voice Scripted Speech 24.0 - Cornish

A collection of scripted spoken phrases in Cornish.
License Icon

License: CC0-1.0

Locale Icon

Locale: kw

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 260.69 MB

Common Voice

Common Voice Scripted Speech 24.0 - Luganda

A collection of scripted spoken phrases in Luganda.
License Icon

License: CC0-1.0

Locale Icon

Locale: lg

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 11.03 GB

Common Voice

Common Voice Scripted Speech 24.0 - Ligurian

A collection of scripted spoken phrases in Ligurian.
License Icon

License: CC0-1.0

Locale Icon

Locale: lij

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 109.81 MB

Common Voice

Common Voice Scripted Speech 24.0 - Tshiluba

A collection of scripted spoken phrases in Tshiluba.
License Icon

License: CC0-1.0

Locale Icon

Locale: lua

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 259.72 MB

Common Voice

Common Voice Scripted Speech 24.0 - Laz

A collection of scripted spoken phrases in Laz.
License Icon

License: CC0-1.0

Locale Icon

Locale: lzz

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 137.23 MB

Common Voice

Common Voice Scripted Speech 24.0 - Huautla Mazatec

A collection of scripted spoken phrases in Huautla Mazatec.
License Icon

License: CC0-1.0

Locale Icon

Locale: mau

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 197.85 MB

Common Voice

Common Voice Scripted Speech 24.0 - Lassi

A collection of scripted spoken phrases in Lassi.
License Icon

License: CC0-1.0

Locale Icon

Locale: lss

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 175.10 MB

Common Voice

Common Voice Scripted Speech 24.0 - Lithuanian

A collection of scripted spoken phrases in Lithuanian.
License Icon

License: CC0-1.0

Locale Icon

Locale: lt

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 741.81 MB

Common Voice

Common Voice Scripted Speech 24.0 - Dholuo

A collection of scripted spoken phrases in Dholuo.
License Icon

License: CC0-1.0

Locale Icon

Locale: luo

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 2.23 GB

Common Voice

Common Voice Scripted Speech 24.0 - Dutch

A collection of scripted spoken phrases in Dutch.
License Icon

License: CC0-1.0

Locale Icon

Locale: nl

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 3.14 GB

Common Voice

Common Voice Scripted Speech 24.0 - Pashto

A collection of scripted spoken phrases in Pashto.
License Icon

License: CC0-1.0

Locale Icon

Locale: ps

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 49.98 GB

Kaltepetlahtol

Zacatlán Tepetzintla Nahuatl ASR Dataset

A 14 hour ASR dataset of Nahuatl from Zacatlán and Tepetzintla. Derived from Amith et al (2026)´'s field recordings and transcriptions datasets
License Icon

License: CC-BY-ND-4.0

Locale Icon

Locale: nhi

Task Icon

Task: ASR

Format Icon

Format: FLAC, TSV

Size Icon

Size: 789.98 MB

Common Voice

Common Voice Scripted Speech 24.0 - Russian

A collection of scripted spoken phrases in Russian.
License Icon

License: CC0-1.0

Locale Icon

Locale: ru

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 6.53 GB

TidyVoice2026 Challenge

TidyVoiceX2_ASV

This dataset is designed for speaker verification using the Mozilla Common Voice corpus, covering approximately 40 additional languages beyond those included in TidyVoiceX_ASV. It comprises recordings from different speakers, each of whom appears in multiple languages. Leveraging this multilingual overlap, we construct trial pairs to investigate cross-lingual variation in the speaker verification task. This dataset served as the evaluation set for the TidyVoice 2026 Challenge.
License Icon

License: CC0-1.0

Locale Icon

Locale: mul

Task Icon

Task: OTH

Format Icon

Format: WAV

Size Icon

Size: 23.11 GB

Common Voice

Common Voice Spontaneous Speech 2.0 - Galician

A collection of spontaneous spoken phrases in Galician.
License Icon

License: CC0-1.0

Locale Icon

Locale: gl

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 23.40 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Gorani

A collection of spontaneous spoken phrases in Gorani.
License Icon

License: CC0-1.0

Locale Icon

Locale: hac

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 224.46 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Wixárika

A collection of spontaneous spoken phrases in Wixárika.
License Icon

License: CC0-1.0

Locale Icon

Locale: hch

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 198.80 MB