Datasets

Filters:
Search results for “thorsten-voice”
Community

Thorsten-Voice Dataset 2023.09 Hessisch

German regional dialect speech dataset (Hessisch, 2,108 phrases), CC0 licensed, 22,050 Hz mono WAV, for TTS and speech research.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WAV,CSV

Size Icon

Size: 255.96 MB

Community

Thorsten-Voice Dataset 2022.10

German neutral speech dataset (12,450 phrases, 11+ hours), CC0 licensed, LJSpeech-compatible, for TTS research and development.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WAV,CSV

Size Icon

Size: 1.30 GB

Community

Thorsten-Voice Dataset 2021.02

German neutral speech dataset (22,668 phrases, 23+ hours), CC0 licensed, LJSpeech-compatible, for TTS research and development.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WAV, CSV

Size Icon

Size: 2.55 GB

Community

Thorsten-Voice Dataset 2021.06 Emotional

German emotional speech dataset (2,400 recordings, 8 emotions), CC0 licensed, 22,050 Hz mono WAV, for TTS and speech research.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WAV,CSV

Size Icon

Size: 380.80 MB

Community

Thorsten-Voice-44kHz-Full

German speech dataset (44.1 kHz, 38k+ files, ~40 hours), CC0 licensed, multi-style (neutral, emotional, dialect), for TTS research.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WAV,PARQUET

Size Icon

Size: 7.99 GB

Open Home Foundation

Kerstin 1.0

Text to speech dataset for German, female speaker, approximately 2 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: de-DE

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 132.05 MB

Open Home Foundation

Ronnie 1.0

Text to speech dataset for Dutch, male speaker, approximately 2 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: nl-NL

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 106.23 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - German

A collection of spontaneous spoken phrases in German.
License Icon

License: CC0-1.0

Locale Icon

Locale: de

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 21.96 MB

Common Voice

Common Voice Scripted Speech 24.0 - Tunen

A collection of scripted spoken phrases in Tunen.
License Icon

License: CC0-1.0

Locale Icon

Locale: tvu

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 195.38 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Thur

A collection of spontaneous spoken phrases in Thur.
License Icon

License: CC0-1.0

Locale Icon

Locale: lth

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 292.98 MB

Common Voice

Common Voice Scripted Speech 24.0 - German

A collection of scripted spoken phrases in German.
License Icon

License: CC0-1.0

Locale Icon

Locale: de

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 34.53 GB

Open Home Foundation

Flemishguy 1.0

Text to speech dataset for Dutch, male speaker, approximately 1 hour of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: nl-BE

Task Icon

Task: TTS

Format Icon

Format: FLAC

Size Icon

Size: 73.69 MB

Open Home Foundation

Nathalie 1.0

Text to speech dataset for Dutch, female speaker, approximately 1 hour of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: nl-BE

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 21.87 MB

TidyVoice2026 Challenge

TidyVoiceX_ASV

This dataset is designed for speaker verification using the Mozilla Common Voice corpus across 40 languages. It includes approximately 5,000 speakers who each have recordings in more than one language. Leveraging this multilingual overlap, we construct the trial pairs to explore cross-lingual variation in the speaker verification task.
License Icon

License: CC0-1.0

Locale Icon

Locale: mul

Task Icon

Task: OTH

Format Icon

Format: WAV

Size Icon

Size: 36.72 GB

Open Home Foundation

Pim 1.0

Text to speech dataset for Dutch, male speaker, approximately 2 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: nl-NL

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 108.08 MB

Open Home Foundation

Jeff 1.0

Text to speech dataset for Brazilian Portuguese, male speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: pt-BR

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 90.74 MB

Open Home Foundation

Tugão 1.0

Text to speech dataset for Portuguese, male speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: pt-PT

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 61.84 MB

Open Home Foundation

Joe 1.0

Text to speech dataset for English, male speaker, approximately 1 hour of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: en-US

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 75.78 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Alsatian

A collection of spontaneous spoken phrases in Alsatian.
License Icon

License: CC0-1.0

Locale Icon

Locale: gsw

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 85.53 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Frisian

A collection of spontaneous spoken phrases in Frisian.
License Icon

License: CC0-1.0

Locale Icon

Locale: fy-NL

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 320.14 KB

Open Home Foundation

Faber 1.0

Text to speech dataset for Brazilian Portuguese, male speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: pt-BR

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 30.98 MB

Open Home Foundation

Denis 1.0

Text to speech dataset for Russian, male speaker, approximately 2 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: ru-RU

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 104.52 MB

TidyVoice2026 Challenge

TidyVoiceX2_ASV

This dataset is designed for speaker verification using the Mozilla Common Voice corpus, covering approximately 40 additional languages beyond those included in TidyVoiceX_ASV. It comprises recordings from different speakers, each of whom appears in multiple languages. Leveraging this multilingual overlap, we construct trial pairs to investigate cross-lingual variation in the speaker verification task. This dataset served as the evaluation set for the TidyVoice 2026 Challenge.
License Icon

License: CC0-1.0

Locale Icon

Locale: mul

Task Icon

Task: OTH

Format Icon

Format: WAV

Size Icon

Size: 23.11 GB

Common Voice

Common Voice Spontaneous Speech 2.0 - Bodo

A collection of spontaneous spoken phrases in Bodo.
License Icon

License: CC0-1.0

Locale Icon

Locale: brx

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 1.29 MB