Thorsten-Voice Dataset 2021.02

License icon

License:

CC0-1.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 2/25/2026

Format: WAV, CSV

Size: 2.55 GB


Share

Description

Thorsten-Voice Dataset 2021.02 is a high-quality German neutral speech dataset recorded by Thorsten Müller and audio-optimized by Dominik Kreutz. It contains 22,668 phrases with more than 23 hours of clean speech audio. The dataset has been publicly available for several years and is released under CC0 to enable unrestricted research and commercial use.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Restrictions/Special Constraints

None. Released under CC0 (public domain dedication).

Forbidden Usage

None from the licensor’s side. Users are responsible for complying with applicable laws and ethical standards.

Processes

Ethical Review

The dataset consists exclusively of voluntary recordings of the contributor’s own voice. No third-party voices or personal data are included. All recordings were created with the explicit intention of unrestricted public release under CC0. No formal institutional ethical review was required, as the dataset contains only self-recorded material. The dataset was released in the spirit of openness, equality, and free access to knowledge. The contributor encourages responsible and socially beneficial use.

Intended Use

Intended for text-to-speech (TTS) model training, speech synthesis research, benchmarking, and commercial speech technology development.

Metadata

Technical Details

  • 22,668 recorded phrases

  • More than 23 hours of audio

  • WAV files (mono)

  • 22,050 Hz sample rate

  • Normalized to -24 dB

  • No leading or trailing silence

  • LJSpeech-compatible file and directory structure

  • Phrase length (min/avg/max): 2 / 52 / 180 characters

  • Average spoken characters per second: 14

  • Sentences with question mark: 2,780

  • Sentences with exclamation mark: 1,840

Licensing

Released under CC0 (public domain dedication).
No restrictions apply.

Distribution

Originally published via Zenodo (DOI: https://doi.org/10.5281/zenodo.5525342).
This MDC entry serves as an additional distribution channel.

Contributor Statement (Thorsten Müller)

“For me, all people are equal, regardless of gender, sexual orientation, religion, skin color, or geographic coordinates of birth. I believe in a global world where everyone is welcome everywhere and where free knowledge and education are accessible to all. I donated my voice to the public domain in the hope that it will be used in this spirit.”