Kerstin 1.0

License icon

License:

CC0-1.0

Shield icon

Steward:

Open Home Foundation

Task: TTS

Release Date: 11/20/2025

Format: WEBM

Size: 132.05 MB


Description

Text to speech dataset for German, female speaker, approximately 2 hours of read speech.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Forbidden Usage

You agree not to attempt to determine the identity of speakers in the dataset

Processes

Intended Use

Training and fine-tuning text-to-speech models

Metadata

Deutsch - German (de)

This dataset contains approximately 2.3 hours of scripted speech for German (de) from a single speaker.

Language

German (Deutsch) is a West Germanic language, mainly spoken in Western and Central Europe.

Variants

There are no variants defined for this dataset.

Demographic information

The age and gender of the speaker was not reported. Dataset names may be gendered, but were assigned according to the speaker's preference only.

Text corpus

The text corpus comes from Piper Recording Studio, which extends Microsoft's samples TTS scripts for Azure.

Microsoft provides the following recommendations:

To use these example scripts for training, it's recommended that you should do the sanity check to make sure it matches what the voice talent actually speaks in the audio and normalize the text before uploading the data. For example, change '50%' to fifty percent and '$45' to forty-five dollars. Normalization should apply to the scripts that contain digits, symbols, abbreviations, date, and time.

Statistics for the text corpus:

  • Average/median characters per sentence: 77/77

  • Average/median words per sentence: 10/11

Writing system

Standard German alphabet.

Symbol table

Standard alphabet:

  • Lowercase: a b c d e f g h i j k l m n o p q r s t u v w x y z ß ä ö ü

  • Uppercase: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ä Ö Ü

Sample

5 randomly selected sentences:

Unmittelbar vor seiner geplanten Deportation beging er mit seiner Frau Charlotte Suizid.
Hofmann machte nach dem Besuch der Volksschule sein Abitur am Moerser Gymnasium Adolfinum.
Der Laser erhitzt dabei die Luft um die Partikel, die in einer Glasröhre stecken.
Ihren Namen hat die Fichtelnaab nur indirekt vom Fichtelgebirge.
Neben Ungarisch und Kroatisch sprach er auch Deutsch, Spanisch, Englisch, Französisch und Esperanto.

Processing and validation

Audio was recorded online using Piper Recording Studio. No post-processing or validation was done to the text or audio.

Trained models

A pre-trained Piper voice model is available for download.

Contribute

If you would like to contribute your voice and have us train a Piper text-to-speech model, please contact us at voice@openhomefoundation.org

Acknowledgements

We would like to thank all contributors, as well as supporters of the Open Home Foundation.

License

This dataset is released under the Creative Commons Zero (CC-0) license. By downloading this data you agree to not determine the identity of speakers in the dataset.