Effect AI Scripted Speech 1.0 - English
License:
CC0-1.0
Steward:
Effect AITask: TTS
Release Date: 1/15/2026
Format: CSV, MP3
Size: 663.45 MB
Share
Description
An open dataset of scripted English sentences, recorded by speakers using the Effect AI platform for the development, training, and evaluation of speech recognition and language technologies.
Specifics
Considerations
Restrictions/Special Constraints
No Restrictions Provided
Forbidden Usage
It is forbidden to attempt to determine the identity of speakers in this dataset.
Processes
Ethical Review
All recordings in this dataset were contributed voluntarily by workers on the Effect AI platform. Contributors provided informed consent for their voice data to be used for research, model training, and public dataset release. No personally identifiable information is included in the dataset, and all contributor identifiers are anonymized. The data collection process complies with standard privacy and data protection practices. Effect AI conducted an internal review to ensure that the dataset meets ethical standards for human-subject data use, including informed consent and anonymization of contributor information. While the dataset is licensed under CC‑BY‑4.0 and may be freely used, adapted, and shared with attribution, users are strongly encouraged to respect ethical considerations, including not attempting to identify contributors or misuse the recordings.
Intended Use
This dataset is designed for training and evaluating text to speech models, speech recognition systems, and audio alignment methods. The short, clearly spoken sentences make it suitable for tasks such as phoneme to audio mapping, prosody modeling, speech synthesis benchmarking, and general research in spoken language processing. It can also support work in language learning tools, pronunciation analysis, and other applications that rely on high quality paired text and speech data.
Metadata
Intended Use
This dataset is intended for training and evaluating automatic speech recognition models, sentence alignment tasks, and research involving short form spoken English. It may also support work in computer aided language learning, speech analysis, and other related fields.
Metadata
This dataset contains 10.2 hours of recorded speech and 11,000+ general everyday english sentences both created by contributors on the Effect AI platform. Each entry consists of a sentence, sentence length (in words), a contributor identifier, an IPFS hosted audio clip, and the duration of the recording.
The recordings feature short, clear utterances suitable for model training, benchmarking, and audio quality evaluation. Contributors represent a variety of English speaking backgrounds.
Sample sentences:
The dog barks at cars passing down the road.
She cleaned the windows until they sparkled.
I was thinking we could try a new place to eat.
The sun is up so early today.
They usually exercise together on weekends.
Fields
Each row includes the following information:
sentence — text spoken by the contributor
sentence_length — number of words in sentence
author_id — contributor identifier
audio_file — audio file name
duration_sec — clip duration in seconds
Join the Effect AI Community
You can learn more about Effect or join the community through the links below:
Website: https://effect.ai
X (Twitter): https://x.com/effectaix
Telegram: https://t.me/effectai
Discord: https://discord.gg/effectnetwork
These resources provide updates, technical information, and ways to participate in future data collection initiatives.