Datasets

Filters:

Search results for “podcast”

Open Home Foundation

Anna 1.0

Text to speech dataset for Hungarian, female speaker, approximately 1.5 hours of read speech.

License: CC0-1.0

Locale: hu-HU

Task: TTS

Format: WEBM

Size: 95.27 MB

Amara Hub

DataTrust Africa: Speech Corpus of Public Radio Recordings from Northern Uganda

This is an open-access corpus of short clips of public radio content from Mega 100 FM, Q FM, Radio Pacis and Radio Rupiny in Northern Uganda. As of now, the online corpus has over 350 clips of recordings in English. We also hope to add finely-annotated transcripts to them. The dataset is for use in NLP research and non-commercial use. Upcoming datasets to look out for from Amara Hub are public radio recordings in other languages spoken in the region like Acholi, Lango, Lugbara and Akaramajong.

License: NOODL-1.0

Locale: en-US

Task: NLP

Format: MP3

Size: 179.82 MB

Common Voice

Common Voice Scripted Speech 24.0 - Tunen

A collection of scripted spoken phrases in Tunen.

License: CC0-1.0

Locale: tvu

Task: ASR

Format: MP3

Size: 195.38 MB

Open Home Foundation

Dmitri 1.0

Text to speech dataset for Russian, male speaker, approximately 2 hours of read speech.

License: CC0-1.0

Locale: ru-RU

Task: TTS

Format: WEBM

Size: 96.63 MB

The University of Melbourne

Central Kurdish TTS dataset 1.0

This dataset contains high-quality single-speaker audio recordings in Central Kurdish (ckb), intended for building Text-to-Speech (TTS) and Automatic Speech recognition (ASR) systems. The dataset comprises 2 hours and 18 minutes of aligned audio and text data.

License: CC-BY-4.0

Locale: ckb

Task: TTS

Format: wav

Size: 293.45 MB

Open Home Foundation

Imre 1.0

Text to speech dataset for Hungarian, male speaker, approximately 1.5 hours of read speech.

License: CC0-1.0

Locale: hu-HU

Task: TTS

Format: WEBM

Size: 99.60 MB

Open Home Foundation

Dimitar 1.0

Text to speech dataset for Bulgarian, male speaker, approximately 2 hours of read speech.

License: CC0-1.0

Locale: bg-BG

Task: TTS

Format: WEBM

Size: 109.58 MB

Community

Thorsten-Voice Dataset 2021.06 Emotional

German emotional speech dataset (2,400 recordings, 8 emotions), CC0 licensed, 22,050 Hz mono WAV, for TTS and speech research.

License: CC0-1.0

Locale: de-DE

Task: TTS

Format: WAV,CSV

Size: 380.80 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Thur

A collection of spontaneous spoken phrases in Thur.

License: CC0-1.0

Locale: lth

Task: ASR

Format: MP3

Size: 292.98 MB

Common Voice

Common Voice Scripted Speech 24.0 - Bateri

A collection of scripted spoken phrases in Bateri.

License: CC0-1.0

Locale: btv

Task: ASR

Format: MP3

Size: 205.82 MB

Common Voice

Mozilla Common Voice Spontaneous Speech ASR Shared Task Train/Dev Data

This datasheet is for the bundle of Mozilla Common Voice spontaneous speech datasets to be used in the Shared Task on Spontaneous Speech.

License: CC0-1.0

Locale: mul

Task: ASR

Format: mp3

Size: 4.30 GB

Common Voice

Common Voice Scripted Speech 24.0 - Nüpode Huitoto

A collection of scripted spoken phrases in Nüpode Huitoto.

License: CC0-1.0

Locale: hux

Task: ASR

Format: MP3

Size: 229.65 MB

Open Home Foundation

Chitwan 1.0

Text to speech dataset for Nepali, male speaker, approximately 1 hour of read speech.

License: CC0-1.0

Locale: ne-NE

Task: TTS

Format: WEBM

Size: 61.68 MB

Institute of African Digital Humanities

Ewondo-TTS-Dataset

The dataset consists of four hours of high-quality audio clips, each paired with text and read by a single speaker.

License: NOODL-1.0

Locale: ewo

Task: TTS

Format: MP3, TSV

Size: 152.70 MB

Common Voice

Common Voice Scripted Speech 24.0 - Esperanto

A collection of scripted spoken phrases in Esperanto.

License: CC0-1.0

Locale: eo

Task: ASR

Format: MP3

Size: 38.69 GB

The University of Melbourne

Hawrami Kurdish TTS dataset 1.0

This dataset contains high-quality single-speaker audio recordings in Hawrami Kurdish (Hewrami, ISO 639-3:hac), also known as the Gorani language, intended for building Text-to-Speech (TTS) and Automatic Speech recognition (ASR) systems. The dataset comprises 5 hours and 15 minutes of aligned audio and text data. Hawrami is classified as Definitely Endangered by UNESCO.

License: CC-BY-4.0

Locale: hac

Task: TTS

Format: WAV

Size: 706.11 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Arvanitika

A collection of spontaneous spoken phrases in Arvanitika.

License: CC0-1.0

Locale: aat

Task: ASR

Format: MP3

Size: 46.68 MB

Common Voice

Common Voice Scripted Speech 24.0 - Losso

A collection of scripted spoken phrases in Losso.

License: CC0-1.0

Locale: nmz

Task: ASR

Format: MP3

Size: 205.70 MB

Common Voice

Common Voice Scripted Speech 24.0 - Massa

A collection of scripted spoken phrases in Massa.

License: CC0-1.0

Locale: mcn

Task: ASR

Format: MP3

Size: 217.68 MB

Common Voice

Common Voice Scripted Speech 24.0 - Kalenjin

A collection of scripted spoken phrases in Kalenjin.

License: CC0-1.0

Locale: kln

Task: ASR

Format: MP3

Size: 1.68 GB

Common Voice

Common Voice Scripted Speech 24.0 - Loja Highland Kichwa

A collection of scripted spoken phrases in Loja Highland Kichwa.

License: CC0-1.0

Locale: qvj

Task: ASR

Format: MP3

Size: 221.72 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Wixárika

A collection of spontaneous spoken phrases in Wixárika.

License: CC0-1.0

Locale: hch

Task: ASR

Format: MP3

Size: 198.80 MB

Common Voice

Mozilla Common Voice Spontaneous Speech ASR Shared Task Test Data

A bundle of the held-out test data for the Mozilla Common Voice Spontaneous Speech ASR shared task.

License: CC0-1.0

Locale: mul

Task: ASR

Format: MP3, TSV

Size: 784.80 MB

Rerooted Archive

ReRooted: Speech Corpus of Testimonials from Armenian Refugees and Immigrants

ReRooted is an open-access online YouTube corpus of interviews with Armenian refugees and immigrants. As of now, the online corpus has over 80hrs of recordings on YouTube, alongside subtitles in Armenian from Amara. The current dataset is 10 hours from the corpus that we have curated with corrected and finely-annotated transcripts. The dataset is for use in NLP research, and we hope to continuously update the dataset with more curated transcripts.

License: GPL-3.0

Locale: hy

Task: ASR

Format: WAV, TEXTGRID

Size: 3.25 GB