Datasets

Filters:
Search results for “breton”
Common Voice

Common Voice Scripted Speech 24.0 - Romansh Vallader

A collection of scripted spoken phrases in Romansh Vallader.
License Icon

License: CC0-1.0

Locale Icon

Locale: rm-vallader

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 113.00 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Welsh

A collection of spontaneous spoken phrases in Welsh.
License Icon

License: CC0-1.0

Locale Icon

Locale: cy

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 3.57 MB

Common Voice

Common Voice Scripted Speech 24.0 - Romansh Sursilvan

A collection of scripted spoken phrases in Romansh Sursilvan.
License Icon

License: CC0-1.0

Locale Icon

Locale: rm-sursilv

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 292.10 MB

Common Voice

Common Voice Scripted Speech 24.0 - Baoule

A collection of scripted spoken phrases in Baoule.
License Icon

License: CC0-1.0

Locale Icon

Locale: bci

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 294.70 MB

Common Voice

Common Voice Scripted Speech 24.0 - Cornish

A collection of scripted spoken phrases in Cornish.
License Icon

License: CC0-1.0

Locale Icon

Locale: kw

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 260.69 MB

Pro Svizra Rumantscha

Sutsilvan Newspaper Corpus

1.3 million tokens in the Sutsilvan variety of Romansh from the daily newspaper “La Quotidiana”.
License Icon

License: CC0-1.0

Locale Icon

Locale: rm-sutsilv

Task Icon

Task: OTH

Format Icon

Format: TSV

Size Icon

Size: 8.87 MB

Common Voice

Common Voice Scripted Speech 24.0 - Galician

A collection of scripted spoken phrases in Galician.
License Icon

License: CC0-1.0

Locale Icon

Locale: gl

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 7.30 GB

Common Voice

Common Voice Spontaneous Speech 2.0 - Galician

A collection of spontaneous spoken phrases in Galician.
License Icon

License: CC0-1.0

Locale Icon

Locale: gl

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 23.40 MB

Balochistan Educational and Cultural Organization

NAWA-E-WATAN Balochi Newspaper Corpus

A ~1.02M-token Balochi newspaper corpus from NAWA-E-WATAN, representing contemporary journalistic and public discourse.
License Icon

License: CC-BY-NC-4.0

Locale Icon

Locale: bgn

Task Icon

Task: NLP

Format Icon

Format: TXT

Size Icon

Size: 1.43 MB

Common Voice

Common Voice Scripted Speech 24.0 - Khetrani

A collection of scripted spoken phrases in Khetrani.
License Icon

License: CC0-1.0

Locale Icon

Locale: xhe

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 207.86 MB

Common Voice

Common Voice Scripted Speech 24.0 - Brahui

A collection of scripted spoken phrases in Brahui.
License Icon

License: CC0-1.0

Locale Icon

Locale: brh

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 205.87 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Gheg Albanian

A collection of spontaneous spoken phrases in Gheg Albanian.
License Icon

License: CC0-1.0

Locale Icon

Locale: aln

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 200.85 MB

Pro Svizra Rumantscha

Surmiran Newspaper Corpus

2.9 million tokens in the Surmiran variety of Romansh from the daily newspaper “La Quotidiana”.
License Icon

License: CC0-1.0

Locale Icon

Locale: rm-surmiran

Task Icon

Task: OTH

Format Icon

Format: TSV

Size Icon

Size: 11.89 MB

Common Voice

Common Voice Scripted Speech 24.0 - Mokpwe

A collection of scripted spoken phrases in Mokpwe.
License Icon

License: CC0-1.0

Locale Icon

Locale: bri

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 188.52 MB

Common Voice

Common Voice Scripted Speech 24.0 - Balti

A collection of scripted spoken phrases in Balti.
License Icon

License: CC0-1.0

Locale Icon

Locale: bft

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 360.29 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Melanau

A collection of spontaneous spoken phrases in Melanau.
License Icon

License: CC0-1.0

Locale Icon

Locale: mel

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 208.47 MB

Pro Svizra Rumantscha

Sursilvan Newspaper Corpus

14.6 million tokens in the Sursilvan variety of Romansh from the daily newspaper “La Quotidiana”.
License Icon

License: CC0-1.0

Locale Icon

Locale: rm-sursilv

Task Icon

Task: OTH

Format Icon

Format: TSV

Size Icon

Size: 37.80 MB

Open Home Foundation

Berta 1.0

Text to speech dataset for Hungarian, female speaker, approximately 1 hour of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: hu-HU

Task Icon

Task: TTS

Format Icon

Format: FLAC

Size Icon

Size: 209.52 MB

Common Voice

Common Voice Scripted Speech 24.0 - Borgu Fulfulde

A collection of scripted spoken phrases in Borgu Fulfulde.
License Icon

License: CC0-1.0

Locale Icon

Locale: fue

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 210.83 MB

Open Home Foundation

Tugão 1.0

Text to speech dataset for Portuguese, male speaker, approximately 1.5 hours of read speech.
License Icon

License: CC0-1.0

Locale Icon

Locale: pt-PT

Task Icon

Task: TTS

Format Icon

Format: WEBM

Size Icon

Size: 61.84 MB

Common Voice

Common Voice Scripted Speech 24.0 - Catalan

A collection of scripted spoken phrases in Catalan.
License Icon

License: CC0-1.0

Locale Icon

Locale: ca

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 77.93 GB

Common Voice

Common Voice Spontaneous Speech 2.0 - Scots

A collection of spontaneous spoken phrases in Scots.
License Icon

License: CC0-1.0

Locale Icon

Locale: sco

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 227.79 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Thur

A collection of spontaneous spoken phrases in Thur.
License Icon

License: CC0-1.0

Locale Icon

Locale: lth

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 292.98 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Gorani

A collection of spontaneous spoken phrases in Gorani.
License Icon

License: CC0-1.0

Locale Icon

Locale: hac

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 224.46 MB