Ewondo-TTS-Dataset

License icon

License:

NOODL-1.0

Shield icon

Steward:

Institute of African Digital Humanities

Task: TTS

Release Date: 1/30/2026

Format: MP3, TSV

Size: 152.70 MB


Share

Description

This dataset comprises high-quality audio recordings of read speech from a single female speaker of Ewondo, a Bantu language spoken in Cameroon. The dataset also contains audio/sentence mapping files, making it suitable for TTS tasks on Ewondo.

Specifics

Licensing

Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)

https://licensingafricandatasets.com/nwulite-obodo-license

Considerations

Restrictions/Special Constraints

- For research and scientific use only - You agree that you will not re-host or re-share this dataset

Forbidden Usage

You agree not to use the data for: determining the identity of the speaker in the dataset; attempt to clone the voice or train models that imitate the speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.

Processes

Intended Use

This dataset is intended for the training and/or evaluation of text-to-speech models for the Ewondo language, with the aim of developing practical language learning and usage applications. More specifically, the dataset is designed to support language revitalisation and endogenous multilingual education in Cameroon and elsewhere, particularly self-directed language learning for teacher trainers, secondary education, and community-driven language revitalisation.

Metadata

Language

Ewondo is a Narrow Bantu language which is indigenous to a population mainly located in the Centre Region of Cameroon, with pockets of settlements in the South, and East Regions. Ewondo is vehicular to populations in the South and East Regions of Cameroon, and has also developed into a creole known as Mongo Ewondo;

Variants

The term 'Ewondo' is used to describe a set of linguistic varieties whose speakers may or may not identify with the term. This is partly due to the structures of linguistic governance. In Cameroon, a nationwide linguistic survey was undertaken in the second half of the 1970s and the first half of the 1980s as part of the Atlas Linguistique du Cameroun project. The survey resulted in the publication of the Atlas of Cameroonian Languages, also referred to as the Administrative Atlas of Cameroonian Languages. In this work, a macro-language called Beti-Fang is identified, with Ewondo being one of the major micro-languages alongside Fang, Bulu, Ntumu and Eton. Other subgroups speaking varieties that differ to a greater or lesser extent have often been subsumed under one of the more prominent Beti-Fang micro languages. Consequently, it is very difficult to determine with confidence, based on which variables, a particular linguistic variety can be categorised as Ewondo without distorting reality. The speaker who red the voice clips of this dataset is indigenous to the Mvog-Ebanda sub-group, located in the capital city of Yaounde.

Alphabet

Latin-based orthography with optional tone marking.

Vowels

a e ə i o u (long vowels by doubling)

Consonants

b d f g h k l m n p s t v w y z

Digraphs / Prenasalized

mb nd ng nk nz ny dz ts

Special symbols

ŋ ə

Tone

grave (◌̀), acute (◌́), caron (◌̌), circumflex (◌̂). Low tone is unmarked by default and only surfaces on syllabic nasals.

Source

The text used to prompt the spoken recordings in this dataset was transcribed from interviews conducted in Yaoundé in the 1980s by Professor Kum A. Ndumbe and his research team. The recordings were transcribed between 2016 and 2017 by Hubert Nkoumou, the legal owner of this dataset. During the dataset's design, the text was divided into sentences and read by a female Ewondo speaker from the Moog Abanda macro family, which is located in Yaoundé.

Domain

The text is a dialogue in which an elderly Ewondo speaker is prompted to recount their experience of German colonialism in Cameroon.

Size

Total size is 152,70 MB.

Structure

This dataset comprises audio clips and audio/text mapping files. There are 2,902 audio clips totalling 4 hours, 3 minutes and 44 seconds, as well as 9 audio/text mapping files totalling 2,903 lines. The zip folder contains nine subfolders, each comprising audio clips and audio/text mapping files created during a single recording session. This avoids the possibility of audio files created during different recording sessions having the same name, which was automatically generated by the recording application.

Sample

  1. 471a5418be70283e32788beed0c35e43.mp3 | Mbóló éyɔŋ bə́ngásɔ́ fɔɔ ?

  2. a3b5215616a772efac01d3d11315b1bb.mp3 | Éyɔŋ bə́ngásɔ́, mǎ yəmə́ ki mbóló bəngásɔ́, eyɔŋ bə́ngásɔ́ mǎ yəmə́ kig.

  3. edc87399d50b5a0c1f7067439f7827b2.mp3 | Hǹńǹ !!!

  4. 0e37afa962c66b24a123c2e995aa45aa.mp3 | Eyɔŋ bə́ngákə éngɔ mǎyəm.

  5. 49af8005e0fa9725297e7ad23f45acf2.mp3 |. Yə bə́ngá lê wa tɔ minláŋ etə nâ apolo bə́ngásɔ́ yə bə́ngábɔ bitá ?

  6. 4beef3d872f34e18f2c10bfdceece776.mp3 | Éyɔŋ bə́ngásɔ́, bə́ Ndziki bɔ̌ bita.

  7. 355b222d38434ce3a925f8161615b9a1.mp3 | Bə́ngásɔ́ fɔ́ɔ́.

  8. 83adee6354a3fe1450726533911c2b64.mp3 | Nala təgɛ bita.

  9. 3cd75f6d73727a83299bf040fc248d56.mp3 | Vədá, dɔ bə́ngá man tɔbɔ ?

  10. 0975cef73a7f48c58b3a9bc0b036d90b.mp3. | Hǹhń éyɔŋ bə́ngázu tɔbɔ.

  11. 66490bb97f55ff8ff3cde76201316143.mp3 | Mɛ éyɔŋ bə́ngásɔ́, məngáwóg bə́ngákəbálê nâ bə́ngázu tɔb Kamərun.

  12. bd43353495fd7de801d45c95f4a9757e.mp3 | Bə́ngázu tɔb ?

  13. a4b41ae678899585b71763d73de8588f.mp3 | Bə́ngázu kɔb Kamərun.