Bulu-TTS-Dataset 1.0

License icon

License:

NOODL-1.0

Shield icon

Steward:

Institute of African Digital Humanities

Task: TTS

Release Date: 2/5/2026

Format: MP3, TSV

Size: 87.40 MB


Share

Description

This dataset comprises denoised audio recordings of read speech from a single Bulu male speaker. Bulu is a Bantu language spoken in Cameroon. The dataset also contains audio/sentence mapping files, making it suitable for TTS tasks on Bulu.

Specifics

Licensing

Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)

https://licensingafricandatasets.com/nwulite-obodo-license

Considerations

Restrictions/Special Constraints

- For research and scientific use only - You agree that you will not re-host or re-share this dataset

Forbidden Usage

You agree not to use the data for: determining the identity of the speaker in the dataset; attempt to clone the voice or train models that imitate the speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.

Processes

Intended Use

This dataset is intended for the training and/or evaluation of text-to-speech models for the Bulu language, with the aim of developing practical language learning and usage applications. More specifically, the dataset is designed to support language revitalisation and endogenous multilingual education in Cameroon, particularly self-directed language learning for teacher trainers, secondary education, and community-driven language revitalisation.

Metadata

Language

The Administrative Atlas of Cameroon's Languages (Breton and Bikia Fohtung, 1991) classifies Bulu as a Beti-Fang dialect. However, the Bulu people contest this classification, claiming that they speak a distinct language rather than a dialect of an overarching Beti-Fang language. Bulu is spoken in four administrative departments in Cameroon's South Region: the Ntem, Mvila, Dja-and-Lobo and Océan divisions.

Variants

The Bulu-speaking community generally identifies with two broad speech areas, which may be referred to as dialects: the Ebolowa speech area and the Sangmelima speech area. Ebolowa is the headquarters of the Mvila Division, and Sangmelima is the headquarters of the Dja-and-Lobo Division.

Alphabet

The Bulu alphabet as represented in this dataset consists of the following letters:

Consonants: b, d, f, g, h, j, k, l, m, n, p, s, t, v, w, y, z

Vowels: a, e, i, o, u

Special Orthographic Features:

  1. Apostrophe ('): Used within words to mark glottal stops or syllable boundaries (e.g., "wo'o", "nga'ane", "bo'ok")

  2. Digraphs and Trigraphs:

    • ng (velar nasal)

    • ny (palatal nasal)

    • kp (doubly articulated consonant)

    • ngb (prenasalized consonant)

    • mb, nd, nj (prenasalized consonants)

  3. Vowel Sequences: Vowels may appear in sequence representing distinct syllables or diphthongs (e.g., "ngenane", "woena", "joe")

  4. No Tone Marking: The orthography does not mark lexical tone, despite Bulu being a tonal language

  5. No Diacritics: As noted in the manuscript preface, diacritical marks are not employed in this transcription

Source

The prompts used in read speech for this dataset were reused from Jean Louis Njemba Medu's Nnanga Kon, a classic of Bulu literature published in 1939. The text was extracted from scanned PDF files using OCR, then cleaned and split into chunks suitable to read speech for TTS.

Domain

TheThe text is a narrative about the first encounters between the missionary Adolph Clemens Good and the Bulu community..

Size

Total size is 87,40 MB.

Structure

This dataset comprises audio clips and audio/text mapping files. There are 1,721 audio clips totalling 3 hours, 16 minutes and 46 seconds, as well as an audio/text mapping file totalling 1,715 lines.

Sample

  1. cf6115924c228d85173d3923f9597c1a.mp3 | Jé é ne na môt a kobô’ô,mi ke mi kobô’ô, bi aye ke wô’ô wô’ô ajô éyoñ évé?

  2. 1b4deae160c0964134a7d8440a10e2f7.mp3 | Nja’a kañete bia.

  3. 6b3fcc92688a959f70bc32eddbca5338.mp3 | A Ela, ke bongô ba be ne aval étam!

  4. 7f54a53dca78d0d52caf2ff61c02cbcd.mp3 | “Môt a betek, ve na susuk”.

  5. 1e01222cc58176e6886fb200a0cb0871.mp3 | Nde ñhe me nga bulane ke jome jôm be nga bo je zô’é;

  6. 6ef23f9bc50c316c788f4149047e7157.mp3 | a zu koé na, aval môt étam e tele, ba benya bôtô; môt m’ ajô nye nyô,

  7. 871d72d1e1f638c8901823e980b81502.mp3 | te yeme yeme avale jôm a funane de; éyoñ me nga sili bôt be nga to valé,

  8. 451422ba8f7e379d5ecdea8888a5fa69.mp3 | ane be nga yalane me na.

  9. c38a927d579a5a0f3eff2b045963b367.mp3 | Ane nanga kon!

  10. 0357189aeb7de00fc1735f4fa7f777ea.mp3| Nyôl é to to nye aya? Ela a sili