Bulu-TTS-Dataset 1.0
License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesTask: TTS
Release Date: 2/5/2026
Format: MP3, TSV
Size: 87.40 MB
Share
Description
This dataset comprises denoised audio recordings of read speech from a single Bulu male speaker. Bulu is a Bantu language spoken in Cameroon. The dataset also contains audio/sentence mapping files, making it suitable for TTS tasks on Bulu.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Restrictions/Special Constraints
- For research and scientific use only - You agree that you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: determining the identity of the speaker in the dataset; attempt to clone the voice or train models that imitate the speaker in this dataset; Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.
Processes
Intended Use
This dataset is intended for the training and/or evaluation of text-to-speech models for the Bulu language, with the aim of developing practical language learning and usage applications. More specifically, the dataset is designed to support language revitalisation and endogenous multilingual education in Cameroon, particularly self-directed language learning for teacher trainers, secondary education, and community-driven language revitalisation.
Metadata
Language
The Administrative Atlas of Cameroon's Languages (Breton and Bikia Fohtung, 1991) classifies Bulu as a Beti-Fang dialect. However, the Bulu people contest this classification, claiming that they speak a distinct language rather than a dialect of an overarching Beti-Fang language. Bulu is spoken in four administrative departments in Cameroon's South Region: the Ntem, Mvila, Dja-and-Lobo and Océan divisions.
Variants
The Bulu-speaking community generally identifies with two broad speech areas, which may be referred to as dialects: the Ebolowa speech area and the Sangmelima speech area. Ebolowa is the headquarters of the Mvila Division, and Sangmelima is the headquarters of the Dja-and-Lobo Division.
Alphabet
The Bulu alphabet as represented in this dataset consists of the following letters:
Consonants: b, d, f, g, h, j, k, l, m, n, p, s, t, v, w, y, z
Vowels: a, e, i, o, u
Special Orthographic Features:
Apostrophe ('): Used within words to mark glottal stops or syllable boundaries (e.g., "wo'o", "nga'ane", "bo'ok")
Digraphs and Trigraphs:
ng (velar nasal)
ny (palatal nasal)
kp (doubly articulated consonant)
ngb (prenasalized consonant)
mb, nd, nj (prenasalized consonants)
Vowel Sequences: Vowels may appear in sequence representing distinct syllables or diphthongs (e.g., "ngenane", "woena", "joe")
No Tone Marking: The orthography does not mark lexical tone, despite Bulu being a tonal language
No Diacritics: As noted in the manuscript preface, diacritical marks are not employed in this transcription
Source
The prompts used in read speech for this dataset were reused from Jean Louis Njemba Medu's Nnanga Kon, a classic of Bulu literature published in 1939. The text was extracted from scanned PDF files using OCR, then cleaned and split into chunks suitable to read speech for TTS.
Domain
TheThe text is a narrative about the first encounters between the missionary Adolph Clemens Good and the Bulu community..
Size
Total size is 87,40 MB.
Structure
This dataset comprises audio clips and audio/text mapping files. There are 1,721 audio clips totalling 3 hours, 16 minutes and 46 seconds, as well as an audio/text mapping file totalling 1,715 lines.
Sample
cf6115924c228d85173d3923f9597c1a.mp3 | Jé é ne na môt a kobô’ô,mi ke mi kobô’ô, bi aye ke wô’ô wô’ô ajô éyoñ évé?
1b4deae160c0964134a7d8440a10e2f7.mp3 | Nja’a kañete bia.
6b3fcc92688a959f70bc32eddbca5338.mp3 | A Ela, ke bongô ba be ne aval étam!
7f54a53dca78d0d52caf2ff61c02cbcd.mp3 | “Môt a betek, ve na susuk”.
1e01222cc58176e6886fb200a0cb0871.mp3 | Nde ñhe me nga bulane ke jome jôm be nga bo je zô’é;
6ef23f9bc50c316c788f4149047e7157.mp3 | a zu koé na, aval môt étam e tele, ba benya bôtô; môt m’ ajô nye nyô,
871d72d1e1f638c8901823e980b81502.mp3 | te yeme yeme avale jôm a funane de; éyoñ me nga sili bôt be nga to valé,
451422ba8f7e379d5ecdea8888a5fa69.mp3 | ane be nga yalane me na.
c38a927d579a5a0f3eff2b045963b367.mp3 | Ane nanga kon!
0357189aeb7de00fc1735f4fa7f777ea.mp3| Nyôl é to to nye aya? Ela a sili