Bamun-TTS-Dataset
License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesTask: TTS
Release Date: 4/2/2026
Format: MP3, TSV
Size: 219.97 MB
Share
Description
This dataset comprises audio recordings of Bamun (Shupamem) speech aligned with textual transcriptions. The dataset is structured into 24 folders totalling 4h 30m 25s, each containing audio files and a corresponding audio-text mapping file. The audio clips are short, typically ranging from 1 to 10 seconds, and are suitable for training and evaluating Text-to-Speech (TTS) systems. The dataset follows a structured format where each audio file is paired with its corresponding transcription in a tab-separated mapping file. The textual content used in this dataset originates from transcriptions of oral narratives documenting personal histories related to German colonisation in Cameroon. These texts were segmented into short utterances suitable for read speech and TTS modelling.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Restrictions/Special Constraints
- For research and scientific use only - You agree not to re-host or redistribute this dataset
Forbidden Usage
- Generative AI - Voice cloning or speaker imitation - Reproduction, duplication, modification, or redistribution - Commercial use without explicit permission
Processes
Intended Use
This dataset is intended for the training and evaluation of Text-to-Speech (TTS) systems for the Bamun language. It aims to support: - Language revitalisation - Development of speech technologies for under-served African languages - Educational applications in multilingual contexts
Metadata
Language
Bamun or Shüpamom/Shupamem is a Bantu-Grassfield language spoken in the Noun Division, West Region in Cameroon.
Variants
The Bamun language is quite homogeneous within their indigenous territory, the Noun Administrative Division. However, the Administrative Atlas of Cameroon's Languages (Breton and Bikia Fohtung, 1991) indicates a few "islands" outside the Noun Department where the Bamun language exhibits minor variations. These include Bapi in the Mifi Division in the West Region and Bamalang and Bangolan in the Mezam Division in the Northwest Region.
Writing System
1. Vowels
The vowel inventory reflected in the dataset is: i, e, ɛ, a, ɔ, o, u, ʉ, ə
The vowel ə / ә is particularly frequent and functions as a central vowel.
2. Consonants
The consonant system includes the following simple consonants: b, d, f, g, h, j, k, l, m, n, ŋ, p, r, s, t, v, w, y, z
Complex and cluster-like consonants attested include: mb, nd, nk, ng, nt, nj, mf, kp
Digraphs: sh, gh
3. Tone system
The transcription encodes lexical tone using diacritics, corresponding to standard tonal categories:
High tone (H): marked with acute accent (á, é, ɔ́, ʉ́, ŋ́)
Low tone (L): marked with grave accent (à, è, ɔ̀)
Mid tone (M): marked with macron (ā, ē)
Rising tone (LH): marked with caron (ǎ, ě, ɔ̌)
Falling tone (HL): marked with circumflex (â, ê)
Source
This dataset originates from audio recordings documenting personal histories of German colonisation. These recordings were made in the early eighties as part of a research project led by Prince (Professor) Koum A Ndoumbe III.
Abdou Salam Ntieche Fifen created the transcriptions associated with this dataset. The transcriptions were made in 2017.
For the purpose of creating this dataset, the textual material was segmented into short utterances and aligned with corresponding audio recordings to support TTS modelling.
Domain
This dataset is derived from prompted speech in the form of directed interviews. The content reflects personal narratives related to colonial history in Cameroon.
The dataset has been transformed into read-style segmented speech suitable for speech synthesis tasks.
Size
219.97 MB The dataset is composed of 24 folders containing audio clips and corresponding mapping files.
Each folder contains between approximately 10 and 280 audio files. Individual audio clips typically range from 1 to 10 seconds in duration.
Folder-level durations range from approximately 1 minute to over 18 minutes of audio. The dataset therefore represents several hours of segmented Bamun speech data.
The total duration of the recording is 4h 30m 25s.
Structure
The dataset is composed of 24 folders containing audio clips and corresponding mapping files.
Each folder in the dataset contains:
A collection of audio files in MP3 format, between approximately 10 and 280 audio files. Individual audio clips typically range from 1 to 10 seconds in duration.
A tab-separated mapping file linking each audio file to its transcription
Folder-level durations range from approximately 1 minute to over 18 minutes of audio. The dataset therefore represents several hours of segmented Bamun speech data.
The total duration of the recording is 4h 30m 25s.
Each line in the mapping file follows the format:
audio_filename.mp3 | transcription
The dataset is designed for TTS pipelines requiring paired audio-text data.
Sample
03246844d87f5a76ec4fc1f636626bb5.mp3 | Euh mí u tóóshә́ ŋwәt ru
14289dc77904b3edb98afcfbb5776ee1.mp3 | Í nzie Li shá?
2579e22b2b248815938631969ae22200.mp3 | Li shú, nә nguu yúá
33380524f516027e3b6acad30c6a4f0f.mp3 | Ndǔ lʉ́m mú
364c11dd9cc6d98ae53d9fca5ef0b374.mp3 | Mbúá' NJI FIFEN