Kituba-TTS-Dataset
License:
NOODL-1.0
Steward:
Institute of African Digital Humanities
Task: TTS
Release Date: 12/10/2025
Format: WAV, TSV
Size: 553.28 MB
Description
Paired audio and text data on Kituba (mkw), a language spoken in Congo. The audio corpus consists of 8,302 clips read by one speaker, totalling 350 min 11.98 sec. The dataset also contains a mapping file of audio and text with 8,173 lines. Each line begins with the name of an audio file, followed by a tab and then the corresponding text excerpt. This dataset is suitable for TTS tasks.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Restrictions/Special Constraints
Although the transcription of the audio recording was created using a standardised writing system, it may not reflect the standard used by the wider Kituba (mkw) community. This is often the case in communities using a low-resource language, where different writing standards may be in use. Therefore, this should be taken into account when using this dataset for TTS and ASR tasks. Ideally, the resulting TTS or ASR models should explicitly state which writing system was used.
Forbidden Usage
Generative AI, reproduction, duplication, modification, augmentation, copying, distribution, transmission, display, sale, transfer, publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.
Processes
Intended Use
The dataset is suitable for speech-related tasks. - Automatic speech recognition (ASR): Audio–text alignment allows the training or evaluation of speech recognition models for Kituba (mkw), which is a valuable tool for language technology. Please be aware that the read sentences are written in a writing standard, which may co-exist with other writing standards within the Kituba community. - Text-to-speech (TTS): The dataset contains clean sentence–audio pairs read by the same speaker, totalling 5.83 hours. This makes the dataset suitable for training or evaluating text-to-speech models. Again, note that the writing standard may be just one of several in use within the community.
Metadata
Language
Kituba is a widely used Bantu-based creole lingua franca spoken in the Republic of the Congo and the Democratic Republic of the Congo. Emerging historically as a simplified variety of Kongo (Kikongo), it has expanded into an independent language with millions of speakers across urban and rural regions. Today, Kituba serves as a national language in both countries and functions in everyday communication, education, administration, religious contexts, and the media. Its simplified grammar and broad communicative reach make it one of the most accessible and widely used languages of Central Africa.
Variant
Kituba (mkw) is known by different names depending on region and institutional context. In the Democratic Republic of Congo (DRC), it is officially called Kikongo ya leta (‘State Kikongo’), highlighting its administrative and educational role, while in Congo-Brazzaville it is commonly referred to simply as Kituba. Other appellations such as Munukutuba or Kikongo-Kituba appear in linguistic and sociolinguistic literature, reflecting its origins as a contact variety based on Kikongo dialects. Despite these different labels, all refer to the same language, whose internal variation remains relatively limited compared to the broader diversity of Kikongo varieties.
Alphabet
The alphabet used to create transcriptions of audio files that constitute this dataset is Latin-based. It contains the following characters: a, b, d, e, f, g, h, i, k, l, m, n, o, p, r, s, t, u, v, w, y, z, mb, mp, mf, ng, nk, ns, nt, nd
Source
This dataset was created using self-audio recordings of a male native speaker. The speaker then transcribed the recordings. This task aimed to produce datasets suitable for developing text-to-speech models for the Kituba (mkw) language. The speaker was guided through the process using open questions provided by the research coordinator.
Domain
The questions which prompted the speech recorded by the native speaker of Kituba (mkw) covered a variety of domains relevant to the cultural practices of the Kituba (mkw) community, and pertained to mostly the following genres: procedural, opinion and philosophical.
Size
Total size is 553,28 MB
Structure
The dataset contains 8,302 audio clips and a mapping file of audio and text with 8,173 lines. Each line begins with the name of an audio file, followed by a tab and then the corresponding text excerpt.
Sample
kituba_TTS_03_T2_T3.wav quel èè plat
kituba_TTS_03_T3_T4.wav bét lénda lamba naa
kituba_TTS_03_T4_T5.wav bangasi
kituba_TTS_03_T5_T6.wav quel èè plat bét lénda lamba
kituba_TTS_03_T6_T7.wav na bangasi
kituba_TTS_03_T7_T8.wav bien na bangasi
kituba_TTS_03_T8_T9.wav bét lénda lamba ba madia mingi
kituba_TTS_03_T10_T11.wav na ba, na bangasi
kituba_TTS_03_T11_T12.wav bét lénda lamba ba madia mingi
kituba_TTS_03_T12_T13.wav ba madia nèt
