Mbosi-TTS-Dataset

License icon

License:

NOODL-1.0

Shield icon

Steward:

Institute of African Digital Humanities

Task: TTS

Release Date: 12/11/2025

Format: WAV, TSV

Size: 644.39 MB


Share

Description

The dataset consists of paired audio and text data on Mbosi (mdw), a language spoken in Congo. The audio corpus consists of 2,575 clips read by one speaker totaling 275 min 48.35 sec. The dataset also contains a mapping file of audio and text with 2,597 lines. Each line begins with the name of an audio file, followed by a tab and then the corresponding text excerpt. This dataset is suitable for TTS tasks.

Specifics

Licensing

Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)

https://licensingafricandatasets.com/nwulite-obodo-license

Considerations

Restrictions/Special Constraints

Although the transcription of the audio recording was created using a standardised writing system, it may not reflect the standard used by the wider Mbosi community. This is often the case in communities using a low-resource language, where different writing standards may be in use. Therefore, this should be taken into account when using this dataset for TTS and ASR tasks. Ideally, the resulting TTS or ASR models should explicitly state which writing system was used.

Forbidden Usage

Generative AI, reproduction, duplication, modification, augmentation, copying, distribution, transmission, display, sale, transfer, publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.

Processes

Intended Use

The dataset is suitable for speech-related tasks. - Automatic speech recognition (ASR): Audio–text alignment allows the training or evaluation of speech recognition models for Mbosi, which is a valuable tool for language technology. Please be aware that the read sentences are written in a writing standard, which may co-exist with other writing standards within the Mbosi community. - Text-to-speech (TTS): The dataset contains clean sentence–audio pairs read by the same speaker, totalling 4.7 hours. This makes the dataset suitable for training or evaluating text-to-speech models. Again, note that the writing standard may be just one of several in use within the community.

Metadata

Language

Mbosi is a Bantu language spoken by about 110,000 people in the centre of the Republic of the Congo, particularly in the districts of Boundji, Owando, Oyo, Bokouélé, Tongo, Tchikapika and Mossaka in the Cuvette Department, and in the districts of Abala, Allembé, Ogogni and Ollombo in the Plateaux Department. Mbosi is also known as Mbochi, Mboshe, Mboshi or Embosi. Speakers of Mbosi call their language Embɔ́si.

Variants

Mbosi dialects are spoken north and south of the Alima River. To the west are Mbondzi, Ngaé, Ngilíma, and Ɔbaa; to the south, Olee, Ondinga, and Tsambítsɔ; in the central north, Ɛbɔi; and to the east, Bokwele and Boɲala.

Alphabet

A Latin-based alphabet for Mbosi was developed by the Department of Linguistics and African Languages at Marien Ngouabi University in Brazzaville. However, the alphabet used to create transcriptions of audio files that constitute this dataset is differs slightly from the one designed at Marien Ngouabi University. The following is the set of characters found in the text of this dataset: a, à, e, è, é, i, o, u, ù, b, d, f, g, h, k, l, m, n, p, r, s, t, v, w, y, z, mb, mp, nd, ndz, ng, nv, nt, ts, dz, mw, pw, bw, kw

Source

This dataset was created using self-audio recordings of a male native speaker. The speaker then transcribed the recordings. This task aimed to produce datasets suitable for developing text-to-speech models for the Mbosi language. The speaker was guided through the process using open questions provided by the research coordinator.

Domain

The questions which prompted the speech recorded by the native speaker of Mbosi covered a variety of domains relevant to the cultural practices of the Mbosi community, and pertained to mostly the following genres: procedural, opinion and philosophical.

Size

Total size is 644,39 MB

Structure

The audio corpus consists of 6,933 clips read by one speaker totaling 275 min 48.35 sec. The dataset also contains a mapping file of audio and text with 4,422 lines. Each line begins with the name of an audio file, followed by a tab and then the corresponding text excerpt.

Sample

  1. Mbosi_TTS_04_T0_T1.wav Moh mbvoussa a pawa hè bissi léh nga litia ndengue edza nga bissi ledzema la bana ba ngo

  2. Mbosi_TTS_04_T1_T2.wav ikanga lidze bissi lesoa

  3. Mbosi_TTS_04_T2_T3.wav bana ba ngoh ah ngah

  4. Mbosi_TTS_04_T3_T4.wav badi ba kana nga samwè motema

  5. Mbosi_TTS_04_T4_T5.wav bissi ledi lébélé ya mélé midie nga bissi ah bana bah ngo lokia

  6. Mbosi_TTS_04_T5_T6.wav edi pawa wa nga ibali omélé mi

  7. Mbosi_TTS_04_T6_T7.wav yè mpè nga bossa yendi nga litia

  8. Mbosi_TTS_04_T7_T8.wav ah wa nga liwelo voula ngongon

  9. Mbosi_TTS_04_T8_T8.wav silence

  10. Mbosi_TTS_04_T8_T9.wav wa nga liwelo voula ngongon