Common Voice Spontaneous Speech 2.0 - Basaa
License:
CC0-1.0
Steward:
Common Voice
Task: ASR
Release Date: 12/5/2025
Format: MP3
Size: 109.37 MB
Description
A collection of spontaneous spoken phrases in Basaa.
Specifics
Considerations
Forbidden Usage
It is forbidden to attempt to determine the identity of speakers in the common Voice datasets. It is forbidden to re-host or re-share this dataset
Processes
Intended Use
This dataset is intended to be used for training and evaluating automatic speech recognition (ASR) models. It may also be used for applications relating to computer-aided language learning (CALL) and language or heritage revitalisation.
Metadata
Basaa — Basaa (bas)
This datasheet is for version 2.0 of the the Mozilla Common Voice Spontaneous Speech dataset
for Basaa (bas). The dataset contains 773 clips representing 6 hours of recorded
speech (6 hours validated) from 11 speakers.
Language
Basaa is a narrow Bantu language spoken across a geographical area spanning three administrative regions in Cameroon: the Centre, Littoral and South regions. It is estimated that there are currently around 600,000–700,000 speakers. This figure includes different varieties, as well as diasporic populations who identify as Basaa speakers.
The vitality of the Basaa language is stable (Ethnologue online). However, intergenerational transmission of Basaa is increasingly threatened among parents aged 50 and under, particularly in urban areas.
Although Basaa is taught in schools, this does not significantly impact the vitality of the language, mainly due to the current pedagogical approach, which relies on rule-based and descriptivist teaching methods.
The glossonym 'Basaa' is a generic term that encompasses a range of varieties, the speakers of which may identify with the 'Basaa' label to varying degrees, depending on a complex set of geographical, social, political, situational and pragmatic factors. Whether a language variant is considered Basaa depends greatly on the perspective of the person 'telling the story'. Some of the most commonly acknowledged varieties of Basaa include:
Mbene
Bikok
Babimbi
Basaa ba Omeng
Basaa ba Yabasi Basaa ba Duala
Ndog-Bikim
Other varieties, such as Ndonga, Mbaa (also known as Mbay-Bati) and Hijuk, may also be classified as Basaa. However, as previously mentioned, not everyone agrees on this classification.
Data splits for modelling
| Split | Count |
|---|---|
| Train | 220 |
| Test | 291 |
| Dev | 261 |
Transcriptions
Prompts:
74Duration:
5:22:34 [h:m:s]Avg. Transcription Len:
283Avg. Duration:
25.04[s]Valid Duration:
18232.452[s]Total hours:
5.38[h]Valid hours:
5.06[h]
Writing system
The prompts and responses in this dataset are written in the Latin alphabet, following the orthography of Protestant missionaries but with modifications introduced by the dataset's author. One such modification is the use of an apostrophe before the symbols 'y' and 'b' to signal nasal prefixes. For example: 'me n'yo': 'stealing palm wine from the palm trunk' (as opposed to me nyo, meaning 'drinking'), and m'bôñ, meaning 'poison'. 'cassava' (vs. 'mbôñ': 'poison'). As a general rule, the apostrophe signals 'accidentals' of a morphological or prosodic nature.
Samples
Questions
There follows a randomly selected sample of questions used in the corpus.
Kii ba nsébél ni hop u basaa le "litat matat ma ntôdôô" ?
Mambee maéba u nla ti babiina i kel libii jap ?
Inyu kii ba nkal le "mahôla ma mbôk bé bak" ?
Ba nla nugna mbôngôô ni mimbee mintén mi bijek ?
Kii ba nsébel njibngañ ?
Responses
There follows a randomly selected sample of transcribed responses from the corpus.
Litat matat ma ntôdôô ni hop u Basaa li yé jam, ntuk u matjañ. U yé le ba nkal we le u tat yom, ndi u tadak u keblak. Hala nyen ba nsébél le matat ma ntôdôô. Mut nu a ntat bé banga ntat nta. Mut nu a yé le, iyom ba nkal nye le a boñ a m'boñ bé. A m'boñ, a hogok a tjagak kôô i lép kiki ba nkal. Matat ma ntôdôô ma.
Imaéba me nti babiina i kel libii jap ma yé le, mulôm a gwés ñwaa wéé, ñwaa wéé a n'nôgôl nye.
Mahôla ma m'bôk bé bak, hala wee mut a nlama bé gwés man-isañ iloo nyemede. Iba le u nhôla mut, hôla mahôla ma nlama ba ni ngim hihéga. Mut a ngwés bé mut iloo nyemede.
Ibijek ba nla nugna ni mbôngôô, gwo bini le gwôô, masôô, manyogi, manga, bobola, makabô, m'bôñ.
Njibngañ, di nla eeh kal le i yé kiki bo ny bôm-be. Yom i i nlona mut ndutu i nyuu. Yom i i m'boñ mut le a bana mam, ma ma nlona nye ndutu.; ma ma nti bé nye nsañ ; ma ma nla yak kuuha nye ndutu ngandak. Iyom i yon ba nsébél le njibngañ. Mut nu a gwéé yom kiki bo mbom-be. Bom-be lôñni njibngañ bi nhek pôôna. Njibngañ i nlona mut ndutu ; i m'boñ nye le mam malam ma lôl bañ nye, ndigi mam mabe.
Fields
Each row of a tsv file represents a single audio clip, and contains the following information:
client_id- hashed UUID of a given useraudio_id- numeric id for audio fileaudio_file- audio file nameduration_ms- duration of audio in millisecondsprompt_id- numeric id for promptprompt- question for usertranscription- transcription of the audio responsevotes- number of people that who approved a given transcriptage- age of the speaker1gender- gender of the speaker1language- language namesplit- for data modelling, which subset of the data does this clip pertain tochar_per_sec- how many characters of transcription per second of audioquality_tags- some automated assessment of the transcription--audio pair, separated by|transcription-length- character per second under 3 characters per secondspeech-rate- characters per second over 30 characters per secondshort-audio- audio length under 2 secondslong-audio- audio length over 30 seconds
Get involved!
Community links
Contribute
Acknowledgements
The recording of spontaneous speech for this dataset was made with volunteer contribution from individuals who are not cited here for privacy reasons, but whose invulable contribution is acknowledged.
Datasheet authors
Emmanuel Ngue Um <ngueum@gmail.com>
Funding
The compilation of this dataset was made possible thanks to grant awarded by the Mozilla Foundation
Licence
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.
Footnotes
For a full list of age, gender, and accent options, see the demographics spec. These will only be reported if the speaker opted in to provide that information. ↩ ↩2
