Ewondo-French Parallel Corpus
License:
NOODL-1.0
Steward:
Institute of African Digital Humanities
Task: MT
Release Date: 11/8/2025
Format: TSV
Size: 137.84 KB
Description
This dataset is a parallel corpus of Ewondo to French texts. Text were obtained by transcription of raw audio files. Translation were added to enrich the original corpus. Alignment of Ewondo and Frnch texts were made in the process of creating this dataset.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Forbidden Usage
Publication, Re-speaking, Generative AI, Augmentation
Processes
Intended Use
Machine Translation, Language Teaching, Research
Metadata
Language
Ewondo is a Narrow Bantu language which is indigenous to a population mainly located in the Centre Region of Cameroon, with pockets of settlements in the South, and East Regions. Having been one among the earlier developped languages within the so-called Beti-Fang group - with comparable earlier development to be found in the Bulu language of the same Beti-Fang group, Ewondo is vehicular to populations in the South and East Regions of Cameroon, and has also developed into a creole known as Mongo Ewondo.
Variants
'Ewondo' is a glossonym that encompasses various dialects: Bene, Yebekolo and Mvele, among others. It is very difficult to draw up an accurate list of what constitutes a variant of Ewondo, as some speakers of these variants would claim to speak their own language and would not want to be subsumed under the name Ewondo. This situation is not unique to Ewondo, but demonstrates the power relationships underlying language policy in the political ecosystem of nation-states.
Writing System
The writing system used in this dataset is based on the General Alphabet of Cameroonian Languages, though it lacks tone marking.
1. Basic Latin letters
Consonants: b, d, f, g, h, j, k, l, m, n, ŋ, p, s, t, v, w, y, z
Vowels: a, e, i, o, u
2. Extended letters & tones
ǹ, ń, á, à, â, ǎ, ə, ǐ, ɔ, ɔ́, ɔ̀, ḿ, ɓ, ɗ
3. Common multigraphs
mb, nd, ng, ny, dz, ts, kp, mv, pf
4. Full alphabet
a, á, à, â, ǎ, b, ɓ, d, ɗ, e, é, è, ê, ə, ə́, ə̀, f, g, h, i, í, ì, î, ǐ, j,
k, l, m, ḿ, n, ń, ǹ, ŋ, o, ó, ò, ô, ɔ, ɔ́, ɔ̀, p, s, t, u, ú, ù, û, v, w, y, z
Source
Where is the Data from?
This dataset originates from audio recordings documenting personal histories of German colorization. These recordings were made in the early eighties as part of a research project led by Prince (Professor) Koum A Ndoumbe III.
Who wrote/created the text and when
Hubert Nkoumou created the transcriptions and French translations in this dataset. The transcriptions were made in 2017 and funded by the AfricAvenir Foundation. For the purpose of creating this dataset, Hubert Nkoumou has aligned and quality-checked the transcribed text and its translations.
Domain
This dataset is a transcription of prompted speech in the form of a directed interview. The aim of the interview was to elicit personal stories about the German colonial experience in Cameroon. Similar interviews were conducted in many other languages and locations across Cameroon.
Sample
Nala te watyan nâ, a dzal hana, onə wɔtə́tám ngǎ ya wayəm e byom bíngálod afolo ndzaman ?
Eminga nyɔ nyâkə mɔ́ yəgɛ, bə́ngákə a Kpanolɛt, akə man mɔn.
Mə mandziki yen ayen dili.
Ayi mod, okə nye hə mod nabɔ nye bisye hǎ ? Minə nye miayibɔ̌ manyaŋ bân mǎnyaŋ.
Mballa Edzɔmbi, bə Nanga, bəzá, bəzá, bə Nanga Hubɛd.
Eyɔŋ te manə Fulansi, manə Ndzaman anga kad nâ bɔbəjaŋ, ma yəmə́ kə nâ mod a nə va, məngazu kɔa áfan, ndɔ məngá manə bɔ ésye,
