Mada-French Parallel Corpus 1.0
License:
NOODL-1.0
Steward:
Institute of African Digital HumanitiesTask: TTS
Release Date: 3/3/2026
Format: TSV
Size: 122.37 KB
Share
Description
This dataset comprises a parallel corpus of Mada–French literary text translations totalling 2,154 lines. It is designed to support the benchmarking, training and evaluation of machine translation models for Mada, a language spoken in Cameroon. The corpus provides aligned, sentence and paragraph-level translations that capture the stylistic, lexical and syntactic features of literary Mada discourse and how these are rendered in the local variety of French.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Restrictions/Special Constraints
You agree: - To download this dataset for research, scientific or educational (non profit) use only - That you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.
Processes
Intended Use
This dataset is intended for the training or testing of machine learning models. Its purpose is to support the learning and revitalisation of the Mada (mxu) language.
Metadata
Language
Maɗa (mxu) should not be confused with Mada (mda). The former is an Afro-Asiatic language spoken in Cameroon, while the latter is an Atlantic-Congo language spoken in Nigeria (see Ethnologue and Glottolog online). This dataset focuses on Mada (mxu), a Chadic language belonging to the Afro-Asiatic family, which is spoken in Cameroon's Far North Region, specifically in the Mayo-Sava Division and Tokombere Subdivision. It is believed that the Mada-speaking group formerly belonged to the Wandala (or Mandara) kingdom alongside a number of other groups, including the Wuzlam, Mayan, Melokwo, Zelgwa-Gemzek, Zulgo-Gemzek and Gudawa.
Variants
We were unable to find specific information on the sociolinguistic and dialectal situation of Maɗa while preparing this dataset for publication. According to Glottolog Online, Maɗa belongs to the Madaic group, which also includes the Muyang and Wuzlam languages.
Alphabet
| Status | Letters / forms |
|---|---|
| Core alphabet | a, b, c, d, e, f, g, h, j, k, l, m, n, o, p, r, s, t, u, v, w, y, z |
| Diacritic form (grammatical particle) | à |
| Diacritic form (single lexical item) | ñ |
| Diacritic forms (status unclear) | â, é |
| Productive consonant clusters | kw, kl, ng, nd, nz, mb, gw, ft, dw |
Source
The texts in this dataset were created around the 1960s and 1970s. The texts are transcriptions of literary genres performed orally and prompted by missionaries. It is unclear whether these texts were recorded on tape or if the transcriptions were done on-site by the collectors. The texts were further edited and revised by Hubert Nkoumou in the 2010s, when he was working as a consultant at the local Mada language academy.
Domain
The texts are narratives that deal with a variety of topics, such as procreation, marriage, household life and social life, as well as the supernatural.
Size
122.37 KB
Structure
This parallel corpus comprises 18 texts totalling 2,154 lines. Each text consists of translation units in both the source and target languages. Some texts are missing a few lines, either from the source or target text. This is to preserve the originality of the text from which the parallel corpus was created. The French translations reflect informal usage and frequently contain lexical or grammatical inconsistencies. Users of this dataset may wish to edit the corpus before applying it to specific tasks or contact the Point of Contact or Legal Owner for updates.
Sample
| # | Mada (mxu) | French |
|---|---|---|
| 1 | KWATAR GEDEGA GA VLOM | LE CONTE DE LA CALEBASSE |
| 2 | Walna eke ana ala : Abak ana no endeana nata-kala –va kla ma, needere elle, nangaa ka yam, nehyeea mahnzow napala-ra brom, nagaa a bra va , nahlada-ra ené. | Il y avait une femme qui disait: « Le jour où moi vraiment je trouverai un enfant, je ferai la cuisine avec j'irai à l'eau avec je moudrai la farine avec, j'irai au bois avec, j'irai dans le grenier avec, je dormirai avec aussi ». |
| 3 | adaba na wal nehe ha ete ava dedena efe kla ftek tam. | Car cette femme, depuis qu'elle est venue chez son père n'a jamais eu d'enfant. |
| 4 | Yân, dwa Ane-fan da. | Elle n'a pas le lait. |
| 5 | Kla nehe, avlal daf, azam da. | Cet enfant elle lui donne de la boule, il ne mange pas. |
| 6 | Akagwad ma, Anguv ta manga bra va. | Au chant du coq, elle alla dans le grenier. |
| 7 | Tahnal-kala dafa, azam da. | On lui sert de la boule, il ne mange pas. |
| 8 | Tahala : « uro, kazzama awaka, kunumuro elgwa, Kahamala akla nehe : mawa, metemen edena, kazzama meseka kegyema-la yama, kazzama-lana mengesa. | On lui dit: « va, vous prenez une chèvre, vous allez en brousse, vous dites à l'enfant: on va faire un sacrifice: vous prenez une marmite, vous prenez de l'eau avec vous prenez aussi un couteau avec. |
| 9 | Afalaña kodomuro elgwa zerre Ma, kahamala : « ngta-fan Afa dam nehe menges enne Ejeke-femnere. | Quand vous serez allés loin en brousse! Vous lui direz: « attends auprès de ces choses, on a oublié de prendre un couteau. » |
| 10 | Afalaña Adaa gama ma, menzea-kabarra. | Quand il y sera, qu'il y reste. |