Mada-French Parallel Corpus 1.0

License icon

License:

NOODL-1.0

Shield icon

Steward:

Institute of African Digital Humanities

Task: TTS

Release Date: 3/3/2026

Format: TSV

Size: 122.37 KB


Share

Description

This dataset comprises a parallel corpus of Mada–French literary text translations totalling 2,154 lines. It is designed to support the benchmarking, training and evaluation of machine translation models for Mada, a language spoken in Cameroon. The corpus provides aligned, sentence and paragraph-level translations that capture the stylistic, lexical and syntactic features of literary Mada discourse and how these are rendered in the local variety of French.

Specifics

Licensing

Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)

https://licensingafricandatasets.com/nwulite-obodo-license

Considerations

Restrictions/Special Constraints

You agree: - To download this dataset for research, scientific or educational (non profit) use only - That you will not re-host or re-share this dataset

Forbidden Usage

You agree not to use the data for: Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.

Processes

Intended Use

This dataset is intended for the training or testing of machine learning models. Its purpose is to support the learning and revitalisation of the Mada (mxu) language.

Metadata

Language

Maɗa (mxu) should not be confused with Mada (mda). The former is an Afro-Asiatic language spoken in Cameroon, while the latter is an Atlantic-Congo language spoken in Nigeria (see Ethnologue and Glottolog online). This dataset focuses on Mada (mxu), a Chadic language belonging to the Afro-Asiatic family, which is spoken in Cameroon's Far North Region, specifically in the Mayo-Sava Division and Tokombere Subdivision. It is believed that the Mada-speaking group formerly belonged to the Wandala (or Mandara) kingdom alongside a number of other groups, including the Wuzlam, Mayan, Melokwo, Zelgwa-Gemzek, Zulgo-Gemzek and Gudawa.

Variants

We were unable to find specific information on the sociolinguistic and dialectal situation of Maɗa while preparing this dataset for publication. According to Glottolog Online, Maɗa belongs to the Madaic group, which also includes the Muyang and Wuzlam languages.

Alphabet

StatusLetters / forms
Core alphabeta, b, c, d, e, f, g, h, j, k, l, m, n, o, p, r, s, t, u, v, w, y, z
Diacritic form (grammatical particle)à
Diacritic form (single lexical item)ñ
Diacritic forms (status unclear)â, é
Productive consonant clusterskw, kl, ng, nd, nz, mb, gw, ft, dw

Source

The texts in this dataset were created around the 1960s and 1970s. The texts are transcriptions of literary genres performed orally and prompted by missionaries. It is unclear whether these texts were recorded on tape or if the transcriptions were done on-site by the collectors. The texts were further edited and revised by Hubert Nkoumou in the 2010s, when he was working as a consultant at the local Mada language academy.

Domain

The texts are narratives that deal with a variety of topics, such as procreation, marriage, household life and social life, as well as the supernatural.

Size

122.37 KB

Structure

This parallel corpus comprises 18 texts totalling 2,154 lines. Each text consists of translation units in both the source and target languages. Some texts are missing a few lines, either from the source or target text. This is to preserve the originality of the text from which the parallel corpus was created. The French translations reflect informal usage and frequently contain lexical or grammatical inconsistencies. Users of this dataset may wish to edit the corpus before applying it to specific tasks or contact the Point of Contact or Legal Owner for updates.

Sample

#Mada (mxu)French
1KWATAR GEDEGA GA VLOMLE CONTE DE LA CALEBASSE
2Walna eke ana ala : Abak ana no endeana nata-kala –va kla ma, needere elle, nangaa ka yam, nehyeea mahnzow napala-ra brom, nagaa a bra va , nahlada-ra ené.Il y avait une femme qui disait: « Le jour où moi vraiment je trouverai un enfant, je ferai la cuisine avec j'irai à l'eau avec je moudrai la farine avec, j'irai au bois avec, j'irai dans le grenier avec, je dormirai avec aussi ».
3adaba na wal nehe ha ete ava dedena efe kla ftek tam.Car cette femme, depuis qu'elle est venue chez son père n'a jamais eu d'enfant.
4Yân, dwa Ane-fan da.Elle n'a pas le lait.
5Kla nehe, avlal daf, azam da.Cet enfant elle lui donne de la boule, il ne mange pas.
6Akagwad ma, Anguv ta manga bra va.Au chant du coq, elle alla dans le grenier.
7Tahnal-kala dafa, azam da.On lui sert de la boule, il ne mange pas.
8Tahala : « uro, kazzama awaka, kunumuro elgwa, Kahamala akla nehe : mawa, metemen edena, kazzama meseka kegyema-la yama, kazzama-lana mengesa.On lui dit: « va, vous prenez une chèvre, vous allez en brousse, vous dites à l'enfant: on va faire un sacrifice: vous prenez une marmite, vous prenez de l'eau avec vous prenez aussi un couteau avec.
9Afalaña kodomuro elgwa zerre Ma, kahamala : « ngta-fan Afa dam nehe menges enne Ejeke-femnere.Quand vous serez allés loin en brousse! Vous lui direz: « attends auprès de ces choses, on a oublié de prendre un couteau. »
10Afalaña Adaa gama ma, menzea-kabarra.Quand il y sera, qu'il y reste.