Documenting Ekpeye Folktales and Preserving Cultural Heritage
License:
CC-BY-NC-SA-4.0
Steward:
NaijaVoices (Lanfrica Labs)Task: OTH
Release Date: 12/2/2025
Format: MP4, TXT, DOCX
Size: 5.97 GB
Share
Description
This dataset presents 21 video-recorded Ekpeye folktales (1h28m) narrated by two community elders, each paired with transcripts and English translations that include narrative summaries. It offers a rich multimodal resource for speech, video, storytelling, and cultural heritage research, as well as training multilingual and multimodal AI systems.
Specifics
Licensing
Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)
https://spdx.org/licenses/CC-BY-NC-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
Use under the default license (non-commercial) is allowed only for academic, educational, or personal purposes (i.e. non-commercial use) If you want to use the dataset (or derivatives) commercially, you must obtain a proper commercial waiver from NaijaVoices. Reach out at info@naijavoices.com Any published work or product using the dataset must give proper attribution to the dataset creators, including the NaijaVoices community — e.g., citing their paper. You must comply with all applicable data-protection / privacy laws in handling the dataset and metadata (e.g. the regulations relevant under the donor’s jurisdiction) and be transparent about your use. Use must be ethical: you cannot use the dataset in a way that perpetuates stereotypes or biases about any group or community. Do not use the dataset in ways that misrepresent, appropriate, or misuse cultural identities or expressions — e.g. ,avoid misuse that mis-frames cultural content for profit or manipulation.
Forbidden Usage
You must not attempt to identify or reveal the real identities of the voice donors (speakers) in the dataset. Voice cloning or creating high-fidelity replicas of individual speakers (i.e. voice cloning) is explicitly prohibited. You may not use the dataset to build or train systems that generate hate speech, discriminatory language, or content that targets groups in harmful ways. You may not use the dataset for surveillance, intrusive monitoring, or any privacy-violating applications. Using the dataset to manipulate political discourse, influence elections, or perform political propaganda is forbidden. It is forbidden to repurpose the dataset (or derivative datasets) to create another dataset that is “substantially similar in content, structure or purpose” for commercial redistribution or sale — i.e. you cannot re-host or resell the dataset or derived dataset commercially. Using the dataset to generate violent, inciting or hateful content — or content promoting violence/aggression — is prohibited.
Metadata
Overview
This dataset contains video recordings of folktales in the Ekpeye language, a Niger-Congo language spoken primarily in Nigeria. The dataset was collected by Dr Uwuma Doris Ugwu as part of the NaijaVoices Micro-Grants Heritage project. This collection focuses on preserving traditional oral folktales from the Ihuaije Folktale Group.
Dataset Statistics
Total Recordings
21 video recordings of Ekpeye folktales with corresponding transcript and translation files
Total duration: 1 hour 28 minutes 23 seconds (5,303.87 seconds)
Average duration per recording: 4 minutes 13 seconds (252.57 seconds)
Video files are stored in the
EKPEYE FOLKTALE VIDEOS/directory
Narrators
2 unique narrators contributing to the dataset
Narrator distribution:
EKP 1: 17 folktales (81.0%)
EKP 2: 4 folktales (19.0%)
Gender Distribution
Male narrators: 21 recordings (100.0%)
Age Range Distribution
70-75 years: 17 recordings (81.0%)
40-45 years: 4 recordings (19.0%)
Geographic and Linguistic Information
Country: All recordings are from Nigeria (21 recordings)
Language: Ekpeye (21 recordings)
Folktale Group: All recordings are from the Ihuaije Folktale Group
File Structure
The dataset is organized in the following directory structure:
├── EKPEYE FOLKTALE VIDEOS/
│ ├── ADA.mp4
│ ├── AGAMILO.mp4
│ ├── AKITA_.mp4
│ └── [additional video files]
├── TRANSCRIPTS/
│ ├── ADA_.docx
│ ├── ADA_.txt
│ ├── AGAMILO_.docx
│ ├── AGAMILO_.txt
│ └── [additional transcript files in both .docx and .txt formats]
├── TRANSLATIONS/
│ ├── ADA.docx
│ ├── ADA.txt
│ ├── AGAMILO_.docx
│ ├── AGAMILO_.txt
│ └── [additional translation files in both .docx and .txt formats]
├── metadata.csv
└── dataset-card.md
Each folktale recording consists of:
Video file (
.mp4format) stored inEKPEYE FOLKTALE VIDEOS/Transcript file available in two formats:
.docxformat (original, preserves formatting).txtformat (converted for faster data processing)
Translation file available in two formats:
.docxformat (original, preserves formatting).txtformat (converted for faster data processing)The translation files contain a summary of the video explaining what the folktale is about
Metadata entry in
metadata.csvwith narrator information and file references
File Format Notes
Transcripts and Translations:
Transcripts: The transcript files contain the original Ekpeye language text as spoken in the video recordings.
Translations: The translation files additionally contain summaries of the videos explaining what each folktale is about. These summaries provide context and overview of the narrative content, making the folktales more accessible to those who may not understand the Ekpeye language.
.docxformat: The original transcript and translation files are provided in Microsoft Word format (.docx) to preserve their original formatting, including any special typography, spacing, or structural elements that may be important for understanding the folktales..txtformat: Plain text versions (.txt) of all transcripts and translations are also provided for faster data processing, text analysis, and compatibility with automated tools and scripts. These are converted versions of the original.docxfiles.
Users can choose the format that best suits their needs:
Use
.docxfiles when original formatting is importantUse
.txtfiles for computational analysis, text processing, or when working with tools that require plain text input
Metadata Fields
The metadata CSV file includes the following fields:
NARRATOR ID: Unique identifier for each narrator (EKP 1, EKP 2)GENDER: Gender of the narrator (MALE)AGE RANGE: Age range of the narrator (70-75, 40-45)COUNTRY: Country where recording was made (NIGERIA)LANGUAGE: Language spoken (EKPEYE)FOLKTALE_NAME: Name of the video file (e.g.,ADA,AGAMILO). This is also the name of the corresponding transcription and translation files.FOLKTALE GROUP: The folktale group affiliation (IHUAJIE FOLKTALE GROUP)
File Naming Convention
Files follow a naming pattern based on the folktale title:
Video files:
{FOLKTALE_NAME}.mp4(e.g.,ADA.mp4,AGAMILO.mp4)Transcript files:
{FOLKTALE_NAME}_.docxand{FOLKTALE_NAME}_.txt(e.g.,ADA_.docx,ADA_.txt)Translation files:
{FOLKTALE_NAME}.docxand{FOLKTALE_NAME}.txt(e.g.,ADA.docx,ADA.txt)
Note: Some file names may include trailing underscores or slight variations in naming conventions. The metadata CSV file provides the authoritative reference for matching files.