Everyday Interactions in Ibọnọ and Obolo Languages

License icon

License:

CC-BY-NC-SA-4.0

Shield icon

Steward:

NaijaVoices (Lanfrica Labs)

Task: NLP

Release Date: 12/1/2025

Format: WAV, TXT

Size: 2.43 GB


Share

Description

This dataset offers 11.3 hours of natural everyday speech in Ibọnọ and Obolo, captured from 20 adult speakers across 120 recordings, each paired with a clean transcript and metadata.

Specifics

Licensing

Creative Commons Attribution Non Commercial Share Alike 4.0 International (CC-BY-NC-SA-4.0)

https://spdx.org/licenses/CC-BY-NC-SA-4.0.html

Considerations

Restrictions/Special Constraints

Use under the default license (non-commercial) is allowed only for academic, educational, or personal purposes (i.e. non-commercial use) If you want to use the dataset (or derivatives) commercially, you must obtain a proper commercial waiver from NaijaVoices. Reach out at info@naijavoices.com Any published work or product using the dataset must give proper attribution to the dataset creators, including the NaijaVoices community — e.g., citing their paper. You must comply with all applicable data-protection / privacy laws in handling the dataset and metadata (e.g. the regulations relevant under the donor’s jurisdiction) and be transparent about your use. Use must be ethical: you cannot use the dataset in a way that perpetuates stereotypes or biases about any group or community. Do not use the dataset in ways that misrepresent, appropriate, or misuse cultural identities or expressions — e.g. ,avoid misuse that mis-frames cultural content for profit or manipulation.

Forbidden Usage

You must not attempt to identify or reveal the real identities of the voice donors (speakers) in the dataset. Voice cloning or creating high-fidelity replicas of individual speakers (i.e. voice cloning) is explicitly prohibited. You may not use the dataset to build or train systems that generate hate speech, discriminatory language, or content that targets groups in harmful ways. You may not use the dataset for surveillance, intrusive monitoring, or any privacy-violating applications. Using the dataset to manipulate political discourse, influence elections, or perform political propaganda is forbidden.

Metadata

Overview

This dataset contains audio recordings of two Niger-Congo languages spoken in Nigeria: Ibọnọ and Obolo. The dataset was collected by Rogers Katelem Edeh as part of the NaijaVoices Micro-Grants Heritage project.

Dataset Statistics

Total Recordings

  • 120 audio recordings with corresponding transcript files

  • 60 recordings in Ibọnọ language

  • 60 recordings in Obolo language

  • Total duration: 11:20:47 (11 hours, 20 minutes, 47 seconds)

  • Average duration per recording: 340.39 seconds (5.67 minutes)

Speakers

  • 20 unique speakers contributing to the dataset

  • Speaker distribution:

    • Abi: 6 recordings (5.0%)

    • Bfe: 6 recordings (5.0%)

    • Chn: 6 recordings (5.0%)

    • Edt: 6 recordings (5.0%)

    • Ejb: 6 recordings (5.0%)

    • Ekw: 6 recordings (5.0%)

    • Ens: 6 recordings (5.0%)

    • Fja: 6 recordings (5.0%)

    • Gls: 6 recordings (5.0%)

    • Hrf: 6 recordings (5.0%)

    • Ikp: 6 recordings (5.0%)

    • Jnp: 6 recordings (5.0%)

    • Jul: 6 recordings (5.0%)

    • Mok: 6 recordings (5.0%)

    • Mpr: 6 recordings (5.0%)

    • Nga: 6 recordings (5.0%)

    • Rfr: 6 recordings (5.0%)

    • Rsi: 6 recordings (5.0%)

    • Tta: 6 recordings (5.0%)

    • Utn: 6 recordings (5.0%)

Gender Distribution

  • Male speakers: 66 recordings (55.0%)

  • Female speakers: 54 recordings (45.0%)

Age Range Distribution

  • Over 30 years: 120 recordings (100.0%)

Geographic and Linguistic Information

  • Country: All recordings are from Nigeria (120 recordings)

  • Languages:

    • Ibọnọ: 60 recordings (50.0%)

    • Obolo: 60 recordings (50.0%)

File Structure

The tar file contains the following folders:

  1. audios where all the audio files are stored.

  2. transcripts, where all the transcript .txt files are kept.

  3. The metadata.xlsx file contains the mappings for the audios and transcripts, with additional details (see below).

Each file is given a unique filename made up of the speaker id.

Metadata Fields

The metadata entry in metadata.xlsx contains the corresponding audio and transcript filenames along with the following fields:

  • SPKR ID: Unique identifier for each speaker (Abi, Bfe, Chn, etc.)

  • GENDER: Gender of the speaker (M/F)

  • AGE RANGE: Age range of the speaker (over 30)

  • COUNTRY: Country where recording was made (Nigeria)

  • LANGUAGE: Language spoken (Ibọnọ or Obolo)

  • AUDIO FILENAME: Name of the audio file

  • TRANSCRIPT FILENAME: Name of the corresponding transcript file NAIJA