Javanese TTS of Banyumasan Dialect

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 3/3/2026

Format: WEBM, TSV

Size: 559.08 MB


Share

Description

This dataset comprises speech data produced by a speaker of the Banyumasan dialect of Javanese (locally known as Ngapak), Central Java Province, Indonesia. All datasets use the informal register (Ngoko) and include various topics.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

Written permission is required for any use of this dataset. Please contact us to discuss the copyright. We accept compensation in any form.

Forbidden Usage

Any attempt to clone the voice or train models that imitate the speakers in this dataset is forbidden.

Processes

Ethical Review

This dataset was created by writing texts using the Banyumasan dialect of Javanese with code-mixing in Indonesian and/or English. In total, the dataset contains 67,416 words, with an overall audio duration of 9 hours, 40 minutes, and 23 seconds . The textual data were read aloud and recorded by native speakers through the hosting platform (https://sabre-2.onrender.com/). All recordings were then compiled and organized into a comprehensive audio dataset.

Intended Use

N/A

Metadata

Language:

This dataset uses Banyumasan Javanese (Ngapak), with Indonesian and/or English code-mixing.

Source(s):

Created by the owner of the dataset, considered as linguists and native speakers.

Domain(s):

General domain, the topics include society, environment, media, education, culture, health, etc.

Size:

9 hours, 40 minutes, and 23 seconds

Structure:

Audio file name, text

Sample:

“Intensitas wong maca koran nang lingkungan sekitarku wis ora patia sering.”

“Ana mbireng ahli utawa ilmuan sing bisa ngembangna teknologi kaya teknologi AI, ahli sing bisa ngembangna perangkat kanggo njalanna komputer lan ngelindungi data-data digital.”

“Masak bahan panganan kesuwen juga bisa ngilangna utawa ngurangi kandungan gizi lan vitamin sing ana nang panganan kuwe.”

“Nang basa Inggris, teknik kiye dijenengi teknik meal prep (meal preparation).”

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)