TTS Central Javanese

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 2/2/2026

Format: WEBM, TSV

Size: 440.11 MB


Share

Description

The Central Javanese dialect is a variety of the Javanese language that is politically regarded as “Standard Javanese.” It is generally spoken in Semarang, Solo, and Yogyakarta, Indonesia. Central Java and Yogyakarta are considered the historical centers of the Mataram Kingdom in Java. This dialect comprises three speech levels: ngoko, krama, and krama inggil. Ngoko is commonly used in everyday communication, whereas krama and krama inggil are typically used in palace contexts or when addressing elders and individuals of higher social status. This dataset focuses on the ngoko variety as used in daily interactions, including instances of Indonesian and English code-switching.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

For research use, proper citation is required.

Forbidden Usage

Re-uploading, modifying, or redistributing this dataset without the owner’s permission is strictly prohibited.

Processes

Ethical Review

This dataset was created using the Semarang dialect of Central Javanese, with code-mixing involving Indonesian and English. The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.

Intended Use

This dataset is intended for research purposes.

Metadata

Language

This dataset contains Central Javanese dialects generally used in Semarang, Solo and Yogyakarta, with code switching into Indonesian and English.

Source(s):

Created by the owner of the dataset, considered a linguist and native speaker.

Domain(s):

The collection features Javanese language usage covering everyday topics, such as daily activities, opinion on education, experience on vacation, social media usage, etc.

Size:

7 Hours 45 minutes, 457 MB

Structure:

Audio file name, text

Sample:

"Saiki, berita online iku okeh banget jenise lan seko ngendi wae."

"Kebiasaan jajan kuwi yo ditularke seko sosial media, okeh influencer makanan sing senengane mangan lan direkam."

"Pasar tradisional ning mben daerah kuwi ono keunikane dewe-dewe."

"Salah siji sing penting seko perusahaan iku sumber daya manusia utowo SDM."

"Musik iku termasuk media nggo ekspresi senine menungso."

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)