TTS Central Javanese
License:
CC-BY-SA-4.0
Steward:
CommunityTask: TTS
Release Date: 2/2/2026
Format: WEBM, TSV
Size: 440.11 MB
Share
Description
The Central Javanese dialect is a variety of the Javanese language that is politically regarded as “Standard Javanese.” It is generally spoken in Semarang, Solo, and Yogyakarta, Indonesia. Central Java and Yogyakarta are considered the historical centers of the Mataram Kingdom in Java. This dialect comprises three speech levels: ngoko, krama, and krama inggil. Ngoko is commonly used in everyday communication, whereas krama and krama inggil are typically used in palace contexts or when addressing elders and individuals of higher social status. This dataset focuses on the ngoko variety as used in daily interactions, including instances of Indonesian and English code-switching.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
For research use, proper citation is required.
Forbidden Usage
Re-uploading, modifying, or redistributing this dataset without the owner’s permission is strictly prohibited.
Processes
Ethical Review
This dataset was created using the Semarang dialect of Central Javanese, with code-mixing involving Indonesian and English. The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.
Intended Use
This dataset is intended for research purposes.
Metadata
Language
This dataset contains Central Javanese dialects generally used in Semarang, Solo and Yogyakarta, with code switching into Indonesian and English.
Source(s):
Created by the owner of the dataset, considered a linguist and native speaker.
Domain(s):
The collection features Javanese language usage covering everyday topics, such as daily activities, opinion on education, experience on vacation, social media usage, etc.
Size:
7 Hours 45 minutes, 457 MB
Structure:
Audio file name, text
Sample:
"Saiki, berita online iku okeh banget jenise lan seko ngendi wae."
"Kebiasaan jajan kuwi yo ditularke seko sosial media, okeh influencer makanan sing senengane mangan lan direkam."
"Pasar tradisional ning mben daerah kuwi ono keunikane dewe-dewe."
"Salah siji sing penting seko perusahaan iku sumber daya manusia utowo SDM."
"Musik iku termasuk media nggo ekspresi senine menungso."
Writing System:
Latin alphabet (A–Z), Arabic numerals (0–9)