TTS Balinese Language
License:
CC-BY-SA-4.0
Steward:
CommunityTask: TTS
Release Date: 3/11/2026
Format: WEBM, TSV
Size: 301.05 MB
Share
Description
The Balinese TTS dataset is created and narrated by native Balinese speakers with code-mixing in Indonesian. This dataset is designed to showcase the use of the Balinese language in everyday contexts, covering topics such as family, social interactions, and routine community activities. Each recording reflects natural language use by Balinese speakers, thus representing authentic communication in daily life. This dataset can be utilized for linguistic research, the development of automatic speech recognition systems, and other applications focused on the preservation and advancement of the Balinese language.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
Any use of this dataset requires prior written permission from the data owner by sending email. Users are also required to provide appropriate citations and comply with the applicable license terms.
Forbidden Usage
Any attempt to clone the voice or train models that imitate the speakers in this dataset is forbidden.
Processes
Ethical Review
(1) This dataset was created by writing texts in the Balinese with code-mixing in Indonesian. (2) The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. (3) The collection of audio recordings was compiled into a comprehensive dataset.
Intended Use
This dataset is intended to be used in the development of automatic speech recognition systems specifically focused on the Balinese language.
Metadata
Language
This dataset uses everyday Balinese in the Buleleng and Gianyar dialects and includes code-mixing with Indonesian.
Source(s):
Created by the owner of the dataset, considered as a linguist and native speaker.
Domain(s):
General domain: covering topics such as family, social interactions, and routine community activities.
Size:
Over 5 hours.
Structure:
Audio file name, text.
Sample:
“Tiang lan kuluwarga masi ngajeng godoh.”
“Tiang melali sareng timpal tiang e uli cenik.”
“Tiang majalan sareng timpal tiang e uli Batubulan ke Pantai Sanur.”
“Sawetara galah sia semengan, tiang majalan makuli ring tongos mekuli. “
“Sirep nika ngaenin bayu irage seger.”
Writing System:
Latin alphabet (A–Z), Arabic numerals (0–9)