TTS Balinese Language

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 3/11/2026

Format: WEBM, TSV

Size: 301.05 MB


Share

Description

The Balinese TTS dataset is created and narrated by native Balinese speakers with code-mixing in Indonesian. This dataset is designed to showcase the use of the Balinese language in everyday contexts, covering topics such as family, social interactions, and routine community activities. Each recording reflects natural language use by Balinese speakers, thus representing authentic communication in daily life. This dataset can be utilized for linguistic research, the development of automatic speech recognition systems, and other applications focused on the preservation and advancement of the Balinese language.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

Any use of this dataset requires prior written permission from the data owner by sending email. Users are also required to provide appropriate citations and comply with the applicable license terms.

Forbidden Usage

Any attempt to clone the voice or train models that imitate the speakers in this dataset is forbidden.

Processes

Ethical Review

(1) This dataset was created by writing texts in the Balinese with code-mixing in Indonesian. (2) The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. (3) The collection of audio recordings was compiled into a comprehensive dataset.

Intended Use

This dataset is intended to be used in the development of automatic speech recognition systems specifically focused on the Balinese language.

Metadata

Language

This dataset uses everyday Balinese in the Buleleng and Gianyar dialects and includes code-mixing with Indonesian.

Source(s):

Created by the owner of the dataset, considered as a linguist and native speaker.

Domain(s):

General domain: covering topics such as family, social interactions, and routine community activities.

Size:

Over 5 hours.

Structure:

Audio file name, text.

Sample:

“Tiang lan kuluwarga masi ngajeng godoh.”

“Tiang melali sareng timpal tiang e uli cenik.”

“Tiang majalan sareng timpal tiang e uli Batubulan ke Pantai Sanur.”

“Sawetara galah sia semengan, tiang majalan makuli ring tongos mekuli. “

“Sirep nika ngaenin bayu irage seger.”

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)

Useful Link

https://dictionary.basabali.org/-#