Sundanese TTS

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 3/9/2026

Format: WEBM, TSV

Size: 298.10 MB


Share

Description

The Sundanese TTS dataset represents the Sundanese language using the Priangan Sundanese dialect as the standard Sundanese in West Java province, Indonesia, reflecting both traditional forms and modern variations in everyday communication practices. This dataset can be utilized for linguistic research, cultural documentation, sociolinguistic studies, and the development of regional language technologies involving code-mixing with Indonesian.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

For commercial use, you must obtain explicit permission from the dataset owner. Please send an email to request authorization. For non-commercial research purposes, proper citation is required, and commercial distribution is prohibited. In addition, the dataset owner accepts compensation in any form.

Forbidden Usage

Re-uploading or redistributing this dataset is prohibited.

Processes

Ethical Review

This dataset was created by writing texts in the Priangan dialect of Sundanese with code-mixing of Indonesian.The files were read and recorded by a native speaker through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.

Intended Use

This dataset is designed to document Sundanese language speech across various domains of community life.

Metadata

Language:

This dataset uses the Priangan dialect of the Sundanese language from West Java, with Indonesian code-mixing.

Source(s):

Created by the owner of the dataset, considered as a linguist and native speaker.

Domain(s):

General domain, covering themes of culture, family, daily activities, education, and social interaction.

Size:

Over 5 hours.

Structure:

audio file name, text

Sample:

“Kabersihan lingkungan téh kacida pentingna dina kahirupan urang”

“prédiksi ngeunaan kaayaan iklim di mangsa nu bakal datang cukup matak hariwang”

“Barudak mindeng maén bal di lapang saban sore datangna”

“tekanan hirup, tekanan gawé, jeung tekanan sosial beuki gedé karasana”

“Indonesia teh nagara nu miboga kaendahan alam jeung sajarah budaya nu kacida kentelna tur beunghar”

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)

Useful Link:

https://petabahasa.kemendikdasmen.go.id/mapEnlarge2.php?idp=12