Sundanese TTS
License:
CC-BY-SA-4.0
Steward:
CommunityTask: TTS
Release Date: 3/9/2026
Format: WEBM, TSV
Size: 298.10 MB
Share
Description
The Sundanese TTS dataset represents the Sundanese language using the Priangan Sundanese dialect as the standard Sundanese in West Java province, Indonesia, reflecting both traditional forms and modern variations in everyday communication practices. This dataset can be utilized for linguistic research, cultural documentation, sociolinguistic studies, and the development of regional language technologies involving code-mixing with Indonesian.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
For commercial use, you must obtain explicit permission from the dataset owner. Please send an email to request authorization. For non-commercial research purposes, proper citation is required, and commercial distribution is prohibited. In addition, the dataset owner accepts compensation in any form.
Forbidden Usage
Re-uploading or redistributing this dataset is prohibited.
Processes
Ethical Review
This dataset was created by writing texts in the Priangan dialect of Sundanese with code-mixing of Indonesian.The files were read and recorded by a native speaker through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.
Intended Use
This dataset is designed to document Sundanese language speech across various domains of community life.
Metadata
Language:
This dataset uses the Priangan dialect of the Sundanese language from West Java, with Indonesian code-mixing.
Source(s):
Created by the owner of the dataset, considered as a linguist and native speaker.
Domain(s):
General domain, covering themes of culture, family, daily activities, education, and social interaction.
Size:
Over 5 hours.
Structure:
audio file name, text
Sample:
“Kabersihan lingkungan téh kacida pentingna dina kahirupan urang”
“prédiksi ngeunaan kaayaan iklim di mangsa nu bakal datang cukup matak hariwang”
“Barudak mindeng maén bal di lapang saban sore datangna”
“tekanan hirup, tekanan gawé, jeung tekanan sosial beuki gedé karasana”
“Indonesia teh nagara nu miboga kaendahan alam jeung sajarah budaya nu kacida kentelna tur beunghar”
Writing System:
Latin alphabet (A–Z), Arabic numerals (0–9)
Useful Link:
https://petabahasa.kemendikdasmen.go.id/mapEnlarge2.php?idp=12