TTS Sasak Language
License:
CC-BY-SA-4.0
Steward:
CommunityTask: TTS
Release Date: 2/5/2026
Format: WEBM, TSV
Size: 293.92 MB
Share
Description
This dataset is compiled based on a series of questions related to daily life, such as routine activities, social interactions, personal experiences, and common customs of the Sasak people. Answers to these questions were delivered both orally and in writing using the informal Sasak language, as used in everyday communication. The language used in this dataset was produced by native Sasak speakers, thus representing authentic, natural, and contextual language use. Sasak is a regional language spoken by the Sasak people who inhabit Lombok Island, West Nusa Tenggara Province (NTB), Indonesia. The Sasak people are the majority ethnic group in Lombok, and Sasak serves as the primary means of communication in daily life, both within the family, local community, and informal social interactions. Geographically, Sasak speakers are spread throughout Lombok, including West Lombok, Central Lombok, East Lombok, North Lombok, and Mataram City. Open for any collaboration in NLP of Sasak language and Indonesian.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
Any use of this dataset requires prior written permission. Please contact the data owner to discuss copyright-related matters. Compensation may be provided in any mutually agreed form.
Forbidden Usage
Any attempt to distribute dataset derivations, modifications, or trained models without the owner’s permission is prohibited.
Processes
Ethical Review
The dataset is written and read by linguists and native Sasak speakers.
Metadata
Language:
This dataset uses the Sasak language from Mataram City, Lombok, West Nusa Tenggara. There is interference from Indonesian, Javanese, and Balinese.
Source(s):
Created by the owner of the dataset, considered as a linguist and native speaker.
Domain(s):
This dataset consists of general domains: daily life activity, social interactions, personal experiences, opinions on current issues, etc.
Size:
Around 33400 tokens, 5 hours of TTS duration.
Technical Datasheet:
5 Hours
Structure:
Audio file name, text
Sample:
"Tiang seneng duduk di teras sambil ngeliat langit nu cerah, ngerasain hangatne matahari pagi."
"Dengan cara meno, waktu ndak habis jari perkara remeh lagi ndak ngasih dampak besar."
"Selain no, gotong royong bersih-bersih jalan, selokan, kance lapangan sering dilakuang bareng-bareng."
"Pikiran jugak bisa ndak stabil, emosi gampang naik, kance konsentrasi berkurang."
"Jam bertamu no paling ideal waktu ndak ngganggu aktivitas kance istirahat tuan bale."
Writing System:
Latin alphabet (A–Z), Arabic numerals (0–9)
Useful Link:
https://petabahasa.kemendikdasmen.go.id/provinsi.php?idp=Nusa%20Tenggara%20Barat https://id.wikipedia.org/wiki/Berkas:Peta_bahasa_di_Lombok.png https://repositori.kemendikdasmen.go.id/3343/1/Pemetaan%20Bahasa-Bahasa%20di%20Nusa%20Tenggara%20Barat%20%20%20%2095h.pdf