TTS Sasak Language

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 2/5/2026

Format: WEBM, TSV

Size: 293.92 MB


Share

Description

This dataset is compiled based on a series of questions related to daily life, such as routine activities, social interactions, personal experiences, and common customs of the Sasak people. Answers to these questions were delivered both orally and in writing using the informal Sasak language, as used in everyday communication. The language used in this dataset was produced by native Sasak speakers, thus representing authentic, natural, and contextual language use. Sasak is a regional language spoken by the Sasak people who inhabit Lombok Island, West Nusa Tenggara Province (NTB), Indonesia. The Sasak people are the majority ethnic group in Lombok, and Sasak serves as the primary means of communication in daily life, both within the family, local community, and informal social interactions. Geographically, Sasak speakers are spread throughout Lombok, including West Lombok, Central Lombok, East Lombok, North Lombok, and Mataram City. Open for any collaboration in NLP of Sasak language and Indonesian.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

Any use of this dataset requires prior written permission. Please contact the data owner to discuss copyright-related matters. Compensation may be provided in any mutually agreed form.

Forbidden Usage

Any attempt to distribute dataset derivations, modifications, or trained models without the owner’s permission is prohibited.

Processes

Ethical Review

The dataset is written and read by linguists and native Sasak speakers.

Metadata

Language:

This dataset uses the Sasak language from Mataram City, Lombok, West Nusa Tenggara. There is interference from Indonesian, Javanese, and Balinese.

Source(s):

Created by the owner of the dataset, considered as a linguist and native speaker.

Domain(s):

This dataset consists of general domains: daily life activity, social interactions, personal experiences, opinions on current issues, etc.

Size:

Around 33400 tokens, 5 hours of TTS duration.

Technical Datasheet:

5 Hours

Structure:

Audio file name, text

Sample:

"Tiang seneng duduk di teras sambil ngeliat langit nu cerah, ngerasain hangatne matahari pagi."

"Dengan cara meno, waktu ndak habis jari perkara remeh lagi ndak ngasih dampak besar."

"Selain no, gotong royong bersih-bersih jalan, selokan, kance lapangan sering dilakuang bareng-bareng."

"Pikiran jugak bisa ndak stabil, emosi gampang naik, kance konsentrasi berkurang."

"Jam bertamu no paling ideal waktu ndak ngganggu aktivitas kance istirahat tuan bale."

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)

Useful Link:

https://petabahasa.kemendikdasmen.go.id/provinsi.php?idp=Nusa%20Tenggara%20Barat https://id.wikipedia.org/wiki/Berkas:Peta_bahasa_di_Lombok.png https://repositori.kemendikdasmen.go.id/3343/1/Pemetaan%20Bahasa-Bahasa%20di%20Nusa%20Tenggara%20Barat%20%20%20%2095h.pdf