Betawi TTS of Cultural Language (BEKAL)

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

Community

Task: TTS

Release Date: 2/5/2026

Format: WEBM, TSV

Size: 309.99 MB


Share

Description

Betawi TTS of Cultural Language (BEKAL) is a dataset that represents the Betawi language as a living and evolving language within the urban context of Jakarta, reflecting both traditional forms and modern variations that emerge in everyday communicative practices. This dataset can be utilized for linguistic research, cultural documentation, urban sociolinguistic studies, and the development of language technologies based on regional languages with Indonesian code-mixing.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

Please contact the dataset owner to request permission. For research use, proper citation is required and not to be distributed commercially.

Forbidden Usage

Re-uploading, modifying, or redistributing this dataset without the owner’s permission is prohibited.

Processes

Ethical Review

This dataset was created by writing texts in the Betawi language with code-mixing in Indonesian. The files were read and recorded by native speakers through the hosting platform https://sabre-2.onrender.com/. The collection of audio recordings was compiled into a comprehensive dataset.

Intended Use

This dataset is designed to document, map, and analyze the variety of Betawi language speech across various domains of community life.

Metadata

Language:

This dataset uses the Bekasi dialect of the Betawi language from West Java, with urban Jakarta Indonesian code-mixing used by young people.

Source(s):

Created by the team of the dataset creator, considered as linguists and native speakers.

Domain(s):

General domain, covering themes of culture, family, daily activities, education, and social interaction.

Technical Datasheet:

5,5 hours

Size:

Approximately 5,5 hours for TTS.

Structure:

Audio file name, text

Sample:

",,,makanye orang betawi sering dibilang gepyak dan semua tetanggenye dianggep sodara,,,"

",,,Orang kampung pade suka nandak-nandak ame nyawer kalo ade jaipongan,,,"

",,,Kadang-kadang ngeliatin orang yang lagi bebiakan,,,"

",,, kalian tepinin lantai rumeh kering,,,"

",,, kaye ngumpul aje di bale baringan maen hp sama ngopi kopi item,,,"

Writing System:

Latin alphabet (A–Z), Arabic numerals (0–9)

Useful Link

www.linkedin.com/in/ riska-legistari-febri-5aab98252