Podcast Homostoria (Indonesia)
License:
CC-BY-SA-4.0
Steward:
CommunityTask: ASR
Release Date: 11/25/2025
Format: mp3
Size: 302.97 MB
Share
Description
This dataset features discussions on modern media—including film, podcasts, and social media—and its connection to local customs and traditions. The conversations are primarily in Indonesian, with frequent code-switching between English and Javanese.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlMetadata
This dataset is derived from the Homostoria podcast. It features conversations primarily conducted in Indonesian, with frequent code-switching between English and Javanese.
Language
Bahasa Indonesia - Indonesian (id)
Domain
Global and local modern media discussions.
Size
This dataset contains 11 hours of spontaneous speech within 16 audio files.
Process
This dataset is transcribed with automatic transcription tool (Transkriptor) and reviewed manually by linguist native speakers.
Fields
Columns in the .tsv file contains the following information:
"audio file": the name of audio files
"start": time when speech begins
"end": time when speech begins
"text": speech transcriptions
Sample
Ya, secure lah.
Ya, at least secure misal kayak gitu.
Jadi mungkin pemaknaan gitu ya.
Mungkin yang kita bawa itu pemaknaan bahwa self-help ini nggak hanya hal-hal yang seperti itu gitu.
Tapi mungkin lebih luas gak sih? Kalau menurutmu gimana nih, Hans?