Corpus of Panjebar Semangat Javanese-Language Magazine
License:
CC-BY-SA-4.0
Steward:
PT Pancaran Semangat JayaTask: OTH
Release Date: 1/9/2026
Format: TXT
Size: 4.31 MB
Share
Description
This dataset is a TXT-format collection compiled from three years of popular articles published in the Javanese-language weekly magazine Panjebar Semangat. It compiles widely read, non-academic Javanese texts reflecting contemporary themes and language use.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
This dataset may be copied and redistributed in any form or medium for non-commercial or commercial purposes, provided that appropriate attribution is given to the dataset creator and owner, Panjebar Semangat. For commercial purposes, in particular, explicit permission is mandatory. Commercial use, resale, or distribution without attribution and explicit permission by email is strictly prohibited.
Forbidden Usage
It is strictly forbidden to use this dataset to train chatbots or large language models, except for educational purposes only. This dataset must not be used for: 1. misrepresentation or distortion of Javanese language or culture; 2. surveillance, profiling, or harmful social applications; 3. AI system training that erases attribution or provenance; 4. any use that conflicts with community values or cultural integrity.
Processes
Ethical Review
The dataset was curated through an internal self-review process from three years (156 editions) of the weekly Javanese-language magazine Panjebar Semangat. From the magazine’s all categories of content (popular articles, literary works, and dialectical articles), approximately 900 popular articles were selected to represent the most common and accessible use of Javanese language in written media, which is the Mataram style.
Intended Use
This dataset is intended to support the preservation and dissemination of the Javanese language as an indigenous language, particularly through educational, linguistic, and cultural research. It may be used for language learning materials and other applications that promote understanding and teaching of Javanese language usage and expression, with attribution as requested.
Metadata
Panjebar Semangat is a Javanese-language weekly magazine that has been published since 1933. It was founded during Indonesia’s pre-independence period as part of a broader movement to educate the indigenous Javanese population, and was initiated by the national hero Dr. Soetomo. Its readership spans all age groups, ranging from 13 to 80 years old.
By the time this dataset is uploaded, Panjebar Semangat had just reached 92 years of age. As an ever-evolving magazine, Panjebar Semangat is open to various forms of collaboration. Individuals, researchers, and institutions are welcome to reach out to Panjebar Semangat to explore collaborations related to dataset development or other initiatives. Collaboration requests can be directed via email to digital@panjebarsemangat.id.
This dataset, which comprises popular articles selected from 156 weekly editions (from year 2023 to 2025), was made possible through the work of Panjebar Semangat’s editorial team—Donny Toenggoel, Kukuh Setyo Wibowo, D. S. Elisabet Novililiana, and Ahmad Rizky Wahyudi—as well as the dedicated works of various contributors.
Language
The language used in Panjebar Semangat magazine represents the Mataram style of Javanese language. Its writers are linguists, language observers, literary practitioners, and Javanese cultural activists who understand the technical aspects of content creation. The content consists of reportage, opinion, and general texts written in Javanese.
Source(s)
156 editions of weekly magazine from the year 2023 to 2025 (in the last 3 years).
Domain(s)
Popular articles: General, Reportage, Opinion, Culture & Tradition, Politics & Public Affairs, Health & Lifestyle, Humor, Trivia.
Size
Around 1.7 million words.
Structure
File name, rubric/theme, original translation
Sample
"Jakarta klebu salah siji kutha gedhe kang lagi ngadhepi ancaman gandha mau. Sebab miturut asil pengukuran rutin, lemah Jakarta rata-rata mudhun utawa ambles 5,17 cm saben taune. Satemene ora ngemungake Jakarta wae sing lemahe mudhun sethithik mbaka sethithik, nanging uga Semarang, Tokyo, Shanghai lan Bangkok."
"Taat ngakoni menawa bakal kereme Jakarta wis ngatonake tandha-tandhane. Upamane ing kampung Muarabaru banyuning segara luwih dhuwur dibandhing dharatan. Ing Penjaringan, Jakarta Lor, ana mesjid sing biyene aman, saiki wiwit keblebeg banyu. Lan sing paling nyata yaiku amblese gedhung bersejarah Onderlinge Levensverzekering Van Eigen Hulp (OLVEH) ing kawasan Kota Lama. Gedhung kuwi mudhun 90 cm saka wiwit pembangunane taun 1921, lan diukur maneh dhek 2015 kepungkur."
"Kethoprak lesung yaiku salah sijine kesenian tradhisional Jawa sing nggabungake sandiwara, musik, lan tarian. Pertunjukan iki ndhisike kerep dipentasake ing acara-acara budaya, pahargyan, uga yen ana warga masyarakat sing nduwe gawe."
Writing System
Latin alphabet (A–Z), Arabic numerals (0–9)