Corpus of Panjebar Semangat Javanese-Language Magazine

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

PT Pancaran Semangat Jaya

Task: OTH

Release Date: 1/9/2026

Format: TXT

Size: 4.31 MB


Share

Description

This dataset is a TXT-format collection compiled from three years of popular articles published in the Javanese-language weekly magazine Panjebar Semangat. It compiles widely read, non-academic Javanese texts reflecting contemporary themes and language use.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

This dataset may be copied and redistributed in any form or medium for non-commercial or commercial purposes, provided that appropriate attribution is given to the dataset creator and owner, Panjebar Semangat. For commercial purposes, in particular, explicit permission is mandatory. Commercial use, resale, or distribution without attribution and explicit permission by email is strictly prohibited.

Forbidden Usage

It is strictly forbidden to use this dataset to train chatbots or large language models, except for educational purposes only. This dataset must not be used for: 1. misrepresentation or distortion of Javanese language or culture; 2. surveillance, profiling, or harmful social applications; 3. AI system training that erases attribution or provenance; 4. any use that conflicts with community values or cultural integrity.

Processes

Ethical Review

The dataset was curated through an internal self-review process from three years (156 editions) of the weekly Javanese-language magazine Panjebar Semangat. From the magazine’s all categories of content (popular articles, literary works, and dialectical articles), approximately 900 popular articles were selected to represent the most common and accessible use of Javanese language in written media, which is the Mataram style.

Intended Use

This dataset is intended to support the preservation and dissemination of the Javanese language as an indigenous language, particularly through educational, linguistic, and cultural research. It may be used for language learning materials and other applications that promote understanding and teaching of Javanese language usage and expression, with attribution as requested.

Metadata

Panjebar Semangat is a Javanese-language weekly magazine that has been published since 1933. It was founded during Indonesia’s pre-independence period as part of a broader movement to educate the indigenous Javanese population, and was initiated by the national hero Dr. Soetomo. Its readership spans all age groups, ranging from 13 to 80 years old.

By the time this dataset is uploaded, Panjebar Semangat had just reached 92 years of age. As an ever-evolving magazine, Panjebar Semangat is open to various forms of collaboration. Individuals, researchers, and institutions are welcome to reach out to Panjebar Semangat to explore collaborations related to dataset development or other initiatives. Collaboration requests can be directed via email to digital@panjebarsemangat.id.

This dataset, which comprises popular articles selected from 156 weekly editions (from year 2023 to 2025), was made possible through the work of Panjebar Semangat’s editorial team—Donny Toenggoel, Kukuh Setyo Wibowo, D. S. Elisabet Novililiana, and Ahmad Rizky Wahyudi—as well as the dedicated works of various contributors.

Language

The language used in Panjebar Semangat magazine represents the Mataram style of Javanese language. Its writers are linguists, language observers, literary practitioners, and Javanese cultural activists who understand the technical aspects of content creation. The content consists of reportage, opinion, and general texts written in Javanese.

Source(s)

156 editions of weekly magazine from the year 2023 to 2025 (in the last 3 years).

Domain(s)

Popular articles: General, Reportage, Opinion, Culture & Tradition, Politics & Public Affairs, Health & Lifestyle, Humor, Trivia.

Size

Around 1.7 million words.

Structure

File name, rubric/theme, original translation

Sample

"Jakarta klebu salah siji kutha gedhe kang lagi ngadhepi ancaman gandha mau. Sebab miturut asil pengukuran rutin, lemah Jakarta rata-rata mudhun utawa ambles 5,17 cm saben taune. Satemene ora ngemungake Jakarta wae sing lemahe mudhun sethithik mbaka sethithik, nanging uga Semarang, Tokyo, Shanghai lan Bangkok."

"Taat ngakoni menawa bakal kereme Jakarta wis ngatonake tandha-tandhane. Upamane ing kampung Muarabaru banyuning segara luwih dhuwur dibandhing dharatan. Ing Penjaringan, Jakarta Lor, ana mesjid sing biyene aman, saiki wiwit keblebeg banyu. Lan sing paling nyata yaiku amblese gedhung bersejarah Onderlinge Levensverzekering Van Eigen Hulp (OLVEH) ing kawasan Kota Lama. Gedhung kuwi mudhun 90 cm saka wiwit pembangunane taun 1921, lan diukur maneh dhek 2015 kepungkur."

"Kethoprak lesung yaiku salah sijine kesenian tradhisional Jawa sing nggabungake sandiwara, musik, lan tarian. Pertunjukan iki ndhisike kerep dipentasake ing acara-acara budaya, pahargyan, uga yen ana warga masyarakat sing nduwe gawe."

Writing System

Latin alphabet (A–Z), Arabic numerals (0–9)

Useful Link

https://panjebarsemangat.id