Polish Public Domain 20th Century Literature Text Corpus
License:
CC0-1.0
Steward:
TaruenTask: NLP
Release Date: 2/24/2026
Format: TXT
Size: 10.86 MB
Share
Description
This corpus contains a curated collection of 54 iconic Polish literary works, including major novels, sprawling multi-volume historical epics, and documentary prose from the late 19th and early 20th centuries. The dataset features the complete canonical works of literary titans such as Władysław Reymont, Stefan Żeromski, Henryk Sienkiewicz, Bolesław Prus, Józef Ignacy Kraszewski, Eliza Orzeszkowa, Tadeusz Dołęga-Mostowicz, and Zofia Nałkowska. All texts utilize modern Polish orthography (post-1936 standard) to ensure consistency and utility for training contemporary language models. The corpus comprises approximately 4.2 million words across multiple plain text files, with each file prefaced by structured YAML front matter containing relevant metadata (author, year, source URL). All included works are fully in the public domain under Polish law.
Specifics
Considerations
Restrictions/Special Constraints
None
Forbidden Usage
None
Metadata
Polish Public Domain 20th Century Literature Text Corpus
Overview
This text corpus contains a massive collection of iconic Polish prose literature (including novels, multi-volume epics, and documentary prose) from the late 19th and early 20th centuries. All included works are in the public domain in Poland.
Statistics
Total Word Count: ~4,220,714
Language: Polish (pl)
Format: Multiple plain text files with YAML front matter
Included Works
Chłopi, Część pierwsza - Jesień (Władysław Reymont, 1904)
Chłopi, Część druga - Zima (Władysław Reymont, 1904)
Chłopi, Część trzecia - Wiosna (Władysław Reymont, 1906)
Chłopi, Część czwarta - Lato (Władysław Reymont, 1909)
Ziemia Obiecana, Tom 1 (Władysław Reymont, 1899)
Ziemia Obiecana, Tom 2 (Władysław Reymont, 1899)
Komediantka (Władysław Reymont, 1896)
Fermenty, Tom 1 (Władysław Reymont, 1897)
Fermenty, Tom 2 (Władysław Reymont, 1897)
Wampir (Władysław Reymont, 1911)
Bunt (Władysław Reymont, 1924)
Popioły, Tom 1 (Stefan Żeromski, 1904)
Popioły, Tom 2 (Stefan Żeromski, 1904)
Popioły, Tom 3 (Stefan Żeromski, 1904)
Przedwiośnie (Stefan Żeromski, 1924)
Ludzie bezdomni, Tom 1 (Stefan Żeromski, 1899)
Ludzie bezdomni, Tom 2 (Stefan Żeromski, 1899)
Syzyfowe prace (Stefan Żeromski, 1897)
Wierna rzeka (Stefan Żeromski, 1912)
Dzieje grzechu (Stefan Żeromski, 1908)
W pustyni i w puszczy (Henryk Sienkiewicz, 1911)
Ogniem i mieczem, Tom 1 (Henryk Sienkiewicz, 1884)
Ogniem i mieczem, Tom 2 (Henryk Sienkiewicz, 1884)
Potop, Tom 1 (Henryk Sienkiewicz, 1886)
Potop, Tom 2 (Henryk Sienkiewicz, 1886)
Potop, Tom 3 (Henryk Sienkiewicz, 1886)
Pan Wołodyjowski (Henryk Sienkiewicz, 1888)
Quo vadis (Henryk Sienkiewicz, 1896)
Krzyżacy, Tom 1 (Henryk Sienkiewicz, 1900)
Krzyżacy, Tom 2 (Henryk Sienkiewicz, 1900)
Rodzina Połanieckich (Henryk Sienkiewicz, 1894)
Bez dogmatu (Henryk Sienkiewicz, 1891)
Faraon, Tom 1 (Bolesław Prus, 1895)
Faraon, Tom 2 (Bolesław Prus, 1895)
Faraon, Tom 3 (Bolesław Prus, 1895)
Lalka, Tom 1 (Bolesław Prus, 1890)
Lalka, Tom 2 (Bolesław Prus, 1890)
Emancypantki, Tom 1 (Bolesław Prus, 1894)
Emancypantki, Tom 2 (Bolesław Prus, 1894)
Placówka (Bolesław Prus, 1886)
Zemsta (Bolesław Prus, 1908)
Stara baśń, Tom 1 (Józef Ignacy Kraszewski, 1876)
Stara baśń, Tom 2 (Józef Ignacy Kraszewski, 1876)
Stara baśń, Tom 3 (Józef Ignacy Kraszewski, 1876)
Nad Niemnem, Tom 1 (Eliza Orzeszkowa, 1888)
Nad Niemnem, Tom 2 (Eliza Orzeszkowa, 1888)
Nad Niemnem, Tom 3 (Eliza Orzeszkowa, 1888)
Cham (Eliza Orzeszkowa, 1888)
Marta (Eliza Orzeszkowa, 1873)
Kariera Nikodema Dyzmy (Tadeusz Dołęga-Mostowicz, 1932)
Znachor (Tadeusz Dołęga-Mostowicz, 1937)
Profesor Wilczur (Tadeusz Dołęga-Mostowicz, 1939)
Granica (Zofia Nałkowska, 1935)
Medaliony (Zofia Nałkowska, 1946)
Data Format and Metadata
The files are provided in plain text format. Each text is prepended with a YAML Front Matter block containing relevant meta
---
title: "Chłopi, Część pierwsza - Jesień"
author: "Władysław Reymont"
lang: "pl"
year: "1904"
source: "[https://wolnelektury.pl/katalog/lektura/chlopi-czesc-pierwsza-jesien](https://wolnelektury.pl/katalog/lektura/chlopi-czesc-pierwsza-jesien)"
license: "Public Domain"
---
Field Definitions:
title: The title of the literary work.author: The author of the work.lang: Language code ('pl' for Polish).year: The year of initial publication.source: The exact URL to the source material on Wolne Lektury.license: The copyright status of the text itself.
Processing Methodology
Source: Texts were fetched directly from the digital library Wolne Lektury (wolnelektury.pl).
Cleaning: The script removed the publisher's legal footer and copyright notice from the end of each file to isolate the pure public domain literary text.
Orthography: The texts utilize modern Polish orthography (post-1936 standard).
Copyright and License
The literary works themselves are in the public domain. Under Polish law, economic copyrights expire 70 years after the end of the year of the author's death. As of 2026, works by authors who died before 1956 are in the public domain.
Władysław Reymont (d. 1925)
Stefan Żeromski (d. 1925)
Henryk Sienkiewicz (d. 1916)
Bolesław Prus (d. 1912)
Józef Ignacy Kraszewski (d. 1887)
Eliza Orzeszkowa (d. 1910)
Tadeusz Dołęga-Mostowicz (d. 1939)
Zofia Nałkowska (d. 1954)
This dataset is released for the Mozilla Data Collective to aid in the development of free/libre/open-source language technologies.
Support Wolne Lektury
The digitization and proofreading of these texts were performed by the Modern Poland Foundation. If you find this text corpus useful, please consider supporting their mission to keep literature accessible to all:
Donate: [https://wolnelektury.pl/pomagam/]