Putèr Newspaper Corpus
License:
CC0-1.0
Steward:
Pro Svizra Rumantscha
Task: OTH
Release Date: 11/26/2025
Format: TSV
Size: 8.94 MB
Description
1.3 million tokens in the Putèr variety of Romansh from the daily newspaper “La Quotidiana”.
Specifics
Metadata
Articles in Putèr, published in the Romansh daily newspaper La Quotidiana between 1997 and 2008. The texts in Putèr were automatically extracted from a mixed Romansh newspaper corpus using a Support Vector Machine trained on a smaller, manually labeled dataset.
To the extent possible under law, the newspaper’s publisher Somedia has waived all copyright and related or neighboring rights to this corpus. This work is published from Switzerland.
| Language variant | IETF BCP47 language code | Corpus size |
|---|---|---|
| Rumantsch Surmiran | rm-surmiran | 2.9 million tokens |
