Putèr Newspaper Corpus

License icon

License:

CC0-1.0

Shield icon

Steward:

Pro Svizra Rumantscha

Task: OTH

Release Date: 11/26/2025

Format: TSV

Size: 8.94 MB


Description

1.3 million tokens in the Putèr variety of Romansh from the daily newspaper “La Quotidiana”.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Metadata

Articles in Putèr, published in the Romansh daily newspaper La Quotidiana between 1997 and 2008. The texts in Putèr were automatically extracted from a mixed Romansh newspaper corpus using a Support Vector Machine trained on a smaller, manually labeled dataset.

To the extent possible under law, the newspaper’s publisher Somedia has waived all copyright and related or neighboring rights to this corpus. This work is published from Switzerland.

Language variantIETF BCP47 language codeCorpus size
Rumantsch Surmiranrm-surmiran2.9 million tokens