Rumantsch Grischun Newspaper Corpus

License icon

License:

CC0-1.0

Shield icon

Steward:

Pro Svizra Rumantscha

Task: OTH

Release Date: 11/26/2025

Format: TSV

Size: 19.03 MB


Share

Description

6.1 million tokens in the Rumantsch Grischun variety of Romansh from the daily newspaper “La Quotidiana”.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Metadata

Articles in Rumantsch Grischun, published in the Romansh daily newspaper La Quotidiana between 1997 and 2008. The texts in Rumantsch Grischun were automatically extracted from a mixed Romansh newspaper corpus using a Support Vector Machine trained on a smaller, manually labeled dataset.

To the extent possible under law, the newspaper’s publisher Somedia has waived all copyright and related or neighboring rights to this corpus. This work is published from Switzerland.

Language variantIETF BCP47 language codeCorpus size
Rumantsch Grischunrm-rumgr6.1 million tokens