Rumantsch Grischun Newspaper Corpus
License:
CC0-1.0
Steward:
Pro Svizra RumantschaTask: OTH
Release Date: 11/26/2025
Format: TSV
Size: 19.03 MB
Share
Description
6.1 million tokens in the Rumantsch Grischun variety of Romansh from the daily newspaper “La Quotidiana”.
Specifics
Metadata
Articles in Rumantsch Grischun, published in the Romansh daily newspaper La Quotidiana between 1997 and 2008. The texts in Rumantsch Grischun were automatically extracted from a mixed Romansh newspaper corpus using a Support Vector Machine trained on a smaller, manually labeled dataset.
To the extent possible under law, the newspaper’s publisher Somedia has waived all copyright and related or neighboring rights to this corpus. This work is published from Switzerland.
| Language variant | IETF BCP47 language code | Corpus size |
|---|---|---|
| Rumantsch Grischun | rm-rumgr | 6.1 million tokens |