Araina Text Corpus (Occitan Aranese)
License:
CC0-1.0
Steward:
CommunityTask: LM
Release Date: 3/24/2026
Format: txt
Size: 22.97 MB
Share
Description
This text corpus includes sentences from three sources. Public domain literary texts translated by Antòni Nogués. Sourced from institutestudisaranesi.cat, Language educational material by Jordi Suïls Subirà, Administrative proceedings from Conselh Generau d'Aran.
Specifics
Considerations
Restrictions/Special Constraints
No restrictions.
Forbidden Usage
No forbidden usages.
Processes
Intended Use
This dataset was compiled in order to launch voice data collection in Common Voice. It can also be used for language modelling.
Metadata
Araina Project was run by non-profit cooperative Col·lectivaT to create a speech dataset for Aranese. These are the sentences collected and used to launch Common Voice in this variety of Occitan.
Antòni Nogués's literary works are made available publicly through Institut Aranesi with open license and was consulted when creating this resource.
Jordi Suïls Subirà has permitted his works to be included in this corpus and was a collaborator of the Araina Project.
This corpus was prepared with support from Culture Department of the Catalan autonomous government and Aran Valley General Council.
Aquest corpus s'ha elaborat amb el suport del Departament de Cultura de la Generalitat de Catalunya i Conselh Generau d'Aran.