Istorima

License:

CC BY-NC-ND 4.0

Steward:

EELLAK - GreekFOSS

Task: NLP

Release Date: 3/19/2026

Format: PARQUET

Size: 416.02 MB

Description

Dataset Language: Greek Dataset Info: This dataset consists of oral history content collected from the Istorima archive, including transcribed interviews and associated metadata. The material reflects personal narratives and life stories, primarily in Greek, covering a wide range of social, cultural, and historical topics. Metadata Info: This dataset consists of 13,548 oral history interview records, structured as a tabular dataset with mixed data types. Each record includes a unique identifier (id) along with textual fields such as title, summary, transcription, speaker_name, and researcher_name. Additional metadata fields capture thematic and categorical information (themes, tags), geographic references (geonames, interview_place), and temporal attributes (date, published_at). The dataset also includes numerical and boolean features such as duration_minutes, is_age_restricted, and is_on_demand, as well as a language field indicating the interview language. Dataset Statistics: Words: 96,479,186 Tokens: 138,933,365

Specifics

Licensing

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.en

Considerations

Restrictions/Special Constraints

CC BY-NC-ND 4.0: non-commercial only; no derivatives; attribution required; research/edu use; 2,392 age-restricted; 297 metadata-only; no re-ID; IP stays with Istorima.

Forbidden Usage

Prohibitions: no re-ID; no commercial use w/o permission; no redistribution; no harmful/misleading use; no synthetic mimicry;