Istorima
License:
CC BY-NC-ND 4.0
Steward:
EELLAK - GreekFOSSTask: NLP
Release Date: 3/19/2026
Format: PARQUET
Size: 416.02 MB
Share
Description
Dataset Language: Greek Dataset Info: This dataset consists of oral history content collected from the Istorima archive, including transcribed interviews and associated metadata. The material reflects personal narratives and life stories, primarily in Greek, covering a wide range of social, cultural, and historical topics. Metadata Info: This dataset consists of 13,548 oral history interview records, structured as a tabular dataset with mixed data types. Each record includes a unique identifier (id) along with textual fields such as title, summary, transcription, speaker_name, and researcher_name. Additional metadata fields capture thematic and categorical information (themes, tags), geographic references (geonames, interview_place), and temporal attributes (date, published_at). The dataset also includes numerical and boolean features such as duration_minutes, is_age_restricted, and is_on_demand, as well as a language field indicating the interview language. Dataset Statistics: Words: 96,479,186 Tokens: 138,933,365
Specifics
Licensing
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.enConsiderations
Restrictions/Special Constraints
CC BY-NC-ND 4.0: non-commercial only; no derivatives; attribution required; research/edu use; 2,392 age-restricted; 297 metadata-only; no re-ID; IP stays with Istorima.
Forbidden Usage
Prohibitions: no re-ID; no commercial use w/o permission; no redistribution; no harmful/misleading use; no synthetic mimicry;