Speech Corpus of Armenian Question-Answer Dialogues

License icon

License:

GPL-3.0

Shield icon

Steward:

Unknown Organization

Task: ASR

Release Date: 11/8/2025

Format: WAV, TEXTGRID, TXT

Size: 2.10 GB


Description

A collection of question-answer dialogues in Western and Eastern Armenian.

Specifics

Licensing

GNU General Public License v3.0 or later (GPL-3.0)

https://spdx.org/licenses/GPL-3.0-or-later.html

Metadata

Speech corpus of Armenian question-answer dialogues

This is a corpus of elicited controlled speech. The stimuli was a sequence of dialogues with intermittent fillers. The stimuli was designed to elicit intonation patterns for questions and answers in two Armenian dialects: Western Armenian (WA) and Eastern Armenian (EA). The recordings can be used for topics like intonation prosody, forced alignment, or ASR (Automatic Speech Recognition).

The dataset is open-access at 8,852 dialogues, consisting of 23,711 utterances (individual sound files), for a total of 2.7GB and 8.5hrs. Each utterance has a sound file, a Praat TextGrid (with full linguistic annotation), and text file that has orthographic forms for easier ASR uses. Pronunciation dictionaries are provided for ASR or forced alignment purposes as well. We generated a forced alignment for these recordings using a cross-language alignment thanks to Interlingual-MFA. See the alignments folder.

If you use the data in any way, please cite us as:

Chakmakjian, Samuel and Hossep Dolatian. 2022. Speech corpus of Armenian question-answer dialogues. DOI/10.5281/zenodo.7088365

Stimuli design

A dialogue is made up of at least a question (Q) and an answer (A). Some dialogues include an interjection (I) and a negated verb (N). We call all these elements (Q, A, I, N) utterances.

The question and answer were SOV sentences. The dialogues were of three types, each with a different position of focus. Focus was either on the subject, object, or verb. Dialogues also varied in the choice of the object word. The object word could have either final stress, penultimate stress, or initial stress.

The file utterance-metadata (provided in Excel and TSV versions) has metadata on the conditions for each recorded utterance.

Materials

Recordings were made with 19 speakers: 10 for Eastern Armenian (5 female, 5 male) and 9 for Western Armenian (5 female, 4 male). In terms of origin, the Eastern Armenian speakers were from Yerevan, Armenia, while the Western Armenian speakers were from Aleppo, Syria. All 19 speakers were living in Yerevan during the time of the recording. Speaker metadata is in file speaker-metadata (in Excel and TSV versions).

The participants were recorded reading the dialogues on a PowerPoint presentation. In our annotation, we broke up each dialogue into its component utterances (Q, A, I, N) using a Praat script. Each utterance is found in the repository in the form of a sound file .wav, a Praat TextGrid .TextGrid, and a transcript file .txt. Data is in the data folder.

We annotated the recordings with information on quality. Most recordings had little to no disfluencies or background noise. These are found in the data-few-issues directory.

Some recorded examples however had such problems. Files were annotated with the symbol _? if they had a mild issue in data-moderate-issues, and _0 if they had a severe issue in data-severe-issues. We list such problems:

  • Mild or moderate issues:

    • focus-unclear: The intonation is ambiguous.

    • laughing: The participant is laughing.

    • noise-mild: There is mild background noise.

    • pause-mild: There is a small felicitous pause in the middle of the sentence.

    • pause-noise-mild: There is both mild background noise and a small pause.

    • unclear-segments: A segment was pronounced unclear.

  • Severe issues:

    • focus-wrong-intonation: The participant used the wrong intonation.

    • noise-extreme: There is extreme background noise.

    • pause-extreme: There is a long infelicitous pause in the middle of the sentence.

    • pause-noise-extreme: There is both extreme noise and a long pause.

    • not-template: The utterance was misread in a way that doesn't fit into our templates, such as omitting the subject.

    • stutter-or-missing-sound: The participant stuttered in speech or omitted a sound.

Recommendations

The recordings can be used for different purposes. We plan on using them for work on intonation phonetics and forced alignment. For phonetic studies, recordings with no or moderate issues can be suitable. But recordings with severe issues are not ideal or recommended. But for forced alignment, the recordings with severe issues might still be useful as a way to prevent overfitting or accommodating noisy data.

The transcript files .txt are to make forced alignment tasks easier. The pronunciation dictionaries for Western Armenian (word-pronunciations-WA.tsv) and Eastern Armenian (word-pronunciations-EA.tsv) are for forced alignment purposes.