Sentence translation difficulty in English - BOUQuET
License:
CC-BY-SA-4.0
Steward:
MDC CuratorsTask: NLP
Release Date: 4/3/2026
Format: TSV
Size: 85.61 KB
Share
Description
This dataset is a collection of sentences in English from the Bouquet benchmark (total 1990 sentences) which have been annotated with sentence translation difficulty scores on a Likert scale. The annotators are speakers of six Indigenous languages of Pakistan and scored the sentences as part of the work on translating the benchmark into their languages
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
By agreeing you accept to share your contact information (email and username) with the repository authors. - I agree not to re-host BOUQuET in places where it could be picked up by web crawlers - If I evaluate using BOUQuET, I will ensure that its contents are not in the training data
Forbidden Usage
By agreeing you accept to share your contact information (email and username) with the repository authors. - I agree not to re-host BOUQuET in places where it could be picked up by web crawlers - If I evaluate using BOUQuET, I will ensure that its contents are not in the training data
Processes
Intended Use
This dataset is intended for use in evaluating models for evaluating sentence difficulty in translation and in language learning and teaching.
Metadata
This dataset is a collection of sentences in English from the Bouquet benchmark (total 1990 sentences) which have been annotated with sentence translation difficulty scores on a Likert scale. The annotators are speakers of six Indigenous languages of Pakistan and scored the sentences as part of the work on translating the benchmark into their languages.
### Scores
We asked translators to score each sentence as they were translating, on a scale of:
Very Easy (1)
Easy (2)
Moderate (3)
Difficult (4)
Very Difficult (5)
They were asked to take into account the whole translation process, both in terms of difficulty of language structures and in terms of difficulty in terms of concepts or terminology.
### Annotators
kxp- Speaker of Wadiyara Kolibsh- Speaker of Katevirikls- Speaker of Kalashaydg- Speaker of Yagdhabft- Speaker of Baltiskr- Speaker of Saraiki
### Columns:
Sent-ID: The sentence IDLang-ID: The language ID, in this casespa_Latn(English in Latin script)Domain: The domain of the sentenceSource sentence: The source sentence in English (may be a translation)kxp: Difficulty scores from speaker of Wadiyara Kolibsh: Difficulty scores from speaker of Katevirikls: Difficulty scores from speaker of Kalashaydg: Difficulty scores from speaker of Yagdhabft: Difficulty scores from speaker of Baltiskr: Difficulty scores from speaker of Saraiki.Average: The average scoreStdev: Standard deviation of the scores
### Usage and restrictions:
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By agreeing you accept to share your contact information (email and username) with the repository authors.
I agree not to re-host BOUQuET in places where it could be picked up by web crawlers
If I evaluate using BOUQuET, I will ensure that its contents are not in the training data