Common Voice Spontaneous Speech 2.0 - Scots

License icon

License:

CC0-1.0

Shield icon

Steward:

Common Voice

Task: ASR

Release Date: 12/5/2025

Format: MP3

Size: 227.79 MB


Description

A collection of spontaneous spoken phrases in Scots.

Considerations

Forbidden Usage

It is forbidden to attempt to determine the identity of speakers in the common Voice datasets. It is forbidden to re-host or re-share this dataset

Processes

Intended Use

This dataset is intended to be used for training and evaluating automatic speech recognition (ASR) models. It may also be used for applications relating to computer-aided language learning (CALL) and language or heritage revitalisation.

Metadata

Scots — Scots (sco)

This datasheet is for version 2.0 of the the Mozilla Common Voice Spontaneous Speech dataset for Scots (sco). The dataset contains 715 clips representing 12 hours of recorded speech (11 hours validated) from 21 speakers.

Language

Scots, a sister language of English spoken throughout Scotland, has a long history. It arose from northern English dialects around the 14th century and spread east and northwards, supplanting the indigenous Gaelic language, and developing into a socially and politically high status language with spoken and written norms distinct from those in England. From the 17th century onwards, political, religious and social events led to a loss of status, and thus a shrinking of the domains in which Scots was used, but while English norms replaced Scots in writing, spoken Scots continued to be used. Present day Scots is characterised by the Scots Linguistic Continuum with Standard Scottish English – generally described as being close to Standard English but with an overlay of distinctly Scottish sounds – at one end and Broad Scots – much further from Standard English with its own words, sounds and sentence structures – at the other. In terms of the social profiles, Scottish Standard English is spoken by middle class speakers and in more formal situations such as in schools, while Broad Scots is spoken by working class speakers and in informal situations such as with family and friends. Speakers may styleshift up and down the continuum according to, amongst others, interlocutor and context. At the Broad Scots end of the continuum, there is significant geographic diversity, where, for example, speakers in Glasgow sound very different to speakers in Aberdeen. The speakers in these recordings are from a range of geographic locations, and align more with the Broad Scots end of the continuum.

Data splits for modelling

SplitCount
Train388
Test141
Dev151

Transcriptions

  • Prompts: 47

  • Duration: 11:10:34 [h:m:s]

  • Avg. Transcription Len: 763

  • Avg. Duration: 56.27[s]

  • Valid Duration: 38478.564[s]

  • Total hours: 11.18[h]

  • Valid hours: 10.69[h]

The transcription system uses general Latin script.

The speakers are from across Scotland, representing the different dialect areas therein.

Writing system

Present day Scots has no written standard and orthographic conventions vary both within and between the different dialects being represented. For example, can’t may be cannae or canny in Edinburgh, but canna in Aberdeen. For these transcriptions, we have followed protocols documented in previous research e.g. https://scotssyntaxatlas.ac.uk

Samples

Questions

There follows a randomly selected sample of questions used in the corpus.

What’s the best gig you’ve ever been to?
How do you try and save money?
What’s the worst technological advancement in recent years?
Tell us about your pets.
What technological advancement would you like to see in the coming years?
Responses

There follows a randomly selected sample of transcribed responses from the corpus.

What’s the biggest gig you’ve ever been to? I’m no a fan of big gigs. I like wee toaty spaces. Wee dark dingy spaces where you can actually see into the eyeballs of the folk that’s singing in front of you or playing in front of you. Eh, I absolutely love that, eh, but I’d probably say the biggest gig I’ve been to was when I took my girl, um, to see Pink. It was at Hampden Park. It was massive. Eh, she was a wee dot millions of miles away it felt like but I really enjoyed being there with my girl. She’s a Pink lover, um, and it was great for her to experience it. I did tell her it’s a’ doonhill from that, though. Hampden. Pink. Um, I don’t know where we go from there but it was a good gig. It was a laugh, um, and it was really nice to experience it with my daughter, so it was. Eh and the joy in her eyes, you know, made it worthwhile. 
Um, I don't save money. I - I don’t earn enough money to save any money. Um, I live very much pay check to pay check. Um, yeah. And I’ve been in that kind of situation since- since like forever, to be honest. Um, when I was- like before I moved out when I was seventeen I did make like a savings account and I would put money like from my part-time job into it. But as soon as I was like out and paying rent, I’ve never once earned enough money to save. And um, it’s a bit scary to be honest. Like maybe I could make some kinda, I don’t know, cuts to other parts of my life in order to save a bit of money but I don’t do anything particularly like exciting. Like I’m not spending crazy money on hobbies or going out places or going on holidays. Like, I don’t know. It'd be nice to save but that would be at the expense of just leading the most boring, depressing life. And I'd rather enjoy even a tiny bit of stuff now than just be worried the whole time about saving. Like, it’s a’ready bleak enough. Yeah. I would like to save, though. I would like to earn enough money to save.
Well, as I said earlier on I’m no really a Luddite. I just kind of try to embrace ony changes that have taen place. I’m no- I was a teacher but I’m no anti-phone or onything like that. I think these are things that are a huge advantage for younger folk. Eh, I’m no convinced at a’ by- probably ‘cause I bide in the sticks- I’m no convinced at a’ aboot electric cars. Eh, thinking of changing my car just noo but I’ll definitely stay the petrol, I think. I hear some horror stories aboot electrical- electric cars and I dinna think I would buy ain.
So I have a little dog called Harvey. He’s a Bichon Frisé, eh, and he’s about fifteen year old, I think now, so yeah. He's a- h- he’s a lovely wee dog. Eh, we got him because, eh, we're all- we’re all asthmatic in the household and he doesn’t moult, he’s just like got this sort of woolly- woolly fleece, so- so that’s Harvey. We did have a couple of, eh, cats in the house until recently. Eh, never keen on having cats but my son and his girlfriend persuaded us to have them and, eh, yeah, now that they’ve gone I actually really miss them, so been totally converted in- from a dog person to a- a cat lover as well.
A personal cleaner [laugh]. A robot that cleans absolutely everything, no just hoover your floor, someone who’ll brush a' your carpets and do a' your dusting in between a’ the heaters, and wash your floors, and yeah something like that would be really good, and having your meals cooked for you walking in the door, rather than just a hot-pot where you can put it on, ‘cause I’m too feared to leave it on, and go oot anyway, so I dinnae use- I have got a hot-pot and I’ll only use it when I’m in the house, I am too feared to leave it on when I’m not. 

Fields

Each row of a tsv file represents a single audio clip, and contains the following information:

  • client_id - hashed UUID of a given user

  • audio_id - numeric id for audio file

  • audio_file - audio file name

  • duration_ms - duration of audio in milliseconds

  • prompt_id - numeric id for prompt

  • prompt - question for user

  • transcription - transcription of the audio response

  • votes - number of people that who approved a given transcript

  • age - age of the speaker1

  • gender - gender of the speaker1

  • language - language name

  • split - for data modelling, which subset of the data does this clip pertain to

  • char_per_sec - how many characters of transcription per second of audio

  • quality_tags - some automated assessment of the transcription--audio pair, separated by |

    • transcription-length - character per second under 3 characters per second

    • speech-rate - characters per second over 30 characters per second

    • short-audio - audio length under 2 seconds

    • long-audio - audio length over 30 seconds

Get involved!

Community links

Contribute

Acknowledgements

Datasheet authors

Funding

This dataset was partially funded by the Open Multilingual Speech Fund managed by Mozilla Common Voice.

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.

Footnotes

  1. For a full list of age, gender, and accent options, see the demographics spec. These will only be reported if the speaker opted in to provide that information. 2