Common Voice Spontaneous Speech 3.0 - Scots
License:
CC0-1.0
Steward:
Common VoiceTask: ASR
Release Date: 3/22/2026
Format: MP3
Size: 228.47 MB
Share
Description
A collection of spontaneous responses to questions in Scots (sco).
Specifics
Considerations
Restrictions/Special Constraints
None provided.
Forbidden Usage
It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.
Processes
Intended Use
This dataset is intended to be used for training and evaluating automatic speech recognition (ASR) models. It may also be used for applications relating to computer-aided language learning (CALL) and language or heritage revitalisation.
Metadata
sco — Scots (sco)
This datasheet is for sps-corpus-3.0-2026-03-09 of the Mozilla Common Voice Spontaneous Speech dataset for Scots [sco - sco]. The dataset contains 715 clips representing 11.17 hours of recorded speech (10.68 hours validated) from 21 speakers.
Language
Scots, a sister language of English spoken throughout Scotland, has a long history. It arose from northern English dialects around the 14th century and spread east and northwards, supplanting the indigenous Gaelic language, and developing into a socially and politically high status language with spoken and written norms distinct from those in England. From the 17th century onwards, political, religious and social events led to a loss of status, and thus a shrinking of the domains in which Scots was used, but while English norms replaced Scots in writing, spoken Scots continued to be used. Present day Scots is characterised by the Scots Linguistic Continuum with Standard Scottish English – generally described as being close to Standard English but with an overlay of distinctly Scottish sounds – at one end and Broad Scots – much further from Standard English with its own words, sounds and sentence structures – at the other. In terms of the social profiles, Scottish Standard English is spoken by middle class speakers and in more formal situations such as in schools, while Broad Scots is spoken by working class speakers and in informal situations such as with family and friends. Speakers may styleshift up and down the continuum according to, amongst others, interlocutor and context. At the Broad Scots end of the continuum, there is significant geographic diversity, where, for example, speakers in Glasgow sound very different to speakers in Aberdeen. The speakers in these recordings are from a range of geographic locations, and align more with the Broad Scots end of the continuum.
Data splits for modelling
The dataset clips are categorised by transcription status and training-set assignment. The following tables summarise the distribution.
Audio clips
| Bucket | Clips | % |
|---|---|---|
| Transcribed & Validated | 680 | 95.1% |
| Transcribed & Pending | 0 | 0.0% |
| Not transcribed | 35 | 4.9% |
Training splits
| Bucket | Clips | % |
|---|---|---|
| Train | 421 | 58.9% |
| Dev | 153 | 21.4% |
| Test | 106 | 14.8% |
| Unassigned | 35 | 4.9% |
Training split coverage: 680 of 680 transcribed & validated clips (100.0%)
Transcriptions
The transcription system uses general Latin script.
Prompts:
47Duration:
40234608[ms]Avg. Transcription Len:
725Avg. Duration:
56.27[s]Valid Duration:
38478.56[s]Total hours:
11.18[h]Valid hours:
10.69[h]
Transcription status
| Bucket | Clips | % |
|---|---|---|
| Validated | 680 | 100.0% |
| Pending | 0 | 0.0% |
| Edited | 190 | 27.9% |
Writing system
Present day Scots has no written standard and orthographic conventions vary both within and between the different dialects being represented. For example, can’t may be cannae or canny in Edinburgh, but canna in Aberdeen. For these transcriptions, we have followed protocols documented in previous research e.g. https://scotssyntaxatlas.ac.uk
Samples
Questions
There follows a randomly selected sample of questions used in the corpus.
If you could learn any skill, what would it be?
What’s your idea of a perfect weekend?
Tell us about your favourite film and why you like it so much.
What’s the worst technological advancement in recent years?
What’s your biggest hope for the future?
Responses
There follows a randomly selected sample of transcribed responses from the corpus.
*I think the most stressful part of my work is- probably be the fact that I have to organise everything. I’m a, eh, self-employed as a ceramic tiler and, eh, yeah, there can be quite a lot of organising in it and I- I think when y- you work in- in any sort of trade whether it’s joinery or- or plastering or painting or whatever, sometimes things don’t go to plan and, eh, yeah, jobs fall behind or tiles aren’t delivered and it means that, you know, it has a knock-on effect and you let people down and I think that’s what I find hard, the- the sort of time management of things and, eh, letting people down, eh, ‘cause I’m a bit of an inherent people-pleaser so, yeah, that would be the most stressful thing for me I think. *
I did play the recorder at school, which wasn’t very good at it, other than that, no never ever have played musical instruments.
*What are you proud of? Um, I’m proud of my two children. They put up with me every day. They forgive everything that falls through my hands that I don’t manage to keep on top of. If I make a mistake, you know, they don’t hold a grudge. They’re brilliant wee humans. So I’d have to say at the forefront of anything at all I am proud of my son and my daughter. Do you know, withoot soonding as if I’m blowing up my own arse, I'm proud of myself as well. I’ve been a single parent for a long time. Don’t have my parents around. Don’t have any family around me, eh, and I manage really, really well most days. Um, there’s always the failure days but do you know what? That’s what makes you. So, as a team I’m super proud of me and the weans. I think we, eh- we win most days in the week. *
*I like pretty much athing aboot farr I bide. I like the fact that you can walk onywaie, even though it’s a relatively big city, you can basically walk everywhere. I like the fact that you look ony day of the week and there’s something going on, that you can go oot to. I like the fact that athing’s on your doorstep really. I like the fact that there’s a whole load of green spaces aroond and a’, um, that’s quite unusual for a big city I think, eh, and I maistly like the folk farr I come fae, really really friendly folk, abody speaks to you on the street. The only thing I dinna like aboot it is, is that it’s no by the sea. *
*What an interesting thing to be asked about. I actually have a professional angle on this, which is to say that I know that most of what people are impressed by when it comes to AI is a lot of shite. Which is to say that people think that it does language very well, and it turns out it actually doesn’t really, it just does a good imitation. It doesn’t really- it imitates certain things, but it cannot imitate other things, it can only get so far. And, eh, it’s amazingly easy to dupe AI, and it’s- it’s amazing to- it’s easy- relatively easy to show its limitations when it comes to its imitation of language. So eh, think it’s going to be a bit of a false dawn you know with the mos- for the most part, it’s kind of- the bottle neck is in the data that you can feed into these things, and they’re only going to be able to do so much. But like you know, they’re good at- it’s good at doing certain pattern recognition shit, so I’m sure that people will find AI useful when it comes to fulfilling menial tasks that people would always- already do, but it’s not going to change our lives in the way that the kind of utopian nutcases in the Silicon Valley think it’s going to change it. So, fuck them. *
Fields
Each row of a tsv file represents a single audio clip, and contains the following information:
client_id- hashed UUID of a given useraudio_id- numeric id for audio fileaudio_file- audio file nameduration_ms- duration of audio in millisecondsprompt_id- numeric id for promptprompt- question for usertranscription- transcription of the audio responsevotes- number of people that who approved a given transcriptage- age of the speaker1gender- gender of the speaker1language- language namesplit- for data modelling, which subset of the data does this clip pertain tochar_per_sec- how many characters of transcription per second of audioquality_tags- some automated assessment of the transcription--audio pair, separated by|transcription-length- character per second under 3 characters per secondspeech-rate- characters per second over 30 characters per secondshort-audio- audio length under 2 secondslong-audio- audio length over 5 minutes
Get involved
Community links
Discussions
Contribute
Acknowledgements
Datasheet authors
Jennifer Smith <jennifer.smith@glasgow.ac.uk>
Funding
This dataset was partially funded by the Open Multilingual Speech Fund managed by Mozilla Common Voice.
Licence
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.
Footnotes
For a full list of age, gender, and accent options, see the demographics spec. These will only be reported if the speaker opted in to provide that information. ↩ ↩2