Common Voice Spontaneous Speech 3.0 - Scots

License:

CC0-1.0

Steward:

Common Voice

Task: ASR

Release Date: 3/22/2026

Format: MP3

Size: 228.47 MB

Description

A collection of spontaneous responses to questions in Scots (sco).

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Restrictions/Special Constraints

None provided.

Forbidden Usage

It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.

Processes

Intended Use

This dataset is intended to be used for training and evaluating automatic speech recognition (ASR) models. It may also be used for applications relating to computer-aided language learning (CALL) and language or heritage revitalisation.

Metadata

sco — Scots (`sco`)

This datasheet is for sps-corpus-3.0-2026-03-09 of the Mozilla Common Voice Spontaneous Speech dataset for Scots [sco - sco]. The dataset contains 715 clips representing 11.17 hours of recorded speech (10.68 hours validated) from 21 speakers.

Language

Scots, a sister language of English spoken throughout Scotland, has a long history. It arose from northern English dialects around the 14th century and spread east and northwards, supplanting the indigenous Gaelic language, and developing into a socially and politically high status language with spoken and written norms distinct from those in England. From the 17th century onwards, political, religious and social events led to a loss of status, and thus a shrinking of the domains in which Scots was used, but while English norms replaced Scots in writing, spoken Scots continued to be used. Present day Scots is characterised by the Scots Linguistic Continuum with Standard Scottish English – generally described as being close to Standard English but with an overlay of distinctly Scottish sounds – at one end and Broad Scots – much further from Standard English with its own words, sounds and sentence structures – at the other. In terms of the social profiles, Scottish Standard English is spoken by middle class speakers and in more formal situations such as in schools, while Broad Scots is spoken by working class speakers and in informal situations such as with family and friends. Speakers may styleshift up and down the continuum according to, amongst others, interlocutor and context. At the Broad Scots end of the continuum, there is significant geographic diversity, where, for example, speakers in Glasgow sound very different to speakers in Aberdeen. The speakers in these recordings are from a range of geographic locations, and align more with the Broad Scots end of the continuum.

Data splits for modelling

The dataset clips are categorised by transcription status and training-set assignment. The following tables summarise the distribution.

Audio clips

Bucket	Clips	%
Transcribed & Validated	680	95.1%
Transcribed & Pending	0	0.0%
Not transcribed	35	4.9%

Training splits

Bucket	Clips	%
Train	421	58.9%
Dev	153	21.4%
Test	106	14.8%
Unassigned	35	4.9%

Training split coverage: 680 of 680 transcribed & validated clips (100.0%)

Transcriptions

The transcription system uses general Latin script.

Prompts: 47
Duration: 40234608[ms]
Avg. Transcription Len: 725
Avg. Duration: 56.27[s]
Valid Duration: 38478.56[s]
Total hours: 11.18[h]
Valid hours: 10.69[h]

Transcription status

Bucket	Clips	%
Validated	680	100.0%
Pending	0	0.0%
Edited	190	27.9%

Writing system

Present day Scots has no written standard and orthographic conventions vary both within and between the different dialects being represented. For example, can’t may be cannae or canny in Edinburgh, but canna in Aberdeen. For these transcriptions, we have followed protocols documented in previous research e.g. https://scotssyntaxatlas.ac.uk

Samples

Questions

There follows a randomly selected sample of questions used in the corpus.

If you could learn any skill, what would it be?
What’s your idea of a perfect weekend?
Tell us about your favourite film and why you like it so much.
What’s the worst technological advancement in recent years?
What’s your biggest hope for the future?

Responses

There follows a randomly selected sample of transcribed responses from the corpus.

*I think the most stressful part of my work is- probably be the fact that I have to organise everything. I’m a, eh, self-employed as a ceramic tiler and, eh, yeah, there can be quite a lot of organising in it and I- I think when y- you work in- in any sort of trade whether it’s joinery or- or plastering or painting or whatever, sometimes things don’t go to plan and, eh, yeah, jobs fall behind or tiles aren’t delivered and it means that, you know, it has a knock-on effect and you let people down and I think that’s what I find hard, the- the sort of time management of things and, eh, letting people down, eh, ‘cause I’m a bit of an inherent people-pleaser so, yeah, that would be the most stressful thing for me I think. *
I did play the recorder at school, which wasn’t very good at it, other than that, no never ever have played musical instruments.
*What are you proud of? Um, I’m proud of my two children. They put up with me every day. They forgive everything that falls through my hands that I don’t manage to keep on top of. If I make a mistake, you know, they don’t hold a grudge. They’re brilliant wee humans. So I’d have to say at the forefront of anything at all I am proud of my son and my daughter. Do you know, withoot soonding as if I’m blowing up my own arse, I'm proud of myself as well. I’ve been a single parent for a long time. Don’t have my parents around. Don’t have any family around me, eh, and I manage really, really well most days. Um, there’s always the failure days but do you know what? That’s what makes you. So, as a team I’m super proud of me and the weans. I think we, eh- we win most days in the week. *
*I like pretty much athing aboot farr I bide. I like the fact that you can walk onywaie, even though it’s a relatively big city, you can basically walk everywhere. I like the fact that you look ony day of the week and there’s something going on, that you can go oot to. I like the fact that athing’s on your doorstep really. I like the fact that there’s a whole load of green spaces aroond and a’, um, that’s quite unusual for a big city I think, eh, and I maistly like the folk farr I come fae, really really friendly folk, abody speaks to you on the street. The only thing I dinna like aboot it is, is that it’s no by the sea. *
*What an interesting thing to be asked about. I actually have a professional angle on this, which is to say that I know that most of what people are impressed by when it comes to AI is a lot of shite. Which is to say that people think that it does language very well, and it turns out it actually doesn’t really, it just does a good imitation. It doesn’t really- it imitates certain things, but it cannot imitate other things, it can only get so far. And, eh, it’s amazingly easy to dupe AI, and it’s- it’s amazing to- it’s easy- relatively easy to show its limitations when it comes to its imitation of language. So eh, think it’s going to be a bit of a false dawn you know with the mos- for the most part, it’s kind of- the bottle neck is in the data that you can feed into these things, and they’re only going to be able to do so much. But like you know, they’re good at- it’s good at doing certain pattern recognition shit, so I’m sure that people will find AI useful when it comes to fulfilling menial tasks that people would always- already do, but it’s not going to change our lives in the way that the kind of utopian nutcases in the Silicon Valley think it’s going to change it. So, fuck them. *

Fields

Each row of a tsv file represents a single audio clip, and contains the following information:

client_id - hashed UUID of a given user
audio_id - numeric id for audio file
audio_file - audio file name
duration_ms - duration of audio in milliseconds
prompt_id - numeric id for prompt
prompt - question for user
transcription - transcription of the audio response
votes - number of people that who approved a given transcript
age - age of the speaker1
gender - gender of the speaker1
language - language name
split - for data modelling, which subset of the data does this clip pertain to
char_per_sec - how many characters of transcription per second of audio
quality_tags - some automated assessment of the transcription--audio pair, separated by |
- transcription-length - character per second under 3 characters per second
- speech-rate - characters per second over 30 characters per second
- short-audio - audio length under 2 seconds
- long-audio - audio length over 5 minutes

Get involved

Community links

Discussions

Contribute

Acknowledgements

Datasheet authors

Jennifer Smith <jennifer.smith@glasgow.ac.uk>

Funding

This dataset was partially funded by the Open Multilingual Speech Fund managed by Mozilla Common Voice.

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.

Footnotes

For a full list of age, gender, and accent options, see the demographics spec. These will only be reported if the speaker opted in to provide that information. ↩ ↩2