Common Voice Spontaneous Speech 1.0 - Scots

Locale: sco

Size: 228 MB

Task: ASR

Format: MP3

License: CC-0


Scots — Scots (sco)

This datasheet has been generated automatically, we would love to include more information, if you would like to help out, get in touch!

This datasheet is for version 23.0 of the the Mozilla Common Voice Spontaneous Speech dataset for Scots (sco). The dataset contains 715 clips representing 12 hours of recorded speech (11 hours validated) from 21 speakers.

Demographic information

The dataset includes the following distribution of age and gender.

Gender

Self-declared gender information, frequency refers to the number of clips annotated with this gender.

Age

Self-declared age information, frequency refers to the number of clips annotated with this age band.

Data splits for modelling

SplitCount
Train452
Dev161

Transcriptions

  • Prompts: 47

  • Duration: 40234608[ms]

  • Avg. Transcription Len: 725

  • Avg. Duration: 56.27[s]

  • Valid Duration: 38478.56[s]

  • Total hours: 11.18[h]

  • Valid hours: 10.69[h]

Questions

There follows a randomly selected sample of questions used in the corpus.

Describe where you grew up.
If you could learn any skill, what would it be?
What’s the most precious object you own?
What was your favourite subject in school and why?
What’s the most expensive thing you’ve ever bought?
Responses

There follows a randomly selected sample of transcribed responses from the corpus.

Some of the things that I like about where I live is that it’s a quiet area, eh, and you know the neighbours are pretty nosey, which is good, eh and they look after, eh, me and my daughter. Eh, I like that it’s away from the main road, I like that it’s quiet, eh, and I’ve always felt safe there. Eh, what I don’t like about it, eh, that’s really quite hard to answer, ‘cause I can’t think of anything. Eh, no can’t think of anything about- in general in Aberdeen I think eh, I like living here ‘cause it’s all I’ve known, eh, so I feel safe and, eh, secure here. What I don’t like about it is that it’s not very like nice to look at these days, there’s a lot of empty buildings, eh, and it doesn’t seem to be as, eh, productive of a place as I think it could be in terms of other cities like Glasgow and Edinburgh. Eh, so yeah, I feel like eh, Aberdeen can be a bit- I don’t know, unused I suppose to its full potential.
I wish I had travelled more, and I wish I’d actually enjoyed school better, eh, to learn a lot more. However, when I was older I preferred- I went to college, but not as in for to do anything- it was more a course that was called ‘Fresh Start for Women’ and it gave you a wee taster of different things, you know, it gave you a taster of English, eh, it- maths, which I- as I said, I absolutely hated, eh, counselling, we were- I wanted to be a counsellor, so yeah, as in to help other people. However, when I was doing my training for that, my husband was getting paid off from his work, and I knew that I had to try and get a job for- to have money coming in, so I had to give up my college course, so yeah I think I wish that I’d actually stuck at that and become a counsellor. I know they say it’s never too late, but I think in time you know, you sort of lose the- the will to actual learn new things, eh so. So that’s no going to happen now, but yeah that’s a- I think that’s one of the things that I wish that I’d done was to stay on and study for- to be a counsellor.
Well, I dinna really want to describe a typical day in my job because I’m retired now and, eh, I just remember it as being- I liked my job and I like the folk I work with but life was just so hectic, ken?  Um, so I could do- well, I could do a typical day in, um, when I do invigilation at the school because I do that sometimes. So, I have to be there early, I have to be there before eight o’ clock. So when it’s the exams in May, that’s alright because it’s nice and bright and lightie in the morning, um, and you get there and then you access the papers at eight o’ clock because you cannae access them mair than an hour afore the exam starts which is normally nine o’ clock. Issue a’ the papers to the invigilators and then wait for them being returned and then package them a’ up and send them off to the SQA, so a lot of admin but fine ‘cause you’re no having to do any prep work at home, you’re no taking any work home with you. You just go in, do your job and then you leave and abody’s awfu' nice.
Oh favourite toy would’ve definitely, fann I was growing up, would’ve been Subbuteo, the football game. Eh, played a lot of that with my pals as young- we were kind of fairly compulsive obsessives as young loons and used to go roond to each other’s hooses and ended up having a, eh, a- like a little league, a little tournament among half a dozen of us. So undoubtedly- I think I’ve still got it in the hoose somewaie. Undoubtedly, it would be Subbuteo.
Well, if you’d asked me this ten year ago I’d have always said a big night oot on the toon or fittever, but nooadays ‘cause I’m getting older I think that’s less and less my idea of fun. Basically, my ideal party or celebration is something that doesnae involve wearing high heels, because they’re just miserable. Eh probably my ideal party would just be at somebody’s hoose nooadays. Somebody’s hoose, eh fine and comfy, you’ve got your own drink that doesnae cost a fortune, eh you dinna hae to deal with all they drunk folk coming up to you in the pub and speaking nonsense to you, eh you’re with your own pals. Eh yeah, that’s probably my ideal party. I’ve never really had a party party, ken like some folk for their eighteenth or their twenty-first or thirtieth or fittever, they have a big party in a function hall. I’ve never done that kind of partying ‘cause I’ve- I think I would aieways be feart that nobody would come, or I wouldnae hae enough pals or I wouldnae ken enough folk to hae a decent sized party, so I’ve never had that kind of party. But yeah, I probably prefer something low-key, with folk that I like, eh folk that I feel comfortable with, and eh yeah, we can just mak our own fun.

Fields

Each row of a tsv file represents a single audio clip, and contains the following information:

  • client_id - hashed UUID of a given user

  • audio_id - numeric id for audio file

  • audio_file - audio file name

  • duration_ms - duration of audio in milliseconds

  • prompt_id - numeric id for prompt

  • prompt - question for user

  • transcription - transcription of the audio response

  • votes - number of people that who approved a given transcript

  • age - age of the speaker1

  • gender - gender of the speaker1

  • language - language name

  • split - for data modelling, which subset of the data does this clip pertain to

  • char_per_sec - how many characters of transcription per second of audio

  • quality_tags - some automated assessment of the transcription--audio pair, separated by |

    • transcription-length - character per second under 3 characters per second

    • speech-rate - characters per second over 30 characters per second

    • short-audio - audio length under 2 seconds

    • long-audio - audio length over 30 seconds

Community links

Contribute

Licence

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.

Footnotes

  1. For a full list of age, gender, and accent options, see the demograpics spec. These will only be reported if the speaker opted in to provide that information. 2