Common Voice v24 English - en-AU subset for Everything Open 2026
License:
CC0-1.0
Steward:
Common Voice
Task: ASR
Release Date: 1/21/2026
Format: CSV, MP3
Size: 1.92 GB
Share
Description
This is a subset of Common Voice v24 English filtered for Australian-clustered accents. It is designed to be used in conjunction with the hands-on Tutorial delivered at Everything Open 2026 in Canberra, Australia.
Specifics
Considerations
Restrictions/Special Constraints
-
Forbidden Usage
It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.
Processes
Ethical Review
This is a subset of Common Voice and the Common Voice collection process is documented at: https://commonvoice.mozillafoundation.org
Intended Use
This dataset is intended for use in fine-tuning automatic speech recognition systems to have better acoustic prediction on Australian English. This dataset does _not_ contain samples of **lexical** variation observed in Australian English.
Metadata
Tutorial information
Everything Open: https://2026.everythingopen.au
Tutorial overview: https://2026.everythingopen.au/schedule/presentation/6/
Tutorial GitHub repo: https://github.com/Mozilla-Data-Collective/tutorial-whisper-fine-tuning-australian-EO2026
Preprocessing information
This dataset was extracted from Common Voice v24 English by filtering on the accent field, after assessing the Australian-related accents in the dataset.
The duration of each clip was also calculated, to assist in identifying very long or short clips, and this is stored in ms in the field duration_ms.
File structure
audios=> contains the audio files in the formatid.mp3whereidis the unique identifier of the clip.commonvoice-v24_en-AU.csva CSV-formatted file.
The CSV fields are:
original row ID from Common Voice v24 English
client_id: unique identifier for each speakerpath: the filename of the audio filesentence_id: a unique identifier for each written sentencesentence_domain: a string description of the topic domain of the sentence (may be null)up_votes: integer indicating how many up votes this clip hasdown_votes: integer indicating how many down votes this clip has, allows for exclusionage: age range of speaker, if provided (may be null)gender: gender identify of speaker, if provided (may be null)accents: accent descriptorlocale: ISO-639 locale (all samples in this dataset areen)segment: not applicable to this dataset, included to provide interoperabilityduration_ms: duration in milliseconds of the audio file, calculated usinglibrosa
Composition
This dataset comprises 55673 rows of Australian-accented elicited (read) English speech.
The total length of time is approximately 4.68 minutes.
Accents represented
Australian EnglishGeneral AustralianSouth AustraliaEducated Australian AccentSydney - middle eastern seaboard AustralianQueenslandish