Common Voice v24 English - en-AU subset for Everything Open 2026

License icon

License:

CC0-1.0

Shield icon

Steward:

Common Voice

Task: ASR

Release Date: 1/21/2026

Format: CSV, MP3

Size: 1.92 GB


Share

Description

This is a subset of Common Voice v24 English filtered for Australian-clustered accents. It is designed to be used in conjunction with the hands-on Tutorial delivered at Everything Open 2026 in Canberra, Australia.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Restrictions/Special Constraints

-

Forbidden Usage

It is forbidden to attempt to determine the identity of speakers in the Common Voice datasets. It is forbidden to re-host or re-share this dataset.

Processes

Ethical Review

This is a subset of Common Voice and the Common Voice collection process is documented at: https://commonvoice.mozillafoundation.org

Intended Use

This dataset is intended for use in fine-tuning automatic speech recognition systems to have better acoustic prediction on Australian English. This dataset does _not_ contain samples of **lexical** variation observed in Australian English.

Metadata

Tutorial information

Preprocessing information

This dataset was extracted from Common Voice v24 English by filtering on the accent field, after assessing the Australian-related accents in the dataset.

The duration of each clip was also calculated, to assist in identifying very long or short clips, and this is stored in ms in the field duration_ms.

File structure

  • audios => contains the audio files in the format id.mp3 where id is the unique identifier of the clip.

  • commonvoice-v24_en-AU.csv a CSV-formatted file.

The CSV fields are:

  • original row ID from Common Voice v24 English

  • client_id: unique identifier for each speaker

  • path: the filename of the audio file

  • sentence_id: a unique identifier for each written sentence

  • sentence_domain: a string description of the topic domain of the sentence (may be null)

  • up_votes: integer indicating how many up votes this clip has

  • down_votes: integer indicating how many down votes this clip has, allows for exclusion

  • age: age range of speaker, if provided (may be null)

  • gender: gender identify of speaker, if provided (may be null)

  • accents: accent descriptor

  • locale: ISO-639 locale (all samples in this dataset are en)

  • segment: not applicable to this dataset, included to provide interoperability

  • duration_ms: duration in milliseconds of the audio file, calculated using librosa

Composition

This dataset comprises 55673 rows of Australian-accented elicited (read) English speech.

The total length of time is approximately 4.68 minutes.

Accents represented

  • Australian English

  • General Australian

  • South Australia

  • Educated Australian Accent

  • Sydney - middle eastern seaboard Australian

  • Queenslandish