UP - DSP - Philippine Languages Database (UP-DSP-PLD)

License icon

License:

CC-BY-NC-4.0

Shield icon

Steward:

UP EEEI - Digital Signal Processing Laboratory

Task: ASR

Release Date: 3/19/2026

Format: WAV, LOG

Size: 45.63 GB


Share

Description

This dataset contains multilingual, text and speech pairs for ten Philippine languages namely Filipino, English, Cebuano, Kapampangan, Hiligaynon, Ilokano, Bikolano, Waray, and Tausug. The dataset contains over 454 hours of recordings, covering multiple domains in news, medical, education, tourism and spontaneous speech. The applicability of the corpus has also been demonstrated in adult and children ASR, phoneme transcriber, voice conversion, and TTS applications.

Specifics

Licensing

Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC-4.0)

https://spdx.org/licenses/CC-BY-NC-4.0.html

Considerations

Restrictions/Special Constraints

The dataset is open source and provided “as is,” without warranties or guarantees of any kind, and without any obligation to provide technical support. The dataset may be used solely for research purposes. Redistribution, sharing with third parties, or use of the dataset for commercial purposes is strictly prohibited without prior written consent.

Forbidden Usage

You agree not to attempt to determine the identity of speakers in this dataset.

Processes

Ethical Review

Participants were informed about the details of the data collection process, including the project’s funding information, the scope on the use of the recordings (for research purposes only), their rights regarding access and withdrawal of their recordings, and the anonymization of their personal data prior to the release of the corpus. Participants were made to sign a participation agreement document.

Intended Use

This dataset is designed for research in ASR, phoneme transcription, voice conversion, and TTS applications.

Metadata