Territórios Digitais

License:

CC-BY-4.0

Steward:

GriôTech

Task: N/A

Release Date: 4/2/2026

Format: DOCX, PDF, XLSX

Size: 4.24 MB

Description

This dataset was developed in accordance with ethical research practices commonly applied in social science and participatory research. All participants were informed about the purpose of the study, the nature of their participation, how the data would be used, and any potential risks involved. Participation was voluntary, and informed consent was obtained prior to data collection. Participants were given the opportunity to ask questions and could withdraw at any stage of the process. To protect participants, all data was anonymized, and direct and indirect identifiers were removed or generalized. Particular care was taken in handling data from small or closely connected communities, where the risk of re-identification may be higher. The research design prioritized minimizing harm and ensuring that the data collection process respected the dignity, agency, and safety of participants, especially those from marginalized and historically underrepresented groups. While the dataset is made publicly available, it is shared under conditions that require responsible and ethical use, in line with principles of privacy, confidentiality, and data protection.

Specifics

Licensing

Creative Commons Attribution 4.0 International (CC-BY-4.0)

https://spdx.org/licenses/CC-BY-4.0.html

Considerations

Restrictions/Special Constraints

This dataset is intended for research, public interest, and policy-related uses focused on information integrity, digital equity, and AI governance. Users must not attempt to re-identify individuals or communities represented in the dataset, nor combine this dataset with other sources for that purpose. Any use that may harm, stigmatize, misrepresent, or exploit the communities from which the data was collected is strictly prohibited. The dataset must not be used to support surveillance, profiling, or targeting of individuals or groups, particularly in ways that may reinforce structural inequalities. Users are expected to interpret and use the data with attention to its context, limitations, and the socio-political conditions in which it was produced

Forbidden Usage

Prohibited Uses * Any attempt to identify, re-identify, or infer the identity of individuals represented in this dataset is strictly prohibited. * The dataset may not be used for surveillance, monitoring, or profiling of individuals or groups. * It is prohibited to use the dataset to train, fine-tune, or develop voice cloning, speech synthesis, or any system designed to imitate the original speakers. * The dataset must not be used to generate deceptive content, including deepfakes, misinformation, or disinformation. * Commercial use of the dataset is not permitted without explicit authorization from the dataset owners. * The dataset may not be used in ways that violate human rights, promote discrimination, or cause harm to individuals or communities represented. * Use in high-risk automated decision-making systems (e.g., predictive policing, credit scoring, biometric surveillance) is prohibited. * Any use that violates applicable data protection and privacy laws is strictly prohibited.

Processes

Ethical Review

The dataset was developed following ethical research principles, with particular attention to informed consent, privacy, and risk mitigation. All participants were informed about the purpose of the data collection, how their data would be used, and any potential risks associated with participation. Consent procedures ensured that participation was voluntary, and participants had the right to withdraw at any stage of the process. When applicable, consent was documented explicitly, and additional care was taken when working with individuals or communities in situations of potential vulnerability. Data collection and processing were designed to minimize risks related to identification, misuse, or unintended harm. Measures such as data minimization, controlled access, and the exclusion of sensitive or personally identifiable information were applied where appropriate. The project follows applicable ethical guidelines and regulatory frameworks for research involving human subjects, including data protection and privacy standards. Where required, the research protocol may be submitted to or reviewed by an institutional ethics committee or equivalent body. Ongoing responsibility for ethical use is shared with dataset users, who are expected to comply with these principles and ensure that their applications do not result in harm, discrimination, or misuse.

Intended Use

This dataset is intended for research and development in speech and language technologies, with a focus on improving automatic speech recognition (ASR), linguistic analysis, and inclusive language technologies. It may also support studies on information ecosystems, communication patterns, and the development of public-interest AI tools. The dataset is particularly suited for applications that prioritize ethical AI, linguistic diversity, and the inclusion of underrepresented voices. It is not intended for use in biometric identification, surveillance systems, or any applications that may harm individuals or communities.

Metadata

This dataset was collected using a structured and ethically guided methodology, with attention to privacy, consent, and contextual integrity. Data collection processes prioritized minimizing risks to participants while preserving the analytical value of the material.

Pre-processing steps may include data cleaning, normalization, segmentation, and formatting to ensure consistency and usability across different research and technical applications. Any transformations applied were designed to preserve the original meaning and context of the data as much as possible.

The dataset reflects specific linguistic, cultural, and territorial contexts, which may influence how the data should be interpreted. Users are encouraged to consider these factors when conducting analysis or developing applications, particularly in cross-cultural or generalized use cases.

Some limitations may be present, including potential sampling biases, incomplete coverage of certain populations or regions, and contextual dependencies that may not be fully captured in the dataset itself. These limitations should be taken into account when drawing conclusions or building models.

This dataset is best suited for applications related to research, public interest technology, and the study of information ecosystems. Users are encouraged to document their use cases and share insights that contribute to transparency, reproducibility, and collective knowledge building.