ViQua² — Visual Question-answering about Quantities

License icon

License:

CC-BY-SA-4.0

Shield icon

Steward:

MDC Curators

Task: CV

Release Date: 4/6/2026

Format: JSON, JPEG

Size: 281.05 MB


Share

Description

This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.

Specifics

Licensing

Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)

https://spdx.org/licenses/CC-BY-SA-4.0.html

Considerations

Restrictions/Special Constraints

N/A

Forbidden Usage

N/A

Processes

Intended Use

To evaluate/benchmark multimodal vision models on their ability to distinguish, count, and reason about the quantities of common objects.

Metadata

ViQuA²: Visual Question Answering about Quantities: An evaluation dataset for visual reasoning about quantities.

This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.

Overview

This data is intended to be used to evaluate CV multimodal models on their ability to identify specific objects and keep track of quantities. Many cases simply involve counting, whereas some also require reading (e.g., a bag of some product with the number of items inside).

Data Collection

The data was collected by taking photos with mobile phones (Phone and ZTE Blade) of varying quantities of different household items and labeling them with the different quantities.