ViQua² — Visual Question-answering about Quantities
License:
CC-BY-SA-4.0
Steward:
MDC CuratorsTask: CV
Release Date: 4/6/2026
Format: JSON, JPEG
Size: 281.05 MB
Share
Description
This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.
Specifics
Licensing
Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0)
https://spdx.org/licenses/CC-BY-SA-4.0.htmlConsiderations
Restrictions/Special Constraints
N/A
Forbidden Usage
N/A
Processes
Intended Use
To evaluate/benchmark multimodal vision models on their ability to distinguish, count, and reason about the quantities of common objects.
Metadata
ViQuA²: Visual Question Answering about Quantities: An evaluation dataset for visual reasoning about quantities.
This dataset provides 116 images of varying quantities of a range of different objects, each with an accompanying question about the quantity in the image and a numeric answer. For example, an image of a table with multiple fruits and vegetables, with the question "How many carrots?". Other examples of objects appearing in this dataset: rocks, brooms, clothes pins, money, motorcycles, screws, paperclips.
Overview
This data is intended to be used to evaluate CV multimodal models on their ability to identify specific objects and keep track of quantities. Many cases simply involve counting, whereas some also require reading (e.g., a bag of some product with the number of items inside).
Data Collection
The data was collected by taking photos with mobile phones (Phone and ZTE Blade) of varying quantities of different household items and labeling them with the different quantities.