World Factbook (JSON)
License:
CC0-1.0
Steward:
TaruenTask: NLP
Release Date: 2/13/2026
Format: JSON
Size: 7.10 MB
Share
Description
This dataset contains the full text of the CIA World Factbook converted into machine-readable JSON. It covers over 260 world entities, organized hierarchically by region (e.g., Africa, Europe). It captures the final state of the public data (Jan 23, 2026) before the official website was retired. This archive includes two versions: 1. Standard Version: The popular factbook.json distribution. Clean, simple JSON ideal for NLP and apps. 2. Raw Cache Version: A structural mirror of the original cia.gov site. Includes media metadata (file paths) for flags and maps, which are missing from the standard version.
Specifics
Considerations
Restrictions/Special Constraints
None. US Government public domain work.
Forbidden Usage
None
Processes
Intended Use
This dataset is designed for use in Natural Language Processing (NLP) and Information Extraction tasks. It provides a highly structured, machine-readable corpus of geopolitical, demographic, and economic facts. Example applications include: • Grounding LLMs: Providing a reliable "source of truth" for RAG (Retrieval-Augmented Generation) systems regarding country-specific data. • Knowledge Graph Construction: Automating the population of knowledge bases with entities and attributes. • Historical Analysis: Serving as a definitive "final snapshot" of the 2026 World Factbook for comparative studies after the official CIA website's retirement. • Educational Tools: Powering interactive atlases and data visualization platforms.
Metadata
World Factbook (JSON)
Last Updated: January 23, 2026
Overview
This dataset contains the CIA World Factbook country profiles converted into machine-readable JSON. It covers over 260 world entities.
The data represents the final snapshot of the Factbook before the official website was retired on February 4, 2026.
This archive includes two versions:
Standard: Clean, developer-friendly JSON (text only).
Raw Cache: Strict copy of the original site structure (includes image paths).
1. Standard Data (/data/standard)
Best for: General use, apps, data science.
This is the "clean" distribution found in the primary factbook.json repository.
It contains the core text profiles without the complex nesting of the raw site.
Origin: Forked from the original work by Gerald Bauer at: https://github.com/factbook/factbook.json
Format: JSON files organized by region (e.g.,
europe/gm.json).
2. Raw Cache Data (/data/raw-cache)
Best for: Linking images, full reconstruction, digital archaeology.
This version preserves the exact structure of the original cia.gov website.
Crucially, it contains the media fields with file paths to the original
flags, maps, and locator images.
Origin: Forked from the original work by Gerald Bauer at: https://github.com/factbook/cache.factbook.json
Key Feature: Contains
srcpaths for images (e.g.,/attachments/flags/...).
Linking Images
The raw cache contains paths like:
"src": "/attachments/flags/AG-flag.jpg"
You can retrieve these images from the Internet Archive (Wayback Machine) or the separate media repository:
Media Repo: https://github.com/factbook/media
Content & Scope
Both versions cover ~260 world entities.
Regions: Africa, Antarctica, Australia-Oceania, Central America, Central Asia, East & SE Asia, Europe, Middle East, North America, Oceans, South America, South Asia, World.
Country Codes
This dataset uses the standard US Government GEC (formerly FIPS) country codes, which differ slightly from ISO-3166.
Germany:
gm(FIPS) vsde(ISO)Vietnam:
vm(FIPS) vsvn(ISO)
Acknowledgements
This archive is built upon the open source work of Gerald Bauer.
Primary Maintainer: Gerald Bauer (factbook.json)
Archive Maintainer: Taruen (This distribution)
License
Public Domain (CC0 1.0 Universal) The original data is US Government public domain work. The JSON conversion is dedicated to the public domain.