World Factbook (JSON)

License icon

License:

CC0-1.0

Shield icon

Steward:

Taruen

Task: NLP

Release Date: 2/13/2026

Format: JSON

Size: 7.10 MB


Share

Description

This dataset contains the full text of the CIA World Factbook converted into machine-readable JSON. It covers over 260 world entities, organized hierarchically by region (e.g., Africa, Europe). It captures the final state of the public data (Jan 23, 2026) before the official website was retired. This archive includes two versions: 1. Standard Version: The popular factbook.json distribution. Clean, simple JSON ideal for NLP and apps. 2. Raw Cache Version: A structural mirror of the original cia.gov site. Includes media metadata (file paths) for flags and maps, which are missing from the standard version.

Specifics

Licensing

Creative Commons Zero v1.0 Universal (CC0-1.0)

https://spdx.org/licenses/CC0-1.0.html

Considerations

Restrictions/Special Constraints

None. US Government public domain work.

Forbidden Usage

None

Processes

Intended Use

This dataset is designed for use in Natural Language Processing (NLP) and Information Extraction tasks. It provides a highly structured, machine-readable corpus of geopolitical, demographic, and economic facts. Example applications include: • Grounding LLMs: Providing a reliable "source of truth" for RAG (Retrieval-Augmented Generation) systems regarding country-specific data. • Knowledge Graph Construction: Automating the population of knowledge bases with entities and attributes. • Historical Analysis: Serving as a definitive "final snapshot" of the 2026 World Factbook for comparative studies after the official CIA website's retirement. • Educational Tools: Powering interactive atlases and data visualization platforms.

Metadata

World Factbook (JSON)

Last Updated: January 23, 2026

Overview

This dataset contains the CIA World Factbook country profiles converted into machine-readable JSON. It covers over 260 world entities.

The data represents the final snapshot of the Factbook before the official website was retired on February 4, 2026.

This archive includes two versions:

  1. Standard: Clean, developer-friendly JSON (text only).

  2. Raw Cache: Strict copy of the original site structure (includes image paths).

1. Standard Data (/data/standard)

Best for: General use, apps, data science.

This is the "clean" distribution found in the primary factbook.json repository. It contains the core text profiles without the complex nesting of the raw site.

2. Raw Cache Data (/data/raw-cache)

Best for: Linking images, full reconstruction, digital archaeology.

This version preserves the exact structure of the original cia.gov website. Crucially, it contains the media fields with file paths to the original flags, maps, and locator images.

Linking Images

The raw cache contains paths like: "src": "/attachments/flags/AG-flag.jpg"

You can retrieve these images from the Internet Archive (Wayback Machine) or the separate media repository:

Content & Scope

Both versions cover ~260 world entities.

  • Regions: Africa, Antarctica, Australia-Oceania, Central America, Central Asia, East & SE Asia, Europe, Middle East, North America, Oceans, South America, South Asia, World.

Country Codes

This dataset uses the standard US Government GEC (formerly FIPS) country codes, which differ slightly from ISO-3166.

  • Germany: gm (FIPS) vs de (ISO)

  • Vietnam: vm (FIPS) vs vn (ISO)

Acknowledgements

This archive is built upon the open source work of Gerald Bauer.

  • Primary Maintainer: Gerald Bauer (factbook.json)

  • Archive Maintainer: Taruen (This distribution)

License

Public Domain (CC0 1.0 Universal) The original data is US Government public domain work. The JSON conversion is dedicated to the public domain.