Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Certify the UK Populace data release `populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z` as the default UK dataset.
2 changes: 1 addition & 1 deletion docs/countries.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Override in any output with `income_variable=`.
| | Dataset |
|---|---|
| US | Enhanced CPS 2024 (`enhanced_cps_2024.h5`) |
| UK | Enhanced FRS 2023/24 (`enhanced_frs_2023_24.h5`) |
| UK | Populace UK 2023 (`populace_uk_2023.h5`) |

## State / regional breakdown

Expand Down
20 changes: 10 additions & 10 deletions docs/microsim.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ datasets = pe.us.ensure_datasets(
dataset = datasets["enhanced_cps_2024_2026"]
```

The default US dataset is **Enhanced CPS 2024** — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is **Enhanced FRS 2023/24** — the Family Resources Survey fused with tax-return microdata and calibrated to HMRC and DWP totals.
The default US dataset is **Enhanced CPS 2024** — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is **Populace UK 2023** — a Populace-built Family Resources Survey dataset calibrated to UK administrative targets.

List datasets already known to the country:

Expand All @@ -57,7 +57,7 @@ pe.us.load_datasets() # or pe.uk.load_datasets()

UK population data uses licensed Family Resources Survey inputs. The default
UK release bundle points to the private
`policyengine/policyengine-uk-data-private` Hugging Face model repository. Set
`policyengine/populace-uk-private` Hugging Face dataset repository. Set
`HUGGING_FACE_TOKEN` to a token from a Hugging Face account with access:

```bash
Expand All @@ -73,11 +73,11 @@ import policyengine as pe
from policyengine.core import Simulation

datasets = pe.uk.ensure_datasets(
datasets=["enhanced_frs_2023_24"],
datasets=["populace_uk_2023"],
years=[2026],
data_folder="./data",
)
dataset = datasets["enhanced_frs_2023_24_2026"]
dataset = datasets["populace_uk_2023_2026"]

simulation = Simulation(
dataset=dataset,
Expand All @@ -87,27 +87,27 @@ simulation.run()
```

To download the raw h5 artifact directly from Hugging Face, use
`huggingface_hub` and specify `repo_type="model"`:
`huggingface_hub` and specify `repo_type="dataset"`:

```python
import os
from huggingface_hub import hf_hub_download

path = hf_hub_download(
repo_id="policyengine/policyengine-uk-data-private",
filename="enhanced_frs_2023_24.h5",
repo_type="model",
repo_id="policyengine/populace-uk-private",
filename="populace_uk_2023.h5",
repo_type="dataset",
token=os.environ["HUGGING_FACE_TOKEN"],
)

print(path)
```

The repository URL is
<https://huggingface.co/policyengine/policyengine-uk-data-private>. A 404 from
<https://huggingface.co/datasets/policyengine/populace-uk-private>. A 404 from
the website or `RepositoryNotFoundError` from the Hub API usually means the
browser or token is not authenticated as an account with access, or that the
Hub call omitted `repo_type="model"`.
Hub call omitted `repo_type="dataset"`.

## Simulations

Expand Down
12 changes: 9 additions & 3 deletions docs/outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,20 +242,26 @@ for row in impacts.district_results:

### UK constituencies / local authorities

Constituency and local-authority breakdowns use externally-maintained weight matrices. The convenience helpers first look for the standard files locally, then download them from the PolicyEngine UK GCS bucket:
Constituency and local-authority breakdowns group household output rows by
dataset-provided longwise geography columns. Constituencies use
`constituency_code_oa`; local authorities use `la_code_oa`. Optional metadata
CSVs add names and map coordinates when available.

```python
from policyengine.outputs import compute_uk_constituency_impacts

impacts = compute_uk_constituency_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
year="2025",
)
impacts.constituency_results
```

`compute_uk_local_authority_impacts` follows the same pattern. Pass explicit paths to use specific local files instead of the default local/GCS lookup; missing explicit paths raise `FileNotFoundError` without falling back to GCS. Pass `download_missing_assets=False` to require the canonical files to exist locally or in the cache. Set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR` to choose the local lookup and download cache directory. See [Regions](regions.md).
`compute_uk_local_authority_impacts` follows the same pattern. Pass
`constituency_csv_path` or `local_authority_csv_path` to use a specific
metadata file; pass `download_missing_assets=False` to skip metadata downloads
and use code-only labels. Legacy matrix arguments are accepted for backward
compatibility but ignored. See [Regions](regions.md).

## Writing your own

Expand Down
21 changes: 15 additions & 6 deletions docs/regions.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,26 @@ for row in impacts.district_results:

## UK parliamentary constituencies

Constituency-level impacts reweight every household to each constituency's demographic profile using a pre-computed weight matrix. By default, PolicyEngine looks for the standard constituency files locally and downloads them from the PolicyEngine UK GCS bucket if they are not present:
Constituency-level impacts group household output rows by the longwise
`constituency_code_oa` column carried by the dataset. If the constituency CSV is
available locally or from the PolicyEngine UK GCS bucket, PolicyEngine uses it
to attach names and map coordinates; otherwise results still compute and use
the code as the label.

```python
from policyengine.outputs import compute_uk_constituency_impacts

impacts = compute_uk_constituency_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
year="2025",
)
impacts.constituency_results
```

To force specific local files, pass `weight_matrix_path` and `constituency_csv_path`. If either provided path is missing, the helper raises `FileNotFoundError` and does not fall back to GCS. To require the canonical files to be available locally or in the cache, pass `download_missing_assets=False`. To set a reusable local data directory and download cache, set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR`.
To force a specific metadata file, pass `constituency_csv_path`. To avoid
downloading metadata and fall back to code-only labels, pass
`download_missing_assets=False`. The legacy `weight_matrix_path` and `year`
arguments are accepted for backward compatibility but ignored.

## UK local authorities

Expand All @@ -71,12 +77,15 @@ from policyengine.outputs import compute_uk_local_authority_impacts
impacts = compute_uk_local_authority_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
year="2025",
)
impacts.local_authority_results
```

`compute_uk_local_authority_impacts` accepts explicit paths with `weight_matrix_path` and `local_authority_csv_path` when callers need to use specific local files instead of the default local/GCS lookup. It also accepts `download_missing_assets=False` for local-only canonical asset resolution.
Local-authority impacts follow the same longwise pattern using `la_code_oa`.
Pass `local_authority_csv_path` to use a specific metadata CSV, or
`download_missing_assets=False` to skip metadata download and use code-only
labels. The legacy `weight_matrix_path` and `year` arguments are accepted for
backward compatibility but ignored.

## Region registries

Expand Down Expand Up @@ -118,7 +127,7 @@ df.groupby("geo").apply(lambda g: (g["change"] * g["weight"]).sum() / g["weight"

## Scoping datasets to a region

For reforms defined only over a sub-national slice, pass a scoping strategy to `Simulation`. `RowFilterStrategy` keeps only matching households; `WeightReplacementStrategy` reweights the full sample to represent the region.
For reforms defined only over a sub-national slice, pass a scoping strategy to `Simulation`. `RowFilterStrategy` keeps only matching households. `WeightReplacementStrategy` is legacy matrix infrastructure and is not used by the UK Populace constituency or local-authority registry.

```python
from policyengine.core.scoping_strategy import RowFilterStrategy
Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ graph = [
"networkx>=3.0",
]
uk = [
"policyengine_core>=3.26.1",
"policyengine-uk==2.88.20",
"policyengine_core>=3.27.1",
"policyengine-uk==2.89.2",
]
us = [
"policyengine_core>=3.27.1",
Expand All @@ -63,7 +63,7 @@ dev = [
"pytest-asyncio>=0.26.0",
"ruff>=0.9.0",
"policyengine_core>=3.27.1",
"policyengine-uk==2.88.20",
"policyengine-uk==2.89.2",
"policyengine-us==1.729.0",
"towncrier>=24.8.0",
"mypy>=1.11.0",
Expand Down
10 changes: 5 additions & 5 deletions src/policyengine/core/scoping_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
1. RowFilterStrategy: Filters dataset rows where a household variable matches
a specific value (e.g., UK countries by 'country' field, US places by 'place_fips').

2. WeightReplacementStrategy: Replaces household weights from a pre-computed weight
matrix resolved locally or from GCS (e.g., UK constituencies and local authorities).
2. WeightReplacementStrategy: Legacy strategy that replaces household weights from
a pre-computed weight matrix resolved locally or from GCS.
"""

import logging
Expand Down Expand Up @@ -90,9 +90,9 @@ def cache_key(self) -> str:
class WeightReplacementStrategy(RegionScopingStrategy):
"""Scoping strategy that replaces household weights from a pre-computed matrix.

Used for UK constituencies and local authorities. Instead of removing
households, this strategy keeps all households but replaces their weights
with region-specific values from a locally cached or downloaded weight matrix.
Instead of removing households, this strategy keeps all households but
replaces their weights with region-specific values from a locally cached
or downloaded weight matrix.

The weight matrix is an HDF5 file with shape (N_regions x N_households),
where each row contains household weights for a specific region.
Expand Down
32 changes: 11 additions & 21 deletions src/policyengine/countries/uk/regions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,17 @@
- Constituencies (loaded from CSV at runtime)
- Local Authorities (loaded from CSV at runtime)

Note: Constituencies and local authorities use weight adjustment rather than
data filtering. They modify household_weight based on pre-computed weights
from H5 files stored in GCS.
Note: Constituencies and local authorities filter from the national dataset
using geography columns carried on each household. This keeps subnational
scoping tied to the dataset rows, not to a separate weight matrix whose
household dimension can drift from the default dataset.
"""

import logging
from typing import TYPE_CHECKING

from policyengine.core.region import Region, RegionRegistry
from policyengine.core.scoping_strategy import (
RowFilterStrategy,
WeightReplacementStrategy,
)
from policyengine.core.scoping_strategy import RowFilterStrategy
from policyengine.data.uk_geography_assets import (
CONSTITUENCY_ASSET_SPEC,
LOCAL_AUTHORITY_ASSET_SPEC,
Expand Down Expand Up @@ -153,7 +151,6 @@ def build_uk_region_registry(
)

# 3. Constituencies (optional, loaded from CSV)
# Note: These use weight replacement, not data filtering
if include_constituencies:
constituencies = _load_constituencies_from_csv()
for const in constituencies:
Expand All @@ -163,18 +160,14 @@ def build_uk_region_registry(
label=const["name"],
region_type="constituency",
parent_code="uk",
scoping_strategy=WeightReplacementStrategy(
weight_matrix_bucket=CONSTITUENCY_ASSET_SPEC.bucket,
weight_matrix_key=CONSTITUENCY_ASSET_SPEC.weight_matrix_filename,
lookup_csv_bucket=CONSTITUENCY_ASSET_SPEC.bucket,
lookup_csv_key=CONSTITUENCY_ASSET_SPEC.lookup_csv_filename,
region_code=const["code"],
scoping_strategy=RowFilterStrategy(
variable_name="constituency_code_oa",
variable_value=const["code"],
),
)
)

# 4. Local Authorities (optional, loaded from CSV)
# Note: These use weight replacement, not data filtering
if include_local_authorities:
local_authorities = _load_local_authorities_from_csv()
for la in local_authorities:
Expand All @@ -184,12 +177,9 @@ def build_uk_region_registry(
label=la["name"],
region_type="local_authority",
parent_code="uk",
scoping_strategy=WeightReplacementStrategy(
weight_matrix_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket,
weight_matrix_key=LOCAL_AUTHORITY_ASSET_SPEC.weight_matrix_filename,
lookup_csv_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket,
lookup_csv_key=LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename,
region_code=la["code"],
scoping_strategy=RowFilterStrategy(
variable_name="la_code_oa",
variable_value=la["code"],
),
)
)
Expand Down
84 changes: 53 additions & 31 deletions src/policyengine/data/release_manifests/uk.json
Original file line number Diff line number Diff line change
@@ -1,53 +1,75 @@
{
"schema_version": 1,
"bundle_id": "uk-4.17.9",
"country_id": "uk",
"policyengine_version": "4.17.9",
"model_package": {
"name": "policyengine-uk",
"version": "2.88.20",
"sha256": "8c3dacb868f3fb18296b8ef2475edaf543f57b8056d24a58bca59b108651f272",
"wheel_url": "https://files.pythonhosted.org/packages/32/f0/c0e7dbcc049501dc968da0a67de4976f305228328f96fe0ad08c65301c4f/policyengine_uk-2.88.20-py3-none-any.whl"
},
"data_package": {
"name": "policyengine-uk-data",
"version": "1.55.10",
"repo_id": "policyengine/policyengine-uk-data-private",
"release_manifest_path": "release_manifest.json",
"release_manifest_revision": "655dd07e4bb9c777b00dac044949611f1feb824f"
"certification": {
"built_with_model_version": "2.89.2",
"certified_by": "policyengine.py certification",
"certified_for_model_version": "2.89.2",
"compatibility_basis": "built_with_model_package",
"data_build_id": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z"
},
"certified_data_artifact": {
"build_id": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z",
"data_package": {
"name": "policyengine-uk-data",
"version": "1.55.10"
"name": "populace-data",
"version": "0.1.0"
},
"build_id": "policyengine-uk-data-1.55.10",
"dataset": "enhanced_frs_2023_24",
"uri": "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f",
"sha256": "584ae33d80ca0431254610a3f8254d132da73477d31966d6446282861ecae50d"
"dataset": "populace_uk_2023",
"sha256": "f17306ccb2aad7ff0130be3589b560afb2e2a12a943570911cd0c77f07934833",
"uri": "hf://policyengine/populace-uk-private/populace_uk_2023.h5@populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z"
},
"certification": {
"compatibility_basis": "exact_build_model_version",
"data_build_id": "policyengine-uk-data-1.55.10",
"built_with_model_version": "2.88.20",
"certified_for_model_version": "2.88.20",
"data_build_fingerprint": "sha256:77f149725a36055fd89961855230401852b0712d301c6e26d6d16565c6b23809",
"certified_by": "policyengine.py bundled manifest"
"country_id": "uk",
"data_package": {
"name": "populace-data",
"release_manifest_path": "releases/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/release_manifest.json",
"release_manifest_revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z",
"repo_id": "policyengine/populace-uk-private",
"repo_type": "dataset",
"version": "0.1.0"
},
"default_dataset": "enhanced_frs_2023_24",
"datasets": {
"frs_2023_24": {
"path": "frs_2023_24.h5",
"repo_id": "policyengine/policyengine-uk-data-private",
"revision": "655dd07e4bb9c777b00dac044949611f1feb824f",
"sha256": "df26d4d7af9d164aa2d064181b39290292d2f62bb26fee6126fc095fc06da292"
},
"enhanced_frs_2023_24": {
"path": "enhanced_frs_2023_24.h5",
"repo_id": "policyengine/policyengine-uk-data-private",
"revision": "655dd07e4bb9c777b00dac044949611f1feb824f",
"sha256": "584ae33d80ca0431254610a3f8254d132da73477d31966d6446282861ecae50d"
},
"calibration_diagnostics": {
"path": "releases/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/calibration_diagnostics.json",
"repo_id": "policyengine/populace-uk-private",
"revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z",
"sha256": "80b98127020aafb049846e0877a3818476aaf7adf13539d62d512fdd6727745d"
},
"populace_uk_2023": {
"path": "populace_uk_2023.h5",
"repo_id": "policyengine/populace-uk-private",
"revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z",
"sha256": "f17306ccb2aad7ff0130be3589b560afb2e2a12a943570911cd0c77f07934833"
},
"populace_uk_2023_calibration": {
"path": "populace_uk_2023_calibration.npz",
"repo_id": "policyengine/populace-uk-private",
"revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z",
"sha256": "fb2fc115fbae53a501b8acbc1529f319b9e07b74478c7bd02d00c674d4c10022"
}
},
"default_dataset": "populace_uk_2023",
"model_package": {
"name": "policyengine-uk",
"sha256": "80965d3dd7dc767db9b083820d40262ce543020d5a8880a0cf88da10ae641b24",
"version": "2.89.2",
"wheel_url": "https://files.pythonhosted.org/packages/83/db/ce3154ba69b6fcd1e9e922ceee705ef4ddb1f81553da1e63b9296e74a4dc/policyengine_uk-2.89.2-py3-none-any.whl"
},
"policyengine_version": "4.17.9",
"region_datasets": {
"national": {
"path_template": "enhanced_frs_2023_24.h5"
"path_template": "populace_uk_2023.h5"
}
}
},
"schema_version": 1
}
Loading