diff --git a/changelog.d/certify-uk-populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z.changed.md b/changelog.d/certify-uk-populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z.changed.md new file mode 100644 index 00000000..52b8ed4f --- /dev/null +++ b/changelog.d/certify-uk-populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z.changed.md @@ -0,0 +1 @@ +Certify the UK Populace data release `populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z` as the default UK dataset. diff --git a/docs/countries.md b/docs/countries.md index 959a2aba..29864bea 100644 --- a/docs/countries.md +++ b/docs/countries.md @@ -33,7 +33,7 @@ Override in any output with `income_variable=`. | | Dataset | |---|---| | US | Enhanced CPS 2024 (`enhanced_cps_2024.h5`) | -| UK | Enhanced FRS 2023/24 (`enhanced_frs_2023_24.h5`) | +| UK | Populace UK 2023 (`populace_uk_2023.h5`) | ## State / regional breakdown diff --git a/docs/microsim.md b/docs/microsim.md index d531db6d..5804431c 100644 --- a/docs/microsim.md +++ b/docs/microsim.md @@ -45,7 +45,7 @@ datasets = pe.us.ensure_datasets( dataset = datasets["enhanced_cps_2024_2026"] ``` -The default US dataset is **Enhanced CPS 2024** — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is **Enhanced FRS 2023/24** — the Family Resources Survey fused with tax-return microdata and calibrated to HMRC and DWP totals. +The default US dataset is **Enhanced CPS 2024** — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is **Populace UK 2023** — a Populace-built Family Resources Survey dataset calibrated to UK administrative targets. List datasets already known to the country: @@ -57,7 +57,7 @@ pe.us.load_datasets() # or pe.uk.load_datasets() UK population data uses licensed Family Resources Survey inputs. The default UK release bundle points to the private -`policyengine/policyengine-uk-data-private` Hugging Face model repository. Set +`policyengine/populace-uk-private` Hugging Face dataset repository. Set `HUGGING_FACE_TOKEN` to a token from a Hugging Face account with access: ```bash @@ -73,11 +73,11 @@ import policyengine as pe from policyengine.core import Simulation datasets = pe.uk.ensure_datasets( - datasets=["enhanced_frs_2023_24"], + datasets=["populace_uk_2023"], years=[2026], data_folder="./data", ) -dataset = datasets["enhanced_frs_2023_24_2026"] +dataset = datasets["populace_uk_2023_2026"] simulation = Simulation( dataset=dataset, @@ -87,16 +87,16 @@ simulation.run() ``` To download the raw h5 artifact directly from Hugging Face, use -`huggingface_hub` and specify `repo_type="model"`: +`huggingface_hub` and specify `repo_type="dataset"`: ```python import os from huggingface_hub import hf_hub_download path = hf_hub_download( - repo_id="policyengine/policyengine-uk-data-private", - filename="enhanced_frs_2023_24.h5", - repo_type="model", + repo_id="policyengine/populace-uk-private", + filename="populace_uk_2023.h5", + repo_type="dataset", token=os.environ["HUGGING_FACE_TOKEN"], ) @@ -104,10 +104,10 @@ print(path) ``` The repository URL is -. A 404 from +. A 404 from the website or `RepositoryNotFoundError` from the Hub API usually means the browser or token is not authenticated as an account with access, or that the -Hub call omitted `repo_type="model"`. +Hub call omitted `repo_type="dataset"`. ## Simulations diff --git a/docs/outputs.md b/docs/outputs.md index d2fffbdf..2bab1cd1 100644 --- a/docs/outputs.md +++ b/docs/outputs.md @@ -242,7 +242,10 @@ for row in impacts.district_results: ### UK constituencies / local authorities -Constituency and local-authority breakdowns use externally-maintained weight matrices. The convenience helpers first look for the standard files locally, then download them from the PolicyEngine UK GCS bucket: +Constituency and local-authority breakdowns group household output rows by +dataset-provided longwise geography columns. Constituencies use +`constituency_code_oa`; local authorities use `la_code_oa`. Optional metadata +CSVs add names and map coordinates when available. ```python from policyengine.outputs import compute_uk_constituency_impacts @@ -250,12 +253,15 @@ from policyengine.outputs import compute_uk_constituency_impacts impacts = compute_uk_constituency_impacts( baseline_simulation=baseline, reform_simulation=reform, - year="2025", ) impacts.constituency_results ``` -`compute_uk_local_authority_impacts` follows the same pattern. Pass explicit paths to use specific local files instead of the default local/GCS lookup; missing explicit paths raise `FileNotFoundError` without falling back to GCS. Pass `download_missing_assets=False` to require the canonical files to exist locally or in the cache. Set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR` to choose the local lookup and download cache directory. See [Regions](regions.md). +`compute_uk_local_authority_impacts` follows the same pattern. Pass +`constituency_csv_path` or `local_authority_csv_path` to use a specific +metadata file; pass `download_missing_assets=False` to skip metadata downloads +and use code-only labels. Legacy matrix arguments are accepted for backward +compatibility but ignored. See [Regions](regions.md). ## Writing your own diff --git a/docs/regions.md b/docs/regions.md index df76e2bf..f6fda56e 100644 --- a/docs/regions.md +++ b/docs/regions.md @@ -48,7 +48,11 @@ for row in impacts.district_results: ## UK parliamentary constituencies -Constituency-level impacts reweight every household to each constituency's demographic profile using a pre-computed weight matrix. By default, PolicyEngine looks for the standard constituency files locally and downloads them from the PolicyEngine UK GCS bucket if they are not present: +Constituency-level impacts group household output rows by the longwise +`constituency_code_oa` column carried by the dataset. If the constituency CSV is +available locally or from the PolicyEngine UK GCS bucket, PolicyEngine uses it +to attach names and map coordinates; otherwise results still compute and use +the code as the label. ```python from policyengine.outputs import compute_uk_constituency_impacts @@ -56,12 +60,14 @@ from policyengine.outputs import compute_uk_constituency_impacts impacts = compute_uk_constituency_impacts( baseline_simulation=baseline, reform_simulation=reform, - year="2025", ) impacts.constituency_results ``` -To force specific local files, pass `weight_matrix_path` and `constituency_csv_path`. If either provided path is missing, the helper raises `FileNotFoundError` and does not fall back to GCS. To require the canonical files to be available locally or in the cache, pass `download_missing_assets=False`. To set a reusable local data directory and download cache, set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR`. +To force a specific metadata file, pass `constituency_csv_path`. To avoid +downloading metadata and fall back to code-only labels, pass +`download_missing_assets=False`. The legacy `weight_matrix_path` and `year` +arguments are accepted for backward compatibility but ignored. ## UK local authorities @@ -71,12 +77,15 @@ from policyengine.outputs import compute_uk_local_authority_impacts impacts = compute_uk_local_authority_impacts( baseline_simulation=baseline, reform_simulation=reform, - year="2025", ) impacts.local_authority_results ``` -`compute_uk_local_authority_impacts` accepts explicit paths with `weight_matrix_path` and `local_authority_csv_path` when callers need to use specific local files instead of the default local/GCS lookup. It also accepts `download_missing_assets=False` for local-only canonical asset resolution. +Local-authority impacts follow the same longwise pattern using `la_code_oa`. +Pass `local_authority_csv_path` to use a specific metadata CSV, or +`download_missing_assets=False` to skip metadata download and use code-only +labels. The legacy `weight_matrix_path` and `year` arguments are accepted for +backward compatibility but ignored. ## Region registries @@ -118,7 +127,7 @@ df.groupby("geo").apply(lambda g: (g["change"] * g["weight"]).sum() / g["weight" ## Scoping datasets to a region -For reforms defined only over a sub-national slice, pass a scoping strategy to `Simulation`. `RowFilterStrategy` keeps only matching households; `WeightReplacementStrategy` reweights the full sample to represent the region. +For reforms defined only over a sub-national slice, pass a scoping strategy to `Simulation`. `RowFilterStrategy` keeps only matching households. `WeightReplacementStrategy` is legacy matrix infrastructure and is not used by the UK Populace constituency or local-authority registry. ```python from policyengine.core.scoping_strategy import RowFilterStrategy diff --git a/pyproject.toml b/pyproject.toml index 0b6f36a1..441f1bcd 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -44,8 +44,8 @@ graph = [ "networkx>=3.0", ] uk = [ - "policyengine_core>=3.26.1", - "policyengine-uk==2.88.20", + "policyengine_core>=3.27.1", + "policyengine-uk==2.89.2", ] us = [ "policyengine_core>=3.27.1", @@ -63,7 +63,7 @@ dev = [ "pytest-asyncio>=0.26.0", "ruff>=0.9.0", "policyengine_core>=3.27.1", - "policyengine-uk==2.88.20", + "policyengine-uk==2.89.2", "policyengine-us==1.729.0", "towncrier>=24.8.0", "mypy>=1.11.0", diff --git a/src/policyengine/core/scoping_strategy.py b/src/policyengine/core/scoping_strategy.py index c77b06d5..2cbc8490 100644 --- a/src/policyengine/core/scoping_strategy.py +++ b/src/policyengine/core/scoping_strategy.py @@ -5,8 +5,8 @@ 1. RowFilterStrategy: Filters dataset rows where a household variable matches a specific value (e.g., UK countries by 'country' field, US places by 'place_fips'). -2. WeightReplacementStrategy: Replaces household weights from a pre-computed weight - matrix resolved locally or from GCS (e.g., UK constituencies and local authorities). +2. WeightReplacementStrategy: Legacy strategy that replaces household weights from + a pre-computed weight matrix resolved locally or from GCS. """ import logging @@ -90,9 +90,9 @@ def cache_key(self) -> str: class WeightReplacementStrategy(RegionScopingStrategy): """Scoping strategy that replaces household weights from a pre-computed matrix. - Used for UK constituencies and local authorities. Instead of removing - households, this strategy keeps all households but replaces their weights - with region-specific values from a locally cached or downloaded weight matrix. + Instead of removing households, this strategy keeps all households but + replaces their weights with region-specific values from a locally cached + or downloaded weight matrix. The weight matrix is an HDF5 file with shape (N_regions x N_households), where each row contains household weights for a specific region. diff --git a/src/policyengine/countries/uk/regions.py b/src/policyengine/countries/uk/regions.py index 7aceabd2..e6ea8c14 100644 --- a/src/policyengine/countries/uk/regions.py +++ b/src/policyengine/countries/uk/regions.py @@ -6,19 +6,17 @@ - Constituencies (loaded from CSV at runtime) - Local Authorities (loaded from CSV at runtime) -Note: Constituencies and local authorities use weight adjustment rather than -data filtering. They modify household_weight based on pre-computed weights -from H5 files stored in GCS. +Note: Constituencies and local authorities filter from the national dataset +using geography columns carried on each household. This keeps subnational +scoping tied to the dataset rows, not to a separate weight matrix whose +household dimension can drift from the default dataset. """ import logging from typing import TYPE_CHECKING from policyengine.core.region import Region, RegionRegistry -from policyengine.core.scoping_strategy import ( - RowFilterStrategy, - WeightReplacementStrategy, -) +from policyengine.core.scoping_strategy import RowFilterStrategy from policyengine.data.uk_geography_assets import ( CONSTITUENCY_ASSET_SPEC, LOCAL_AUTHORITY_ASSET_SPEC, @@ -153,7 +151,6 @@ def build_uk_region_registry( ) # 3. Constituencies (optional, loaded from CSV) - # Note: These use weight replacement, not data filtering if include_constituencies: constituencies = _load_constituencies_from_csv() for const in constituencies: @@ -163,18 +160,14 @@ def build_uk_region_registry( label=const["name"], region_type="constituency", parent_code="uk", - scoping_strategy=WeightReplacementStrategy( - weight_matrix_bucket=CONSTITUENCY_ASSET_SPEC.bucket, - weight_matrix_key=CONSTITUENCY_ASSET_SPEC.weight_matrix_filename, - lookup_csv_bucket=CONSTITUENCY_ASSET_SPEC.bucket, - lookup_csv_key=CONSTITUENCY_ASSET_SPEC.lookup_csv_filename, - region_code=const["code"], + scoping_strategy=RowFilterStrategy( + variable_name="constituency_code_oa", + variable_value=const["code"], ), ) ) # 4. Local Authorities (optional, loaded from CSV) - # Note: These use weight replacement, not data filtering if include_local_authorities: local_authorities = _load_local_authorities_from_csv() for la in local_authorities: @@ -184,12 +177,9 @@ def build_uk_region_registry( label=la["name"], region_type="local_authority", parent_code="uk", - scoping_strategy=WeightReplacementStrategy( - weight_matrix_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket, - weight_matrix_key=LOCAL_AUTHORITY_ASSET_SPEC.weight_matrix_filename, - lookup_csv_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket, - lookup_csv_key=LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename, - region_code=la["code"], + scoping_strategy=RowFilterStrategy( + variable_name="la_code_oa", + variable_value=la["code"], ), ) ) diff --git a/src/policyengine/data/release_manifests/uk.json b/src/policyengine/data/release_manifests/uk.json index 8497d1a2..b3a9c86e 100644 --- a/src/policyengine/data/release_manifests/uk.json +++ b/src/policyengine/data/release_manifests/uk.json @@ -1,53 +1,75 @@ { - "schema_version": 1, "bundle_id": "uk-4.17.9", - "country_id": "uk", - "policyengine_version": "4.17.9", - "model_package": { - "name": "policyengine-uk", - "version": "2.88.20", - "sha256": "8c3dacb868f3fb18296b8ef2475edaf543f57b8056d24a58bca59b108651f272", - "wheel_url": "https://files.pythonhosted.org/packages/32/f0/c0e7dbcc049501dc968da0a67de4976f305228328f96fe0ad08c65301c4f/policyengine_uk-2.88.20-py3-none-any.whl" - }, - "data_package": { - "name": "policyengine-uk-data", - "version": "1.55.10", - "repo_id": "policyengine/policyengine-uk-data-private", - "release_manifest_path": "release_manifest.json", - "release_manifest_revision": "655dd07e4bb9c777b00dac044949611f1feb824f" + "certification": { + "built_with_model_version": "2.89.2", + "certified_by": "policyengine.py certification", + "certified_for_model_version": "2.89.2", + "compatibility_basis": "built_with_model_package", + "data_build_id": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z" }, "certified_data_artifact": { + "build_id": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", "data_package": { - "name": "policyengine-uk-data", - "version": "1.55.10" + "name": "populace-data", + "version": "0.1.0" }, - "build_id": "policyengine-uk-data-1.55.10", - "dataset": "enhanced_frs_2023_24", - "uri": "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f", - "sha256": "584ae33d80ca0431254610a3f8254d132da73477d31966d6446282861ecae50d" + "dataset": "populace_uk_2023", + "sha256": "f17306ccb2aad7ff0130be3589b560afb2e2a12a943570911cd0c77f07934833", + "uri": "hf://policyengine/populace-uk-private/populace_uk_2023.h5@populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z" }, - "certification": { - "compatibility_basis": "exact_build_model_version", - "data_build_id": "policyengine-uk-data-1.55.10", - "built_with_model_version": "2.88.20", - "certified_for_model_version": "2.88.20", - "data_build_fingerprint": "sha256:77f149725a36055fd89961855230401852b0712d301c6e26d6d16565c6b23809", - "certified_by": "policyengine.py bundled manifest" + "country_id": "uk", + "data_package": { + "name": "populace-data", + "release_manifest_path": "releases/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/release_manifest.json", + "release_manifest_revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", + "repo_id": "policyengine/populace-uk-private", + "repo_type": "dataset", + "version": "0.1.0" }, - "default_dataset": "enhanced_frs_2023_24", "datasets": { "frs_2023_24": { "path": "frs_2023_24.h5", + "repo_id": "policyengine/policyengine-uk-data-private", + "revision": "655dd07e4bb9c777b00dac044949611f1feb824f", "sha256": "df26d4d7af9d164aa2d064181b39290292d2f62bb26fee6126fc095fc06da292" }, "enhanced_frs_2023_24": { "path": "enhanced_frs_2023_24.h5", + "repo_id": "policyengine/policyengine-uk-data-private", + "revision": "655dd07e4bb9c777b00dac044949611f1feb824f", "sha256": "584ae33d80ca0431254610a3f8254d132da73477d31966d6446282861ecae50d" + }, + "calibration_diagnostics": { + "path": "releases/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/calibration_diagnostics.json", + "repo_id": "policyengine/populace-uk-private", + "revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", + "sha256": "80b98127020aafb049846e0877a3818476aaf7adf13539d62d512fdd6727745d" + }, + "populace_uk_2023": { + "path": "populace_uk_2023.h5", + "repo_id": "policyengine/populace-uk-private", + "revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", + "sha256": "f17306ccb2aad7ff0130be3589b560afb2e2a12a943570911cd0c77f07934833" + }, + "populace_uk_2023_calibration": { + "path": "populace_uk_2023_calibration.npz", + "repo_id": "policyengine/populace-uk-private", + "revision": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", + "sha256": "fb2fc115fbae53a501b8acbc1529f319b9e07b74478c7bd02d00c674d4c10022" } }, + "default_dataset": "populace_uk_2023", + "model_package": { + "name": "policyengine-uk", + "sha256": "80965d3dd7dc767db9b083820d40262ce543020d5a8880a0cf88da10ae641b24", + "version": "2.89.2", + "wheel_url": "https://files.pythonhosted.org/packages/83/db/ce3154ba69b6fcd1e9e922ceee705ef4ddb1f81553da1e63b9296e74a4dc/policyengine_uk-2.89.2-py3-none-any.whl" + }, + "policyengine_version": "4.17.9", "region_datasets": { "national": { - "path_template": "enhanced_frs_2023_24.h5" + "path_template": "populace_uk_2023.h5" } - } + }, + "schema_version": 1 } diff --git a/src/policyengine/data/release_manifests/uk.trace.tro.jsonld b/src/policyengine/data/release_manifests/uk.trace.tro.jsonld index 03addcac..aa9bec66 100644 --- a/src/policyengine/data/release_manifests/uk.trace.tro.jsonld +++ b/src/policyengine/data/release_manifests/uk.trace.tro.jsonld @@ -17,7 +17,7 @@ "schema:name": "PolicyEngine", "schema:url": "https://policyengine.org" }, - "schema:dateCreated": "2026-05-20T20:16:50.641086Z", + "schema:dateCreated": "2026-06-19T02:38:00Z", "schema:description": "TRACE TRO for certified runtime bundle uk-4.17.9 covering the bundle manifest, the certified dataset artifact, the country model wheel, and the country data release manifest when it is available.", "schema:name": "policyengine uk certified bundle TRO", "trov:createdWith": { @@ -45,7 +45,7 @@ "trov:hasArtifact": { "@id": "composition/1/artifact/data_release_manifest" }, - "trov:hasLocation": "https://huggingface.co/policyengine/policyengine-uk-data-private/resolve/655dd07e4bb9c777b00dac044949611f1feb824f/release_manifest.json" + "trov:hasLocation": "https://huggingface.co/datasets/policyengine/populace-uk-private/resolve/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/releases/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/release_manifest.json" }, { "@id": "arrangement/1/location/dataset", @@ -53,7 +53,7 @@ "trov:hasArtifact": { "@id": "composition/1/artifact/dataset" }, - "trov:hasLocation": "https://huggingface.co/policyengine/policyengine-uk-data-private/resolve/655dd07e4bb9c777b00dac044949611f1feb824f/enhanced_frs_2023_24.h5" + "trov:hasLocation": "https://huggingface.co/datasets/policyengine/populace-uk-private/resolve/populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z/populace_uk_2023.h5" }, { "@id": "arrangement/1/location/model_wheel", @@ -61,7 +61,7 @@ "trov:hasArtifact": { "@id": "composition/1/artifact/model_wheel" }, - "trov:hasLocation": "https://files.pythonhosted.org/packages/32/f0/c0e7dbcc049501dc968da0a67de4976f305228328f96fe0ad08c65301c4f/policyengine_uk-2.88.20-py3-none-any.whl" + "trov:hasLocation": "https://files.pythonhosted.org/packages/83/db/ce3154ba69b6fcd1e9e922ceee705ef4ddb1f81553da1e63b9296e74a4dc/policyengine_uk-2.89.2-py3-none-any.whl" } ] } @@ -75,51 +75,50 @@ "@type": "trov:ResearchArtifact", "schema:name": "policyengine.py bundle manifest for uk", "trov:mimeType": "application/json", - "trov:sha256": "97e28fe544c32d9edf91b91d081a6db8d43e12569070cd45d1f4f52e5b4d816f" + "trov:sha256": "5148919573814aa1ed339372f4513e2606954a65b13e33515939c4980b3b178a" }, { "@id": "composition/1/artifact/data_release_manifest", "@type": "trov:ResearchArtifact", - "schema:name": "policyengine-uk-data release manifest 1.55.10", + "schema:name": "populace-data release manifest 0.1.0", "trov:mimeType": "application/json", - "trov:sha256": "9f41a0f14ca93d20e61d33419173c3fedc1c3ba295b6ca67dd3197a41643d179" + "trov:sha256": "687c5c19ee75aa7959d4eaedf310b10a00036dd8f9c46af4cb712c3f30f02921" }, { "@id": "composition/1/artifact/dataset", "@type": "trov:ResearchArtifact", - "schema:name": "enhanced_frs_2023_24", + "schema:name": "populace_uk_2023", "trov:mimeType": "application/x-hdf5", - "trov:sha256": "584ae33d80ca0431254610a3f8254d132da73477d31966d6446282861ecae50d" + "trov:sha256": "f17306ccb2aad7ff0130be3589b560afb2e2a12a943570911cd0c77f07934833" }, { "@id": "composition/1/artifact/model_wheel", "@type": "trov:ResearchArtifact", - "schema:name": "policyengine-uk==2.88.20 wheel", + "schema:name": "policyengine-uk==2.89.2 wheel", "trov:mimeType": "application/zip", - "trov:sha256": "8c3dacb868f3fb18296b8ef2475edaf543f57b8056d24a58bca59b108651f272" + "trov:sha256": "80965d3dd7dc767db9b083820d40262ce543020d5a8880a0cf88da10ae641b24" } ], "trov:hasFingerprint": { "@id": "composition/1/fingerprint", "@type": "trov:CompositionFingerprint", - "trov:sha256": "1c28f0f5eb7251d81e3ce17efd7a58cd69d35eca262d709d9babceea7d37dfd4" + "trov:sha256": "6b93b186c45fc55f27a7b3b936594ae20143dee67f6429e6d0bf800816d52f8c" } }, "trov:hasPerformance": { "@id": "trp/1", "@type": "trov:TransparentResearchPerformance", - "pe:builtWithModelVersion": "2.88.20", - "pe:certifiedBy": "policyengine.py bundled manifest", - "pe:certifiedForModelVersion": "2.88.20", - "pe:compatibilityBasis": "exact_build_model_version", - "pe:dataBuildFingerprint": "sha256:77f149725a36055fd89961855230401852b0712d301c6e26d6d16565c6b23809", - "pe:dataBuildId": "policyengine-uk-data-1.55.10", + "pe:builtWithModelVersion": "2.89.2", + "pe:certifiedBy": "policyengine.py certification", + "pe:certifiedForModelVersion": "2.89.2", + "pe:compatibilityBasis": "built_with_model_package", + "pe:dataBuildId": "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z", "pe:emittedIn": "local", - "rdfs:comment": "Certification of build policyengine-uk-data-1.55.10 for policyengine-uk 2.88.20.", + "rdfs:comment": "Certification of build populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z for policyengine-uk 2.89.2.", "trov:accessedArrangement": { "@id": "arrangement/1" }, - "trov:startedAtTime": "2026-05-20T20:16:50.641086Z", + "trov:startedAtTime": "2026-06-19T02:38:00Z", "trov:wasConductedBy": { "@id": "trs" } diff --git a/src/policyengine/outputs/constituency_impact.py b/src/policyengine/outputs/constituency_impact.py index adde6095..2dbd04ee 100644 --- a/src/policyengine/outputs/constituency_impact.py +++ b/src/policyengine/outputs/constituency_impact.py @@ -1,21 +1,22 @@ """UK parliamentary constituency impact output class. -Computes per-constituency income changes using pre-computed weight matrices. -Each constituency has a row in the weight matrix (shape: 650 x N_households) -that reweights all households to represent that constituency's demographics. +Computes per-constituency income changes by grouping household output rows +on longwise geography codes carried by the dataset. """ from typing import TYPE_CHECKING, Optional, Sequence -import numpy as np import pandas as pd from pydantic import ConfigDict from policyengine.core import Output +from policyengine.data.uk_geography_assets import CONSTITUENCY_ASSET_SPEC from policyengine.outputs.uk_geography_assets import ( - CONSTITUENCY_ASSET_SPEC, UKGeographyAssetStrategy, - resolve_uk_geography_asset_paths, +) +from policyengine.outputs.uk_geography_impact import ( + compute_longwise_uk_geography_impacts, + resolve_uk_geography_lookup_csv_path, ) if TYPE_CHECKING: @@ -25,81 +26,33 @@ class ConstituencyImpact(Output): """Per-parliamentary-constituency income change from a UK policy reform. - Uses pre-computed weight matrices from GCS to reweight households - for each of 650 constituencies, then computes weighted average and - relative household income changes. + Groups households by ``constituency_code_oa`` and computes weighted + average and relative household income changes. """ model_config = ConfigDict(arbitrary_types_allowed=True) baseline_simulation: "Simulation" reform_simulation: "Simulation" - weight_matrix_path: str - constituency_csv_path: str + weight_matrix_path: Optional[str] = None + constituency_csv_path: Optional[str] = None year: str = "2025" # Results populated by run() constituency_results: Optional[list[dict]] = None def run(self) -> None: - """Load weight matrix and compute per-constituency metrics.""" - # Load constituency metadata (code, name, x, y) - constituency_df = pd.read_csv(self.constituency_csv_path) - - # Load weight matrix: shape (N_constituencies, N_households) - import h5py - - with h5py.File(self.weight_matrix_path, "r") as f: - weight_matrix = f[self.year][...] - - # Get household income arrays from output datasets + """Group household output rows and compute per-constituency metrics.""" baseline_hh = self.baseline_simulation.output_dataset.data.household reform_hh = self.reform_simulation.output_dataset.data.household - baseline_income = baseline_hh["household_net_income"].values - reform_income = reform_hh["household_net_income"].values - - results: list[dict] = [] - for i in range(len(constituency_df)): - row = constituency_df.iloc[i] - code = str(row["code"]) - name = str(row["name"]) - x = int(row["x"]) - y = int(row["y"]) - w = weight_matrix[i] - - total_weight = float(np.sum(w)) - if total_weight == 0: - continue - - weighted_baseline = float(np.sum(baseline_income * w)) - weighted_reform = float(np.sum(reform_income * w)) - - # Count of weighted households - count = float(np.sum(w > 0)) - if count == 0: - continue - - avg_change = (weighted_reform - weighted_baseline) / total_weight - rel_change = ( - (weighted_reform / weighted_baseline - 1.0) - if weighted_baseline != 0 - else 0.0 - ) - - results.append( - { - "constituency_code": code, - "constituency_name": name, - "x": x, - "y": y, - "average_household_income_change": float(avg_change), - "relative_household_income_change": float(rel_change), - "population": total_weight, - } - ) - - self.constituency_results = results + self.constituency_results = compute_longwise_uk_geography_impacts( + baseline_household=pd.DataFrame(baseline_hh), + reform_household=pd.DataFrame(reform_hh), + geography_column="constituency_code_oa", + result_key_prefix="constituency", + lookup_csv_path=self.constituency_csv_path, + ) def compute_uk_constituency_impacts( @@ -116,31 +69,29 @@ def compute_uk_constituency_impacts( Args: baseline_simulation: Completed baseline simulation. reform_simulation: Completed reform simulation. - weight_matrix_path: Optional path to parliamentary_constituency_weights.h5. - If omitted, standard local paths are checked before downloading from GCS. + weight_matrix_path: Deprecated and ignored. Constituency outputs now + group by ``constituency_code_oa`` on the household output. constituency_csv_path: Optional path to constituencies_2024.csv. - If omitted, standard local paths are checked before downloading from GCS. - year: Year key in the H5 file (default "2025"). - asset_strategies: Optional resolver strategy chain. If omitted, defaults to - local lookup, then optional GCS download. - download_missing_assets: Whether to download canonical missing assets from GCS. - Set to False to require local/cache files. + If omitted, standard local paths are checked before downloading + from GCS. If still unavailable, results use geography codes as names. + year: Deprecated and ignored. + asset_strategies: Deprecated and ignored. + download_missing_assets: Whether to download the optional lookup CSV + from GCS when no local CSV is found. Returns: ConstituencyImpact with constituency_results populated. """ - paths = resolve_uk_geography_asset_paths( + lookup_csv_path = resolve_uk_geography_lookup_csv_path( CONSTITUENCY_ASSET_SPEC, - weight_matrix_path=weight_matrix_path, lookup_csv_path=constituency_csv_path, - asset_strategies=asset_strategies, download_missing_assets=download_missing_assets, ) impact = ConstituencyImpact.model_construct( baseline_simulation=baseline_simulation, reform_simulation=reform_simulation, - weight_matrix_path=paths.weight_matrix_path, - constituency_csv_path=paths.lookup_csv_path, + weight_matrix_path=weight_matrix_path, + constituency_csv_path=lookup_csv_path, year=year, ) impact.run() diff --git a/src/policyengine/outputs/local_authority_impact.py b/src/policyengine/outputs/local_authority_impact.py index a6aa244a..40d50e8f 100644 --- a/src/policyengine/outputs/local_authority_impact.py +++ b/src/policyengine/outputs/local_authority_impact.py @@ -1,21 +1,22 @@ """UK local authority impact output class. -Computes per-local-authority income changes using pre-computed weight matrices. -Each local authority has a row in the weight matrix (shape: 360 x N_households) -that reweights all households to represent that local authority's demographics. +Computes per-local-authority income changes by grouping household output rows +on longwise geography codes carried by the dataset. """ from typing import TYPE_CHECKING, Optional, Sequence -import numpy as np import pandas as pd from pydantic import ConfigDict from policyengine.core import Output +from policyengine.data.uk_geography_assets import LOCAL_AUTHORITY_ASSET_SPEC from policyengine.outputs.uk_geography_assets import ( - LOCAL_AUTHORITY_ASSET_SPEC, UKGeographyAssetStrategy, - resolve_uk_geography_asset_paths, +) +from policyengine.outputs.uk_geography_impact import ( + compute_longwise_uk_geography_impacts, + resolve_uk_geography_lookup_csv_path, ) if TYPE_CHECKING: @@ -25,8 +26,7 @@ class LocalAuthorityImpact(Output): """Per-local-authority income change from a UK policy reform. - Uses pre-computed weight matrices from GCS to reweight households - for each of 360 local authorities, then computes weighted average and + Groups households by ``la_code_oa`` and computes weighted average and relative household income changes. """ @@ -34,71 +34,25 @@ class LocalAuthorityImpact(Output): baseline_simulation: "Simulation" reform_simulation: "Simulation" - weight_matrix_path: str - local_authority_csv_path: str + weight_matrix_path: Optional[str] = None + local_authority_csv_path: Optional[str] = None year: str = "2025" # Results populated by run() local_authority_results: Optional[list[dict]] = None def run(self) -> None: - """Load weight matrix and compute per-local-authority metrics.""" - # Load local authority metadata (code, x, y, name) - la_df = pd.read_csv(self.local_authority_csv_path) - - # Load weight matrix: shape (N_local_authorities, N_households) - import h5py - - with h5py.File(self.weight_matrix_path, "r") as f: - weight_matrix = f[self.year][...] - - # Get household income arrays from output datasets + """Group household output rows and compute per-local-authority metrics.""" baseline_hh = self.baseline_simulation.output_dataset.data.household reform_hh = self.reform_simulation.output_dataset.data.household - baseline_income = baseline_hh["household_net_income"].values - reform_income = reform_hh["household_net_income"].values - - results: list[dict] = [] - for i in range(len(la_df)): - row = la_df.iloc[i] - code = str(row["code"]) - name = str(row["name"]) - x = int(row["x"]) - y = int(row["y"]) - w = weight_matrix[i] - - total_weight = float(np.sum(w)) - if total_weight == 0: - continue - - weighted_baseline = float(np.sum(baseline_income * w)) - weighted_reform = float(np.sum(reform_income * w)) - - count = float(np.sum(w > 0)) - if count == 0: - continue - - avg_change = (weighted_reform - weighted_baseline) / total_weight - rel_change = ( - (weighted_reform / weighted_baseline - 1.0) - if weighted_baseline != 0 - else 0.0 - ) - - results.append( - { - "local_authority_code": code, - "local_authority_name": name, - "x": x, - "y": y, - "average_household_income_change": float(avg_change), - "relative_household_income_change": float(rel_change), - "population": total_weight, - } - ) - - self.local_authority_results = results + self.local_authority_results = compute_longwise_uk_geography_impacts( + baseline_household=pd.DataFrame(baseline_hh), + reform_household=pd.DataFrame(reform_hh), + geography_column="la_code_oa", + result_key_prefix="local_authority", + lookup_csv_path=self.local_authority_csv_path, + ) def compute_uk_local_authority_impacts( @@ -115,31 +69,29 @@ def compute_uk_local_authority_impacts( Args: baseline_simulation: Completed baseline simulation. reform_simulation: Completed reform simulation. - weight_matrix_path: Optional path to local_authority_weights.h5. - If omitted, standard local paths are checked before downloading from GCS. + weight_matrix_path: Deprecated and ignored. Local-authority outputs + now group by ``la_code_oa`` on the household output. local_authority_csv_path: Optional path to local_authorities_2021.csv. - If omitted, standard local paths are checked before downloading from GCS. - year: Year key in the H5 file (default "2025"). - asset_strategies: Optional resolver strategy chain. If omitted, defaults to - local lookup, then optional GCS download. - download_missing_assets: Whether to download canonical missing assets from GCS. - Set to False to require local/cache files. + If omitted, standard local paths are checked before downloading + from GCS. If still unavailable, results use geography codes as names. + year: Deprecated and ignored. + asset_strategies: Deprecated and ignored. + download_missing_assets: Whether to download the optional lookup CSV + from GCS when no local CSV is found. Returns: LocalAuthorityImpact with local_authority_results populated. """ - paths = resolve_uk_geography_asset_paths( + lookup_csv_path = resolve_uk_geography_lookup_csv_path( LOCAL_AUTHORITY_ASSET_SPEC, - weight_matrix_path=weight_matrix_path, lookup_csv_path=local_authority_csv_path, - asset_strategies=asset_strategies, download_missing_assets=download_missing_assets, ) impact = LocalAuthorityImpact.model_construct( baseline_simulation=baseline_simulation, reform_simulation=reform_simulation, - weight_matrix_path=paths.weight_matrix_path, - local_authority_csv_path=paths.lookup_csv_path, + weight_matrix_path=weight_matrix_path, + local_authority_csv_path=lookup_csv_path, year=year, ) impact.run() diff --git a/src/policyengine/outputs/uk_geography_impact.py b/src/policyengine/outputs/uk_geography_impact.py new file mode 100644 index 00000000..ff2528ea --- /dev/null +++ b/src/policyengine/outputs/uk_geography_impact.py @@ -0,0 +1,174 @@ +"""Longwise UK geography impact helpers.""" + +from pathlib import Path +from typing import Optional + +import pandas as pd + +from policyengine.data.uk_geography_assets import ( + UKGeographyAssetSpec, + default_download_dir, + default_local_search_dirs, +) + + +def _normalise_code(value) -> str: + if value is None or pd.isna(value): + return "" + if isinstance(value, bytes): + value = value.decode() + return str(value).strip() + + +def resolve_uk_geography_lookup_csv_path( + spec: UKGeographyAssetSpec, + *, + lookup_csv_path: Optional[str] = None, + download_missing_assets: bool = True, +) -> Optional[str]: + """Resolve a UK geography lookup CSV without requiring a weight matrix.""" + if lookup_csv_path: + path = Path(lookup_csv_path).expanduser() + if not path.is_file(): + raise FileNotFoundError( + f"Provided UK geography lookup CSV path does not exist " + f"or is not a file: {path}" + ) + return str(path) + + for search_dir in default_local_search_dirs(): + path = search_dir / spec.lookup_csv_filename + if path.is_file(): + return str(path) + + if not download_missing_assets: + return None + + try: + from policyengine_core.tools.google_cloud import download_gcs_file + except ImportError: + return None + + try: + target_path = default_download_dir() / spec.lookup_csv_filename + target_path.parent.mkdir(parents=True, exist_ok=True) + return download_gcs_file( + bucket=spec.resolved_lookup_csv_bucket, + file_path=spec.lookup_csv_filename, + local_path=str(target_path), + ) + except Exception: + return None + + +def _load_lookup_metadata( + lookup_csv_path: Optional[str], +) -> tuple[dict[str, dict], list[str]]: + if lookup_csv_path is None: + return {}, [] + + lookup_df = pd.read_csv(lookup_csv_path) + if "code" not in lookup_df.columns: + raise ValueError( + f"UK geography lookup CSV must contain a 'code' column: {lookup_csv_path}" + ) + + metadata: dict[str, dict] = {} + order: list[str] = [] + for _, row in lookup_df.iterrows(): + code = _normalise_code(row["code"]) + if not code: + continue + order.append(code) + metadata[code] = { + "name": _normalise_code(row["name"]) + if "name" in lookup_df.columns + else code, + "x": _optional_int(row["x"]) if "x" in lookup_df.columns else None, + "y": _optional_int(row["y"]) if "y" in lookup_df.columns else None, + } + return metadata, order + + +def _optional_int(value) -> Optional[int]: + if value is None or pd.isna(value): + return None + return int(value) + + +def compute_longwise_uk_geography_impacts( + *, + baseline_household: pd.DataFrame, + reform_household: pd.DataFrame, + geography_column: str, + result_key_prefix: str, + lookup_csv_path: Optional[str] = None, +) -> list[dict]: + """Compute UK geography impacts by grouping household rows.""" + for column in [ + geography_column, + "household_net_income", + "household_weight", + ]: + if column not in baseline_household.columns: + raise ValueError( + f"UK geography impacts require baseline household column " + f"'{column}'. Re-run the simulation with an output dataset " + f"that preserves UK geography passthrough columns." + ) + if "household_net_income" not in reform_household.columns: + raise ValueError( + "UK geography impacts require reform household column " + "'household_net_income'." + ) + if len(baseline_household) != len(reform_household): + raise ValueError( + "Baseline and reform household outputs must have the same row " + "count for longwise UK geography impacts." + ) + + metadata, lookup_order = _load_lookup_metadata(lookup_csv_path) + codes = ( + pd.Series(baseline_household[geography_column]).map(_normalise_code).to_numpy() + ) + present_codes = {code for code in pd.unique(codes) if code} + ordered_codes = [code for code in lookup_order if code in present_codes] + sorted( + present_codes - set(lookup_order) + ) + + baseline_income = baseline_household["household_net_income"].to_numpy(dtype=float) + reform_income = reform_household["household_net_income"].to_numpy(dtype=float) + weights = baseline_household["household_weight"].to_numpy(dtype=float) + + results: list[dict] = [] + for code in ordered_codes: + mask = codes == code + w = weights[mask] + total_weight = float(w.sum()) + if total_weight == 0: + continue + + baseline_weighted = float((baseline_income[mask] * w).sum()) + reform_weighted = float((reform_income[mask] * w).sum()) + + avg_change = (reform_weighted - baseline_weighted) / total_weight + rel_change = ( + (reform_weighted / baseline_weighted - 1.0) + if baseline_weighted != 0 + else 0.0 + ) + row_metadata = metadata.get(code, {}) + + results.append( + { + f"{result_key_prefix}_code": code, + f"{result_key_prefix}_name": row_metadata.get("name", code), + "x": row_metadata.get("x"), + "y": row_metadata.get("y"), + "average_household_income_change": float(avg_change), + "relative_household_income_change": float(rel_change), + "population": total_weight, + } + ) + + return results diff --git a/src/policyengine/tax_benefit_models/uk/datasets.py b/src/policyengine/tax_benefit_models/uk/datasets.py index f771af06..6cada238 100644 --- a/src/policyengine/tax_benefit_models/uk/datasets.py +++ b/src/policyengine/tax_benefit_models/uk/datasets.py @@ -102,8 +102,7 @@ def __repr__(self) -> str: def create_datasets( datasets: list[str] = [ - "frs_2023_24", - "enhanced_frs_2023_24", + "populace_uk_2023", ], years: list[int] = [2026, 2027, 2028, 2029, 2030], data_folder: str = "./data", @@ -184,8 +183,7 @@ def create_datasets( def load_datasets( datasets: list[str] = [ - "frs_2023_24", - "enhanced_frs_2023_24", + "populace_uk_2023", ], years: list[int] = [2026, 2027, 2028, 2029, 2030], data_folder: str = "./data", @@ -212,8 +210,7 @@ def load_datasets( def ensure_datasets( datasets: list[str] = [ - "frs_2023_24", - "enhanced_frs_2023_24", + "populace_uk_2023", ], years: list[int] = [2026, 2027, 2028, 2029, 2030], data_folder: str = "./data", diff --git a/src/policyengine/tax_benefit_models/uk/model.py b/src/policyengine/tax_benefit_models/uk/model.py index c8a440ae..4dd544a7 100644 --- a/src/policyengine/tax_benefit_models/uk/model.py +++ b/src/policyengine/tax_benefit_models/uk/model.py @@ -20,6 +20,14 @@ from policyengine.core.simulation import Simulation UK_GROUP_ENTITIES = ["benunit", "household"] +UK_HOUSEHOLD_PASSTHROUGH_COLUMNS = [ + "oa_code", + "lsoa_code", + "msoa_code", + "constituency_code_oa", + "la_code_oa", + "region_code_oa", +] class PolicyEngineUK(TaxBenefitModel): @@ -220,6 +228,11 @@ def run(self, simulation: "Simulation") -> "Simulation": var, period=simulation.dataset.year, map_to=entity ).values + household_input_df = pd.DataFrame(dataset.data.household) + for column in UK_HOUSEHOLD_PASSTHROUGH_COLUMNS: + if column in household_input_df.columns and column not in data["household"]: + data["household"][column] = household_input_df[column].values + data["person"] = MicroDataFrame(data["person"], weights="person_weight") data["benunit"] = MicroDataFrame(data["benunit"], weights="benunit_weight") data["household"] = MicroDataFrame( diff --git a/tests/fixtures/household_calculator_snapshots/uk_couple_two_kids.json b/tests/fixtures/household_calculator_snapshots/uk_couple_two_kids.json index bede1c1b..94a4950e 100644 --- a/tests/fixtures/household_calculator_snapshots/uk_couple_two_kids.json +++ b/tests/fixtures/household_calculator_snapshots/uk_couple_two_kids.json @@ -1,7 +1,7 @@ { "benunit.benunit_id": 0.0, "benunit.benunit_weight": 1.0, - "benunit.child_benefit": 2328.16, + "benunit.child_benefit": 2337.4, "benunit.child_tax_credit": 0.0, "benunit.family_type": "COUPLE_WITH_CHILDREN", "benunit.income_support": 0.0, @@ -10,17 +10,17 @@ "benunit.universal_credit": 0.0, "benunit.working_tax_credit": 0.0, "household.council_tax": 0.0, - "household.equiv_hbai_household_net_income": 52503.68, + "household.equiv_hbai_household_net_income": 52510.29, "household.fuel_duty": 0.0, - "household.hbai_household_net_income": 73505.15, - "household.household_benefits": 5880.35, + "household.hbai_household_net_income": 73514.4, + "household.household_benefits": 5889.59, "household.household_count_people": 4.0, - "household.household_gross_income": 95880.34, + "household.household_gross_income": 95889.59, "household.household_id": 0.0, "household.household_income_decile": 10.0, "household.household_market_income": 90000.0, - "household.household_net_income": 76898.3, - "household.household_tax": 18982.05, + "household.household_net_income": 76886.55, + "household.household_tax": 19003.05, "household.household_wealth_decile": 10.0, "household.household_weight": 1.0, "household.in_poverty_ahc": 0.0, @@ -32,7 +32,7 @@ "household.vat": 0.0, "person[0].age": 42.0, "person[0].benunit_id": 0.0, - "person[0].child_benefit": 2328.16, + "person[0].child_benefit": 2337.4, "person[0].child_tax_credit": 0.0, "person[0].dividend_income": 0.0, "person[0].earned_income": 55000.0, @@ -61,7 +61,7 @@ "person[0].working_tax_credit": 0.0, "person[1].age": 40.0, "person[1].benunit_id": 0.0, - "person[1].child_benefit": 2328.16, + "person[1].child_benefit": 2337.4, "person[1].child_tax_credit": 0.0, "person[1].dividend_income": 0.0, "person[1].earned_income": 35000.0, @@ -90,7 +90,7 @@ "person[1].working_tax_credit": 0.0, "person[2].age": 8.0, "person[2].benunit_id": 0.0, - "person[2].child_benefit": 2328.16, + "person[2].child_benefit": 2337.4, "person[2].child_tax_credit": 0.0, "person[2].dividend_income": 0.0, "person[2].earned_income": 0.0, @@ -119,7 +119,7 @@ "person[2].working_tax_credit": 0.0, "person[3].age": 3.0, "person[3].benunit_id": 0.0, - "person[3].child_benefit": 2328.16, + "person[3].child_benefit": 2337.4, "person[3].child_tax_credit": 0.0, "person[3].dividend_income": 0.0, "person[3].earned_income": 0.0, diff --git a/tests/fixtures/household_calculator_snapshots/uk_model_surface.json b/tests/fixtures/household_calculator_snapshots/uk_model_surface.json index 161ef0ec..db163a3a 100644 --- a/tests/fixtures/household_calculator_snapshots/uk_model_surface.json +++ b/tests/fixtures/household_calculator_snapshots/uk_model_surface.json @@ -1,11 +1,11 @@ { "country_id": "uk", - "data_package_name": "policyengine-uk-data", + "data_package_name": "populace-data", "has_employment_income": true, "has_income_tax": true, "has_region_registry": true, "model_package_name": "policyengine-uk", - "num_parameters_bucketed_100s": 20, + "num_parameters_bucketed_100s": 22, "num_variables_bucketed_100s": 8, "region_registry_country": "uk" } diff --git a/tests/fixtures/household_calculator_snapshots/uk_single_adult_employment_income.json b/tests/fixtures/household_calculator_snapshots/uk_single_adult_employment_income.json index 27fa1f63..1d37b7aa 100644 --- a/tests/fixtures/household_calculator_snapshots/uk_single_adult_employment_income.json +++ b/tests/fixtures/household_calculator_snapshots/uk_single_adult_employment_income.json @@ -19,8 +19,8 @@ "household.household_id": 0.0, "household.household_income_decile": 10.0, "household.household_market_income": 30000.0, - "household.household_net_income": 24960.55, - "household.household_tax": 5039.45, + "household.household_net_income": 24939.55, + "household.household_tax": 5060.45, "household.household_wealth_decile": 10.0, "household.household_weight": 1.0, "household.in_poverty_ahc": 0.0, diff --git a/tests/fixtures/household_calculator_snapshots/uk_single_adult_no_income.json b/tests/fixtures/household_calculator_snapshots/uk_single_adult_no_income.json index 2e7a2db9..391af449 100644 --- a/tests/fixtures/household_calculator_snapshots/uk_single_adult_no_income.json +++ b/tests/fixtures/household_calculator_snapshots/uk_single_adult_no_income.json @@ -7,20 +7,20 @@ "benunit.income_support": 0.0, "benunit.pension_credit": 0.0, "benunit.tax_credits": 0.0, - "benunit.universal_credit": 5079.13, + "benunit.universal_credit": 5098.8, "benunit.working_tax_credit": 0.0, "household.council_tax": 0.0, - "household.equiv_hbai_household_net_income": 7580.79, + "household.equiv_hbai_household_net_income": 7610.15, "household.fuel_duty": 0.0, - "household.hbai_household_net_income": 5079.13, - "household.household_benefits": 5079.13, + "household.hbai_household_net_income": 5098.8, + "household.household_benefits": 5098.8, "household.household_count_people": 1.0, - "household.household_gross_income": 5079.13, + "household.household_gross_income": 5098.8, "household.household_id": 0.0, "household.household_income_decile": 10.0, "household.household_market_income": 0.0, - "household.household_net_income": 4920.09, - "household.household_tax": 159.04, + "household.household_net_income": 4918.75, + "household.household_tax": 180.04, "household.household_wealth_decile": 10.0, "household.household_weight": 1.0, "household.in_poverty_ahc": 1.0, @@ -57,6 +57,6 @@ "person[0].self_employment_income": 0.0, "person[0].state_pension": 0.0, "person[0].total_income": 0.0, - "person[0].universal_credit": 5079.13, + "person[0].universal_credit": 5098.8, "person[0].working_tax_credit": 0.0 } diff --git a/tests/fixtures/household_calculator_snapshots/uk_single_parent_one_child.json b/tests/fixtures/household_calculator_snapshots/uk_single_parent_one_child.json index 4b5589c0..acec3cc8 100644 --- a/tests/fixtures/household_calculator_snapshots/uk_single_parent_one_child.json +++ b/tests/fixtures/household_calculator_snapshots/uk_single_parent_one_child.json @@ -1,26 +1,26 @@ { "benunit.benunit_id": 0.0, "benunit.benunit_weight": 1.0, - "benunit.child_benefit": 1400.66, + "benunit.child_benefit": 1406.6, "benunit.child_tax_credit": 0.0, "benunit.family_type": "LONE_PARENT", "benunit.income_support": 0.0, "benunit.pension_credit": 0.0, "benunit.tax_credits": 0.0, - "benunit.universal_credit": 1544.43, + "benunit.universal_credit": 1596.3, "benunit.working_tax_credit": 0.0, "household.council_tax": 0.0, - "household.equiv_hbai_household_net_income": 28120.33, + "household.equiv_hbai_household_net_income": 28186.78, "household.fuel_duty": 0.0, - "household.hbai_household_net_income": 24464.69, - "household.household_benefits": 2945.09, + "household.hbai_household_net_income": 24522.5, + "household.household_benefits": 3002.9, "household.household_count_people": 2.0, - "household.household_gross_income": 27945.09, + "household.household_gross_income": 28002.9, "household.household_id": 0.0, "household.household_income_decile": 10.0, "household.household_market_income": 25000.0, - "household.household_net_income": 24305.64, - "household.household_tax": 3639.45, + "household.household_net_income": 24342.45, + "household.household_tax": 3660.45, "household.household_wealth_decile": 10.0, "household.household_weight": 1.0, "household.in_poverty_ahc": 0.0, @@ -32,7 +32,7 @@ "household.vat": 0.0, "person[0].age": 32.0, "person[0].benunit_id": 0.0, - "person[0].child_benefit": 1400.66, + "person[0].child_benefit": 1406.6, "person[0].child_tax_credit": 0.0, "person[0].dividend_income": 0.0, "person[0].earned_income": 25000.0, @@ -57,11 +57,11 @@ "person[0].self_employment_income": 0.0, "person[0].state_pension": 0.0, "person[0].total_income": 25000.0, - "person[0].universal_credit": 1544.43, + "person[0].universal_credit": 1596.3, "person[0].working_tax_credit": 0.0, "person[1].age": 5.0, "person[1].benunit_id": 0.0, - "person[1].child_benefit": 1400.66, + "person[1].child_benefit": 1406.6, "person[1].child_tax_credit": 0.0, "person[1].dividend_income": 0.0, "person[1].earned_income": 0.0, @@ -86,6 +86,6 @@ "person[1].self_employment_income": 0.0, "person[1].state_pension": 0.0, "person[1].total_income": 0.0, - "person[1].universal_credit": 1544.43, + "person[1].universal_credit": 1596.3, "person[1].working_tax_credit": 0.0 } diff --git a/tests/test_constituency_impact.py b/tests/test_constituency_impact.py index 7450d130..4b7e3d1d 100644 --- a/tests/test_constituency_impact.py +++ b/tests/test_constituency_impact.py @@ -1,11 +1,7 @@ """Unit tests for ConstituencyImpact output class.""" -import os -import tempfile from unittest.mock import MagicMock, patch -import h5py -import numpy as np import pandas as pd import pytest from microdf import MicroDataFrame @@ -13,9 +9,7 @@ from policyengine.outputs.constituency_impact import ( compute_uk_constituency_impacts, ) -from policyengine.outputs.uk_geography_assets import ( - CONSTITUENCY_ASSET_SPEC, -) +from policyengine.outputs.uk_geography_assets import CONSTITUENCY_ASSET_SPEC def _make_sim(household_data: dict) -> MagicMock: @@ -30,49 +24,42 @@ def _make_sim(household_data: dict) -> MagicMock: return sim -def _make_weight_matrix_and_csv( - tmpdir, n_constituencies, n_households, weights, csv_rows -): - """Create a temp H5 weight matrix and CSV metadata file.""" - h5_path = os.path.join(tmpdir, "weights.h5") - with h5py.File(h5_path, "w") as f: - f.create_dataset("2025", data=np.array(weights, dtype=np.float64)) - - csv_path = os.path.join(tmpdir, "constituencies.csv") - pd.DataFrame(csv_rows).to_csv(csv_path, index=False) - - return h5_path, csv_path +def _write_lookup_csv(tmp_path, rows) -> str: + csv_path = tmp_path / "constituencies.csv" + pd.DataFrame(rows).to_csv(csv_path, index=False) + return str(csv_path) -def test_basic_constituency_reweighting(): - """Two constituencies with known weight matrices produce correct metrics.""" - n_hh = 3 +def test_basic_constituency_longwise_grouping(tmp_path): + """Two constituencies with household geography codes produce metrics.""" baseline = _make_sim( { - "household_net_income": [50000.0, 60000.0, 40000.0], - "household_weight": [1.0, 1.0, 1.0], + "constituency_code_oa": ["C001", "C001", b"C002", ""], + "household_net_income": [50000.0, 60000.0, 40000.0, 30000.0], + "household_weight": [2.0, 1.0, 3.0, 1.0], } ) reform = _make_sim( { - "household_net_income": [52000.0, 62000.0, 42000.0], - "household_weight": [1.0, 1.0, 1.0], + "constituency_code_oa": ["C001", "C001", b"C002", ""], + "household_net_income": [52000.0, 62000.0, 43000.0, 33000.0], + "household_weight": [2.0, 1.0, 3.0, 1.0], } ) + csv_path = _write_lookup_csv( + tmp_path, + [ + {"code": "C001", "name": "Constituency A", "x": 10, "y": 20}, + {"code": "C002", "name": "Constituency B", "x": 30, "y": 40}, + ], + ) - # Constituency 0 weights: [2, 0, 1] → weighted baseline = 2*50k + 0 + 1*40k = 140k - # Constituency 1 weights: [0, 3, 0] → weighted baseline = 0 + 3*60k + 0 = 180k - weight_matrix = [[2.0, 0.0, 1.0], [0.0, 3.0, 0.0]] - csv_rows = [ - {"code": "C001", "name": "Constituency A", "x": 10, "y": 20}, - {"code": "C002", "name": "Constituency B", "x": 30, "y": 40}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path, csv_path = _make_weight_matrix_and_csv( - tmpdir, 2, n_hh, weight_matrix, csv_rows - ) - impact = compute_uk_constituency_impacts(baseline, reform, h5_path, csv_path) + impact = compute_uk_constituency_impacts( + baseline, + reform, + constituency_csv_path=csv_path, + download_missing_assets=False, + ) assert impact.constituency_results is not None assert len(impact.constituency_results) == 2 @@ -80,7 +67,6 @@ def test_basic_constituency_reweighting(): by_code = {r["constituency_code"]: r for r in impact.constituency_results} c1 = by_code["C001"] - # Weighted change: (2*2000 + 0 + 1*2000) / 3 = 2000 assert abs(c1["average_household_income_change"] - 2000.0) < 1e-6 assert c1["constituency_name"] == "Constituency A" assert c1["x"] == 10 @@ -88,148 +74,177 @@ def test_basic_constituency_reweighting(): assert c1["population"] == 3.0 c2 = by_code["C002"] - # Weighted change: (0 + 3*2000 + 0) / 3 = 2000 - assert abs(c2["average_household_income_change"] - 2000.0) < 1e-6 + assert abs(c2["average_household_income_change"] - 3000.0) < 1e-6 + assert c2["constituency_name"] == "Constituency B" + assert c2["population"] == 3.0 -def test_zero_weight_constituency_skipped(): - """A constituency with all-zero weights produces no result.""" +def test_zero_weight_constituency_skipped(tmp_path): + """A constituency with all-zero household weights produces no result.""" baseline = _make_sim( { + "constituency_code_oa": ["C001", "C002"], "household_net_income": [50000.0, 60000.0], - "household_weight": [1.0, 1.0], + "household_weight": [1.0, 0.0], } ) reform = _make_sim( { + "constituency_code_oa": ["C001", "C002"], "household_net_income": [55000.0, 65000.0], - "household_weight": [1.0, 1.0], + "household_weight": [1.0, 0.0], } ) + csv_path = _write_lookup_csv( + tmp_path, + [ + {"code": "C001", "name": "A", "x": 0, "y": 0}, + {"code": "C002", "name": "B", "x": 0, "y": 0}, + ], + ) - weight_matrix = [[1.0, 1.0], [0.0, 0.0]] - csv_rows = [ - {"code": "C001", "name": "A", "x": 0, "y": 0}, - {"code": "C002", "name": "B", "x": 0, "y": 0}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path, csv_path = _make_weight_matrix_and_csv( - tmpdir, 2, 2, weight_matrix, csv_rows - ) - impact = compute_uk_constituency_impacts(baseline, reform, h5_path, csv_path) + impact = compute_uk_constituency_impacts( + baseline, + reform, + constituency_csv_path=csv_path, + download_missing_assets=False, + ) assert len(impact.constituency_results) == 1 assert impact.constituency_results[0]["constituency_code"] == "C001" -def test_relative_change(): +def test_relative_change(tmp_path): """Relative household income change is computed correctly.""" baseline = _make_sim( { + "constituency_code_oa": ["C001"], "household_net_income": [100000.0], "household_weight": [1.0], } ) reform = _make_sim( { + "constituency_code_oa": ["C001"], "household_net_income": [110000.0], "household_weight": [1.0], } ) + csv_path = _write_lookup_csv( + tmp_path, + [{"code": "C001", "name": "A", "x": 0, "y": 0}], + ) - weight_matrix = [[1.0]] - csv_rows = [{"code": "C001", "name": "A", "x": 0, "y": 0}] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path, csv_path = _make_weight_matrix_and_csv( - tmpdir, 1, 1, weight_matrix, csv_rows - ) - impact = compute_uk_constituency_impacts(baseline, reform, h5_path, csv_path) + impact = compute_uk_constituency_impacts( + baseline, + reform, + constituency_csv_path=csv_path, + download_missing_assets=False, + ) - # 10% increase assert ( abs(impact.constituency_results[0]["relative_household_income_change"] - 0.1) < 1e-6 ) -def test_compute_resolves_standard_constituency_assets_from_default_local_dir( - monkeypatch, +def test_compute_uses_local_lookup_csv_without_matrix_or_gcs( + tmp_path, ): - """The helper can run without explicit asset paths when standard files exist.""" + """The helper can enrich labels from local CSV without matrix assets.""" baseline = _make_sim( { + "constituency_code_oa": ["C001", "C002"], "household_net_income": [100.0, 200.0], "household_weight": [1.0, 1.0], } ) reform = _make_sim( { + "constituency_code_oa": ["C001", "C002"], "household_net_income": [110.0, 220.0], "household_weight": [1.0, 1.0], } ) + csv_path = tmp_path / CONSTITUENCY_ASSET_SPEC.lookup_csv_filename + pd.DataFrame( + [ + {"code": "C001", "name": "A", "x": 0, "y": 0}, + {"code": "C002", "name": "B", "x": 1, "y": 1}, + ] + ).to_csv(csv_path, index=False) + + with patch( + "policyengine.outputs.uk_geography_impact.default_local_search_dirs", + return_value=[tmp_path], + ): + impact = compute_uk_constituency_impacts( + baseline, + reform, + download_missing_assets=True, + ) - weight_matrix = [[1.0, 0.0], [0.0, 1.0]] - csv_rows = [ - {"code": "C001", "name": "A", "x": 0, "y": 0}, - {"code": "C002", "name": "B", "x": 1, "y": 1}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path = os.path.join(tmpdir, CONSTITUENCY_ASSET_SPEC.weight_matrix_filename) - with h5py.File(h5_path, "w") as f: - f.create_dataset("2025", data=np.array(weight_matrix, dtype=np.float64)) - - csv_path = os.path.join(tmpdir, CONSTITUENCY_ASSET_SPEC.lookup_csv_filename) - pd.DataFrame(csv_rows).to_csv(csv_path, index=False) - - monkeypatch.setenv("POLICYENGINE_UK_GEOGRAPHY_DATA_DIR", tmpdir) - with patch( - "policyengine_core.tools.google_cloud.download_gcs_file" - ) as download: - impact = compute_uk_constituency_impacts( - baseline, - reform, - ) - - download.assert_not_called() - - assert impact.weight_matrix_path == h5_path - assert impact.constituency_csv_path == csv_path + assert impact.constituency_csv_path == str(csv_path) assert len(impact.constituency_results) == 2 + assert impact.constituency_results[0]["constituency_name"] == "A" -def test_compute_constituency_impacts_local_only_does_not_call_gcs(tmp_path): +def test_compute_constituency_impacts_does_not_require_lookup_csv_or_matrix( + tmp_path, +): baseline = _make_sim( { + "constituency_code_oa": ["C001"], "household_net_income": [100.0], "household_weight": [1.0], } ) reform = _make_sim( { + "constituency_code_oa": ["C001"], "household_net_income": [110.0], "household_weight": [1.0], } ) - with ( - patch( - "policyengine.data.uk_geography_assets.default_local_search_dirs", - return_value=[tmp_path / "missing"], - ), - patch("policyengine_core.tools.google_cloud.download_gcs_file") as download, + legacy_matrix_path = str(tmp_path / "legacy-unused.h5") + with patch( + "policyengine.outputs.uk_geography_impact.default_local_search_dirs", + return_value=[tmp_path / "missing"], ): - with pytest.raises(FileNotFoundError) as exc_info: - compute_uk_constituency_impacts( - baseline, - reform, - download_missing_assets=False, - ) + impact = compute_uk_constituency_impacts( + baseline, + reform, + weight_matrix_path=legacy_matrix_path, + download_missing_assets=False, + ) - download.assert_not_called() - assert "GCS fallback disabled by download_missing_assets=False" in str( - exc_info.value + assert impact.weight_matrix_path == legacy_matrix_path + assert len(impact.constituency_results) == 1 + result = impact.constituency_results[0] + assert result["constituency_code"] == "C001" + assert result["constituency_name"] == "C001" + assert result["x"] is None + assert result["y"] is None + + +def test_compute_constituency_impacts_requires_longwise_geography_column(): + baseline = _make_sim( + { + "household_net_income": [100.0], + "household_weight": [1.0], + } + ) + reform = _make_sim( + { + "household_net_income": [110.0], + "household_weight": [1.0], + } ) + + with pytest.raises(ValueError, match="constituency_code_oa"): + compute_uk_constituency_impacts( + baseline, + reform, + download_missing_assets=False, + ) diff --git a/tests/test_extra_variables.py b/tests/test_extra_variables.py index b57ce6fd..1631566d 100644 --- a/tests/test_extra_variables.py +++ b/tests/test_extra_variables.py @@ -148,6 +148,12 @@ def _uk_fixture_dataset(tmp_path): "household_id": [1], "household_weight": [1_000.0], "region": ["LONDON"], + "oa_code": ["E00000001"], + "lsoa_code": ["E01000001"], + "msoa_code": ["E02000001"], + "constituency_code_oa": ["E14001234"], + "la_code_oa": ["E09000001"], + "region_code_oa": ["E12000007"], "tenure_type": ["RENT_PRIVATELY"], "rent": [12_000.0], "council_tax": [1_500.0], @@ -191,6 +197,13 @@ def test__uk_extra_variables_appear_on_output_dataset(tmp_path) -> None: "likely a regression in uk/model.py:run() bypassing " "resolve_entity_variables" ) + household = sim.output_dataset.data.household + assert household["oa_code"].tolist() == ["E00000001"] + assert household["lsoa_code"].tolist() == ["E01000001"] + assert household["msoa_code"].tolist() == ["E02000001"] + assert household["constituency_code_oa"].tolist() == ["E14001234"] + assert household["la_code_oa"].tolist() == ["E09000001"] + assert household["region_code_oa"].tolist() == ["E12000007"] def test__uk_resolve_entity_variables_raises_on_unknown_variable() -> None: diff --git a/tests/test_local_authority_impact.py b/tests/test_local_authority_impact.py index f59c2bb2..8e54a50a 100644 --- a/tests/test_local_authority_impact.py +++ b/tests/test_local_authority_impact.py @@ -1,11 +1,7 @@ """Unit tests for LocalAuthorityImpact output class.""" -import os -import tempfile from unittest.mock import MagicMock, patch -import h5py -import numpy as np import pandas as pd import pytest from microdf import MicroDataFrame @@ -13,9 +9,7 @@ from policyengine.outputs.local_authority_impact import ( compute_uk_local_authority_impacts, ) -from policyengine.outputs.uk_geography_assets import ( - LOCAL_AUTHORITY_ASSET_SPEC, -) +from policyengine.outputs.uk_geography_assets import LOCAL_AUTHORITY_ASSET_SPEC def _make_sim(household_data: dict) -> MagicMock: @@ -30,42 +24,42 @@ def _make_sim(household_data: dict) -> MagicMock: return sim -def _make_weight_matrix_and_csv(tmpdir, weights, csv_rows): - """Create a temp H5 weight matrix and CSV metadata file.""" - h5_path = os.path.join(tmpdir, "la_weights.h5") - with h5py.File(h5_path, "w") as f: - f.create_dataset("2025", data=np.array(weights, dtype=np.float64)) - - csv_path = os.path.join(tmpdir, "local_authorities.csv") - pd.DataFrame(csv_rows).to_csv(csv_path, index=False) +def _write_lookup_csv(tmp_path, rows) -> str: + csv_path = tmp_path / "local_authorities.csv" + pd.DataFrame(rows).to_csv(csv_path, index=False) + return str(csv_path) - return h5_path, csv_path - -def test_basic_local_authority_reweighting(): - """Two LAs with known weight matrices produce correct metrics.""" +def test_basic_local_authority_longwise_grouping(tmp_path): + """Two local authorities with household geography codes produce metrics.""" baseline = _make_sim( { - "household_net_income": [50000.0, 60000.0, 40000.0], - "household_weight": [1.0, 1.0, 1.0], + "la_code_oa": ["LA001", "LA001", b"LA002", ""], + "household_net_income": [50000.0, 60000.0, 40000.0, 30000.0], + "household_weight": [2.0, 1.0, 3.0, 1.0], } ) reform = _make_sim( { - "household_net_income": [53000.0, 63000.0, 43000.0], - "household_weight": [1.0, 1.0, 1.0], + "la_code_oa": ["LA001", "LA001", b"LA002", ""], + "household_net_income": [53000.0, 63000.0, 43000.0, 33000.0], + "household_weight": [2.0, 1.0, 3.0, 1.0], } ) + csv_path = _write_lookup_csv( + tmp_path, + [ + {"code": "LA001", "name": "Authority A", "x": 5, "y": 15}, + {"code": "LA002", "name": "Authority B", "x": 25, "y": 35}, + ], + ) - weight_matrix = [[1.0, 1.0, 0.0], [0.0, 1.0, 2.0]] - csv_rows = [ - {"code": "LA001", "name": "Authority A", "x": 5, "y": 15}, - {"code": "LA002", "name": "Authority B", "x": 25, "y": 35}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path, csv_path = _make_weight_matrix_and_csv(tmpdir, weight_matrix, csv_rows) - impact = compute_uk_local_authority_impacts(baseline, reform, h5_path, csv_path) + impact = compute_uk_local_authority_impacts( + baseline, + reform, + local_authority_csv_path=csv_path, + download_missing_assets=False, + ) assert impact.local_authority_results is not None assert len(impact.local_authority_results) == 2 @@ -73,124 +67,186 @@ def test_basic_local_authority_reweighting(): by_code = {r["local_authority_code"]: r for r in impact.local_authority_results} la1 = by_code["LA001"] - # Weighted change: (1*3000 + 1*3000) / 2 = 3000 assert abs(la1["average_household_income_change"] - 3000.0) < 1e-6 assert la1["local_authority_name"] == "Authority A" - assert la1["population"] == 2.0 + assert la1["x"] == 5 + assert la1["y"] == 15 + assert la1["population"] == 3.0 la2 = by_code["LA002"] - # Weighted change: (0 + 1*3000 + 2*3000) / 3 = 3000 assert abs(la2["average_household_income_change"] - 3000.0) < 1e-6 + assert la2["local_authority_name"] == "Authority B" + assert la2["population"] == 3.0 -def test_zero_weight_la_skipped(): - """A local authority with all-zero weights produces no result.""" +def test_zero_weight_la_skipped(tmp_path): + """A local authority with all-zero household weights produces no result.""" baseline = _make_sim( { - "household_net_income": [50000.0], - "household_weight": [1.0], + "la_code_oa": ["LA001", "LA002"], + "household_net_income": [50000.0, 60000.0], + "household_weight": [1.0, 0.0], } ) reform = _make_sim( { - "household_net_income": [55000.0], - "household_weight": [1.0], + "la_code_oa": ["LA001", "LA002"], + "household_net_income": [55000.0, 65000.0], + "household_weight": [1.0, 0.0], } ) + csv_path = _write_lookup_csv( + tmp_path, + [ + {"code": "LA001", "name": "A", "x": 0, "y": 0}, + {"code": "LA002", "name": "B", "x": 0, "y": 0}, + ], + ) - weight_matrix = [[1.0], [0.0]] - csv_rows = [ - {"code": "LA001", "name": "A", "x": 0, "y": 0}, - {"code": "LA002", "name": "B", "x": 0, "y": 0}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path, csv_path = _make_weight_matrix_and_csv(tmpdir, weight_matrix, csv_rows) - impact = compute_uk_local_authority_impacts(baseline, reform, h5_path, csv_path) + impact = compute_uk_local_authority_impacts( + baseline, + reform, + local_authority_csv_path=csv_path, + download_missing_assets=False, + ) assert len(impact.local_authority_results) == 1 assert impact.local_authority_results[0]["local_authority_code"] == "LA001" -def test_compute_resolves_standard_local_authority_assets_from_default_local_dir( - monkeypatch, +def test_relative_change(tmp_path): + """Relative household income change is computed correctly.""" + baseline = _make_sim( + { + "la_code_oa": ["LA001"], + "household_net_income": [100000.0], + "household_weight": [1.0], + } + ) + reform = _make_sim( + { + "la_code_oa": ["LA001"], + "household_net_income": [115000.0], + "household_weight": [1.0], + } + ) + csv_path = _write_lookup_csv( + tmp_path, + [{"code": "LA001", "name": "A", "x": 0, "y": 0}], + ) + + impact = compute_uk_local_authority_impacts( + baseline, + reform, + local_authority_csv_path=csv_path, + download_missing_assets=False, + ) + + assert ( + abs( + impact.local_authority_results[0]["relative_household_income_change"] - 0.15 + ) + < 1e-6 + ) + + +def test_compute_uses_local_lookup_csv_without_matrix_or_gcs( + tmp_path, ): - """The helper can run without explicit asset paths when standard files exist.""" + """The helper can enrich labels from local CSV without matrix assets.""" baseline = _make_sim( { + "la_code_oa": ["LA001", "LA002"], "household_net_income": [100.0, 200.0], "household_weight": [1.0, 1.0], } ) reform = _make_sim( { + "la_code_oa": ["LA001", "LA002"], "household_net_income": [115.0, 230.0], "household_weight": [1.0, 1.0], } ) - - weight_matrix = [[1.0, 0.0], [0.0, 1.0]] - csv_rows = [ - {"code": "LA001", "name": "A", "x": 0, "y": 0}, - {"code": "LA002", "name": "B", "x": 1, "y": 1}, - ] - - with tempfile.TemporaryDirectory() as tmpdir: - h5_path = os.path.join( - tmpdir, - LOCAL_AUTHORITY_ASSET_SPEC.weight_matrix_filename, + csv_path = tmp_path / LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename + pd.DataFrame( + [ + {"code": "LA001", "name": "A", "x": 0, "y": 0}, + {"code": "LA002", "name": "B", "x": 1, "y": 1}, + ] + ).to_csv(csv_path, index=False) + + with patch( + "policyengine.outputs.uk_geography_impact.default_local_search_dirs", + return_value=[tmp_path], + ): + impact = compute_uk_local_authority_impacts( + baseline, + reform, + download_missing_assets=True, ) - with h5py.File(h5_path, "w") as f: - f.create_dataset("2025", data=np.array(weight_matrix, dtype=np.float64)) - - csv_path = os.path.join(tmpdir, LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename) - pd.DataFrame(csv_rows).to_csv(csv_path, index=False) - - monkeypatch.setenv("POLICYENGINE_UK_GEOGRAPHY_DATA_DIR", tmpdir) - with patch( - "policyengine_core.tools.google_cloud.download_gcs_file" - ) as download: - impact = compute_uk_local_authority_impacts( - baseline, - reform, - ) - download.assert_not_called() - - assert impact.weight_matrix_path == h5_path - assert impact.local_authority_csv_path == csv_path + assert impact.local_authority_csv_path == str(csv_path) assert len(impact.local_authority_results) == 2 + assert impact.local_authority_results[0]["local_authority_name"] == "A" -def test_compute_local_authority_impacts_local_only_does_not_call_gcs(tmp_path): +def test_compute_local_authority_impacts_does_not_require_lookup_csv_or_matrix( + tmp_path, +): baseline = _make_sim( { + "la_code_oa": ["LA001"], "household_net_income": [100.0], "household_weight": [1.0], } ) reform = _make_sim( { + "la_code_oa": ["LA001"], "household_net_income": [115.0], "household_weight": [1.0], } ) - with ( - patch( - "policyengine.data.uk_geography_assets.default_local_search_dirs", - return_value=[tmp_path / "missing"], - ), - patch("policyengine_core.tools.google_cloud.download_gcs_file") as download, + legacy_matrix_path = str(tmp_path / "legacy-unused.h5") + with patch( + "policyengine.outputs.uk_geography_impact.default_local_search_dirs", + return_value=[tmp_path / "missing"], ): - with pytest.raises(FileNotFoundError) as exc_info: - compute_uk_local_authority_impacts( - baseline, - reform, - download_missing_assets=False, - ) + impact = compute_uk_local_authority_impacts( + baseline, + reform, + weight_matrix_path=legacy_matrix_path, + download_missing_assets=False, + ) + + assert impact.weight_matrix_path == legacy_matrix_path + assert len(impact.local_authority_results) == 1 + result = impact.local_authority_results[0] + assert result["local_authority_code"] == "LA001" + assert result["local_authority_name"] == "LA001" + assert result["x"] is None + assert result["y"] is None + - download.assert_not_called() - assert "GCS fallback disabled by download_missing_assets=False" in str( - exc_info.value +def test_compute_local_authority_impacts_requires_longwise_geography_column(): + baseline = _make_sim( + { + "household_net_income": [100.0], + "household_weight": [1.0], + } ) + reform = _make_sim( + { + "household_net_income": [115.0], + "household_weight": [1.0], + } + ) + + with pytest.raises(ValueError, match="la_code_oa"): + compute_uk_local_authority_impacts( + baseline, + reform, + download_missing_assets=False, + ) diff --git a/tests/test_models.py b/tests/test_models.py index 68b84787..f6a3c10d 100644 --- a/tests/test_models.py +++ b/tests/test_models.py @@ -29,12 +29,13 @@ def test_has_release_manifest_metadata(self): assert uk_latest.release_manifest is not None assert uk_latest.release_manifest.country_id == "uk" assert uk_latest.model_package.name == "policyengine-uk" - assert uk_latest.model_package.version == "2.88.20" - assert uk_latest.data_package.name == "policyengine-uk-data" - assert uk_latest.data_package.version == "1.55.10" + assert uk_latest.model_package.version == "2.89.2" + assert uk_latest.data_package.name == "populace-data" + assert uk_latest.data_package.version == "0.1.0" assert ( uk_latest.default_dataset_uri - == "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f" + == "hf://policyengine/populace-uk-private/populace_uk_2023.h5" + "@populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z" ) def test_has_hundreds_of_parameters(self): diff --git a/tests/test_release_manifests.py b/tests/test_release_manifests.py index 2a0742c8..af9d9571 100644 --- a/tests/test_release_manifests.py +++ b/tests/test_release_manifests.py @@ -58,6 +58,26 @@ US_RELEASE_MANIFEST_DATASET_URI = ( f"hf://policyengine/populace-us/populace_us_2024.h5@{US_DATA_RELEASE_REVISION}" ) +UK_MODEL_VERSION = "2.89.2" +UK_BUILT_WITH_MODEL_VERSION = "2.89.2" +UK_DATA_RELEASE_VERSION = "0.1.0" +UK_DATA_RELEASE_ID = "populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z" +UK_DATA_RELEASE_REVISION = UK_DATA_RELEASE_ID +UK_DATA_RELEASE_PATH = f"releases/{UK_DATA_RELEASE_ID}/release_manifest.json" +UK_CERTIFICATION_SOURCE = "policyengine.py certification" +UK_CERTIFIED_DATASET_URI = ( + f"hf://policyengine/populace-uk-private/populace_uk_2023.h5" + f"@{UK_DATA_RELEASE_REVISION}" +) +UK_LEGACY_DATA_RELEASE_REVISION = "655dd07e4bb9c777b00dac044949611f1feb824f" +UK_LEGACY_FRS_DATASET_URI = ( + "hf://policyengine/policyengine-uk-data-private/frs_2023_24.h5" + f"@{UK_LEGACY_DATA_RELEASE_REVISION}" +) +UK_LEGACY_ENHANCED_FRS_DATASET_URI = ( + "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5" + f"@{UK_LEGACY_DATA_RELEASE_REVISION}" +) def _response_with_json(payload: dict) -> MagicMock: @@ -124,25 +144,27 @@ def test__given_uk_manifest__then_has_pinned_model_and_data_packages(self): assert manifest.country_id == "uk" assert manifest.policyengine_version == POLICYENGINE_VERSION assert manifest.model_package.name == "policyengine-uk" - assert manifest.model_package.version == "2.88.20" - assert manifest.data_package.name == "policyengine-uk-data" - assert manifest.data_package.version == "1.55.10" + assert manifest.model_package.version == UK_MODEL_VERSION + assert manifest.data_package.name == "populace-data" + assert manifest.data_package.version == UK_DATA_RELEASE_VERSION + assert manifest.data_package.repo_id == "policyengine/populace-uk-private" + assert manifest.data_package.release_manifest_path == UK_DATA_RELEASE_PATH assert ( - manifest.data_package.repo_id == "policyengine/policyengine-uk-data-private" + manifest.data_package.release_manifest_revision == UK_DATA_RELEASE_REVISION ) assert manifest.certified_data_artifact is not None - assert ( - manifest.certified_data_artifact.build_id == "policyengine-uk-data-1.55.10" - ) - assert manifest.certified_data_artifact.dataset == "enhanced_frs_2023_24" + assert manifest.certified_data_artifact.build_id == UK_DATA_RELEASE_ID + assert manifest.certified_data_artifact.dataset == "populace_uk_2023" + assert manifest.certified_data_artifact.uri == UK_CERTIFIED_DATASET_URI assert manifest.certification is not None - assert manifest.certification.data_build_id == "policyengine-uk-data-1.55.10" - assert manifest.certification.built_with_model_version == "2.88.20" - assert manifest.certification.certified_for_model_version == "2.88.20" + assert manifest.certification.data_build_id == UK_DATA_RELEASE_ID + assert manifest.certification.compatibility_basis == "built_with_model_package" + assert manifest.certification.certified_by == UK_CERTIFICATION_SOURCE assert ( - manifest.certification.data_build_fingerprint - == "sha256:77f149725a36055fd89961855230401852b0712d301c6e26d6d16565c6b23809" + manifest.certification.built_with_model_version + == UK_BUILT_WITH_MODEL_VERSION ) + assert manifest.certification.certified_for_model_version == UK_MODEL_VERSION def test__given_us_dataset_name__then_resolves_to_versioned_hf_url(self): resolved = resolve_dataset_reference("us", "populace_us_2024") @@ -169,11 +191,17 @@ def test__given_dataset_explicit_revision__then_resolves_to_that_revision(self): ) def test__given_uk_dataset_name__then_resolves_to_versioned_hf_url(self): - resolved = resolve_dataset_reference("uk", "enhanced_frs_2023_24") + resolved = resolve_dataset_reference("uk", "populace_uk_2023") + + assert resolved == UK_CERTIFIED_DATASET_URI + def test__given_uk_legacy_dataset_names__then_resolves_bundled_aliases(self): + assert ( + resolve_dataset_reference("uk", "frs_2023_24") == UK_LEGACY_FRS_DATASET_URI + ) assert ( - resolved - == "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f" + resolve_dataset_reference("uk", "enhanced_frs_2023_24") + == UK_LEGACY_ENHANCED_FRS_DATASET_URI ) def test__given_explicit_url__then_resolution_is_noop(self): @@ -655,12 +683,12 @@ def test__given_manifest_certification__then_release_bundle_exposes_it(self): bundle = model_version.release_bundle assert bundle["bundle_id"] == f"uk-{POLICYENGINE_VERSION}" - assert bundle["default_dataset"] == "enhanced_frs_2023_24" + assert bundle["default_dataset"] == "populace_uk_2023" assert bundle["default_dataset_uri"] == manifest.default_dataset_uri - assert bundle["certified_data_build_id"] == "policyengine-uk-data-1.55.10" - assert bundle["data_build_model_version"] == "2.88.20" - assert bundle["compatibility_basis"] == "exact_build_model_version" - assert bundle["certified_by"] == "policyengine.py bundled manifest" + assert bundle["certified_data_build_id"] == UK_DATA_RELEASE_ID + assert bundle["data_build_model_version"] == UK_BUILT_WITH_MODEL_VERSION + assert bundle["compatibility_basis"] == "built_with_model_package" + assert bundle["certified_by"] == UK_CERTIFICATION_SOURCE def test__given_runtime_certification__then_release_bundle_prefers_runtime_value( self, @@ -771,22 +799,22 @@ def test__given_uk_managed_dataset_name__then_resolves_within_bundle(self): ), patch( "policyengine.tax_benefit_models.uk.model.materialize_dataset_source", - return_value="/tmp/enhanced_frs_2023_24.h5", + return_value="/tmp/populace_uk_2023.h5", ), ): - microsim = managed_uk_microsimulation(dataset="enhanced_frs_2023_24") + microsim = managed_uk_microsimulation(dataset="populace_uk_2023") dataset = mock_microsimulation.call_args.kwargs["dataset"] - assert dataset == "/tmp/enhanced_frs_2023_24.h5" + assert dataset == "/tmp/populace_uk_2023.h5" assert ( microsim.policyengine_bundle["policyengine_version"] == POLICYENGINE_VERSION ) - assert microsim.policyengine_bundle["runtime_dataset"] == "enhanced_frs_2023_24" + assert microsim.policyengine_bundle["runtime_dataset"] == "populace_uk_2023" assert microsim.policyengine_bundle["runtime_dataset_uri"] == ( - "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f" + UK_CERTIFIED_DATASET_URI ) dataset_source = microsim.policyengine_bundle["runtime_dataset_source"] - assert dataset_source == "/tmp/enhanced_frs_2023_24.h5" + assert dataset_source == "/tmp/populace_uk_2023.h5" def test__given_uk_unmanaged_dataset_uri__then_source_is_not_rewritten(self): dataset = "hf://policyengine/policyengine-uk-data-private/frs_2022_23.h5@1.40.4" diff --git a/tests/test_uk_regions.py b/tests/test_uk_regions.py index b8f92ffc..87a2edd3 100644 --- a/tests/test_uk_regions.py +++ b/tests/test_uk_regions.py @@ -4,17 +4,12 @@ from policyengine.core.scoping_strategy import ( RowFilterStrategy, - WeightReplacementStrategy, ) from policyengine.countries.uk.regions import ( UK_COUNTRIES, build_uk_region_registry, uk_region_registry, ) -from policyengine.data.uk_geography_assets import ( - CONSTITUENCY_ASSET_SPEC, - LOCAL_AUTHORITY_ASSET_SPEC, -) class TestUKCountries: @@ -77,7 +72,8 @@ def test__given_uk_registry__then_has_national_region(self): assert national.region_type == "national" assert ( national.dataset_path - == "hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5@655dd07e4bb9c777b00dac044949611f1feb824f" + == "hf://policyengine/populace-uk-private/populace_uk_2023.h5" + "@populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z" ) assert not national.requires_filter @@ -257,13 +253,13 @@ def test__given_builder__then_accepts_include_local_authorities_flag(self): "policyengine.countries.uk.regions._load_constituencies_from_csv", return_value=[{"code": "C001", "name": "Constituency A"}], ) - def test__given_constituencies_included__then_uses_canonical_assets( + def test__given_constituencies_included__then_filters_on_dataset_geography( self, _mock_loader, ): """Given: constituencies are included When: Building the registry - Then: Weight replacement strategy uses canonical constituency assets + Then: They filter on the dataset's longwise constituency code """ # When registry = build_uk_region_registry(include_constituencies=True) @@ -271,35 +267,21 @@ def test__given_constituencies_included__then_uses_canonical_assets( # Then assert constituency is not None - assert isinstance(constituency.scoping_strategy, WeightReplacementStrategy) - assert ( - constituency.scoping_strategy.weight_matrix_bucket - == CONSTITUENCY_ASSET_SPEC.bucket - ) - assert ( - constituency.scoping_strategy.weight_matrix_key - == CONSTITUENCY_ASSET_SPEC.weight_matrix_filename - ) - assert ( - constituency.scoping_strategy.lookup_csv_bucket - == CONSTITUENCY_ASSET_SPEC.bucket - ) - assert ( - constituency.scoping_strategy.lookup_csv_key - == CONSTITUENCY_ASSET_SPEC.lookup_csv_filename - ) + assert isinstance(constituency.scoping_strategy, RowFilterStrategy) + assert constituency.scoping_strategy.variable_name == "constituency_code_oa" + assert constituency.scoping_strategy.variable_value == "C001" @patch( "policyengine.countries.uk.regions._load_local_authorities_from_csv", return_value=[{"code": "LA001", "name": "Local Authority A"}], ) - def test__given_local_authorities_included__then_uses_canonical_assets( + def test__given_local_authorities_included__then_filters_on_dataset_geography( self, _mock_loader, ): """Given: local authorities are included When: Building the registry - Then: Weight replacement strategy uses canonical local-authority assets + Then: They filter on the dataset's longwise local-authority code """ # When registry = build_uk_region_registry(include_local_authorities=True) @@ -307,20 +289,6 @@ def test__given_local_authorities_included__then_uses_canonical_assets( # Then assert local_authority is not None - assert isinstance(local_authority.scoping_strategy, WeightReplacementStrategy) - assert ( - local_authority.scoping_strategy.weight_matrix_bucket - == LOCAL_AUTHORITY_ASSET_SPEC.bucket - ) - assert ( - local_authority.scoping_strategy.weight_matrix_key - == LOCAL_AUTHORITY_ASSET_SPEC.weight_matrix_filename - ) - assert ( - local_authority.scoping_strategy.lookup_csv_bucket - == LOCAL_AUTHORITY_ASSET_SPEC.bucket - ) - assert ( - local_authority.scoping_strategy.lookup_csv_key - == LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename - ) + assert isinstance(local_authority.scoping_strategy, RowFilterStrategy) + assert local_authority.scoping_strategy.variable_name == "la_code_oa" + assert local_authority.scoping_strategy.variable_value == "LA001" diff --git a/uv.lock b/uv.lock index ddb94c9d..1560c7aa 100644 --- a/uv.lock +++ b/uv.lock @@ -2820,7 +2820,7 @@ wheels = [ [[package]] name = "policyengine" -version = "4.17.6" +version = "4.17.9" source = { editable = "." } dependencies = [ { name = "diskcache" }, @@ -2893,10 +2893,10 @@ requires-dist = [ { name = "plotly", marker = "extra == 'dev'", specifier = ">=5.0.0" }, { name = "plotly", marker = "extra == 'plotting'", specifier = ">=5.0.0" }, { name = "policyengine-core", marker = "extra == 'dev'", specifier = ">=3.27.1" }, - { name = "policyengine-core", marker = "extra == 'uk'", specifier = ">=3.26.1" }, + { name = "policyengine-core", marker = "extra == 'uk'", specifier = ">=3.27.1" }, { name = "policyengine-core", marker = "extra == 'us'", specifier = ">=3.27.1" }, - { name = "policyengine-uk", marker = "extra == 'dev'", specifier = "==2.88.20" }, - { name = "policyengine-uk", marker = "extra == 'uk'", specifier = "==2.88.20" }, + { name = "policyengine-uk", marker = "extra == 'dev'", specifier = "==2.89.2" }, + { name = "policyengine-uk", marker = "extra == 'uk'", specifier = "==2.89.2" }, { name = "policyengine-us", marker = "extra == 'dev'", specifier = "==1.729.0" }, { name = "policyengine-us", marker = "extra == 'us'", specifier = "==1.729.0" }, { name = "psutil", specifier = ">=5.9.0" }, @@ -2944,7 +2944,7 @@ wheels = [ [[package]] name = "policyengine-uk" -version = "2.88.20" +version = "2.89.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "microdf-python" }, @@ -2954,9 +2954,9 @@ dependencies = [ { name = "tables", version = "3.10.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.10.*'" }, { name = "tables", version = "3.11.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/fb/11/64c8b0269e68d42ffdc58c74b1975dcb6a67487de526855182ecc2479fb1/policyengine_uk-2.88.20.tar.gz", hash = "sha256:3c3939f4b4dc78be2747ec459bad2b5f341580be031af4004a554ce0c3f59682", size = 1189714, upload-time = "2026-05-20T17:38:13.426Z" } +sdist = { url = "https://files.pythonhosted.org/packages/55/bc/d9cadc5b91804dab0937506e02463a4146a4c996b3d6cc400599b688eb7a/policyengine_uk-2.89.2.tar.gz", hash = "sha256:9eefdc321799f1b610dc1d72b465b6d35a0595469d67c2e4445529c3063a6ef7", size = 1217538, upload-time = "2026-06-18T10:09:46.6Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/32/f0/c0e7dbcc049501dc968da0a67de4976f305228328f96fe0ad08c65301c4f/policyengine_uk-2.88.20-py3-none-any.whl", hash = "sha256:8c3dacb868f3fb18296b8ef2475edaf543f57b8056d24a58bca59b108651f272", size = 1918240, upload-time = "2026-05-20T17:38:11.347Z" }, + { url = "https://files.pythonhosted.org/packages/83/db/ce3154ba69b6fcd1e9e922ceee705ef4ddb1f81553da1e63b9296e74a4dc/policyengine_uk-2.89.2-py3-none-any.whl", hash = "sha256:80965d3dd7dc767db9b083820d40262ce543020d5a8880a0cf88da10ae641b24", size = 2001007, upload-time = "2026-06-18T10:09:44.808Z" }, ] [[package]]