Skip to content

Certify UK Populace default dataset#427

Merged
MaxGhenis merged 2 commits into
mainfrom
codex/promote-uk-populace
Jun 19, 2026
Merged

Certify UK Populace default dataset#427
MaxGhenis merged 2 commits into
mainfrom
codex/promote-uk-populace

Conversation

@MaxGhenis

@MaxGhenis MaxGhenis commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Certifies the published UK Populace release as the default UK dataset in policyengine.py.

Release: populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z
Dataset: hf://policyengine/populace-uk-private/populace_uk_2023.h5@populace-uk-2023-dd68c73-4aa4b14-20260619T023711Z
Model pin: policyengine-uk==2.89.2

Why promote

UK Populace materially outperforms the incumbent enhanced FRS surface under the matched-N sound comparison.

Metric Populace UK Incumbent enhanced FRS
Full loss 0.0376 0.1069
Train loss 0.0159 0.0385
Holdout loss 0.1239 0.3784
Max weight 167k 535k
Weights >500k 0 1
Per-target wins 79 70

Target-fit diagnostics also look acceptable: mean absolute relative error is 1.49%, median is 0.96%, 97.99% of targets are within 10%, and the worst target miss is 23.1%, under the 25% release gate.

Salient targets

Shipped-weight relative error versus the registry target, compared with the incumbent enhanced FRS surface:

Target Target value Populace UK Incumbent Why it matters
ons/lone_households_under_65 4.17m -0.2% +66.5% Household-type fit
ons/unrelated_adult_households 827k +1.2% -38.6% Household-type fit
ons/couple_under_3_children_households 5.21m +0.1% -35.4% Household-type fit
ons/lone_parent_dependent_children_households 1.90m +0.6% +72.3% Household-type fit
ons/multi_family_households 290k +0.2% +64.8% Household-type fit
ons/public_sector_employment 5.90m +0.2% -100.0% New sector passthrough target
ons/household_land_value £4.80tn +0.8% -2.5% Wealth / housing balance-sheet target
ons/household_land_value/NORTH_WEST £402bn -0.3% -21.5% Regional wealth fit
ons/tenure_england_owned_with_mortgage 7.60m +0.3% -6.4% Housing tenure mix
ons/tenure_england_social_rent 4.25m +0.1% +43.8% Housing tenure mix
slc/maintenance_loan_recipients 1.15m +0.5% -16.9% Student-loan population
slc/adult_dependants_grant_recipients 20k -23.1% -82.2% Worst remaining miss, still under gate and materially improved
obr/private_school_students 557k -14.4% +5.0% Salient remaining miss where incumbent is closer

Likely drivers

  • Larger and better household support before calibration, giving the solver better candidate records rather than relying on extreme weights.
  • Bounded Populace calibration with max_weight_ratio=50, reducing the max weight while improving loss.
  • Fixed mutually exclusive household-type target definitions in the UK-data release branch.
  • Better alignment of target definitions to PolicyEngine UK variables, including council tax using council_tax_less_benefit.
  • Recent UK-data improvements folded into the build flow: bus fare/subsidy calibration, employer sector and SIC passthrough, forward-compatible debt columns, and OBR target hardening.

Compatibility

The default UK dataset becomes populace_uk_2023, but the bundled manifest still preserves the old logical dataset aliases:

  • frs_2023_24
  • enhanced_frs_2023_24

Both resolve to the prior pinned policyengine-uk-data-private artifacts, so existing explicit dataset references keep working.

Subnational geography

The old UK constituency matrix was built for the enhanced FRS household count (650 x 53,508) and would not match Populace UK's 535,080 households. This PR therefore moves the Populace-facing UK constituency and local-authority paths to the longwise approach:

  • UK region registry constituency/LA entries now use RowFilterStrategy on constituency_code_oa and la_code_oa.
  • UK simulation outputs preserve those household geography columns from the input dataset.
  • compute_uk_constituency_impacts and compute_uk_local_authority_impacts group household outputs by those longwise columns instead of loading weight matrices.
  • Optional metadata CSVs are still used for names and map coordinates when available; legacy weight_matrix_path and year arguments are accepted for compatibility but ignored.

This keeps Populace clean: no matrix generation, no matrix artifacts, and no default matrix dependency.

Verification

  • ruff format --check .
  • ruff check .
  • pytest tests/test_release_manifests.py tests/test_models.py tests/test_uk_regions.py tests/test_trace_tro.py tests/test_manifest_version_mismatch.py tests/test_dataset_sources.py tests/test_household_calculator_snapshot.py -q
  • pytest tests/test_constituency_impact.py tests/test_local_authority_impact.py tests/test_uk_regions.py tests/test_scoping_strategy.py tests/test_extra_variables.py -q
  • pytest tests/test_release_manifests.py tests/test_models.py tests/test_dataset_sources.py tests/test_manifest_version_mismatch.py tests/test_uk_regions.py tests/test_constituency_impact.py tests/test_local_authority_impact.py -q
  • UK Populace release gates passed separately: nonzero exports, parity, export surface, target surface, and target fit.
  • Independent review cycle completed clean after fixing the flagged artifact, alias, target-fit, stale build-option, and UK subnational geography issues.

@MaxGhenis MaxGhenis force-pushed the codex/promote-uk-populace branch 2 times, most recently from 8bca058 to 7bf1c08 Compare June 19, 2026 13:01
@MaxGhenis MaxGhenis force-pushed the codex/promote-uk-populace branch from 7bf1c08 to f41f5e9 Compare June 19, 2026 13:17
@MaxGhenis MaxGhenis merged commit 3ffa682 into main Jun 19, 2026
12 checks passed
@MaxGhenis MaxGhenis deleted the codex/promote-uk-populace branch June 19, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant