Skip to content

Build UK geography crosswalk from official sources#142

Merged
MaxGhenis merged 1 commit into
mainfrom
codex/uk-source-crosswalk-20260619
Jun 19, 2026
Merged

Build UK geography crosswalk from official sources#142
MaxGhenis merged 1 commit into
mainfrom
codex/uk-source-crosswalk-20260619

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Summary

  • add official-source E/W and Scotland loaders/builders so Populace can compose the complete UK row-wise geography crosswalk from ONS, NRS, NISRA, and postcode sources
  • keep the existing base-crosswalk repair function for compatibility, but add build_official_uk_geography_crosswalk() as the source-independent path
  • harden source validation, ZIP CSV selection, transient source-download retries, and NI active-postcode handling
  • export the new public UK geography helpers and cover them with synthetic source tests

Live validation

  • official crosswalk build: 239,023 rows
  • country rows: England 178,605; Wales 10,275; Scotland 46,363; Northern Ireland 3,780
  • country populations: England 56,490,284; Wales 3,107,463; Scotland 5,440,284; Northern Ireland 1,903,168
  • target coverage using PE uk-data target lists for validation only: 650/650 constituencies covered and sampleable; 360/360 local authorities covered and sampleable
  • one extra official LA code appears outside the PE target list: N09000011 (Northern Ireland), with no missing PE target LAs

Row-wise dataset pilot

Using /Users/maxghenis/.claude-worktrees/populace-uk-build/artifacts/populace_uk_2023.h5 and the official crosswalk:

  • n=1 clone: person (1,157,100, 104), benunit (618,980, 12), household (535,080, 68), exact weight preservation, zero missing geo rows, 650 constituencies assigned, no duplicate source-household/constituency pairs
  • n=2 clone: person (2,314,200, 104), benunit (1,237,960, 12), household (1,070,160, 68), exact weight preservation, zero missing geo rows, 650 constituencies assigned, no duplicate source-household/constituency pairs

Checks

  • uv run pytest packages/populace-build/tests/test_uk_geography_sources.py packages/populace-build/tests/test_uk_rowwise_geography.py packages/populace-build/tests/test_uk_rowwise_dataset.py -q
  • uv run ruff check packages/populace-build/src/populace/build/uk/geography_sources.py packages/populace-build/src/populace/build/uk/__init__.py packages/populace-build/tests/test_uk_geography_sources.py
  • uv run ruff format --check packages/populace-build/src/populace/build/uk/geography_sources.py packages/populace-build/src/populace/build/uk/__init__.py packages/populace-build/tests/test_uk_geography_sources.py
  • git diff --check origin/main...HEAD
  • env -u UV_FROZEN uv lock --check
  • uv run --all-packages pytest -q

Review

  • /cycle read-only review pass 1 found the NI blank-string doterm active-postcode issue; fixed with regression coverage
  • /cycle read-only review pass 2 found no remaining actionable findings

@MaxGhenis MaxGhenis merged commit d86e017 into main Jun 19, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the codex/uk-source-crosswalk-20260619 branch June 19, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant