Skip to content
This repository was archived by the owner on Jun 19, 2026. It is now read-only.

Carry FRS employer sector (mjobsect) and SIC industry into the dataset#433

Merged
vahid-ahmadi merged 8 commits into
mainfrom
carry-frs-employer-sector
Jun 18, 2026
Merged

Carry FRS employer sector (mjobsect) and SIC industry into the dataset#433
vahid-ahmadi merged 8 commits into
mainfrom
carry-frs-employer-sector

Conversation

@vahid-ahmadi

Copy link
Copy Markdown
Collaborator

What this does

Carries two existing FRS fields into the dataset (passthrough, not imputation — they're already in the FRS spine), populating the new Person-level variables added in PolicyEngine/policyengine-uk#1785:

  • employment_sector ← FRS mjobsect (public/private of main job), via categorical()NOT_EMPLOYED / PRIVATE / PUBLIC.
  • sic_industry_division ← FRS sic (SIC 2007 division; 84 = public administration & defence).

Mirrors the existing handling of employment_status (from empstati), region (from gvtregno), and tenure_type (from ptentyp2).

Validation (raw FRS 2022-23 person frame)

Output Result
employment_sector = PUBLIC 6,332 — exactly matches raw mjobsect == 2
employment_sector = PRIVATE 18,314
employment_sector = NOT_EMPLOYED 28,931 (non-workers + children)
unmapped / NaN 0
sic_industry_division == 84 (public admin) 2,999

Ordering

Depends on PolicyEngine/policyengine-uk#1785 (which defines the variables); that PR must merge/release first or the dataset build will not recognise the new variables.

Closes #432.

🤖 Generated with Claude Code

vahid-ahmadi and others added 2 commits June 17, 2026 18:12
Populate the new employment_sector (public/private, from FRS mjobsect) and
sic_industry_division (SIC 2007, from FRS sic) Person-level variables, using
the same categorical() passthrough pattern as employment_status and region.

Requires the matching variables in policyengine-uk (PolicyEngine/policyengine-uk#1785).
Closes #432

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis June 18, 2026 08:31
vahid-ahmadi and others added 6 commits June 18, 2026 09:33
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The create_frs smoke-test fixture builds a minimal person frame without
mjobsect/sic; fall back to 0 (NOT_EMPLOYED / unknown division) when the
columns are absent, matching existing defensive column checks (e.g. fted,
adema). Fixes test_create_frs_smoke_includes_legacy_proxy_columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dataset now writes the employment_sector and sic_industry_division
variables, which are defined in policyengine-uk 2.89.2 (PolicyEngine/policyengine-uk#1785).
Update the pin and frozen lock so CI installs a model that recognises them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a national calibration target constraining public-sector employment
(employment_sector == PUBLIC) towards the official ONS Public Sector
Employment headcount (~5.9m), correcting the FRS self-reported over-count
(~7.8m). Wires a compute_public_sector_employment column into the loss
matrix and adds a target source module.

Tests cover the target definition/value (within 20% of ONS) and a
post-data-generation total check asserting the simulated weighted public
sector headcount is within 20% of the target (skipped until a dataset
build includes the variable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Access person.mjobsect/person.sic directly like the other FRS categoricals
(empstati, gvtregno, ptentyp2) instead of falling back to 0 when the column
is absent, which would silently produce all-NOT_EMPLOYED on real data. The
create_frs smoke-test fixture now provides mjobsect/sic, matching how it
already provides empstati.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FRS over-reports public-sector employment (~7.9m vs ONS ~5.9m) and the
national calibration only partially corrects it, so the simulated-total
check uses a loose tolerance like the other aggregate-vs-target tests
(land value/spending ~0.65-0.70, vehicles ~0.30) instead of 20%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi merged commit 4a66ca1 into main Jun 18, 2026
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Carry FRS employer sector (mjobsect) and SIC industry into the dataset

1 participant