This repository was archived by the owner on Jun 19, 2026. It is now read-only.
Carry FRS employer sector (mjobsect) and SIC industry into the dataset#433
Merged
Conversation
Populate the new employment_sector (public/private, from FRS mjobsect) and sic_industry_division (SIC 2007, from FRS sic) Person-level variables, using the same categorical() passthrough pattern as employment_status and region. Requires the matching variables in policyengine-uk (PolicyEngine/policyengine-uk#1785). Closes #432 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The create_frs smoke-test fixture builds a minimal person frame without mjobsect/sic; fall back to 0 (NOT_EMPLOYED / unknown division) when the columns are absent, matching existing defensive column checks (e.g. fted, adema). Fixes test_create_frs_smoke_includes_legacy_proxy_columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dataset now writes the employment_sector and sic_industry_division variables, which are defined in policyengine-uk 2.89.2 (PolicyEngine/policyengine-uk#1785). Update the pin and frozen lock so CI installs a model that recognises them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a national calibration target constraining public-sector employment (employment_sector == PUBLIC) towards the official ONS Public Sector Employment headcount (~5.9m), correcting the FRS self-reported over-count (~7.8m). Wires a compute_public_sector_employment column into the loss matrix and adds a target source module. Tests cover the target definition/value (within 20% of ONS) and a post-data-generation total check asserting the simulated weighted public sector headcount is within 20% of the target (skipped until a dataset build includes the variable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Access person.mjobsect/person.sic directly like the other FRS categoricals (empstati, gvtregno, ptentyp2) instead of falling back to 0 when the column is absent, which would silently produce all-NOT_EMPLOYED on real data. The create_frs smoke-test fixture now provides mjobsect/sic, matching how it already provides empstati. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FRS over-reports public-sector employment (~7.9m vs ONS ~5.9m) and the national calibration only partially corrects it, so the simulated-total check uses a loose tolerance like the other aggregate-vs-target tests (land value/spending ~0.65-0.70, vehicles ~0.30) instead of 20%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Carries two existing FRS fields into the dataset (passthrough, not imputation — they're already in the FRS spine), populating the new Person-level variables added in PolicyEngine/policyengine-uk#1785:
employment_sector← FRSmjobsect(public/private of main job), viacategorical()→NOT_EMPLOYED/PRIVATE/PUBLIC.sic_industry_division← FRSsic(SIC 2007 division; 84 = public administration & defence).Mirrors the existing handling of
employment_status(fromempstati),region(fromgvtregno), andtenure_type(fromptentyp2).Validation (raw FRS 2022-23 person frame)
employment_sector= PUBLICmjobsect == 2employment_sector= PRIVATEemployment_sector= NOT_EMPLOYEDsic_industry_division == 84(public admin)Ordering
Depends on PolicyEngine/policyengine-uk#1785 (which defines the variables); that PR must merge/release first or the dataset build will not recognise the new variables.
Closes #432.
🤖 Generated with Claude Code