Skip to content
This repository was archived by the owner on Jun 19, 2026. It is now read-only.

Calibrate bus fare and bus subsidy spending to DfT totals#431

Merged
vahid-ahmadi merged 6 commits into
mainfrom
add-bus-fare-dataset-regression-test
Jun 18, 2026
Merged

Calibrate bus fare and bus subsidy spending to DfT totals#431
vahid-ahmadi merged 6 commits into
mainfrom
add-bus-fare-dataset-regression-test

Conversation

@vahid-ahmadi

@vahid-ahmadi vahid-ahmadi commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What

Anchors both bus variables to official DfT Annual Bus Statistics (year ending March 2025) totals, uplifted England → UK by population, and adds an end-to-end test that bus fare reaches the dataset.

Variable England (DfT) Table UK target (×1.18)
bus_fare_spending £3.4bn passenger fare receipts BUS05aii ~£4.0bn
bus_subsidy_spending £3.0bn net government support BUS05bii ~£3.5bn

Sources:

Why

Both bus variables were imputed but unanchored (no calibration target, not in uprating_factors.csv), so they drifted badly in the dataset:

was now (calibrated)
bus_fare_spending £10.1bn (~3× high) ~£4.0bn
bus_subsidy_spending £1.5bn (~0.6× low) ~£3.5bn

The fare over-estimate came from inheriting a broader transport-consumption inflation (transport_consumption itself ~2.6× high); the subsidy drifted low.

How

Mirrors the existing calibrate_rail_subsidy_spending pattern — post-calibration scaling that computes the actual weighted total and scales the column to the target:

  • calibrate_bus_fare_spending (consumption.py) + BUS_FARE_TARGETS
  • calibrate_bus_subsidy_spending (services.py) + BUS_SUBSIDY_TARGETS
  • both wired into create_datasets.py alongside the rail/fuel calibration.

Coverage / uplift caveat

DfT publishes bus finance for England only. There's no single official GB/UK total, so I scale England → UK by the ONS population ratio (~1.18) as a documented best approximation. It's indicative: bus use per head varies by nation (London lifts England's per-capita use), so the true UK factor is likely a touch below the population ratio. Can be refined with Transport Scotland / StatsWales / DfI NI figures if a direct UK total is wanted.

Tests

  • bus_subsidy_spending smoke target set to the uplifted ~£3.5bn.
  • bus_fare_spending smoke target recorded (commented) — enable once a calibrated dataset is published (the released dataset predates this).
  • End-to-end test: bus_fare_spending is present in the current release (enhanced_frs_2024_25.h5). The earlier "missing" was a stale-file misread (enhanced_frs_2023_24.h5), not a pipeline drop (issue bus_fare_spending dropped between LCFS imputation and the published enhanced dataset #430 closed with that correction).

Scaling is deterministic, so calibrated totals hit the targets by construction; takes effect on the next dataset rebuild.

🤖 Generated with Claude Code

vahid-ahmadi and others added 2 commits June 17, 2026 16:43
generate_lcfs_table is unit-tested to compute bus_fare_spending, but nothing
checked it survives the QRF predict + enhanced-dataset assembly/save into the
published dataset — and it currently doesn't (issue #430): every other
consumption output lands, bus_fare_spending is dropped downstream.

Add an end-to-end test asserting the enhanced dataset carries a populated
bus_fare_spending column. Marked xfail so it is mergeable and documents the
gap; it will XPASS once the pipeline is fixed.

Refs #430.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi force-pushed the add-bus-fare-dataset-regression-test branch from 8fb9d96 to 4493f09 Compare June 18, 2026 08:38
Anchor both bus variables to official DfT Annual Bus Statistics (y/e March
2025, England): passenger fare receipts £3.4bn (BUS05aii) and net government
support £3.0bn (BUS05bii). Adds calibrate_bus_fare_spending (consumption) and
calibrate_bus_subsidy_spending (services), mirroring calibrate_rail_subsidy_
spending, called after weight calibration in create_datasets.

Unanchored, imputed bus fare inherited the broader transport-consumption
over-estimate (~£10bn, ~3x) and bus subsidy drifted low (~£1.5bn). Updates the
bus_subsidy_spending smoke target to the official £3.0bn and de-xfails the
end-to-end bus_fare_spending dataset test (the column is present in the current
release; the earlier "drop" was a stale-file misread, not a pipeline bug).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add end-to-end regression test for bus_fare_spending in the dataset Calibrate bus fare and bus subsidy spending to DfT totals Jun 18, 2026
vahid-ahmadi and others added 3 commits June 18, 2026 10:55
DfT bus-finance figures are England-only; scale to UK by the ONS mid-2023
population ratio (UK 68.3M / England 57.7M ≈ 1.18) as a documented best
approximation. Targets: bus fare £3.4bn→~£4.0bn, bus subsidy £3.0bn→~£3.5bn.
Indicative (bus use per head varies by nation); refine with Scotland/Wales/NI
sources if a direct UK figure is needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the presence-only check with an active 20% total test for both
bus_fare_spending and bus_subsidy_spending against the DfT Annual Bus
Statistics targets (England, population-uplifted to UK). Uses the enhanced FRS
dataset, which make data builds but make download does not fetch, so the
baseline fixture skips it in PR CI and runs it on the post-merge build against
the freshly calibrated data (same pattern as test_energy_calibration).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the 'or "<col>" not in dataset.household' guard from
calibrate_bus_fare_spending / calibrate_bus_subsidy_spending so they match the
rail calibration (if target is None: return None) and fail loudly if the
imputed column is unexpectedly absent, rather than silently skipping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi merged commit be6fa40 into main Jun 18, 2026
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant