Reorganize backend by topic (vertical slice) [stacked on #112] by anth-volk · Pull Request #116 · PolicyEngine/policyengine-uk-chat

anth-volk · 2026-06-18T17:26:16Z

Summary

Reorganizes the backend from its mixed structure (HTTP-layer routes/, topic
tooling/+evaluation/, grab-bag package root) into package-by-topic: each
folder is one requirement, each file one sub-responsibility — including that
topic's own route file and config. Pure structural change; runtime behaviour
is identical.

Final tree: api/ · billing/ · chat/ · config/ · conversations/ · engine/ · eval/ · gateway/ · prompts/ · tools/ (+ tests/).

Closes #115. Implements the proposal in #114.

⚠️ Stacked on #112 — do not merge first

This branch is based on feat/chat-gateway (#112) and the PR targets it, not
main. Three packages (config/, gateway/, the gateway-aware chat/ split)
come from files that only exist on #112, and most slices collide with #112's
edits to chatbot.py / prompts.py / evaluation/* / build_reference.py.
Merge after #112, then retarget this PR to main.

How it was done

A 10-step loop — one topic package per commit, leaf → root, each independently
runnable and test-green:

commit	slice
`refactor(engine)`	`tooling/` + `build_reference.py` → `engine/`
`refactor(config)`	`model_config.py` + chat-only model constants → `config/{models,sampling,clients,scope}`
`refactor(prompts)`	`prompts.py` → `prompts/{system,gateway,meta}`
`refactor(tools)`	`tool_definitions.py`/`agent_tools.py` → `tools/{definitions,dispatch}`
`refactor(gateway)`	`gateway.py`/`gateway_config.py` → `gateway/{runtime,policy}`
`refactor(billing)`	`routes/billing.py` → `billing/{pricing,credits,stripe,routes}`
`refactor(conversations)`	`routes/conversations.py` → `conversations/{models,schemas,store,sharing,reports,routes}`
`refactor(chat)`	`routes/chatbot.py` (707 lines) → `chat/{orchestrator,system_blocks,model_selection,schemas,titles,suggestions,routes}`
`refactor(eval)`	`evaluation/` → `eval/`
`refactor(api)`	`main.py`/`rate_limit.py` → `api/{main,errors,rate_limit}`
`docs(skills)`	update `uk-chat-runtime.md` + `ai-evals.md` path maps

Design notes:

Re-exporting __init__s (config, prompts, gateway, billing, conversations,
chat) keep most importers unchanged. Tests that reach into module internals
got precise submodule aliases (chat.orchestrator, gateway.runtime,
tools.dispatch, conversations.models).
config/ collapses the DEFAULT_FAST_MODEL-here / DEFAULT_COMPLEX_MODEL-there
asymmetry — all model-call config now lives in one place; the
scope_descriptor.md path was fixed (parent → parent.parent).
chat/orchestrator.py keeps the SSE streaming tool loop verbatim; the mocked
end-to-end stream + tool-loop test validates it.
Route paths (/chat/*, /billing/*, /conversations/*), the
reference.md/scope_descriptor.md output locations, and the
evals/cases/* resolution are all unchanged.

Verification

Full backend suite: 155 passed, 10 skipped — at every commit.
make eval-ai-offline: 95 passed, 0 failed, 41 skipped (baseline unchanged).
uvicorn api.main:app boots; /health + /version return 200; startup
ensure_table wired.
Deploy entrypoints updated (modal_app.py, backend/Dockerfile, Makefile).
Stale-path sweep clean (no references to any moved module).

🤖 Generated with Claude Code

Rename backend/tooling/ -> backend/engine/ and backend/scripts/build_reference.py -> backend/engine/reference.py, so the deterministic PolicyEngine compute layer is one topic folder (#114 slice 1). - Update all tooling.* imports -> engine.* (internal engine files, agent_tools.py, tool_definitions.py, tests/test_agent_tools.py). - Update run paths: modal_app.py, backend/Dockerfile (python engine/reference.py), plus the reference.py header and the chatbot/prompts comment strings. - reference.md / scope_descriptor.md output paths are unchanged (parent.parent still resolves to backend/). No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the single backend/model_config.py with a config/ topic package (#114 slice 2), and move the chat-only model constants out of chatbot.py so all model-call configuration lives in one place: - config/sampling.py DEFAULT_TEMPERATURE, SUGGESTION_TEMPERATURE - config/models.py DEFAULT_FAST_MODEL, DEFAULT_COMPLEX_MODEL, TITLE_MODEL, SUGGESTION_MODEL, SUGGESTION_TIMEOUT_SECS, FAST_MODEL_MAX_INPUT_TOKENS (the last five moved from chatbot.py — kills the fast-here/complex-there asymmetry) - config/clients.py get_sync_client - config/scope.py load_scope_descriptor (path fixed parent -> parent.parent now that it sits one level under backend/) - config/__init__.py re-exports the public surface, so importers just swap the module name (from model_config import X -> from config import X). Importers updated: gateway.py, evaluation/providers.py, routes/chatbot.py. MAX_ITERATIONS stays in chatbot.py (orchestration, not model config). No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Split the single backend/prompts.py into a prompts/ topic package by audience (#114 slice 3): - prompts/system.py the compute SYSTEM_PROMPT sections + CHARTS_MODE_DIRECTIVE - prompts/gateway.py DEFAULT_SCOPE_DESCRIPTOR, lightweight_system, the gateway classifier prompt, and the 4 per-outcome writer directives - prompts/meta.py TITLE_SYSTEM, SUGGESTION_SYSTEM - prompts/__init__.py re-exports the public surface, so every `from prompts import X` site (gateway.py, chatbot.py, evaluation/runner.py, test_prompts.py) is unchanged. No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move backend/tool_definitions.py -> tools/definitions.py (the schemas the model sees) and backend/agent_tools.py -> tools/dispatch.py (TOOL_HANDLERS + execute_tool), so the model-facing tool seam is one topic folder (#114 slice 4). Importers point at the specific submodule rather than a re-export facade, so lightweight TOOL_DEFINITIONS consumers (gateway, gateway_config) don't pull in the heavy dispatch+engine import chain: - chatbot.py, evaluation/runner.py: from tools.definitions/from tools.dispatch - gateway.py, gateway_config.py: from tools.definitions import TOOL_DEFINITIONS - tests: import tools.dispatch as agent_tools / tools.definitions as tool_definitions (keeps the module-attribute + private-name access intact) No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Split the gateway into a topic package (#114 slice 5): - gateway/runtime.py was gateway.py (run_gateway, emit_plan, GatewayVerdict) - gateway/policy.py was gateway_config.py (criticality, gate, inferable) - gateway/__init__.py re-exports the runtime surface, so chatbot.py and the eval runner's lazy `from gateway import run_gateway` are unchanged. runtime.py now imports `from gateway.policy import ...`. test_gateway.py uses `from gateway import runtime as gateway` so its `gateway.get_sync_client` patch still targets the namespace run_gateway resolves against, and `from gateway.policy import ...` for the gate tests. No behaviour change. Gateway tests 33 passed; full suite 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Split the 301-line route file into a billing/ topic package (#114 slice 6): - billing/pricing.py pricing table + calculate_cost_gbp - billing/credits.py Supabase credit accounting (get_supabase, check_balance, record_usage, ...) - billing/stripe.py Stripe checkout-session + webhook integration (the SDK calls + credit side-effects, extracted verbatim) - billing/routes.py the /billing router + 4 thin endpoints - billing/__init__.py re-exports router (main.py) and check_balance/record_usage (the chat route's dynamic imports become `from billing import ...`) Function bodies are relocated verbatim; route paths are unchanged (/billing/balance,/usage,/checkout,/webhook all still register). Importers updated: main.py, chatbot.py (3 dynamic imports), test_billing.py. No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Split the 348-line route file into a conversations/ topic package (#114 slice 7): - conversations/models.py ChatConversation SQLModel + get_engine + ensure_table - conversations/schemas.py request/response models - conversations/store.py CRUD endpoints - conversations/sharing.py share + read-shared endpoints - conversations/reports.py GitHub-issue report builder + endpoint - conversations/routes.py prefix-less aggregator that include_router's the three endpoint groups (each carries the /conversations prefix so empty "" paths are legal) - conversations/__init__.py re-exports router + ensure_table (main.py startup) Function bodies relocated verbatim; the five route paths are unchanged. conftest's fixture now patches conversations.models._engine (via `from conversations import models as conversations`). No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Decompose the 707-line chat route into a chat/ topic package (#114 slice 8): - chat/orchestrator.py the SSE streaming tool loop (stream_chat) — verbatim - chat/system_blocks.py reference doc + scope descriptor + block assembly - chat/model_selection.py model pick + turn-classification helpers - chat/schemas.py ChatMessage / ChatRequest / TitleRequest - chat/suggestions.py follow-up suggestion chips - chat/titles.py title generation - chat/routes.py the /chat router + thin decorated endpoints - chat/__init__.py re-exports router The async Anthropic client factory moves to config/clients.get_async_client (get_sync_client's sibling). routes/ is now empty and removed. Importers updated: main.py (import chat), test_prompts.py (chat.system_blocks), test_api.py (patches chat.orchestrator: get_async_client / _generate_followup_suggestions / execute_tool; imports chat.routes.chat_message). Route paths /chat/message and /chat/title are unchanged. No behaviour change. Full suite: 155 passed, 10 skipped (incl. the mocked end-to-end stream + tool-loop test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rename the eval harness package backend/evaluation/ -> backend/eval/ (#114 slice 9). Updated only the module-path references (from eval.* / import eval.* / python -m eval.*) — the word "evaluation" in docstrings is untouched. - internal eval imports, run.py, __init__.py - tests/test_evaluation.py - Makefile (python -m eval.sync_policyengine_uk, python -m eval.run) SUITE_DIRS still resolves to evals/cases/* (parents[2] depth unchanged). No behaviour change. Full suite: 155 passed, 10 skipped. make eval-ai-offline: 95 passed, 0 failed, 41 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the HTTP surface into an api/ topic package (#114 slice 10, final): - api/main.py the FastAPI app, CORS, router mounting, /health, /version - api/errors.py NaNSafeJSONResponse + the rate-limit exception handler (wired via app.add_exception_handler instead of a decorator) - api/rate_limit.py the slowapi limiter + key funcs (verbatim) - api/__init__.py Importers updated: chat/routes.py + tests/test_api.py (from api.rate_limit import ...); test_api.py + modal_app.py (from api.main import app); Dockerfile CMD (uvicorn api.main:app). `uvicorn api.main:app` resolves; all routers mount. No behaviour change. Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reflect the #114 reorg in the AI-facing skill docs: - uk-chat-runtime.md §Source Boundaries rewritten to the new packages (chat/, gateway/, prompts/, tools/, engine/, config/, api/); also fixes the prompts/, engine/, and config/ references in the Tool Boundary and Deterministic sections. - ai-evals.md: backend/evaluation/ -> backend/eval/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-18T17:26:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	Jun 18, 2026 7:41pm

github-actions · 2026-06-18T17:27:12Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

Small fixes from the #116 review (no behaviour change): - config/clients.py: extract _api_key() so get_sync_client / get_async_client stop duplicating the env-var fetch + RuntimeError guard (DRY; the async factory was newly added in the chat slice). - gateway/policy.py: docstring referenced the old prompts.py/chatbot.py paths -> prompts/ / chat/. - .gitignore: comment referenced scripts/build_reference.py -> engine/reference.py. - gateway/runtime.py: import order (config before gateway.policy). Full suite: 155 passed, 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vahid-ahmadi

Review — backend reorg to package-by-topic (draft, stacked on #112)

I verified this as what it claims to be: a faithful pure-structural refactor. I checked the low-similarity splits closely (the risky ones), confirmed the re-export shims, and confirmed the deploy wiring. Findings below.

Faithfulness — verified ✅

The big split routes/chatbot.py (707 lines) → chat/{orchestrator,routes,schemas,suggestions,system_blocks,titles,model_selection}: every function/route present once with identical bodies; constants unchanged (MAX_ITERATIONS=30, max_tokens 16000/200/32, MAX_RESULT_CHARS, max_retries=2); inline model/sampling constants moved to config/* with identical values. SSE tool loop kept verbatim.
prompts.py → prompts/{system,gateway,meta}: every triple-quoted prompt literal is byte-identical (no wording/whitespace drift), same SYSTEM_PROMPT_SECTIONS ordering and join.
routes/billing.py → billing/*: all pricing constants, Stripe params, webhook signature handling, and 4 routes preserved.
main.py → api/{main,errors} and routes/conversations.py → conversations/*: app config, CORS, routers, startup hook, the chat_conversations DB model (column-for-column), and ensure_table() all preserved.
Re-export __init__ shims (prompts, gateway, config, billing, conversations, chat) re-export the public names external importers actually use, so from prompts import … / from gateway import run_gateway etc. still resolve to the same objects.
Entrypoints updated consistently: Dockerfile (uvicorn api.main:app, python engine/reference.py), modal_app.py (from api.main import app, build step), Makefile (python -m eval.run). No stale evaluation./scripts//main:app references left.
Test coverage preserved: I counted def test_ across all six touched test files — identical base vs head (no tests dropped or merged). The 162 → 155 passed delta between this and #112's description is measurement context, not lost coverage; worth a one-line "CI green" confirmation when you de-draft.

The one buried behavioural change — fine, but mildly mis-described

The PR calls the parent → parent.parent edit in the scope-descriptor path a "fix". It's actually compensation: load_scope_descriptor moved from backend/model_config.py to backend/config/scope.py (one directory deeper), so it needs an extra .parent to resolve to the same backend/scope_descriptor.md. Net runtime target is unchanged — correct, but "fix" overstates it; it'd be clearer to describe it as path-compensation for the move so a future reader doesn't go hunting for the bug it "fixed".

Other small undocumented deltas (benign)

api/main.py swaps @app.exception_handler(RateLimitExceeded) for app.add_exception_handler(RateLimitExceeded, rate_limit_handler) — equivalent in Starlette, but it's a behavioural-looking change inside a "pure move" PR; worth a word in the commit body.
conversations/models.py drops an unused timezone import. Harmless.
New tests/conftest.py (rate-limit env bumps, sqlite isolation, a requires_compiled skip marker) is a good addition and well-commented.

My actual concern: the stacking order

The code is sound; the risk is process. This is a 74-file move stacked on #112, which is uncalibrated and explicitly expects tuning (its own follow-up is "run the live eval and tune the gateway prompt + promotions against misroutes"). Any such tuning — prompt edits, the keyword-promotion fixes I flagged on #112, criticality tweaks — lands in the exact files this PR moves (gateway/*, prompts/*, chat/*), forcing a rebase of all 74 files each time. Refactoring on top of a moving, unmerged target is avoidable churn.

Recommendation, matching the draft status:

Land #112 and let it stabilise through its live calibration.
Then do this reorg against main (retarget as the description says).

Keeping it a draft until #112 merges is exactly right. Once it's rebased onto a settled #112, this is a clean, well-sequenced refactor (one topic package per commit, test-green at each) and I'd be happy to approve it. Nice work on the leaf→root commit discipline.

anth-volk and others added 11 commits June 18, 2026 18:36

anth-volk mentioned this pull request Jun 18, 2026

Reorganize backend by topic (vertical slice) instead of by layer #114

Closed

4 tasks

vercel Bot deployed to Preview June 18, 2026 19:41 View deployment

vahid-ahmadi reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize backend by topic (vertical slice) [stacked on #112]#116

Reorganize backend by topic (vertical slice) [stacked on #112]#116
anth-volk wants to merge 12 commits into
feat/chat-gatewayfrom
refactor/backend-topic-structure

anth-volk commented Jun 18, 2026

Uh oh!

vercel Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

vahid-ahmadi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anth-volk commented Jun 18, 2026

Summary

⚠️ Stacked on #112 — do not merge first

How it was done

Verification

Uh oh!

vercel Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

Review — backend reorg to package-by-topic (draft, stacked on #112)

Faithfulness — verified ✅

The one buried behavioural change — fine, but mildly mis-described

Other small undocumented deltas (benign)

My actual concern: the stacking order

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 18, 2026 •

edited

Loading