Skip to content

Reorganize backend by topic (vertical slice) [stacked on #112]#116

Draft
anth-volk wants to merge 12 commits into
feat/chat-gatewayfrom
refactor/backend-topic-structure
Draft

Reorganize backend by topic (vertical slice) [stacked on #112]#116
anth-volk wants to merge 12 commits into
feat/chat-gatewayfrom
refactor/backend-topic-structure

Conversation

@anth-volk

Copy link
Copy Markdown
Contributor

Summary

Reorganizes the backend from its mixed structure (HTTP-layer routes/, topic
tooling/+evaluation/, grab-bag package root) into package-by-topic: each
folder is one requirement, each file one sub-responsibility — including that
topic's own route file and config. Pure structural change; runtime behaviour
is identical
.

Final tree: api/ · billing/ · chat/ · config/ · conversations/ · engine/ · eval/ · gateway/ · prompts/ · tools/ (+ tests/).

Closes #115. Implements the proposal in #114.

⚠️ Stacked on #112 — do not merge first

This branch is based on feat/chat-gateway (#112) and the PR targets it, not
main. Three packages (config/, gateway/, the gateway-aware chat/ split)
come from files that only exist on #112, and most slices collide with #112's
edits to chatbot.py / prompts.py / evaluation/* / build_reference.py.
Merge after #112, then retarget this PR to main.

How it was done

A 10-step loop — one topic package per commit, leaf → root, each independently
runnable and test-green:

commit slice
refactor(engine) tooling/ + build_reference.pyengine/
refactor(config) model_config.py + chat-only model constants → config/{models,sampling,clients,scope}
refactor(prompts) prompts.pyprompts/{system,gateway,meta}
refactor(tools) tool_definitions.py/agent_tools.pytools/{definitions,dispatch}
refactor(gateway) gateway.py/gateway_config.pygateway/{runtime,policy}
refactor(billing) routes/billing.pybilling/{pricing,credits,stripe,routes}
refactor(conversations) routes/conversations.pyconversations/{models,schemas,store,sharing,reports,routes}
refactor(chat) routes/chatbot.py (707 lines) → chat/{orchestrator,system_blocks,model_selection,schemas,titles,suggestions,routes}
refactor(eval) evaluation/eval/
refactor(api) main.py/rate_limit.pyapi/{main,errors,rate_limit}
docs(skills) update uk-chat-runtime.md + ai-evals.md path maps

Design notes:

  • Re-exporting __init__s (config, prompts, gateway, billing, conversations,
    chat) keep most importers unchanged. Tests that reach into module internals
    got precise submodule aliases (chat.orchestrator, gateway.runtime,
    tools.dispatch, conversations.models).
  • config/ collapses the DEFAULT_FAST_MODEL-here / DEFAULT_COMPLEX_MODEL-there
    asymmetry — all model-call config now lives in one place; the
    scope_descriptor.md path was fixed (parentparent.parent).
  • chat/orchestrator.py keeps the SSE streaming tool loop verbatim; the mocked
    end-to-end stream + tool-loop test validates it.
  • Route paths (/chat/*, /billing/*, /conversations/*), the
    reference.md/scope_descriptor.md output locations, and the
    evals/cases/* resolution are all unchanged.

Verification

  • Full backend suite: 155 passed, 10 skipped — at every commit.
  • make eval-ai-offline: 95 passed, 0 failed, 41 skipped (baseline unchanged).
  • uvicorn api.main:app boots; /health + /version return 200; startup
    ensure_table wired.
  • Deploy entrypoints updated (modal_app.py, backend/Dockerfile, Makefile).
  • Stale-path sweep clean (no references to any moved module).

🤖 Generated with Claude Code

anth-volk and others added 11 commits June 18, 2026 18:36
Rename backend/tooling/ -> backend/engine/ and
backend/scripts/build_reference.py -> backend/engine/reference.py, so the
deterministic PolicyEngine compute layer is one topic folder (#114 slice 1).

- Update all tooling.* imports -> engine.* (internal engine files,
  agent_tools.py, tool_definitions.py, tests/test_agent_tools.py).
- Update run paths: modal_app.py, backend/Dockerfile (python engine/reference.py),
  plus the reference.py header and the chatbot/prompts comment strings.
- reference.md / scope_descriptor.md output paths are unchanged (parent.parent
  still resolves to backend/).

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the single backend/model_config.py with a config/ topic package
(#114 slice 2), and move the chat-only model constants out of chatbot.py so
all model-call configuration lives in one place:

- config/sampling.py   DEFAULT_TEMPERATURE, SUGGESTION_TEMPERATURE
- config/models.py      DEFAULT_FAST_MODEL, DEFAULT_COMPLEX_MODEL, TITLE_MODEL,
                        SUGGESTION_MODEL, SUGGESTION_TIMEOUT_SECS,
                        FAST_MODEL_MAX_INPUT_TOKENS (the last five moved from
                        chatbot.py — kills the fast-here/complex-there asymmetry)
- config/clients.py     get_sync_client
- config/scope.py       load_scope_descriptor (path fixed parent -> parent.parent
                        now that it sits one level under backend/)
- config/__init__.py re-exports the public surface, so importers just swap the
  module name (from model_config import X -> from config import X).

Importers updated: gateway.py, evaluation/providers.py, routes/chatbot.py.
MAX_ITERATIONS stays in chatbot.py (orchestration, not model config).

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the single backend/prompts.py into a prompts/ topic package by audience
(#114 slice 3):

- prompts/system.py   the compute SYSTEM_PROMPT sections + CHARTS_MODE_DIRECTIVE
- prompts/gateway.py   DEFAULT_SCOPE_DESCRIPTOR, lightweight_system, the gateway
                       classifier prompt, and the 4 per-outcome writer directives
- prompts/meta.py      TITLE_SYSTEM, SUGGESTION_SYSTEM
- prompts/__init__.py re-exports the public surface, so every `from prompts
  import X` site (gateway.py, chatbot.py, evaluation/runner.py, test_prompts.py)
  is unchanged.

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move backend/tool_definitions.py -> tools/definitions.py (the schemas the
model sees) and backend/agent_tools.py -> tools/dispatch.py (TOOL_HANDLERS +
execute_tool), so the model-facing tool seam is one topic folder (#114 slice 4).

Importers point at the specific submodule rather than a re-export facade, so
lightweight TOOL_DEFINITIONS consumers (gateway, gateway_config) don't pull in
the heavy dispatch+engine import chain:
- chatbot.py, evaluation/runner.py: from tools.definitions/from tools.dispatch
- gateway.py, gateway_config.py: from tools.definitions import TOOL_DEFINITIONS
- tests: import tools.dispatch as agent_tools / tools.definitions as
  tool_definitions (keeps the module-attribute + private-name access intact)

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the gateway into a topic package (#114 slice 5):
- gateway/runtime.py   was gateway.py (run_gateway, emit_plan, GatewayVerdict)
- gateway/policy.py     was gateway_config.py (criticality, gate, inferable)
- gateway/__init__.py re-exports the runtime surface, so chatbot.py and the
  eval runner's lazy `from gateway import run_gateway` are unchanged.

runtime.py now imports `from gateway.policy import ...`. test_gateway.py uses
`from gateway import runtime as gateway` so its `gateway.get_sync_client` patch
still targets the namespace run_gateway resolves against, and `from
gateway.policy import ...` for the gate tests.

No behaviour change. Gateway tests 33 passed; full suite 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the 301-line route file into a billing/ topic package (#114 slice 6):
- billing/pricing.py   pricing table + calculate_cost_gbp
- billing/credits.py    Supabase credit accounting (get_supabase, check_balance,
                        record_usage, ...)
- billing/stripe.py     Stripe checkout-session + webhook integration (the SDK
                        calls + credit side-effects, extracted verbatim)
- billing/routes.py     the /billing router + 4 thin endpoints
- billing/__init__.py re-exports router (main.py) and check_balance/record_usage
  (the chat route's dynamic imports become `from billing import ...`)

Function bodies are relocated verbatim; route paths are unchanged
(/billing/balance,/usage,/checkout,/webhook all still register).

Importers updated: main.py, chatbot.py (3 dynamic imports), test_billing.py.
No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the 348-line route file into a conversations/ topic package (#114 slice 7):
- conversations/models.py    ChatConversation SQLModel + get_engine + ensure_table
- conversations/schemas.py    request/response models
- conversations/store.py      CRUD endpoints
- conversations/sharing.py    share + read-shared endpoints
- conversations/reports.py    GitHub-issue report builder + endpoint
- conversations/routes.py     prefix-less aggregator that include_router's the
                              three endpoint groups (each carries the
                              /conversations prefix so empty "" paths are legal)
- conversations/__init__.py re-exports router + ensure_table (main.py startup)

Function bodies relocated verbatim; the five route paths are unchanged.
conftest's fixture now patches conversations.models._engine (via
`from conversations import models as conversations`).

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Decompose the 707-line chat route into a chat/ topic package (#114 slice 8):
- chat/orchestrator.py   the SSE streaming tool loop (stream_chat) — verbatim
- chat/system_blocks.py   reference doc + scope descriptor + block assembly
- chat/model_selection.py model pick + turn-classification helpers
- chat/schemas.py         ChatMessage / ChatRequest / TitleRequest
- chat/suggestions.py     follow-up suggestion chips
- chat/titles.py          title generation
- chat/routes.py          the /chat router + thin decorated endpoints
- chat/__init__.py re-exports router

The async Anthropic client factory moves to config/clients.get_async_client
(get_sync_client's sibling). routes/ is now empty and removed.

Importers updated: main.py (import chat), test_prompts.py
(chat.system_blocks), test_api.py (patches chat.orchestrator: get_async_client
/ _generate_followup_suggestions / execute_tool; imports chat.routes.chat_message).
Route paths /chat/message and /chat/title are unchanged.

No behaviour change. Full suite: 155 passed, 10 skipped (incl. the mocked
end-to-end stream + tool-loop test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the eval harness package backend/evaluation/ -> backend/eval/ (#114
slice 9). Updated only the module-path references (from eval.* / import eval.* /
python -m eval.*) — the word "evaluation" in docstrings is untouched.

- internal eval imports, run.py, __init__.py
- tests/test_evaluation.py
- Makefile (python -m eval.sync_policyengine_uk, python -m eval.run)

SUITE_DIRS still resolves to evals/cases/* (parents[2] depth unchanged).

No behaviour change. Full suite: 155 passed, 10 skipped.
make eval-ai-offline: 95 passed, 0 failed, 41 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the HTTP surface into an api/ topic package (#114 slice 10, final):
- api/main.py      the FastAPI app, CORS, router mounting, /health, /version
- api/errors.py    NaNSafeJSONResponse + the rate-limit exception handler
                   (wired via app.add_exception_handler instead of a decorator)
- api/rate_limit.py  the slowapi limiter + key funcs (verbatim)
- api/__init__.py

Importers updated: chat/routes.py + tests/test_api.py (from api.rate_limit
import ...); test_api.py + modal_app.py (from api.main import app); Dockerfile
CMD (uvicorn api.main:app). `uvicorn api.main:app` resolves; all routers mount.

No behaviour change. Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the #114 reorg in the AI-facing skill docs:
- uk-chat-runtime.md §Source Boundaries rewritten to the new packages
  (chat/, gateway/, prompts/, tools/, engine/, config/, api/); also fixes the
  prompts/, engine/, and config/ references in the Tool Boundary and
  Deterministic sections.
- ai-evals.md: backend/evaluation/ -> backend/eval/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment Jun 18, 2026 7:41pm

Request Review

@github-actions

Copy link
Copy Markdown

Beta preview is ready.

Small fixes from the #116 review (no behaviour change):
- config/clients.py: extract _api_key() so get_sync_client / get_async_client
  stop duplicating the env-var fetch + RuntimeError guard (DRY; the async
  factory was newly added in the chat slice).
- gateway/policy.py: docstring referenced the old prompts.py/chatbot.py paths
  -> prompts/ / chat/.
- .gitignore: comment referenced scripts/build_reference.py -> engine/reference.py.
- gateway/runtime.py: import order (config before gateway.policy).

Full suite: 155 passed, 10 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@vahid-ahmadi vahid-ahmadi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — backend reorg to package-by-topic (draft, stacked on #112)

I verified this as what it claims to be: a faithful pure-structural refactor. I checked the low-similarity splits closely (the risky ones), confirmed the re-export shims, and confirmed the deploy wiring. Findings below.

Faithfulness — verified ✅

  • The big split routes/chatbot.py (707 lines) → chat/{orchestrator,routes,schemas,suggestions,system_blocks,titles,model_selection}: every function/route present once with identical bodies; constants unchanged (MAX_ITERATIONS=30, max_tokens 16000/200/32, MAX_RESULT_CHARS, max_retries=2); inline model/sampling constants moved to config/* with identical values. SSE tool loop kept verbatim.
  • prompts.pyprompts/{system,gateway,meta}: every triple-quoted prompt literal is byte-identical (no wording/whitespace drift), same SYSTEM_PROMPT_SECTIONS ordering and join.
  • routes/billing.pybilling/*: all pricing constants, Stripe params, webhook signature handling, and 4 routes preserved.
  • main.pyapi/{main,errors} and routes/conversations.pyconversations/*: app config, CORS, routers, startup hook, the chat_conversations DB model (column-for-column), and ensure_table() all preserved.
  • Re-export __init__ shims (prompts, gateway, config, billing, conversations, chat) re-export the public names external importers actually use, so from prompts import … / from gateway import run_gateway etc. still resolve to the same objects.
  • Entrypoints updated consistently: Dockerfile (uvicorn api.main:app, python engine/reference.py), modal_app.py (from api.main import app, build step), Makefile (python -m eval.run). No stale evaluation./scripts//main:app references left.
  • Test coverage preserved: I counted def test_ across all six touched test files — identical base vs head (no tests dropped or merged). The 162 → 155 passed delta between this and #112's description is measurement context, not lost coverage; worth a one-line "CI green" confirmation when you de-draft.

The one buried behavioural change — fine, but mildly mis-described

The PR calls the parentparent.parent edit in the scope-descriptor path a "fix". It's actually compensation: load_scope_descriptor moved from backend/model_config.py to backend/config/scope.py (one directory deeper), so it needs an extra .parent to resolve to the same backend/scope_descriptor.md. Net runtime target is unchanged — correct, but "fix" overstates it; it'd be clearer to describe it as path-compensation for the move so a future reader doesn't go hunting for the bug it "fixed".

Other small undocumented deltas (benign)

  • api/main.py swaps @app.exception_handler(RateLimitExceeded) for app.add_exception_handler(RateLimitExceeded, rate_limit_handler) — equivalent in Starlette, but it's a behavioural-looking change inside a "pure move" PR; worth a word in the commit body.
  • conversations/models.py drops an unused timezone import. Harmless.
  • New tests/conftest.py (rate-limit env bumps, sqlite isolation, a requires_compiled skip marker) is a good addition and well-commented.

My actual concern: the stacking order

The code is sound; the risk is process. This is a 74-file move stacked on #112, which is uncalibrated and explicitly expects tuning (its own follow-up is "run the live eval and tune the gateway prompt + promotions against misroutes"). Any such tuning — prompt edits, the keyword-promotion fixes I flagged on #112, criticality tweaks — lands in the exact files this PR moves (gateway/*, prompts/*, chat/*), forcing a rebase of all 74 files each time. Refactoring on top of a moving, unmerged target is avoidable churn.

Recommendation, matching the draft status:

  1. Land #112 and let it stabilise through its live calibration.
  2. Then do this reorg against main (retarget as the description says).

Keeping it a draft until #112 merges is exactly right. Once it's rebased onto a settled #112, this is a clean, well-sequenced refactor (one topic package per commit, test-green at each) and I'd be happy to approve it. Nice work on the leaf→root commit discipline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants