Skip to content

BNGsim backend hook spawns a fresh helper process per atomic job — parameter scans pay N× Python import overhead #101

@wshlavacek

Description

@wshlavacek

Summary

The BNGsim backend-hook route delegates each atomic simulation job to the
JSON helper by having BNG2.pl system()-spawn a fresh Python process
(python -m bionetgen.core.tools.bngsim_backend_helper JOB.json). Each spawn
pays the full cost of starting a Python interpreter and import bngsim
(~0.45 s) plus importing the bionetgen machinery.

For a single large simulation this is amortized and fine. For a
parameter_scan — where BNG2.pl calls simulate (and thus the hook)
once per scan point — the per-job startup is paid N times and dominates the
runtime. The integration ends up slower than the subprocess path it is
meant to accelerate.

Measurements

harmonicOscillator.bngl and
ATG_update_mTORC1_assembly_more_complete_scheme.bngl are both 200-point
parameter_scans (n_scan_pts=>200).

Run Time
import bngsim (warm, already built) 0.45 s
bngsim route, n_scan_pts=2 5.42 s
bngsim route, n_scan_pts=20 14.98 s
bngsim route, n_scan_pts=200 ~138 s
subprocess route, n_scan_pts=200 (full run) ~10 s

Linear fit over the scan-point count: ~0.53 s per scan point, of which
~0.45 s is interpreter + import startup and only ~0.08 s is actual BNGsim
work. So ~85 % of a parameter scan's bngsim-route runtime is process/import
overhead. At 200 points that is ~100 s of pure overhead.

Subprocess is fast here because BNG2.pl calls run_network — a precompiled C
binary — once per point, and binary spawn is ~1 ms.

This surfaced in the parity sweep: both models registered as spurious
ERROR (timeout) at ~185 s against the 180 s budget. (Worked around in the
sweep harness with a per-model TIMEOUT_OVERRIDES entry — that is a harness
band-aid, not a fix.)

Root cause

scripts/apply_bngsim_backend_hook.py — both hook bodies (_NETWORK_BODY,
_NF_BODY) end with:

my $rc = system(@helper_command, $job_file);

i.e. one cold Python process per atomic job. BNGCLI advertises the helper
as BIONETGEN_BNGSIM_BACKEND_HELPER_PYTHON + _MODULE; the helper
(bngsim_backend_helper.py) processes exactly one job per invocation and
exits.

Proposed fix — a persistent helper

Run one long-lived helper process for the whole BNG2.pl run instead of
one per job, so import bngsim is paid once.

  1. bngsim_backend_helper.py: add a serve mode — bind a Unix-domain
    socket, accept one connection per job, read a job-file path, run the
    existing execute_backend_payload, write back the one-line JSON result.
  2. cli.py (BNGCLI): before running BNG2.pl, spawn the helper in serve
    mode, wait for the socket to be ready, advertise its path in a new env
    var (e.g. BIONETGEN_BNGSIM_BACKEND_HELPER_SOCKET); tear it down after.
  3. The two Perl hook bodies: if the socket env var is set, send the job path
    over the socket and read the reply; otherwise fall back to the current
    system() spawn (preserving correctness if the persistent helper is
    unavailable).

Estimated effect for these models: per-point cost ~0.53 s → ~0.08 s, i.e.
~138 s → ~20 s — and it speeds up every scan-heavy model. Scope: ~3 files

  • re-vendoring the Perl hook + tests; the per-job system() fallback keeps
    it safe.

Notes

  • BNGsim's own numerics are not at fault — the per-point ODE integration is
    a few ms. This is purely the backend-hook process boundary.
  • The "8–30× speedup" rationale for the integration holds for single large
    simulations; this issue is specifically the many-small-jobs (scan) regime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions