Summary
The BNGsim backend-hook route delegates each atomic simulation job to the
JSON helper by having BNG2.pl system()-spawn a fresh Python process
(python -m bionetgen.core.tools.bngsim_backend_helper JOB.json). Each spawn
pays the full cost of starting a Python interpreter and import bngsim
(~0.45 s) plus importing the bionetgen machinery.
For a single large simulation this is amortized and fine. For a
parameter_scan — where BNG2.pl calls simulate (and thus the hook)
once per scan point — the per-job startup is paid N times and dominates the
runtime. The integration ends up slower than the subprocess path it is
meant to accelerate.
Measurements
harmonicOscillator.bngl and
ATG_update_mTORC1_assembly_more_complete_scheme.bngl are both 200-point
parameter_scans (n_scan_pts=>200).
| Run |
Time |
import bngsim (warm, already built) |
0.45 s |
bngsim route, n_scan_pts=2 |
5.42 s |
bngsim route, n_scan_pts=20 |
14.98 s |
bngsim route, n_scan_pts=200 |
~138 s |
subprocess route, n_scan_pts=200 (full run) |
~10 s |
Linear fit over the scan-point count: ~0.53 s per scan point, of which
~0.45 s is interpreter + import startup and only ~0.08 s is actual BNGsim
work. So ~85 % of a parameter scan's bngsim-route runtime is process/import
overhead. At 200 points that is ~100 s of pure overhead.
Subprocess is fast here because BNG2.pl calls run_network — a precompiled C
binary — once per point, and binary spawn is ~1 ms.
This surfaced in the parity sweep: both models registered as spurious
ERROR (timeout) at ~185 s against the 180 s budget. (Worked around in the
sweep harness with a per-model TIMEOUT_OVERRIDES entry — that is a harness
band-aid, not a fix.)
Root cause
scripts/apply_bngsim_backend_hook.py — both hook bodies (_NETWORK_BODY,
_NF_BODY) end with:
my $rc = system(@helper_command, $job_file);
i.e. one cold Python process per atomic job. BNGCLI advertises the helper
as BIONETGEN_BNGSIM_BACKEND_HELPER_PYTHON + _MODULE; the helper
(bngsim_backend_helper.py) processes exactly one job per invocation and
exits.
Proposed fix — a persistent helper
Run one long-lived helper process for the whole BNG2.pl run instead of
one per job, so import bngsim is paid once.
bngsim_backend_helper.py: add a serve mode — bind a Unix-domain
socket, accept one connection per job, read a job-file path, run the
existing execute_backend_payload, write back the one-line JSON result.
cli.py (BNGCLI): before running BNG2.pl, spawn the helper in serve
mode, wait for the socket to be ready, advertise its path in a new env
var (e.g. BIONETGEN_BNGSIM_BACKEND_HELPER_SOCKET); tear it down after.
- The two Perl hook bodies: if the socket env var is set, send the job path
over the socket and read the reply; otherwise fall back to the current
system() spawn (preserving correctness if the persistent helper is
unavailable).
Estimated effect for these models: per-point cost ~0.53 s → ~0.08 s, i.e.
~138 s → ~20 s — and it speeds up every scan-heavy model. Scope: ~3 files
- re-vendoring the Perl hook + tests; the per-job
system() fallback keeps
it safe.
Notes
- BNGsim's own numerics are not at fault — the per-point ODE integration is
a few ms. This is purely the backend-hook process boundary.
- The "8–30× speedup" rationale for the integration holds for single large
simulations; this issue is specifically the many-small-jobs (scan) regime.
Summary
The BNGsim backend-hook route delegates each atomic simulation job to the
JSON helper by having BNG2.pl
system()-spawn a fresh Python process(
python -m bionetgen.core.tools.bngsim_backend_helper JOB.json). Each spawnpays the full cost of starting a Python interpreter and
import bngsim(~0.45 s) plus importing the
bionetgenmachinery.For a single large simulation this is amortized and fine. For a
parameter_scan— where BNG2.pl callssimulate(and thus the hook)once per scan point — the per-job startup is paid N times and dominates the
runtime. The integration ends up slower than the subprocess path it is
meant to accelerate.
Measurements
harmonicOscillator.bnglandATG_update_mTORC1_assembly_more_complete_scheme.bnglare both 200-pointparameter_scans (n_scan_pts=>200).import bngsim(warm, already built)n_scan_pts=2n_scan_pts=20n_scan_pts=200n_scan_pts=200(full run)Linear fit over the scan-point count: ~0.53 s per scan point, of which
~0.45 s is interpreter +
importstartup and only ~0.08 s is actual BNGsimwork. So ~85 % of a parameter scan's bngsim-route runtime is process/import
overhead. At 200 points that is ~100 s of pure overhead.
Subprocess is fast here because BNG2.pl calls
run_network— a precompiled Cbinary — once per point, and binary spawn is ~1 ms.
This surfaced in the parity sweep: both models registered as spurious
ERROR(timeout) at ~185 s against the 180 s budget. (Worked around in thesweep harness with a per-model
TIMEOUT_OVERRIDESentry — that is a harnessband-aid, not a fix.)
Root cause
scripts/apply_bngsim_backend_hook.py— both hook bodies (_NETWORK_BODY,_NF_BODY) end with:i.e. one cold Python process per atomic job.
BNGCLIadvertises the helperas
BIONETGEN_BNGSIM_BACKEND_HELPER_PYTHON+_MODULE; the helper(
bngsim_backend_helper.py) processes exactly one job per invocation andexits.
Proposed fix — a persistent helper
Run one long-lived helper process for the whole BNG2.pl run instead of
one per job, so
import bngsimis paid once.bngsim_backend_helper.py: add aservemode — bind a Unix-domainsocket, accept one connection per job, read a job-file path, run the
existing
execute_backend_payload, write back the one-line JSON result.cli.py(BNGCLI): before running BNG2.pl, spawn the helper inservemode, wait for the socket to be ready, advertise its path in a new env
var (e.g.
BIONETGEN_BNGSIM_BACKEND_HELPER_SOCKET); tear it down after.over the socket and read the reply; otherwise fall back to the current
system()spawn (preserving correctness if the persistent helper isunavailable).
Estimated effect for these models: per-point cost ~0.53 s → ~0.08 s, i.e.
~138 s → ~20 s — and it speeds up every scan-heavy model. Scope: ~3 files
system()fallback keepsit safe.
Notes
a few ms. This is purely the backend-hook process boundary.
simulations; this issue is specifically the many-small-jobs (scan) regime.