Standalone C++ library implementing the MACS3 FRAG peak-calling path. The
implementation has been extracted from Chromap-suite/src/peak_caller/,
where it has been validated end-to-end inside libchromap's concurrent
mode and against chromap_callpeaks on the 100K and full PBMC 3k
multiome benchmarks.
This repo is private until the preprint is finalized.
Two consumers are intended:
- libchromap — links libMACS3 to call peaks in-process during the
ATAC mapping pass (concurrent / memory-source mode preserved via a
thin
FragmentIteratoradapter over the in-memory fragment store). macs3fragstandalone CLI — reads a Chromap/ARC 5-col fragments TSV and emits narrowPeak. Replaces / supersedes Chromap-suite'schromap_callpeaks.
See:
- HANDOFF.md — current state, source-of-truth files in Chromap-suite, validated baseline numbers.
- plans/2026-04-26-libmacs3-roadmap.md — multi-phase roadmap (extraction → streaming sidecar → preprint).
Phase A (extraction) and Phase B streaming both landed. The MACS3 FRAG
path (--macs3-frag-narrowpeak) now streams via RunMacs3FragPeakPipeline- FromSortedIterator and the sweep-line workspace by default — no
fragment list materialization, ~tens of KB peak workspace memory.
Falls back to the events-buffer workspace if the input violates
chrom-grouped + start-sorted order.
See BENCHMARKS.md for the standalone + integrated PBMC 3k full-set numbers (10.3× over MACS3 standalone at T=4; byte-identical narrowPeak vs MACS3 v3.0.3 binary; full lo-mem stack saves ~56 GB peak RSS at chromap-concurrent scale).
The legacy custom-caller path (--out-prefix → BuildBinnedCutSignal + CallPeaksOnBins) also streams (uint16 bins, sliding-window lambda,
sparse BH).
Two test lanes guard regressions:
make test-legacy-streaming-100k—--out-prefixexercises the streaming path (uint16 bins, sliding-window lambda, sparse BH). Currently runs the 100K fixture in ~155 MB RSS / 0.35 s and produces a byte-identical 121,592-peak narrowPeak vs upstreamchromap_callpeaks.make test-macs3-frag-100k—--macs3-frag-narrowpeak+--macs3-frag-summitsexercises the MACS3 FRAG path via the sweep-line workspace. Currently runs the 100K in ~29 MB RSS / 0.6 s and produces byte-identical 2,047-peak narrowPeak + summits.
The bin/macs3frag defaults already match MACS3 callpeak's standard
ATAC settings (--extsize 150 --shift -75, -g hs, --nomodel,
-q 0.05, llocal 10000) except for bdgpeakcall_cutoff, which defaults
to 5 (= MACS3 -p 1e-5) so the 100K parity smoke stays byte-identical
against the existing golden. Pass --preset shiftTAG to opt in to MACS3
-p 0.01 (bdgpeakcall_cutoff = 2); on the 100K fixture this produces
75,275 peaks (md5 09c37a96...) instead of the default 2,047 peaks
(md5 e5c88351...). Place --preset shiftTAG before any per-flag
overrides if you want explicit flags to win.
Full PBMC 3k (54M fragments, single-threaded) measured 2026-04-26:
| Path | RSS | Wall | narrowPeak md5 |
|---|---|---|---|
Sweep-line (default for --macs3-frag-narrowpeak) |
2.16 GB | 86 s | ddc1510e... |
Events-buffer (forced via --frag-score-fe-bdg) |
3.18 GB | 211 s | ddc1510e... |
Both produce byte-identical 50,274-peak output. Sweep-line saves ~1 GB by avoiding the events buffer and runs ~2.5× faster (no events sort+coalesce). Remaining 2.16 GB is mostly the per-chrom BdgTrack data at 1 bp resolution; proportional to fragment count, fundamental at this resolution.
Phase B substrate changes (legacy custom caller only — NOT yet applied to the MACS3 FRAG pipeline):
obs[]isuint16_t(saturating at 65535/bin); 4× smaller than the priorint64_t. Saturation is reported in*.summary.tsv(saturated_bins,saturated_bin_fraction).- Local lambda is computed as a sliding-window stream rather than
materialized into a
lambda[n]array. - BH FDR runs over the sparse subset of bins with
obs >= 1; we never buildp_values[n]/q_values[n]. FileFragmentIteratoris strict: malformedstart/end/countproduce an error (path:line:reason) rather than silently defaulting.
Open follow-ups (in rough order):
-
Re-link libchromap against libMACS3 — landed in
Chromap-suitebranchrelink-libmacs3(commitse084822+db09cff). The in-treesrc/peak_caller/was deleted and Chromap-suite now consumeslib/libmacs3.avia a git submodule. Awaits merge to Chromap-suite master. -
libchromap concurrent-path adapter — landed in
Chromap-suitebranchrelink-libmacs3(commit94cbba6).mapping_writer.ccnow buffers fragments asmacs3::FragmentRecord(~16 B each) instead of workspace events (~48 B each), and the driver consumes the buffer viamacs3::WrapVectorFragmentIterator→RunMacs3FragPeakPipelineFromSortedIterator. Validated byte-identical on the 100K integration harness:chromap --call-macs3-frag-peaksnarrowPeak (md5e5c8835..., 2047 peaks) matches standalonechromap_callpeaks. Awaits merge to Chromap-suite master and full PBMC 3k re-validation. -
Tolerance-based parity harness vs upstream MACS3 (Python) and MACS2 for noise-floor reference. (Done at the BED3 level: byte-identical on 100K with
-f FRAG -p 1e-5. Tolerance harness for narrowPeak score columns is still TBD.) -
Namespace rename
chromap::peaks→macs3(mechanical). -
Update
bin/macs3fragusage string (still says "chromap_callpeaks"). -
Full PBMC 3k re-validation through the streamed MACS3 FRAG path via libchromap (now that the adapter on
relink-libmacs3has landed).
narrowPeak BED3 (chrom, start, end): byte-identical on 100K and full
PBMC 3k — Jaccard 1.0, 50,274 peaks each. bedGraphs (treat / lambda):
byte-identical to MACS3's -B output. pValue / qValue / signalValue
columns: match within ≤1e-5 floating-point round-off.
Summit position (col 10): byte-identical to MACS3 by default —
validated 0 / 50,274 differences vs the MACS3 v3.0.3 binary on
the full PBMC 3k benchmark; byte-identical to MACS3 on the 100K
fixture. Cost: 3.19 GB peak RSS / 92 s wall single-threaded on PBMC
3k. Pass --no-macs3-frag-summit-parity for a 1.0 GB-smaller shortcut
that diverges on ~2% of summits at PBMC scale (max ~222 bp; BED3 /
bedGraphs / score columns unaffected).
Diagnosed via an instrumented build of MACS3 v3.0.3: MACS3's internal
pos_array (the array iterated by __close_peak_wo_subpeaks to find
the median-tied summit) is per-event, not per-distinct-value. When
a fragment endpoint lands inside a treat plateau without changing the
running pileup (one fragment ends and another starts at the same
position with net ΔX=0), MACS3 still emits a new pos_array entry. The
bedGraph writer collapses those adjacent same-value runs, so the
exported .bdg matches libMACS3's collapsed treat track byte-for-byte,
but the median over peak_content is computed against more sub-pieces
than the default libMACS3 path sees.
--macs3-frag-summit-parity reproduces MACS3's pos_array directly
(see src/macs3_pos_array.cc): per chrom,
build treat_pv positions via the quick_pileup start/end parallel-
walk (with start==end coincidence skip) on count-duplicated fragment
endpoints, build ctrl_pv positions via the se_all_in_one_pileup
walk at d=llocal (10000 default for FRAG-without-control), then take
the sorted union. The summit argmax then splits each tied plateau at
every interior pos_array position before the median tie-pick.
Worked example on the worst-offender (chr4:76382069-76382561, 222 bp Δ
under default; 0 bp Δ under --macs3-frag-summit-parity):
- bedGraph has 3 plateaus of treat=30 → default ties=3, midindex=1 → summit=76382470.
- MACS3
pos_arrayhas 6 sub-pieces of treat=30 (first plateau split at 76382245, 76382246 from llocal-window endpoints of fragments at 76377245 and 76387246; second plateau split at 76382470) → ties=6, midindex=2 → summit=76382248.--macs3-frag-summit-paritymatches this exactly.
include/libmacs3/ public headers (consumers include "libmacs3/<name>.h")
src/ library implementation (.cc files)
cli/ standalone CLI source (cli/macs3frag.cc)
bin/ build output: macs3frag
lib/ build output: libmacs3.a
tests/
golden/100k/ committed baseline narrowPeak from chromap_callpeaks
unit/ small C++ unit tests
run_100k_smoke.sh parity smoke vs golden baseline
run_unit_tests.sh compiles + runs all tests/unit/*.cc
third_party/htslib git submodule pinned at v1.23 (kfunc only used)
git submodule update --init --recursive # first time only
make # produces lib/libmacs3.a + bin/macs3fragBy default the build links against the system-installed htslib (the
same library Chromap-suite/chromap_callpeaks resolves to, currently
v1.13 on this workstation). The submodule is pinned at v1.23 (matching
Chromap-suite's vendored copy) for self-contained future builds; to
build it and link against it instead:
(cd third_party/htslib && autoreconf -i && ./configure && make)
make HTSLIB_DIR=third_party/htslibmake test # runs both
make test-unit # 3 C++ unit tests (~1s)
make test-100k # smoke: byte-identical narrowPeak vs golden baselineThe 100K smoke baseline is a snapshot of chromap_callpeaks output on
pbmc_unsorted_3k_100k/pbmc_unsorted_3k_100k_chromap/fragments.tsv,
committed at tests/golden/100k/golden.narrowPeak
(md5 87f274f23fe635092bf1635418b98bc9, 121,592 peaks).
Override the fragments path:
FIXTURE_FRAGS=/path/to/fragments.tsv.gz make test-100kOnce Phase A is fully closed (libchromap relinked, full PBMC 3k
re-validated), the next work is the streaming sidecar
(StreamingPeakCaller) — see roadmap §2. The FragmentIterator
interface in include/libmacs3/fragments.h
is the boundary that streaming will plug into; today it is in place
but only exercised by unit tests, not by the production pipeline.