Skip to content

morphic-bio/libMACS3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

libMACS3

Standalone C++ library implementing the MACS3 FRAG peak-calling path. The implementation has been extracted from Chromap-suite/src/peak_caller/, where it has been validated end-to-end inside libchromap's concurrent mode and against chromap_callpeaks on the 100K and full PBMC 3k multiome benchmarks.

This repo is private until the preprint is finalized.

Two consumers are intended:

  1. libchromap — links libMACS3 to call peaks in-process during the ATAC mapping pass (concurrent / memory-source mode preserved via a thin FragmentIterator adapter over the in-memory fragment store).
  2. macs3frag standalone CLI — reads a Chromap/ARC 5-col fragments TSV and emits narrowPeak. Replaces / supersedes Chromap-suite's chromap_callpeaks.

See:

Status

Phase A (extraction) and Phase B streaming both landed. The MACS3 FRAG path (--macs3-frag-narrowpeak) now streams via RunMacs3FragPeakPipeline- FromSortedIterator and the sweep-line workspace by default — no fragment list materialization, ~tens of KB peak workspace memory. Falls back to the events-buffer workspace if the input violates chrom-grouped + start-sorted order.

See BENCHMARKS.md for the standalone + integrated PBMC 3k full-set numbers (10.3× over MACS3 standalone at T=4; byte-identical narrowPeak vs MACS3 v3.0.3 binary; full lo-mem stack saves ~56 GB peak RSS at chromap-concurrent scale).

The legacy custom-caller path (--out-prefixBuildBinnedCutSignal + CallPeaksOnBins) also streams (uint16 bins, sliding-window lambda, sparse BH).

Two test lanes guard regressions:

  • make test-legacy-streaming-100k--out-prefix exercises the streaming path (uint16 bins, sliding-window lambda, sparse BH). Currently runs the 100K fixture in ~155 MB RSS / 0.35 s and produces a byte-identical 121,592-peak narrowPeak vs upstream chromap_callpeaks.
  • make test-macs3-frag-100k--macs3-frag-narrowpeak + --macs3-frag-summits exercises the MACS3 FRAG path via the sweep-line workspace. Currently runs the 100K in ~29 MB RSS / 0.6 s and produces byte-identical 2,047-peak narrowPeak + summits.

The bin/macs3frag defaults already match MACS3 callpeak's standard ATAC settings (--extsize 150 --shift -75, -g hs, --nomodel, -q 0.05, llocal 10000) except for bdgpeakcall_cutoff, which defaults to 5 (= MACS3 -p 1e-5) so the 100K parity smoke stays byte-identical against the existing golden. Pass --preset shiftTAG to opt in to MACS3 -p 0.01 (bdgpeakcall_cutoff = 2); on the 100K fixture this produces 75,275 peaks (md5 09c37a96...) instead of the default 2,047 peaks (md5 e5c88351...). Place --preset shiftTAG before any per-flag overrides if you want explicit flags to win.

Full PBMC 3k (54M fragments, single-threaded) measured 2026-04-26:

Path RSS Wall narrowPeak md5
Sweep-line (default for --macs3-frag-narrowpeak) 2.16 GB 86 s ddc1510e...
Events-buffer (forced via --frag-score-fe-bdg) 3.18 GB 211 s ddc1510e...

Both produce byte-identical 50,274-peak output. Sweep-line saves ~1 GB by avoiding the events buffer and runs ~2.5× faster (no events sort+coalesce). Remaining 2.16 GB is mostly the per-chrom BdgTrack data at 1 bp resolution; proportional to fragment count, fundamental at this resolution.

Phase B substrate changes (legacy custom caller only — NOT yet applied to the MACS3 FRAG pipeline):

  • obs[] is uint16_t (saturating at 65535/bin); 4× smaller than the prior int64_t. Saturation is reported in *.summary.tsv (saturated_bins, saturated_bin_fraction).
  • Local lambda is computed as a sliding-window stream rather than materialized into a lambda[n] array.
  • BH FDR runs over the sparse subset of bins with obs >= 1; we never build p_values[n] / q_values[n].
  • FileFragmentIterator is strict: malformed start / end / count produce an error (path:line:reason) rather than silently defaulting.

Open follow-ups (in rough order):

  • Re-link libchromap against libMACS3 — landed in Chromap-suite branch relink-libmacs3 (commits e084822 + db09cff). The in-tree src/peak_caller/ was deleted and Chromap-suite now consumes lib/libmacs3.a via a git submodule. Awaits merge to Chromap-suite master.

  • libchromap concurrent-path adapter — landed in Chromap-suite branch relink-libmacs3 (commit 94cbba6). mapping_writer.cc now buffers fragments as macs3::FragmentRecord (~16 B each) instead of workspace events (~48 B each), and the driver consumes the buffer via macs3::WrapVectorFragmentIteratorRunMacs3FragPeakPipelineFromSortedIterator. Validated byte-identical on the 100K integration harness: chromap --call-macs3-frag-peaks narrowPeak (md5 e5c8835..., 2047 peaks) matches standalone chromap_callpeaks. Awaits merge to Chromap-suite master and full PBMC 3k re-validation.

  • Tolerance-based parity harness vs upstream MACS3 (Python) and MACS2 for noise-floor reference. (Done at the BED3 level: byte-identical on 100K with -f FRAG -p 1e-5. Tolerance harness for narrowPeak score columns is still TBD.)

  • Namespace rename chromap::peaksmacs3 (mechanical).

  • Update bin/macs3frag usage string (still says "chromap_callpeaks").

  • Full PBMC 3k re-validation through the streamed MACS3 FRAG path via libchromap (now that the adapter on relink-libmacs3 has landed).

Known residuals vs MACS3

narrowPeak BED3 (chrom, start, end): byte-identical on 100K and full PBMC 3k — Jaccard 1.0, 50,274 peaks each. bedGraphs (treat / lambda): byte-identical to MACS3's -B output. pValue / qValue / signalValue columns: match within ≤1e-5 floating-point round-off.

Summit position (col 10): byte-identical to MACS3 by default — validated 0 / 50,274 differences vs the MACS3 v3.0.3 binary on the full PBMC 3k benchmark; byte-identical to MACS3 on the 100K fixture. Cost: 3.19 GB peak RSS / 92 s wall single-threaded on PBMC 3k. Pass --no-macs3-frag-summit-parity for a 1.0 GB-smaller shortcut that diverges on ~2% of summits at PBMC scale (max ~222 bp; BED3 / bedGraphs / score columns unaffected).

Diagnosed via an instrumented build of MACS3 v3.0.3: MACS3's internal pos_array (the array iterated by __close_peak_wo_subpeaks to find the median-tied summit) is per-event, not per-distinct-value. When a fragment endpoint lands inside a treat plateau without changing the running pileup (one fragment ends and another starts at the same position with net ΔX=0), MACS3 still emits a new pos_array entry. The bedGraph writer collapses those adjacent same-value runs, so the exported .bdg matches libMACS3's collapsed treat track byte-for-byte, but the median over peak_content is computed against more sub-pieces than the default libMACS3 path sees.

--macs3-frag-summit-parity reproduces MACS3's pos_array directly (see src/macs3_pos_array.cc): per chrom, build treat_pv positions via the quick_pileup start/end parallel- walk (with start==end coincidence skip) on count-duplicated fragment endpoints, build ctrl_pv positions via the se_all_in_one_pileup walk at d=llocal (10000 default for FRAG-without-control), then take the sorted union. The summit argmax then splits each tied plateau at every interior pos_array position before the median tie-pick.

Worked example on the worst-offender (chr4:76382069-76382561, 222 bp Δ under default; 0 bp Δ under --macs3-frag-summit-parity):

  • bedGraph has 3 plateaus of treat=30 → default ties=3, midindex=1 → summit=76382470.
  • MACS3 pos_array has 6 sub-pieces of treat=30 (first plateau split at 76382245, 76382246 from llocal-window endpoints of fragments at 76377245 and 76387246; second plateau split at 76382470) → ties=6, midindex=2 → summit=76382248. --macs3-frag-summit-parity matches this exactly.

Layout

include/libmacs3/   public headers (consumers include "libmacs3/<name>.h")
src/                library implementation (.cc files)
cli/                standalone CLI source (cli/macs3frag.cc)
bin/                build output: macs3frag
lib/                build output: libmacs3.a
tests/
  golden/100k/      committed baseline narrowPeak from chromap_callpeaks
  unit/             small C++ unit tests
  run_100k_smoke.sh parity smoke vs golden baseline
  run_unit_tests.sh compiles + runs all tests/unit/*.cc
third_party/htslib  git submodule pinned at v1.23 (kfunc only used)

Building

git submodule update --init --recursive   # first time only
make                                       # produces lib/libmacs3.a + bin/macs3frag

By default the build links against the system-installed htslib (the same library Chromap-suite/chromap_callpeaks resolves to, currently v1.13 on this workstation). The submodule is pinned at v1.23 (matching Chromap-suite's vendored copy) for self-contained future builds; to build it and link against it instead:

(cd third_party/htslib && autoreconf -i && ./configure && make)
make HTSLIB_DIR=third_party/htslib

Testing

make test           # runs both
make test-unit      # 3 C++ unit tests (~1s)
make test-100k      # smoke: byte-identical narrowPeak vs golden baseline

The 100K smoke baseline is a snapshot of chromap_callpeaks output on pbmc_unsorted_3k_100k/pbmc_unsorted_3k_100k_chromap/fragments.tsv, committed at tests/golden/100k/golden.narrowPeak (md5 87f274f23fe635092bf1635418b98bc9, 121,592 peaks).

Override the fragments path:

FIXTURE_FRAGS=/path/to/fragments.tsv.gz make test-100k

Phase A → Phase B

Once Phase A is fully closed (libchromap relinked, full PBMC 3k re-validated), the next work is the streaming sidecar (StreamingPeakCaller) — see roadmap §2. The FragmentIterator interface in include/libmacs3/fragments.h is the boundary that streaming will plug into; today it is in place but only exercised by unit tests, not by the production pipeline.

About

Standalone C++ MACS3-compatible peak caller. Spun off from Chromap-suite/src/peak_caller; consumed by libchromap for in-process FRAG peak calling and shipped as a standalone tool for preprint.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors