Skip to content

SQ8-quantized sealed segments bypass HNSW graph navigation (O(N) brute-force per query) #40

@hollanf

Description

@hollanf

Once a vector segment is sealed and SQ8 quantization is built (automatic at ≥ 1000 live vectors), every subsequent search iterates every vector in the segment. HNSW's logarithmic advantage is discarded. Per-query latency scales linearly with segment size.

Summary

VectorCollection::search has two branches per sealed segment. When seg.sq8 is None, it calls seg.index.search(query, top_k, ef) which uses the HNSW graph correctly. When seg.sq8 is Some(_) — which is the common case after a segment crosses DEFAULT_SEAL_THRESHOLD and live_count() >= 1000 — it stops using the graph entirely and performs a full linear scan of the SQ8 bytes, then reranks.

Current code

nodedb-vector/src/collection/search.rs:22-48

let results = if let Some((codec, sq8_data)) = &seg.sq8 {
    // Quantized two-phase search.
    let rerank_k = top_k.saturating_mul(3).max(20);
    let mut candidates: Vec<(u32, f32)> = Vec::with_capacity(seg.index.len());
    let dim = seg.index.dim();
    for i in 0..seg.index.len() {                                    // ← iterates EVERY vector
        if seg.index.is_deleted(i as u32) {
            continue;
        }
        let sq8_vec = &sq8_data[i * dim..(i + 1) * dim];
        let d = match self.params.metric {
            DistanceMetric::L2 => codec.asymmetric_l2(query, sq8_vec),
            DistanceMetric::Cosine => codec.asymmetric_cosine(query, sq8_vec),
            DistanceMetric::InnerProduct => codec.asymmetric_ip(query, sq8_vec),
            _ => {
                let dequant = codec.dequantize(sq8_vec);
                distance(query, &dequant, self.params.metric)
            }
        };
        candidates.push((i as u32, d));
    }
    if candidates.len() > rerank_k {
        candidates.select_nth_unstable_by(rerank_k, |a, b| { ... });
        candidates.truncate(rerank_k);
    }
    // … phase-2 FP32 rerank on candidates …
} else {
    seg.index.search(query, top_k, ef)   // ← only HNSW path
};

Vec::with_capacity(seg.index.len()) and for i in 0..seg.index.len() confirm the intent: full linear scan. There is no graph traversal in this branch.

SQ8 auto-build trigger: nodedb-vector/src/collection/lifecycle.rs:231 — any sealed segment with live_count() >= 1000 gets SQ8 built automatically.

Why it's broken

  • HNSW's purpose is O(log N) ANN search. This branch is O(N) per sealed segment per query.
  • At 1 M vectors × 768 dim, each query reads ~768 MB of SQ8 bytes sequentially and computes 1 M distance calls before reranking. Per-query latency goes from millisecond to seconds.
  • With multiple sealed segments, the cost multiplies.
  • The two-phase design is correct in principle (approximate → rerank), but the candidate-generation step must use the HNSW graph (with SQ8 as the in-graph distance function), not a linear scan.
  • This also wastes all the memory spent maintaining the HNSW graph for sealed segments — the graph is built, stored, and then never consulted once SQ8 is on.

Reproduction

  1. Build a collection; insert ≥ ~100 k vectors so a segment seals and SQ8 is constructed (see lifecycle.rs:231).
  2. Run a single nearest-neighbour query.
  3. Profile: a flamegraph shows ~100 % of time in the for i in 0..seg.index.len() loop at collection/search.rs:27.
  4. Compare against the same collection with SQ8 disabled (or small enough not to trigger seal): HNSW path runs in milliseconds.

Notes

  • Found during a CPU/memory audit sweep of nodedb-vector/src/*.
  • Easy to verify with a criterion benchmark comparing sq8=None vs sq8=Some paths on the same dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions