Skip to content

perf: optimize vector search SQL to leverage HNSW/GIN index for 40-500x speedup#5285

Open
zhouliang5266 wants to merge 1 commit into
1Panel-dev:v2from
zhouliang5266:perf/hnsw-vector-search
Open

perf: optimize vector search SQL to leverage HNSW/GIN index for 40-500x speedup#5285
zhouliang5266 wants to merge 1 commit into
1Panel-dev:v2from
zhouliang5266:perf/hnsw-vector-search

Conversation

@zhouliang5266
Copy link
Copy Markdown

@zhouliang5266 zhouliang5266 commented May 24, 2026

Summary

The current vector search SQL queries perform full table scans even when HNSW/GIN indexes exist, causing severe performance degradation on large datasets.

This PR rewrites the 3 search SQL files to use index-friendly query patterns and adds per-knowledge-base query routing to leverage partial HNSW indexes, achieving 40-500x speedup on large-scale deployments.

Problem

Two issues prevent HNSW/GIN indexes from being used:

1. SQL queries don't utilize indexes

Although common.py already creates per-knowledge-base HNSW indexes (embedding_hnsw_idx_{k_id}), the SQL queries fail to utilize these indexes:

  • embedding_search.sql: ORDER BY distance scans the entire embedding table (no LIMIT in inner query), pgvector falls back to exact search
  • blend_search.sql: Computes both vector distance and ts_rank_cd for every row in a single pass, full table scan with no early termination
  • keywords_search.sql: ts_rank_cd() computed on every row without @@ pre-filter, no GIN index utilization

2. Per-KB partial indexes bypassed by knowledge_id__in

hit_test() and query() use knowledge_id__in=knowledge_id_list which produces WHERE knowledge_id IN (...). PostgreSQL cannot use partial indexes with WHERE knowledge_id = '{k_id}' in this case, falling back to full table scan across all knowledge bases.

Solution

SQL Optimization (3 files)

embedding_search.sql — Use CTE (WITH vector_top AS) to first fetch top-K candidates via HNSW index with LIMIT LEAST(top_number * 10, 500), then apply DISTINCT ON and threshold filtering on the small candidate set.

blend_search.sql — Two-phase approach: CTE first gets vector candidates via HNSW (with LIMIT), then JOIN embedding to compute ts_rank_cd text scores only on the candidate set. Uses COALESCE(..., 0) for rows without text scores.

keywords_search.sql — Add AND search_vector @@ websearch_to_tsquery('simple', %s) pre-filter to leverage GIN index, avoiding full-table ts_rank_cd computation.

Per-KB Query Routing (pg_vector.py)

Split hit_test() and query() to iterate per knowledge base when multiple KBs are queried. Each per-KB query uses knowledge_id=kid (exact match) which enables PostgreSQL to use the corresponding partial HNSW index. Results from all KBs are merged and sorted by similarity. Single KB case is optimized to avoid overhead.

Parameter Update (pg_vector.py)

Update parameter arrays in all 3 search classes (EmbeddingSearch, KeywordsSearch, BlendSearch) to match the new SQL placeholder order.

Performance Results

Tested on production data: 770K vectors, 22GB embedding table, 5 knowledge bases, PostgreSQL 17 + pgvector

The test environment uses a 3840-dimension embedding model (beyond pgvector's default 2000-dim HNSW limit), with additional halfvec configuration applied separately. The optimizations in this PR are dimension-independent and benefit all deployments.

Search Mode Before After Speedup
blend_search 16,358ms ~220ms 74x
embedding_search 6,551ms ~160ms 41x
keywords_search 10,662ms ~20ms 533x

Additional Recommendation: Disable PostgreSQL JIT

PostgreSQL JIT compilation was designed for long-running analytical queries. For vector search queries that complete in <10ms with HNSW indexes, JIT compilation overhead (50-200ms) far exceeds the actual query execution time.

Recommended: Add jit = off to postgresql.conf. PostgreSQL 19 will disable JIT by default, aligning with this recommendation.

Before disabling JIT (single embedding query):

JIT compilation: 159ms
Query execution: 3ms
Total: ~162ms

After disabling JIT:

Query execution: 3.4ms
Total: ~3.4ms (47x faster)

Prerequisites

  • PostgreSQL with pgvector extension (HNSW indexes are already created by common.py per knowledge base)
  • GIN index on search_vector column (recommended for keywords_search optimization):
    CREATE INDEX embedding_search_vector_gin_idx ON embedding USING GIN (search_vector);

Testing

  • Tested with 770K vectors (22GB) on PostgreSQL 17 + pgvector
  • All 3 search modes (embedding/keywords/blend) return correct results
  • Multi-KB query tested (5 knowledge bases)
  • Backward compatible - no schema changes required, only SQL query + routing optimization

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot Bot commented May 24, 2026

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot Bot commented May 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhouliang5266 zhouliang5266 force-pushed the perf/hnsw-vector-search branch 3 times, most recently from 7c2110e to 68de0ef Compare May 24, 2026 10:26
…eedup)

Problem: SQL queries perform full table scans despite per-KB HNSW indexes
existing in common.py. Two root causes:
1. SQL patterns (ORDER BY distance without LIMIT, ts_rank_cd without @@)
   don't trigger index usage
2. knowledge_id__in bypasses partial indexes (WHERE knowledge_id = '{k_id}')

Changes (4 files):

SQL optimization:
- embedding_search.sql: CTE + LIMIT to fetch top-K candidates via HNSW
- blend_search.sql: Two-phase - HNSW candidates first, then JOIN for text scores
- keywords_search.sql: Add @@ GIN pre-filter before ts_rank_cd

Query routing (pg_vector.py):
- Split hit_test()/query() to iterate per-KB when multiple knowledge bases
  are queried, ensuring each query hits its partial HNSW index
- Update parameter arrays in 3 search classes for new SQL placeholder order

Benchmark (770K vectors, 3840 dims, 22GB, 5 KBs):
- blend: 16,358ms -> ~220ms (74x)
- embedding: 6,551ms -> ~160ms (41x)
- keywords: 10,662ms -> ~20ms (533x)
@zhouliang5266 zhouliang5266 force-pushed the perf/hnsw-vector-search branch from 68de0ef to c859456 Compare May 24, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant