Various performance improvements inspired by asyncpg#172
Open
Dev-iL wants to merge 1 commit into
Open
Conversation
D1 — Delete global STMTS_CACHE; add per-connection caches:
- Delete src/statement/cache.rs (process-global RwLock<HashMap> was
incorrect for cross-connection Statement reuse and a serialization point)
- PoolConnection: unchanged — deadpool's prepare_cached already correct
- SingleConnection: gains DashMap<String, Statement> per-connection cache;
prepare() consults/inserts it on every prepared query
- dashmap = "6" added to Cargo.toml
D2 — execute_many: build statement once, single GIL pass:
- StatementBuilder::build() called once per execute_many call, not per row
- Remaining rows reuse extracted Vec<Type> in a single GIL pass
- run_pipelined_batch: accepts pre-built Statement, no redundant prepare
- TODO(bind-execute-many) marker left citing asyncpg coreproto.pyx:1022-1092
D3 — COPY records path: 512 KiB BytesMut streaming encoder:
- Replace BinaryCopyInWriter per-row flush (4 KiB) with hand-rolled encoder
flushing at 512 KiB (COPY_BUFFER_SIZE = 524288, matches asyncpg's value)
- Single streaming pass: open copy_in before GIL, encode+flush per row
- Eliminates intermediate Vec<Vec<Py<PyAny>>> materialization
D4 — Cache COPY column-type introspection per (schema, table, columns):
- Both PoolConnection and SingleConnection gain CopyTypeCache (DashMap)
- copy_records_to_table checks cache before issuing PREPARE+DEALLOCATE
D5 — Record pyclass + additive records() method:
- New #[pyclass] Record: Vec<Py<PyAny>> + Arc<RecordDesc> (shared col map)
- Implements __getitem__ (int/str/slice), __len__, __iter__, __repr__,
get(), keys(), values(), items() — matches asyncpg Record surface
- QueryResult::records() returns Vec<Record>; result() unchanged (additive)
- Type stubs in python/psqlpy/_internal/__init__.pyi updated
D6 — Micro-wins:
- T3#7: is_exact_instance dispatch in from_python.rs (GILOnceCell-cached
PyTypeObject pointers for UUID + Decimal replace string name comparison)
- T3#8: ParametersBuilder::prepare early-returns before Python::with_gil
when params are None or empty
- T3#10: per-row scratch Vec cleared between rows in COPY encoder
Tests: 17 new pytest tests (test_record.py + test_copy_records.py extensions)
Lint: ruff D205/PLR2004 suppressed for test files in pyproject.toml
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
f91d130 to
814e4da
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes most of psqlpy 0.12.0's performance gap vs asyncpg by fixing four root causes:
STMTS_CACHEwas incorrect and a serialization point — cross-connection reuse oftokio_postgres::Statement(which carries aWeakto the originating connection) plus a process-globalRwLockwrite-lock on every preparedexecute.execute_manydidStatementBuilder::build()per row — 999 prepare lookups + 999 GIL re-entries for a 1 000-row call.BinaryCopyInWriterflushes every 4 KiB; asyncpg flushes at 512 KiB with a single-buffer encoder. Same algorithm, 128× smaller chunk size.result()returnedPyDictper row — asyncpg'sRecordis one allocation per row with a shared column-name map;PyDictis ~3× the memory with redundant key storage.D1 — Delete global
STMTS_CACHE; add per-connection cachessrc/statement/cache.rs— the process-globalRwLock<HashMap<u64, StatementCacheInfo>>.PoolConnection: unchanged — deadpool'sprepare_cachedalready provides correct per-connection caching.SingleConnection: gainsDashMap<String, Statement>field;prepare()reads/inserts it.StatementBuilder::buildno longer takes any global lock.Breaking change note: Per-
SingleConnectioncache is unbounded (mirrors deadpool's current behavior). LRU + DEALLOCATE deferred per trade-off T-1.D2 —
execute_many: build statement once, single GIL passStatementBuilder::build()called once perexecute_manyinvocation, not once per row.Vec<Type>in a single GIL pass (from_python_typedonly).run_pipelined_batchaccepts the pre-builtStatement— no redundant secondprepare.drain_ordered()free function to deduplicate the FuturesOrdered drain pattern.// TODO(bind-execute-many):marker at the dispatch site citing asyncpgcoreproto.pyx:1022-1092(_EXECUTE_MANY_BUF_NUM=4,_EXECUTE_MANY_BUF_SIZE=32768).Breaking change (0.12.0):
Transaction::execute_manywraps the batch inSAVEPOINT psqlpy_execute_many. On batch failure the savepoint is rolled back and the outer transaction remains live. Callers that previously calledtransaction.rollback()after a batch error should remove that call.D3 — COPY records path: 512 KiB
BytesMutstreaming encoderBinaryCopyInWriter::write()per-row loop with a hand-rolledBytesMutencoder.COPY_BUFFER_SIZE = 524288bytes (matches asyncpg's_COPY_BUFFER_SIZE).copy_infirst, then encode one row at a time → write toBytesMut→ flush when ≥ 512 KiB. No intermediateVec<Vec<Py<PyAny>>>materialization.sink.close().awaitis called before returning to put the connection back inReadyForQuery.Algorithm reference: asyncpg
asyncpg/protocol/coreproto.pyxCOPY binary protocol implementation.D4 — Cache COPY column-type introspection per
(schema, table, columns)Each
copy_records_to_tablecall previously ranPREPARE+DEALLOCATEfor the column-type introspection query (2 extra round-trips). BothPoolConnectionandSingleConnectionnow carry aCopyTypeCache:Column order is part of the key —
["a","b"]and["b","a"]are different COPY targets. Cache is per-connection-checkout.D5 — New
Recordpyclass + additiverecords()methodNew
#[pyclass] Recordinsrc/query_result.rs:Vec<Py<PyAny>>(eagerly decoded columns) +Arc<RecordDesc>(sharedHashMap<String, usize>+ name list, one allocation per result set).Recordsurface:__getitem__(int / str / slice),__len__,__iter__,__repr__,.get(),.keys(),.values(),.items().__getitem__raisesIndexErrorfor out-of-range int,KeyErrorfor missing str,TypeErrorfor wrong key type.records()raiseConnectionExecuteErrorinstead of silently overwriting the index.QueryResult.records()returnsVec<Record>.result()is unchanged — this is additive, non-breaking.Type stubs in
python/psqlpy/_internal/__init__.pyiupdated withclass Recordandrecords()signatures.D6 — Micro-wins
from_python.rs): Replaceget_type().name()string comparisons withis_exact_instanceagainstGILOnceCell-cachedPyTypeObjectpointers foruuid.UUIDanddecimal.Decimal. Usesget_or_try_initso import failures surface asPSQLPyResulterrors instead of panics.parameters.rs):ParametersBuilder::preparereturnsPreparedParameters::default()beforePython::with_gilwhenparamsisNone. Empty sequences return early insidewith_gilbefore any conversion work.common.rs): Per-row scratch allocation in the COPY encoder reuses oneVec(.clear()between rows instead of re-allocating).Tests added
python/tests/test_record.py— 10 integration tests covering positional/named/slice access, iteration, dict-like methods, shared descriptor, error paths, and coexistence withresult().python/tests/test_copy_records.py— 2 new tests: heterogeneous column types +pg_stat_statements-based introspection-cache verification.src/driver/common.rs— 3 Rust unit tests forencode_copy_field(int, null, text).Algorithm references
asyncpg/protocol/coreproto.pyx(binary COPY encoding,_COPY_BUFFER_SIZE = 524_288)asyncpg/protocol/coreproto.pyx:1022-1092(_EXECUTE_MANY_BUF_NUM=4,_EXECUTE_MANY_BUF_SIZE=32768)asyncpg/protocol/record.pyx(PyVarObject + inline column pointer array + shared desc dict)Breaking changes in 0.12.0
Transaction::execute_manySAVEPOINT; outer transaction survives a batch failuretransaction.rollback()call that immediately follows a caughtexecute_manyerrorQueryResult.result()list[dict]QueryResult.records()list[Record]