Skip to content

Commit e920961

Browse files
authored
A handful of optimizations for the DRC collector (#12974)
* Cache `TraceInfo` lookups in the DRC collector Ideally we would just use a `SecondaryMap<VMSharedTypeIndex, TraceInfo>` here but allocating `O(num engine types)` space inside a store that uses only a couple types seems not great. So instead, we just have a fixed size cache that is probably big enough for most things in practice. * Combine dec_ref, trace, and dealloc into single-pass loop Inline `dec_ref`, `trace_gc_ref`, and `dealloc` into `dec_ref_and_maybe_dealloc`'s main loop so that we read the `VMDrcHeader` once per object to get `ref_count`, type index, and `object_size`, avoiding 3 separate GC heap accesses and bounds checks per freed object. For struct tracing, read gc_ref fields directly from the heap slice at known offsets instead of going through gc_object_data → object_range → object_size which would re-read the object_size from the header. 301,333,979,721 -> 291,038,676,119 instructions (~3.4% improvement) * Fast-path `gc_alloc_raw` to skip async/fiber machinery when GC store exists When the GC store is already initialized and the allocation succeeds, avoid async machinery entirely. This avoids the overhead of taking/restoring fiber async state pointers on every allocation. 291,038,676,119 -> 230,503,364,489 instructions (~20.8% improvement) * Pass `VMSharedTypeIndex`es to the `gc_alloc_raw` libcall Avoids converting `ModuleInternedTypeIndex` to `VMSharedTypeIndex` in host code, which requires look ups in the instance's module's `TypeCollection`. We already have helpers to do this conversion inline in JIT code. 230,503,364,489 -> 216,937,168,529 instructions (~5.9% improvement) * Do not effectively double-test for `externref`s during DRC deallocation Moves the `externref` host data cleanup inside the `ty.is_none()` branch of `dec_ref_and_maybe_dealloc`, since only `externref`s have host data. Additionally the type check is sort of expensive since it involves additional bounds-checked reads from the GC heap. * Fix warning * Update exceptions disas results * Fix an overflow bug in the free list's bump allocation * Revert "Cache `TraceInfo` lookups in the DRC collector" This reverts commit 41dcbd931170c0e510b5baf9e0cafa19a83c0ddd. * Use a custom hasher for the trace-info hash map * Fix free list tests * fix free list tests * Really fix free list tests this time? * Fix free list add_capacity on 32-bit architectures `Layout::from_size_align` rejects sizes greater than `isize::MAX`, causing `add_capacity` to silently discard new capacity blocks that exceed this limit. This meant the free list could not grow beyond ~2 GB on 32-bit even though our `u32` indices can address up to ~4 GB. Fix by calling `dealloc_impl` directly in add_capacity, bypassing the `Layout` construction. The block index and size are already properly aligned u32 values, so the `Layout` validation is unnecessary for internal free list bookkeeping. Also remove a redundant `debug_assert` in `dealloc_impl` that constructed a `Layout` (hitting the same `isize::MAX` limitation), since the alignment invariant is already checked by the adjacent assertions. * Fix warnings * fix unused import warning * Fix `allocated_bytes` accounting after rebase
1 parent 7bd61b8 commit e920961

21 files changed

Lines changed: 690 additions & 512 deletions

File tree

crates/cranelift/src/func_environ/gc/enabled/drc.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,7 @@ fn emit_gc_raw_alloc(
364364
.ins()
365365
.iconst(ir::types::I32, i64::from(kind.as_u32()));
366366

367-
let ty = builder.ins().iconst(ir::types::I32, i64::from(ty.as_u32()));
367+
let ty = func_env.module_interned_to_shared_ty(&mut builder.cursor(), ty);
368368

369369
assert!(align.is_power_of_two());
370370
let align = builder.ins().iconst(ir::types::I32, i64::from(align));

crates/environ/src/builtin.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ macro_rules! foreach_builtin_function {
8686
gc_alloc_raw(
8787
vmctx: vmctx,
8888
kind: u32,
89-
module_interned_type_index: u32,
89+
shared_type_index: u32,
9090
size: u32,
9191
align: u32
9292
) -> u32;

crates/wasmtime/proptest-regressions/runtime/vm/gc/enabled/free_list.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@
55
# It is recommended to check this file in to source control so that
66
# everyone who runs the test benefits from these saved cases.
77
cc b26e69fbaf46deb79652859039538e422818fd40b9afff63faa7aacbddecfd3d # shrinks to (capacity, ops) = (219544665809630458, [(10, Alloc(Layout { size: 193045289231815352, align: 8 (1 << 3) })), (10, Dealloc(Layout { size: 193045289231815352, align: 8 (1 << 3) }))])
8+
cc 174fe731edb88dd41ae77aeb8ddc0a94e09f8d8ad0849709c440c8995e639bbd # shrinks to (initial_capacity, ops) = (558656369836710805, [Alloc(292159696945657979, Layout { size: 339386803970957424, align: 2 (1 << 1) }), Dealloc(292159696945657979, Layout { size: 339386803970957424, align: 2 (1 << 1) }), AddCapacity(5898306987443335331)])

crates/wasmtime/src/runtime/store.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2074,6 +2074,13 @@ impl StoreOpaque {
20742074
.expect("attempted to access the store's GC heap before it has been allocated")
20752075
}
20762076

2077+
/// Returns a mutable reference to the GC store if it has been allocated.
2078+
#[inline]
2079+
#[cfg(feature = "gc-drc")]
2080+
pub(crate) fn try_gc_store_mut(&mut self) -> Option<&mut GcStore> {
2081+
self.gc_store.as_mut()
2082+
}
2083+
20772084
#[inline]
20782085
pub(crate) fn gc_roots(&self) -> &RootSet {
20792086
&self.gc_roots

crates/wasmtime/src/runtime/store/gc.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ impl StoreOpaque {
5757
bytes_needed: Option<u64>,
5858
asyncness: Asyncness,
5959
) {
60+
log::trace!("collect_and_maybe_grow_gc_heap(bytes_needed = {bytes_needed:#x?})");
6061
self.do_gc(asyncness).await;
6162
if let Some(n) = bytes_needed
6263
// The gc_zeal's allocation counter will pass `bytes_needed == 0` to

crates/wasmtime/src/runtime/vm/gc.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,11 @@ impl GcStore {
113113
pub async fn gc(&mut self, asyncness: Asyncness, roots: GcRootsIter<'_>) {
114114
let collection = self.gc_heap.gc(roots, &mut self.host_data_table);
115115
collect_async(collection, asyncness).await;
116-
self.last_post_gc_allocated_bytes = Some(self.gc_heap.allocated_bytes());
116+
self.last_post_gc_allocated_bytes = Some({
117+
let size = self.gc_heap.allocated_bytes();
118+
log::trace!("After collection, GC heap size = {size} bytes");
119+
size
120+
});
117121
}
118122

119123
/// Get the kind of the given GC reference.

0 commit comments

Comments
 (0)