You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Debugging: preserve original Wasm bytecode inside of compiled ELF artifact. (#12636)
* Debugging: preserve original Wasm bytecode inside of compiled ELF artifact.
This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.
The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).
We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.
Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:
- It means that we need some way of mapping a compiled Wasm artifact
back to a source Wasm; absent "here's the full bytecode", that means
"here's the path to the full bytecode", but that path is an
identifier that may not be universally accessible (consider
e.g. capabilities/preopens present for a debugger component) or
portable (consider e.g. moving the artifact to a different machine).
- Or we don't even provide that metadata, and require the user to
explicitly specify the same module filename twice -- once to
actually run it, and once as an argument to the debugger.
- It means that we should account for stale artifacts and mark the
mismatch somehow; e.g. if the user starts debugging with Wasmtime,
either from a `.cwasm` on disk or with one produced in-memory just
for this run, and then subsequently rebuilds their source `.wasm`,
we no longer have a reference for it. (The same problem exists one
level up if source code is edited, but source to a Wasm producer
toolchain is definitely out-of-scope for Wasmtime.)
- It means that special logic is required in the case of components to
map a module back to a specific component section (we would
essentially have to expose the static module IDs, then require the
debugger top-half to re-implement our exact flattening algorithm to
find that core module).
The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
* miri ignore tests with compilation
* Review feedback.
0 commit comments