Skip to content

Commit c07c94d

Browse files
authored
Debugging: preserve original Wasm bytecode inside of compiled ELF artifact. (#12636)
* Debugging: preserve original Wasm bytecode inside of compiled ELF artifact. This PR adds logic to embed the original core Wasm module(s) from a compilation into a new ELF section, alongside other metadata sections. When a component is compiled, the core Wasms inside are preserved, accessible by their `StaticModuleIndex`es. The need for this support arises from the guest-debugger ecosystem. Consider either a debug component (bytecodealliance/rfcs#45) or a bespoke debugger in native code using Wasmtime's APIs. In either case, the existing APIs to introspect execution state provide `Module` references for each instance from each stack frame, and PC offsets into these `Module`s are the way in which breakpoints are configured. The debugger will somehow need to associate these `Module`s with the original Wasm bytecode, including e.g. any custom sections containing the producer-specific ways of encoding debug metadata, to do something useful. In particular also note that the GDB-stub protocol as extended for Wasm requires read access directly to the Wasm bytecode (it shows up as part of a "memory map" that is viewed by the standard read-remote-memory command); we can't delegate this requirement to the remote end of the stub connection, but have to handle it in the stub server that runs inside Wasmtime (as a component or bespoke). We have two main choices: carry the original bytecode all the way through the Wasmtime compilation pipeline and present it via `Module::bytecode()`, ready to use; or say that this task is out-of-scope and that the debugger top-half can find it on disk somehow. Unfortunately the latter ("out of scope, find the file") is somewhat at odds with the desired developer experience: - It means that we need some way of mapping a compiled Wasm artifact back to a source Wasm; absent "here's the full bytecode", that means "here's the path to the full bytecode", but that path is an identifier that may not be universally accessible (consider e.g. capabilities/preopens present for a debugger component) or portable (consider e.g. moving the artifact to a different machine). - Or we don't even provide that metadata, and require the user to explicitly specify the same module filename twice -- once to actually run it, and once as an argument to the debugger. - It means that we should account for stale artifacts and mark the mismatch somehow; e.g. if the user starts debugging with Wasmtime, either from a `.cwasm` on disk or with one produced in-memory just for this run, and then subsequently rebuilds their source `.wasm`, we no longer have a reference for it. (The same problem exists one level up if source code is edited, but source to a Wasm producer toolchain is definitely out-of-scope for Wasmtime.) - It means that special logic is required in the case of components to map a module back to a specific component section (we would essentially have to expose the static module IDs, then require the debugger top-half to re-implement our exact flattening algorithm to find that core module). The permissions issue alone was enough to convince me that we should do something better than providing a filename (why should we have to authorize the adapter to read the user's filesystem?) but all of the other benefits -- ensuring an exact match and ensuring perfect availability -- are a nice bonus. The main downside is making the `.cwasm` larger (possibly substantially so), but this overhead is only present when enabling guest-debugging, the data has to be present anyway, and this is likely not a dealbreaker. * miri ignore tests with compilation * Review feedback.
1 parent 2842552 commit c07c94d

8 files changed

Lines changed: 224 additions & 1 deletion

File tree

crates/environ/src/compile/module_artifacts.rs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,32 @@ impl<'a> ObjectBuilder<'a> {
245245
dwarf.push((T::id() as u8, offset..offset + data.len() as u64));
246246
}
247247

248+
/// Appends the original Wasm bytecode for one or more core modules as a
249+
/// pair of new ELF sections.
250+
///
251+
/// `modules` is an iterator of raw Wasm binary slices, one per core
252+
/// module, in `StaticModuleIndex` order.
253+
pub fn append_wasm_bytecode<'b>(&mut self, modules: impl IntoIterator<Item = &'b [u8]>) {
254+
let bytecode_id = self.obj.add_section(
255+
self.obj.segment_name(StandardSegment::Data).to_vec(),
256+
obj::ELF_WASMTIME_WASM_BYTECODE.as_bytes().to_vec(),
257+
SectionKind::ReadOnlyData,
258+
);
259+
let ends_id = self.obj.add_section(
260+
self.obj.segment_name(StandardSegment::Data).to_vec(),
261+
obj::ELF_WASMTIME_WASM_BYTECODE_ENDS.as_bytes().to_vec(),
262+
SectionKind::ReadOnlyData,
263+
);
264+
let mut end: u32 = 0;
265+
for wasm in modules {
266+
self.obj.append_section_data(bytecode_id, wasm, 1);
267+
end = end
268+
.checked_add(u32::try_from(wasm.len()).expect("module bytecode exceeds 4 GiB"))
269+
.expect("total bytecode exceeds 4 GiB");
270+
self.obj.append_section_data(ends_id, &end.to_le_bytes(), 4);
271+
}
272+
}
273+
248274
/// Creates the `ELF_WASMTIME_INFO` section from the given serializable data
249275
/// structure.
250276
pub fn serialize_info<T>(&mut self, info: &T)

crates/environ/src/obj.rs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,24 @@ pub const ELF_NAME_DATA: &'static str = ".name.wasm";
177177
/// metadata.
178178
pub const ELF_WASMTIME_DWARF: &str = ".wasmtime.dwarf";
179179

180+
/// This is the name of the section in the final ELF image which contains the
181+
/// original Wasm bytecode for the module, preserved verbatim to support
182+
/// debugger access to the source bytecode.
183+
///
184+
/// This section is only emitted when the `guest-debug` tunable is enabled at
185+
/// compile time. Its contents are the concatenated raw bytes of all core
186+
/// module Wasm binaries in the artifact.
187+
pub const ELF_WASMTIME_WASM_BYTECODE: &str = ".wasmtime.wasm_bytecode";
188+
189+
/// This is the name of the companion section to [`ELF_WASMTIME_WASM_BYTECODE`]
190+
/// that stores the end-offset table used to locate individual module bytecodes
191+
/// within the concatenated data.
192+
///
193+
/// The section contains one little-endian `u32` per core module in
194+
/// the artifact giving the *end* of that module's bytecode in the
195+
/// concatenated bytecode section above.
196+
pub const ELF_WASMTIME_WASM_BYTECODE_ENDS: &str = ".wasmtime.wasm_bytecode_ends";
197+
180198
/// Workaround to implement `core::error::Error` until
181199
/// gimli-rs/object#747 is settled.
182200
pub struct ObjectCrateErrorWrapper(pub object::Error);

crates/wasmtime/src/compile.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,10 @@ pub(crate) fn build_module_artifacts<T: FinishedObject>(
119119
dwarf_package,
120120
)?;
121121

122+
if tunables.debug_guest {
123+
object.append_wasm_bytecode(std::iter::once(wasm));
124+
}
125+
122126
let (info, index) = compilation_artifacts.unwrap_as_module_info();
123127
let types = types.finish();
124128
object.serialize_info(&(&info, &index, &types));
@@ -181,6 +185,16 @@ pub(crate) fn build_component_artifacts<T: FinishedObject>(
181185
t.module.needs_gc_heap |= needs_gc_heap
182186
}
183187

188+
// Collect bytecode slices here before moving `module_translations` below.
189+
let module_wasms = if tunables.debug_guest {
190+
module_translations
191+
.values()
192+
.map(|t| t.wasm)
193+
.collect::<Vec<_>>()
194+
} else {
195+
vec![]
196+
};
197+
184198
let mut object = compiler.object(ObjectKind::Component)?;
185199
engine.append_compiler_info(&mut object)?;
186200
engine.append_bti(&mut object);
@@ -192,6 +206,11 @@ pub(crate) fn build_component_artifacts<T: FinishedObject>(
192206
module_translations,
193207
None, // TODO: Support dwarf packages for components.
194208
)?;
209+
210+
if tunables.debug_guest {
211+
object.append_wasm_bytecode(module_wasms);
212+
}
213+
195214
let (types, ty) = types.finish(&component.component);
196215

197216
let info = CompiledComponentInfo {

crates/wasmtime/src/runtime/code.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ use alloc::sync::Arc;
1010
use core::ops::{Add, Range, Sub};
1111
use wasmtime_environ::DefinedFuncIndex;
1212
use wasmtime_environ::ModuleTypes;
13+
use wasmtime_environ::StaticModuleIndex;
1314
#[cfg(feature = "component-model")]
1415
use wasmtime_environ::component::ComponentTypes;
1516

@@ -227,6 +228,13 @@ impl EngineCode {
227228
self.original_code.wasm_dwarf()
228229
}
229230

231+
/// Returns the original Wasm bytecode section if preserved in the
232+
/// compiled artifact.
233+
#[inline]
234+
pub fn wasm_bytecode_for_module(&self, module: StaticModuleIndex) -> Option<&[u8]> {
235+
self.original_code.wasm_bytecode_for_module(module)
236+
}
237+
230238
/// Returns the raw image as bytes (in our internal image format).
231239
pub fn image(&self) -> &[u8] {
232240
&self.original_code.mmap()[..]

crates/wasmtime/src/runtime/code_memory.rs

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,14 @@ use crate::prelude::*;
55
use crate::runtime::vm::MmapVec;
66
use alloc::sync::Arc;
77
use core::ops::Range;
8-
use object::SectionIndex;
98
use object::read::elf::SectionTable;
9+
use object::{LittleEndian, SectionIndex, U32Bytes};
1010
use object::{
1111
elf::{FileHeader64, SectionHeader64},
1212
endian::Endianness,
1313
read::elf::{FileHeader as _, SectionHeader as _},
1414
};
15+
use wasmtime_environ::StaticModuleIndex;
1516
use wasmtime_environ::{Trap, lookup_trap_code, obj};
1617
use wasmtime_unwinder::ExceptionTable;
1718

@@ -45,6 +46,8 @@ pub struct CodeMemory {
4546
func_name_data: Range<usize>,
4647
info_data: Range<usize>,
4748
wasm_dwarf: Range<usize>,
49+
wasm_bytecode: Range<usize>,
50+
wasm_bytecode_ends: Range<usize>,
4851
}
4952

5053
impl Drop for CodeMemory {
@@ -153,6 +156,8 @@ impl CodeMemory {
153156
let mut func_name_data = 0..0;
154157
let mut info_data = 0..0;
155158
let mut wasm_dwarf = 0..0;
159+
let mut wasm_bytecode = 0..0;
160+
let mut wasm_bytecode_ends = 0..0;
156161
for section_header in sections.iter() {
157162
let data = section_header
158163
.data(endian, mmap_data)
@@ -212,6 +217,8 @@ impl CodeMemory {
212217
obj::ELF_NAME_DATA => func_name_data = range,
213218
obj::ELF_WASMTIME_INFO => info_data = range,
214219
obj::ELF_WASMTIME_DWARF => wasm_dwarf = range,
220+
obj::ELF_WASMTIME_WASM_BYTECODE => wasm_bytecode = range,
221+
obj::ELF_WASMTIME_WASM_BYTECODE_ENDS => wasm_bytecode_ends = range,
215222

216223
#[cfg(feature = "debug-builtins")]
217224
".debug_info" => has_native_debug_info = true,
@@ -269,6 +276,8 @@ impl CodeMemory {
269276
wasm_dwarf,
270277
info_data,
271278
wasm_data,
279+
wasm_bytecode,
280+
wasm_bytecode_ends,
272281
})
273282
}
274283

@@ -332,6 +341,17 @@ impl CodeMemory {
332341
&self.mmap[self.frame_tables_data.clone()]
333342
}
334343

344+
/// Returns the concatenated Wasm bytecode section, or an empty slice if
345+
/// the artifact was not compiled with `guest-debug` enabled.
346+
pub fn wasm_bytecode(&self) -> &[u8] {
347+
&self.mmap[self.wasm_bytecode.clone()]
348+
}
349+
350+
/// Returns the Wasm bytecode section end-offset array.
351+
pub fn wasm_bytecode_ends(&self) -> &[u8] {
352+
&self.mmap[self.wasm_bytecode_ends.clone()]
353+
}
354+
335355
/// Returns the contents of the `ELF_WASMTIME_INFO` section, or an empty
336356
/// slice if it wasn't found.
337357
#[inline]
@@ -346,6 +366,36 @@ impl CodeMemory {
346366
&self.mmap[self.trap_data.clone()]
347367
}
348368

369+
/// Returns the Wasm bytecode section end-offset for a given core
370+
/// module, or `None` if no bytecode is present.
371+
///
372+
/// # Panics
373+
///
374+
/// Panics if index is out-of-range.
375+
fn wasm_bytecode_end_for_module(&self, index: StaticModuleIndex) -> Option<usize> {
376+
if self.wasm_bytecode_ends().is_empty() {
377+
return None;
378+
}
379+
let ends = self.wasm_bytecode_ends();
380+
let count = ends.len() / core::mem::size_of::<u32>();
381+
let (ends, _) = object::slice_from_bytes::<U32Bytes<LittleEndian>>(ends, count)
382+
.expect("Invalid alignment of `ends` section");
383+
let index = usize::try_from(index.as_u32()).unwrap();
384+
Some(usize::try_from(ends[index].get(LittleEndian)).unwrap())
385+
}
386+
387+
/// Returns the Wasm bytecode for the a core module in this
388+
/// artifact, or `None` if bytecode was not preserved.
389+
pub(crate) fn wasm_bytecode_for_module(&self, index: StaticModuleIndex) -> Option<&[u8]> {
390+
let start = if index.as_u32() == 0 {
391+
0
392+
} else {
393+
self.wasm_bytecode_end_for_module(StaticModuleIndex::from_u32(index.as_u32() - 1))?
394+
};
395+
let end = self.wasm_bytecode_end_for_module(index)?;
396+
Some(&self.wasm_bytecode()[start..end])
397+
}
398+
349399
/// Publishes the internal ELF image to be ready for execution.
350400
///
351401
/// This method can only be when the image is not published (its

crates/wasmtime/src/runtime/instantiate.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,12 @@ impl CompiledModule {
261261
pub fn has_address_map(&self) -> bool {
262262
!self.engine_code.address_map_data().is_empty()
263263
}
264+
265+
/// Returns the original Wasm bytecode for this module, if it is available.
266+
pub fn bytecode(&self) -> Option<&[u8]> {
267+
self.engine_code
268+
.wasm_bytecode_for_module(self.module.module_index)
269+
}
264270
}
265271

266272
#[cfg(feature = "addr2line")]

crates/wasmtime/src/runtime/module.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -688,6 +688,18 @@ impl Module {
688688
Some(&module.strings[name])
689689
}
690690

691+
/// Returns the original Wasm bytecode for this module, if it is
692+
/// available.
693+
///
694+
/// Bytecode is only retained when the [`Engine`] was configured with
695+
/// `guest-debug` support enabled (see [`Config::guest_debug`]). Returns
696+
/// `None` when the module was compiled without that option.
697+
///
698+
/// [`Config::guest_debug`]: crate::Config::guest_debug
699+
pub fn debug_bytecode(&self) -> Option<&[u8]> {
700+
self.compiled_module().bytecode()
701+
}
702+
691703
/// Returns the list of imports that this [`Module`] has and must be
692704
/// satisfied.
693705
///

tests/all/debug.rs

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1116,3 +1116,87 @@ async fn invalidated_frame_handles_in_dropped_future() -> wasmtime::Result<()> {
11161116

11171117
Ok(())
11181118
}
1119+
1120+
#[test]
1121+
#[cfg_attr(miri, ignore)]
1122+
fn module_bytecode() -> wasmtime::Result<()> {
1123+
let wasm = wat::parse_str(
1124+
r#"
1125+
(module
1126+
(func (export "add") (param i32 i32) (result i32)
1127+
local.get 0
1128+
local.get 1
1129+
i32.add
1130+
)
1131+
)
1132+
"#,
1133+
)
1134+
.unwrap();
1135+
1136+
let mut config = Config::default();
1137+
config.guest_debug(true);
1138+
let engine = Engine::new(&config)?;
1139+
let module = Module::new(&engine, &wasm)?;
1140+
1141+
assert_eq!(module.debug_bytecode(), Some(&wasm[..]));
1142+
1143+
Ok(())
1144+
}
1145+
1146+
#[test]
1147+
#[cfg_attr(miri, ignore)]
1148+
fn module_bytecode_absent_without_debug() -> wasmtime::Result<()> {
1149+
let wasm = wat::parse_str("(module)").unwrap();
1150+
1151+
let mut config = Config::default();
1152+
config.guest_debug(false);
1153+
let engine = Engine::new(&config)?;
1154+
let module = Module::new(&engine, &wasm)?;
1155+
1156+
assert_eq!(module.debug_bytecode(), None);
1157+
1158+
Ok(())
1159+
}
1160+
1161+
#[test]
1162+
#[cfg_attr(miri, ignore)]
1163+
fn component_bytecode() -> wasmtime::Result<()> {
1164+
use wasmtime::component::{Component, Linker};
1165+
1166+
// Build the bytecode for each core module by compiling them
1167+
// standalone.
1168+
let m1_body = r#"(func (export "f1") (result i32) i32.const 42)"#;
1169+
let m2_body = r#"(func (export "f2") (result i32) i32.const 99)"#;
1170+
let m1_wasm = wat::parse_str(&format!("(module $m1 {m1_body})")).unwrap();
1171+
let m2_wasm = wat::parse_str(&format!("(module $m2 {m2_body})")).unwrap();
1172+
1173+
// Build a component that embeds both core modules inline.
1174+
let component_wasm = wat::parse_str(&format!(
1175+
r#"(component
1176+
(core module $m1 {m1_body})
1177+
(core instance $i1 (instantiate (module $m1)))
1178+
(core module $m2 {m2_body})
1179+
(core instance $i2 (instantiate (module $m2))))
1180+
"#,
1181+
))
1182+
.unwrap();
1183+
1184+
let mut config = Config::default();
1185+
config.guest_debug(true);
1186+
let engine = Engine::new(&config)?;
1187+
1188+
let component = Component::new(&engine, &component_wasm)?;
1189+
let linker: Linker<()> = Linker::new(&engine);
1190+
let mut store = Store::new(&engine, ());
1191+
linker.instantiate(&mut store, &component)?;
1192+
1193+
let modules = store.debug_all_modules();
1194+
assert_eq!(modules.len(), 2);
1195+
1196+
// Modules should be registered in offset order. The API doesn't
1197+
// guarantee this, but this suffices for a test.
1198+
assert_eq!(modules[0].debug_bytecode().unwrap(), &m1_wasm[..]);
1199+
assert_eq!(modules[1].debug_bytecode().unwrap(), &m2_wasm[..]);
1200+
1201+
Ok(())
1202+
}

0 commit comments

Comments
 (0)