An ahead-of-time compiler that turns Lua 5.5 source into standalone WebAssembly modules — no interpreter, no bytecode VM, no bundled garbage collector.
lua2wasm leans on the modern WebAssembly type system (the garbage collector, typed-reference,
and exception-handling). There is no linear memory, no bundled collector, and no bytecode interpreter.
Lua tables become real WebAssembly structs, closures become typed function references, and the host's garbage collector (V8 / SpiderMonkey) owns every value.
Live playground: https://rhmoller.github.io/lua2wasm/
Type Lua on the left and see the result on the right — the page compiles and runs your program entirely in the browser, with nothing to download. Flip the output to Show WAT to read the generated WebAssembly text. Works in any recent Chrome, Edge, Firefox, or Safari (see Targets).
It integrates with just-bash so even the os and io packages can be used on the web.
Prefer the command line? See Build & run locally.
-- A "Stack" class — exercises tables, metatables, OO sugar, closures,
-- errors, and pcall in one short program.
local Stack = {} -- a Lua table → a `$LuaTable` struct:
Stack.__index = Stack -- array part + open-addressed hash + `meta` slot
function Stack.new()
return setmetatable({n = 0}, Stack) -- writes Stack into the struct's `meta` field
end -- `Stack.new` itself is a `(ref $LuaClosure)` =
-- struct (funcref code, upvalue array)
function Stack:push(v) -- `:` desugars to `self` as the first parameter
self.n = self.n + 1 -- small ints unbox to `i31ref` — zero allocation
self[self.n] = v -- integer keys land in the dense array part
end
function Stack:pop()
if self.n == 0 then
error("stack underflow") -- compiles to `throw $LuaError` (anyref payload)
end
local v, n = self[self.n], self.n
self[n], self.n = nil, n - 1
return v
end
local s = Stack.new()
s:push("hi"); s:push("from"); s:push("lua2wasm")
-- each string literal materialises once via `array.new_data` from a shared data segment
for i = 1, s.n do print(s[i]) end -- numeric `for` keeps `i` as a raw `i64`;
-- the call to `print` dispatches via `call_ref`
local ok, err = pcall(function() -- `pcall` → `try_table (catch $LuaError ...)`
for _ = 1, 4 do s:pop() end -- the 4th `pop` underflows and throws
end)
print(ok, err) --> false <src>:19: stack underflowThat compiles to a ~13 KB .wasm with --tree-shake (~42 KB out of the box).
Every value is a host-GC object — the host's garbage collector owns lifetimes,
there is no linear memory and no bundled allocator. The Stack instance is a
real $LuaTable struct (array + hash + meta slot, all GC fields); the
:push / :pop methods are (ref $LuaClosure) values carrying a typed
funcref plus an upvalue array; every call dispatches through call_ref,
the WASM indirect-call instruction for typed function references; and
error / pcall ride directly on the exception-handling proposal — a real
throw $LuaError thrown across frames and caught by a try_table, with no
setjmp/longjmp emulation. The full mapping from each Lua construct to the
WASM feature it lowers to is in
How we use modern WebAssembly.
lua2wasm is an AOT compiler. There's no Lua interpreter sitting around at
runtime; what you write is what the compiler statically lowers to WASM
instructions.
Most Lua code will compile and run on lua2wasm. The two primary limitations are:
- no runtime compilation or interpreter, so
loadandload-fileare not supported. - no coroutines (waiting for stack switching support in WebAssembly standard)
| Area | Supported | Not yet |
|---|---|---|
| Values | nil, booleans, integers, floats, strings, tables, first-class functions / closures |
userdata, threads |
| Numbers | int + float subtypes with Lua-compliant promotion (/ always float, + - * keep int if both ints, // floor-div, % floor-mod, ^ always float, hex int 0xFF / 0Xff (wraps mod 2^64 like reference Lua — 0x13121110090807060504030201 denotes its low 64 bits), hex float 0x1.8p3, bitwise & | ~ << >> and unary ~ on 64-bit ints with float-to-integer coercion) |
— |
| Strings | single / double-quoted with full escape set (\n \t \\ \" \' \0 \a \b \f \r \v \xHH \ddd \u{…} \z \<line-break>), long-bracket [[ ... ]] and level-N [=[ ... ]=], concat .. (with __concat), length # (with __len), structural equality |
— |
| Tables | array part + hash part, positional / named / [expr]= constructors, t.k and t[k] read+write, nil-assignment delete, #t border rule, nesting, identity equality via ref.eq (table keys work as in Lua) |
metatable performance tricks |
| Locals | local x, local x, y, z = …, lexical block scoping, shadowing, <const> (compile-time reassignment error) and <close> (calls __close(value, nil) in reverse declaration order at natural block exit; nil/false skip) |
<close> cleanup on return/break/error exits inside the block |
| Globals | Lua-traditional implicit globals — num = 42 at top level just works; reading an undeclared name yields nil. Explicit global x declarations also supported. _G is a real $LuaTable aliasing every global: builtins, library tables, user globals all live there, _G._G == _G, pairs(_G) iterates them. Reassigning a builtin (e.g. print = my_print) writes a new _G entry — the original closure remains reachable through getmetatable / locals capturing it. |
strict mode (global <const> * opt-in) |
| Statements | local, local function, top-level function f() end incl. dotted (function T.x.y() end) and method (function T:m() end) forms, multi-assign, if/elseif/else, while, for i = a,b[,c], generic for k[,v,…] in …, repeat ... until, break, goto NAME / ::NAME::, bare do, expression-statement, return e1, … |
— |
| Operators | + - * / // % ^, & | ~ << >> (binary) and ~ (unary, bitwise NOT), == ~= < <= > >=, and or not, .., # |
— |
| Functions | N-ary arguments, multiple return values (return a, b, c), upvalue capture (mutable shared boxes), transitive captures, method-call sugar obj:m(args), paren-less single-arg call f"x" / f{k=1}, varargs function f(...) / function f(a, ...) with ... spliced into call args, returns, table constructors, and multi-assign, proper tail calls (return f(...) → return_call_ref, doesn't grow the stack) |
— |
| Errors | error(v) / pcall(f, …) / assert(v[, msg]) lowered to WASM exception handling (throw $LuaError + try_table). Internal builtins raise spec-shaped messages with the standard <src>:<line>: position prefix — attempt to perform arithmetic, attempt to index a value, invalid UTF-8 code, data does not fit, out of limits, variable-length format, not power of 2, table overflow (raised before WASM array-size traps so pcall can catch runaway allocations), stack overflow (a $call_depth guard raises it catchably before deep non-tail recursion exhausts the host call stack), etc. |
full traceback (debug.traceback returns the current frame only) |
| Metatables | setmetatable / getmetatable, __index (table chain and function form, with cycle limit), __newindex (table chain and function form), __add __sub __mul __div __mod __pow __unm __idiv __band __bor __bxor __shl __shr __bnot, __concat, __len, __eq __lt __le, __call, __tostring, __name (renames the type in tostring and type-aware error messages), __metatable (protect) |
__close on early exits (return/break/goto/error; natural block exit works), __gc (no finalizers in WasmGC), __mode (no weak refs in WasmGC) |
| Standard lib | print, error (full position prefix <src>:<line>: via the level arg; level 0 disables it), pcall, xpcall, warn, assert (auto-prefixes too), select, type, tostring, tonumber, ipairs, pairs, next, setmetatable, getmetatable, rawequal, rawlen, rawget, rawset, collectgarbage (stub — no real Lua GC, but "incremental"/"generational" round-trip the previous mode), _G, _VERSION; debug.{traceback, getmetatable, setmetatable, gethook}; require + package.{loaded, preload} (static -m FILE modules at compile time — see CLI section); io.{write, read, output, input} (io.output([f]) / io.input([f]) query or set the default file; bare io.write / io.read route through it) plus io.stdout / io.stderr / io.stdin file handles with :write / :read / :close / :flush methods (close/flush are no-ops on the standard streams; stderr writes route to the host's stderr); real files via io.open / io.lines / io.type returning handles with :read (l/L/a/n/N-byte) / :write / :seek (set/cur/end) / :flush / :close / :lines — backed by node:fs under Node and by just-bash's in-memory FS in the playground (bridged through JSPI, persists for the page session); os.{time, clock, date, difftime, getenv, exit, execute, remove, rename, tmpname, setlocale} — light shims over the host. os.difftime(t2,t1) returns the second difference as a float; os.setlocale supports the portable "C" locale (other locale names → nil). os.time/os.clock are real, os.date covers a strftime subset (%Y %m %d %H %M %S %j %p %w %y %c %x %X %%) plus the *t / !*t table form, os.execute() reports "shell available" and os.execute(cmd) returns (nil, "exit", 1), os.{remove, rename} act on the host filesystem and return (nil, msg) on failure; math.{floor, ceil, abs, sqrt, min, max, sin, cos, tan, asin, acos, atan, exp, log, pi, huge, deg, rad, fmod, modf, tointeger, type, ult, maxinteger, mininteger, random, randomseed} (atan/log accept a second arg; numeric-string args coerce like the operators, and a non-number raises a catchable number expected); string.{len, sub, format, upper, lower, reverse, rep, byte, char, find, match, gmatch, gsub, pack, unpack, packsize} (format covers all of %s %d %i %u %o %x %X %c %q %e %E %f %F %g %G %a %A %% with flags - + space # 0, width, precision; pattern subsystem covers every documented class, set, quantifier, anchor, capture, back-ref, %bxy balanced match, %f[set] frontier; gsub accepts string/table/function repl; pack/unpack/packsize cover every option in §6.5.2 — < > = ! [n] b B h H l L j J T i[N] I[N] f d n c[N] z s[N] x Xop — and round-trip byte-for-byte with reference Lua, including data does not fit overflow detection on >8-byte ints in either direction); table.{insert, remove, concat, unpack, pack, move, create, sort}; utf8.{char, len, codepoint, offset, codes, charpattern} (codes raises invalid UTF-8 code with file:line on bad bytes); load returns (nil, "no load") since lua2wasm is AOT — no runtime compiler ships with the module; loadfile is not implemented |
real coroutines |
| Coroutines | — | blocked on the WASM stack-switching proposal shipping in browsers |
Browse tests/fixtures/ to see what valid programs look
like across each capability area.
The whole point of lua2wasm is to take the new WASM proposals seriously —
not as a portable assembler with a hand-rolled allocator on top, but as a
managed runtime that already has most of what a dynamic language needs.
| WASM feature | What we use it for |
|---|---|
GC: struct and array types |
Every Lua value is a host-GC object. $LuaString wraps (array i8), $LuaTable is a struct of (keys, vals, n, cap, meta), $LuaClosure is a struct of (funcref, upvalues). No linear memory. No bundled allocator. The browser's GC owns lifetime. |
i31ref |
Unboxed small integers. Lua ints in the 31-bit range live as tagged immediates with zero allocation; only overflowed ints get boxed in $LuaInt. |
Typed function references + call_ref |
Closures are real references, not table-of-functions indices. Every function call goes through call_ref on a (ref $LuaFn) extracted from the closure struct. |
return_call_ref (tail calls) |
return f(...) lowers to return_call_ref. Deep recursion (e.g. 20 000-step countdown) doesn't grow the WASM call stack — a property the JS embedding can't offer. |
Mutually-recursive types (rec blocks) |
$LuaClosure references $LuaFn, $LuaFn mentions $LuaClosure. We declare them in a single recursion group so the type system accepts the cycle. Same trick for $LuaTable referencing itself via its metatable field. |
| Reference-type tests / casts | Dynamic dispatch (e.g. print of arbitrary values, or + falling back to __add) uses ref.test (ref $LuaTable), ref.cast, ref.is_null to switch on the type without a tag word. |
Exception handling (tag + throw + try_table) |
error(v) is throw $LuaError v carrying the error as an anyref payload. pcall(f, ...) is try_table with a single catch label that lands the error value on a block exit. Real call-stack unwinding, no setjmp/longjmp emulation. |
array.new_data from data segments |
String literals are materialized in a single shared data segment; constructing a $LuaString is one array.new_data instruction that copies the byte range out. |
array.copy between GC arrays |
String concat and table-array resizing copy ranges between GC-managed (array …) instances directly — no manual loop, no memcpy, no linear-memory staging. |
anyref + null tracking in the type system |
Lua values flow as anyref. Non-nullable refs ((ref $X) vs (ref null $X)) are tracked separately so the validator catches whole classes of NPE-style bugs in our generated code at module-instantiation time. |
(start) not used |
We deliberately don't run code at instantiation time, so the JS host can wire up its decoder helpers before main() is called — otherwise imports couldn't see the module's own exports. |
JS Promise Integration (WebAssembly.Suspending / WebAssembly.promising) |
Playground only. io.read and the filesystem imports (io.open/io.lines/os.remove/…, backed by just-bash's async in-memory FS) are wrapped as suspending imports and main as a promising export, so synchronous Lua I/O transparently awaits the async host underneath. No change to the compiled .wasm; pure host-side. Falls back to EOF / soft I/O failure when the host lacks JSPI. |
The practical consequence: a typical compiled module is a few KB. The
host has zero Lua-specific runtime; everything that is the Lua VM lives in
the produced .wasm. A program that defines a closure and calls it once
fits in 5 KB; the full milestone-8 OO demo fits in 5.5 KB.
Does the "no linear memory, everything is a host-GC object" model cost
runtime speed? It doesn't have to. Hand-writing the ideal WAT for a
numeric loop — plain i64/f64 locals, no boxing — runs at par with (or
faster than) reference Lua 5.5's bytecode interpreter on V8. The baseline gap
is representation overhead, not the execution substrate: boxed floats,
per-call argument/result arrays, and generic anyref dispatch through runtime
helpers. So it's recoverable by generating better code.
A numeric/call specialization pass closes it for monomorphic code, and it is
on by default — -O0 selects the boxed fallback, which is behaviour-
identical, just slower. Measured under Node/V8, timed inside the script with
os.clock() so process startup is excluded (numbers vary by machine; compare
the columns, not the absolutes):
| tight-loop workload | reference lua5.5 |
lua2wasm (-O0, boxed) |
lua2wasm (default) |
|---|---|---|---|
| 20M integer ops | 0.07 s | 0.36 s | 0.05 s |
| 20M float ops | 0.05 s | 0.38 s | 0.05 s |
| 20M function calls | 0.22 s | 0.55 s | 0.04 s |
recursive fib(34) |
0.18 s | 0.31 s | 0.05 s |
5M t[i]=i then sum |
0.06 s | 0.24 s | 0.14 s |
What the pass does, all within the WasmGC model (no linear memory, no deopt):
- Unboxes locals, parameters, returns, and recursion. A whole-program
fixpoint infers which slots, function parameters, and return values are
monomorphically
intorfloat, and declares them as rawi64/f64. Integer and float are kept distinct (Lua semantics:3*3is9,3.0*3.0is9.0), so a value seen as both stays boxed. - Specializes arithmetic and comparisons to native
i64/f64opcodes; comparisons inif/while/repeatuse thei32result directly, skipping the boxed boolean and the truthiness helper. - Direct calls. A call to a statically-bound local function — including a
self-recursive one — goes to a typed entry that takes unpacked, unboxed
arguments and returns an unboxed value, skipping the per-call argument and
result arrays. End to end,
s = s + f(i)in a loop compiles to ani64.addover a direct call with no allocation.
Independently, tables use a hybrid array + hash representation (always on,
not gated): integer keys 1..n live in a dense array part for O(1) sequential
access, with everything else in an open-addressing hash — the same shape
reference Lua uses. Genuinely dynamic table values still box, so a sum += t[i]
loop stays a few times off reference; numeric scalar code is where the model
shines.
The honest ceiling: genuinely dynamic float or big-integer code must box
(WasmGC has no way to put a raw f64/64-bit int in an anyref, and NaN-boxing
needs linear memory — which this project forbids), and an AOT compiler with no
deoptimization can't speculate the way a tracing JIT does. Conservative static
specialization plus a boxed generic fallback is the model's ceiling — and for
ordinary numeric and call-heavy code, that ceiling is at or below reference Lua.
Anything with current WASM-GC + reference-types + exception-handling support runs every compiled module:
- Chrome / Edge ≥ 137 — works out of the box, no flags
- Firefox ≥ 131 — works out of the box, no flags
- Safari ≥ 18.4 — works out of the box, no flags
- Node ≥ 22 — needs
--experimental-wasm-exnref(still gated as of Node 24); future Node releases are expected to default it on
The new exception-handling proposal (with exnref / try_table) is the
only opcode family in our compiled output that's still flag-gated
anywhere. It's shipped in every current browser by default; the Node
holdout is a runtime config detail, not a missing implementation.
runtime/playground.html uses JavaScript
Promise Integration (WebAssembly.Suspending + WebAssembly.promising)
to let io.read suspend the running wasm while it awaits a typed line in
the output pane. Status is the same as exnref:
- Chrome / Edge ≥ 137 — default-on
- Firefox / Safari recent — default-on (track the JSPI proposal for the latest)
- Node — behind
--experimental-wasm-jspi(not used by our CLI host)
When JSPI is unavailable the playground silently falls back to returning
EOF from io.read — every other feature keeps working. Nothing in the
compiled .wasm itself depends on JSPI; it's purely a host-side feature
that lets a synchronous-looking wasm import resolve a promise.
Compiled modules need no other runtime files. They import "host" for print
only — and even that can be replaced with whatever host imports your
embedding cares about.
The command-line flow needs no Emscripten and no Binaryen — just
clang, cmake, and Node ≥ 22. The WAT→wasm assembler is built into the
compiler.
# 1. Build the compiler (a native binary)
CC=clang cmake -S . -B build -G Ninja
cmake --build build
# 2. Compile a Lua program straight to a .wasm module
echo 'print("hello from " .. "lua2wasm")' > /tmp/hello.lua
./build/lua2wasm /tmp/hello.lua -o /tmp/hello.wasm
# 3. Run it
node --experimental-wasm-exnref runtime/host.mjs /tmp/hello.wasm
# → hello from lua2wasmUse -o /tmp/hello.wat to emit human-readable text instead. Run the full test
suite with ctest --test-dir build --output-on-failure.
Requirements (required vs optional):
| Tool | Version | Required for |
|---|---|---|
clang |
≥ 19 | building the native compiler binary (C23 #embed requires clang ≥ 19) |
cmake |
≥ 3.25 | building the native compiler binary |
node |
≥ 22 | running compiled .wasm modules from the command line (Browser hosts work equivalently, no Node needed there) |
binaryen |
recent | optional — wasm-as is used only as a differential oracle by the wat2wasm unit tests; the compiler ships its own WAT→wasm assembler |
emcc |
≥ 4.0 | optional — only for cross-compiling the compiler itself to WASM so the playground page can call it. Skip if you don't need the playground. |
./build/lua2wasm input.lua -o output.wasm # binary module (use -o output.wat for text)-o .wasm output runs dead-code elimination by default — prelude functions
and globals the program can't reach are dropped, along with the function-type
signatures left orphaned once those functions are gone (all behavior-preserving).
Add --tree-shake to also prune builtins the program never names literally, which
shrinks modules dramatically (e.g. print("hi"): 42 KB → ~7.6 KB) at the cost of
dynamic _G["name"] lookups of un-named builtins returning nil. As a further
--tree-shake win, a program that writes no tables of its own has _G and the
library tables populated by an append-only bootstrap insert, so the table
grow/rehash write path is dead-code eliminated entirely. --no-dce disables the
DCE pass.
Run under Node:
node --experimental-wasm-exnref runtime/host.mjs output.wasmMulti-file programs are linked at compile time via -m:
./build/lua2wasm main.lua -m util.lua -m wrap.lua -o output.watEach -m FILE becomes a require()-able module keyed by basename(FILE)
sans .lua. require("util") in any source (entry or module) walks
package.preload, calls the loader on first hit, and caches the result
in package.loaded — the standard Lua semantics, statically wired.
The numeric/call specialization pass (see Performance) is on by
default. To emit the boxed fallback instead — every value a host-GC object, no
unboxed i64/f64 — compile with -O0:
./build/lua2wasm input.lua -O0 -o output.watscripts/package-html.sh wraps a compiled .wasm into a single
self-contained HTML page — base64-embeds the module plus a tiny host loader,
about 10 KB of HTML overhead. The result needs no server and no other
files.
./scripts/package-html.sh output.wasm -o output.html
# open output.html — runs in any GC-capable browserThe live playground above is prebuilt, but you can build it yourself if you have Emscripten. It cross-compiles the compiler itself to WASM so the page compiles Lua entirely client-side: a CodeMirror editor on one side, output on the other, with Run to execute and Show WAT to inspect the generated WebAssembly (foldable prelude / stdlib / user code / data sections). Catppuccin theming with a sun/moon toggle between Mocha (dark) and Latte (light).
. ~/path/to/emsdk/emsdk_env.sh
./scripts/build-wasm.sh # produces build-em/lua2wasm.{js,wasm}
python3 -m http.server 8000
# open http://localhost:8000/runtime/playground.htmlEmscripten is needed only for this page — every other path in this README works without it.
flowchart LR
src["Lua source"] --> lex[lexer]
lex --> par[parser]
par --> cg[codegen]
cg --> wa["wat2wasm<br/>(built-in assembler)"]
wa --> out[".wasm"]
prelude[("static WAT prelude<br/>type defs · runtime<br/>helpers · builtin<br/>closures · stdlib init")] -.embedded.-> cg
scope[("scope analysis<br/>function-frame stack →<br/>local / upvalue /<br/>global / builtin slots")] -.feeds.-> par
classDef stage fill:#1e1e2e,stroke:#cba6f7,stroke-width:1.5px,color:#cdd6f4
classDef ioNode fill:#11111b,stroke:#89b4fa,stroke-width:1.5px,color:#cdd6f4
classDef helper fill:#181825,stroke:#7f849c,stroke-dasharray:3 3,color:#bac2de
class lex,par,cg,wa stage
class src,out ioNode
class prelude,scope helper
| Source file | Job |
|---|---|
src/lexer.{c,h} |
Hand-written lexer for the full Lua 5.5 lexical surface |
src/parser.{c,h} |
Recursive-descent + Pratt expressions; scope and upvalue analysis |
src/ast.{c,h} |
Tagged-union AST with a bump-allocator pool |
src/codegen.{c,h} |
Emits WAT to a WatBuilder; embeds a static runtime prelude |
src/builtins.{c,h} |
Single source of truth for builtin names → wasm function symbols |
src/wat_builder.{c,h} |
Dynamic string buffer for WAT emission |
src/wat2wasm.{c,h} |
Self-contained WAT→wasm binary assembler (library + standalone wat2wasm CLI; no Binaryen) |
src/emscripten_entry.c |
One-function entry point used when the compiler is itself compiled to WASM for the playground |
runtime/host.mjs |
Reference host: instantiates a compiled module and renders print output |
runtime/playground.html |
CodeMirror editor + in-browser compile + built-in wat→wasm + execute |
tests/ |
µnit unit tests + bash end-to-end fixtures + the wat2wasm assembler harness (currently 131 in CTest, all green); scripts/smoke-official-tests.sh runs the AOT pipeline over every file in official-tests/lua-5.5.0-tests/ for a compatibility scorecard |
Cards in roughly priority order. Open a discussion before tackling anything large.
<close>cleanup on early exits —__closeruns at natural block exit;return/break/goto/ error unwinds through a scope still skip it. Completing it needs an active-close-scope stack in codegen plus per-scopetry_tableunwinding — a milestone, not a small fix.- Full
debug.*library + multi-frame traceback (todaydebug.tracebackreturns the current frame only;debug.getinfois unimplemented — itsnparams/nups/isvararg/linedefinedfields need per-function metadata a no-bytecode AOT compiler doesn't retain). load(stub returns(nil, "no load")) /loadfile(unimplemented) — full dynamic loading would need the compiler itself shipped at runtime. (requirealready works via-m FILEmodules baked in at compile time.)- Dynamic error messages with the offending value embedded (e.g.
"invalid format option 'r'"instead of just"invalid format"). - Coroutines — blocked on the WASM stack-switching proposal landing in browsers.
- Source maps so DevTools can step from compiled WASM back into Lua.
- A more aggressive size optimizer.
-o .wasmalready runs dead-code elimination (functions, globals, and orphaned signatures), and--tree-shakeprunes unused builtins, but there's no instruction-level optimization or inlining like Binaryen'swasm-opt -Oz. A built-in native pass would close the remaining gap without any external dependency — the project (CLI and playground alike) no longer pulls in Binaryen at all.
Commits follow Conventional Commits.
Common types in this repo: feat:, fix:, refactor:, test:, docs:,
chore:, build:. Use a scope when it helps,
e.g. feat(parser): handle long-string literals.
New language features land behind an end-to-end fixture before they land as syntax in the parser. If you can't print it, you didn't build it.
MIT — see LICENSE.