feat(compiler): support registered host-await builtins for natural function call syntax#667
feat(compiler): support registered host-await builtins for natural function call syntax#667kusha wants to merge 4 commits into
Conversation
Allow hosts to register function names at compile time so that calls to those names emit HostAwait instructions directly, enabling natural syntax like fetch(x) instead of __builtin_host_await(x, "fetch"). - Add host_await_builtins map and register_host_await_builtin() to Compiler - Validate arg_count == 1 and reject reserved __builtin_host_await name - Extend determine_call_target() resolution: explicit > registered > user > builtin - Both explicit and registered paths emit identical HostAwait bytecode - Add compile_from_policy_with_host_await() entry point in rules.rs - Extended test harness with HostAwaitBuiltinSpec and args assertion - 9 YAML test cases: suspend/resume, run-to-completion, multiple names, queue, shadowing, object packing, arg_count rejection, reserved name rejection, standard builtin override - Documentation: instruction-set.md, architecture.md
There was a problem hiding this comment.
Pull request overview
This PR extends the Rego→RVM compiler to support “registered” host-await builtins, allowing natural function-call syntax (e.g. lookup(x)) to compile directly into HostAwait instructions, while retaining the explicit __builtin_host_await(arg, id) path.
Changes:
- Add
Compiler::compile_from_policy_with_host_await(...)andCompiler::register_host_await_builtin(...)to configure host-await builtin names (witharg_count == 1enforced). - Extend call target resolution to prioritize explicit
__builtin_host_await, then registered host-await names, then user-defined functions, then standard builtins. - Update the YAML-based RVM Rego test harness to pass registered builtins and (in suspendable mode) assert the host-await argument payload; add a new YAML suite covering registered host-await scenarios.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/languages/rego/compiler/mod.rs |
Adds compiler state + API to register host-await builtins and validate registration constraints. |
src/languages/rego/compiler/rules.rs |
Adds a new compile entry point that accepts registered host-await builtins and wires it into compilation. |
src/languages/rego/compiler/function_calls.rs |
Implements resolution order + emits HostAwait for registered names by auto-loading the identifier literal. |
tests/rvm/rego/mod.rs |
Extends harness to pass registered builtin specs into compilation and validate suspendable host-await arguments. |
tests/rvm/rego/cases/registered_host_await.yaml |
Adds test coverage for registered host-await behavior (shadowing, queueing, override, validation errors). |
docs/rvm/instruction-set.md |
Documents registered host-await builtin syntax, resolution order, and argument handling constraints. |
docs/rvm/architecture.md |
Updates architecture walkthrough to describe both explicit and registered HostAwait emission paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Mark Birger <birgerm@yandex.ru>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert_eq!( | ||
| actual, expected, | ||
| "HostAwait argument mismatch for {:?}: expected {:?}, got {:?}", | ||
| identifier, expected, actual | ||
| ); |
There was a problem hiding this comment.
assert_eq! will panic on a HostAwait argument mismatch, which bypasses this test harness’s normal error propagation (and can skip listing dumps / want_error handling). Prefer returning an anyhow::Error (or using the existing bail-with-listing pattern by threading case context) so failures are reported consistently.
| assert_eq!( | |
| actual, expected, | |
| "HostAwait argument mismatch for {:?}: expected {:?}, got {:?}", | |
| identifier, expected, actual | |
| ); | |
| if actual != expected { | |
| return Err(anyhow::anyhow!( | |
| "HostAwait argument mismatch for {:?}: expected {:?}, got {:?}", | |
| identifier, | |
| expected, | |
| actual | |
| )); | |
| } |
|
|
||
| result := kv_store(input.key, input.value) | ||
| query: data.demo.result | ||
| # Registration panics because arg_count must be 1. |
There was a problem hiding this comment.
The comment says registration "panics" when arg_count is invalid, but the compiler path returns a compile-time error (register_host_await_builtin returns Err). Consider rewording to avoid implying a panic.
| # Registration panics because arg_count must be 1. | |
| # Registration fails with an error because arg_count must be 1. |
| @@ -0,0 +1,240 @@ | |||
| cases: | |||
There was a problem hiding this comment.
More test cases to lock down behavior:
- Out param syntax: e.g:
lookup(in, out) - Test case that uses __builtin_host_await and registerd builtins in same policy
- Empty host await builtins
There was a problem hiding this comment.
Fixed: out works, test cases extended, empty
| pub fn compile_from_policy_with_host_await( | ||
| policy: &CompiledPolicy, | ||
| entry_points: &[&str], | ||
| host_await_builtins: &[(&str, usize)], |
There was a problem hiding this comment.
Scenarios to define behavior for:
- Duplicate entries in host_await_builtins
- Empty host_await_builtins
- A name is empty or whitespace
There was a problem hiding this comment.
Fixed: empty means no registered
| (arg_regs[0], arg_regs[1]) | ||
| } else { | ||
| // Registered host-awaitable builtin — identifier is the function name | ||
| if arg_regs.len() != 1 { |
There was a problem hiding this comment.
Need to define behavior when registered host await builtin shadows a user write policy rule (both regular and function)
There was a problem hiding this comment.
Can we move it out of scope for this PR? Shadowing already happens for __builtin_host_await. Also we need to decide on approach. IMO compilation check is not great choice as it makes compiled policies dependent on engine state. The easiest would be to prioritize user defined if it is present.
| } | ||
|
|
||
| // Check registered host-awaitable builtins | ||
| if let Some(&arg_count) = self.host_await_builtins.get(original_fcn_path) { |
There was a problem hiding this comment.
We need to detect shadowing here.
|
|
||
| // Check registered host-awaitable builtins | ||
| if let Some(&arg_count) = self.host_await_builtins.get(original_fcn_path) { | ||
| return Ok(CallTarget::HostAwait { |
There was a problem hiding this comment.
Maybe it might be clear to distinguish between the builtin and the custom registered builtins via a new variant say RegisteredHostAwait. That way the rest of the code doesn't need to depend on the builtin's name (__builtin_host_await) to distinguish between builtin and registered.
There was a problem hiding this comment.
Fixed, but please do a thorough review of 357fbf3
… registration - Compiler::register_host_await_builtin now rejects duplicate, empty, and whitespace-only names. Previously a duplicate registration would silently overwrite the existing entry, which could mask the host's own registration mistakes. - YAML test cases added: empty registration list as no-op, duplicate name rejection, empty/whitespace name rejection, out-param (a, out) calling syntax with a single-arg registered builtin, and mixed __builtin_host_await + registered builtins in the same policy consuming from their respective identifier queues. - Test harness: replace assert_eq! on HostAwait argument mismatch with anyhow::Error so mismatches propagate through the case reporter instead of panicking and skipping the harness's normal error path. - YAML comment fix: "Registration panics" -> "Registration fails with an error" (registration returns Err, never panics). Addresses anakrish + Copilot inline review comments on PR microsoft#667. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…riants Addresses PR microsoft#667 review item microsoft#8: at the emit site in `compile_function_call`, the discrimination between explicit `__builtin_host_await(arg, id)` and a registered host-awaitable builtin was being recovered by string-comparing `original_fcn_path` against `"__builtin_host_await"`. The information was already known in `determine_call_target` and was being thrown away. Replace the single `CallTarget::HostAwait` variant with two: * `ExplicitHostAwait` (unit) — the two-argument call form. The identifier register comes from the user's second argument. * `RegisteredHostAwait { identifier: String }` — the one-argument call form for registered builtins. The identifier is the registered name and is captured in the variant at recognition time, so the emit site never re-derives it from the function path. This removes the magic-string comparison at the emit site (the source of truth is now `determine_call_target`) and makes both match sites in `compile_function_call` exhaustive over the two forms — adding a third host-await form in the future would force a compile error at every match site instead of silently falling through. Arities are now hardcoded in the `expected_args` extraction (`Some(2)` for explicit, `Some(1)` for registered) rather than carried in the variant; registered builtins are constrained to `arg_count == 1` at registration time, so there is no per-call variability to carry. Bytecode output is unchanged; the full RVM test suite (97 cases) and the registered_host_await suite (15 cases) pass without modification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot Review this PR using all the skills in this repo. Use a separate agent for each skill. |
Summary
Extends the RVM compiler to accept a list of host function names at compile time. Calls to registered names are compiled directly into
HostAwaitinstructions, allowing policy authors to writelookup(input.account_id)instead of__builtin_host_await(input.account_id, "lookup").Motivation
No compile-time validation of host-supported identifiers: The identifier passed to
__builtin_host_awaitis a runtime string. Typos like"lokup"instead of"lookup"are silently accepted by the compiler and only discovered at runtime when the host rejects the identifier.Wrapper workaround breaks caller identification: Defining a wrapper
lookup(x) := __builtin_host_await(x, "lookup")compiles theHostAwaitinstruction inside the wrapper module, not the calling module — breaking caller identification for authorization and auditing.Host can override standard builtins: Registering a name that matches a standard Rego builtin (e.g.
time.parse_duration_ns) lets the host intercept that call and provide its own implementation.Changes
Compiler (
src/languages/rego/compiler/)register_host_await_builtin()onCompilerto declare host function names. Validatesarg_count == 1and rejects the reserved name__builtin_host_await.compile_from_policy_with_host_await()entry point that accepts the builtin list. Existingcompile_from_policy()delegates with an empty list.determine_call_target()extended with resolution order: explicit__builtin_host_await→ registered host-await → user-defined → standard builtin.HostAwait { dest, arg, id }bytecode.Documentation (
docs/rvm/)instruction-set.md: Registered builtin syntax, resolution order, argument handling.architecture.md: Updated to describe both emission paths.Tests (
tests/rvm/rego/)HostAwaitBuiltinSpecand argument assertion.Constraints
arg_countmust be 1: TheHostAwaitinstruction carries a single argument register. Use object packing to pass multiple values:lookup({"key1": v1, "key2": v2}). This can be lifted in the future by having the compiler auto-pack multiple arguments into an array or object.__builtin_host_awaitis reserved: Attempting to register this name produces a compile-time error. It is handled by a dedicated code path (explicit 2-argument form) and cannot be overridden.Limitations
__builtin_host_awaitstill exists: The raw builtin remains functional. Policy authors who discover it can use arbitrary identifier strings. However, the host controls how identifiers are resolved and can reject or error on unexpected identifiers at runtime.