Skip to content

yeslogic/regent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regent

The Prince proxy.

A host-allowlisting HTTP/HTTPS proxy companion for Prince. Restricts which URLs Prince can fetch during PDF conversion, so a malicious or compromised input document can't exfiltrate data, hit internal services, or reach attacker-chosen hosts.

Threat model: a trusted client (Prince) processing untrusted input (templates that may contain attacker-supplied URLs). If your HTML and CSS input is fully trusted, you don't need this.

Quick start

cargo build --release
./target/release/regent --allow-external www.example.com
# listening on http://127.0.0.1:8100

Point Prince at it:

prince --http-proxy=http://127.0.0.1:8100 input.html -o out.pdf

Fetches from www.example.com succeed. Everything else returns 403 Forbidden; Prince logs warning: error: 403 and renders the document without the blocked resource.

Prince's single --http-proxy=URL flag covers both HTTP and HTTPS. HTTP requests are forwarded; HTTPS uses CONNECT to tunnel the encrypted bytes through — the proxy never sees inside TLS, so upstream certificates validate end-to-end.

Allowlist

Every host must be classified as external (resolves to a public IP) or internal (resolves to loopback / RFC 1918 / link-local / IPv6 ULA):

regent --allow-external www.example.com \
       --allow-internal localhost

After the hostname matches the allowlist, the proxy resolves it and refuses to connect if the resolved IP class doesn't match the declared class. This catches split-horizon DNS surprises, DNS rebinding, and SSRF attempts via a nominally-public host that resolves to a private address — including the cloud metadata service at 169.254.169.254.

Both flags accept comma-separated entries or repeated use: --allow-external a.com,b.com is the same as --allow-external a.com --allow-external b.com.

Port pinning is host:port. Without a port, any port matches.

Wildcards (opt-in)

Wildcards are off by default. Enable with --wildcard:

regent --wildcard --allow-external '*.example.com'

*.host matches exactly one subdomain label:

Host *.example.com
foo.example.com match
bar.example.com match
example.com (apex) no match
a.b.example.com (two levels) no match
evil-example.com (no label boundary) no match

To allow both apex and subdomains, list both:

regent --wildcard --allow-external 'example.com,*.example.com'

Public-suffix warnings

If a wildcard's parent is itself a Mozilla Public Suffix, the proxy emits a startup warning:

WARN  *.github.io: 'github.io' is a public suffix; wildcard matches resources of other registrants

This catches the common footguns: *.co.uk, *.com, *.s3.amazonaws.com, *.github.io, *.cloudfront.net, *.azurewebsites.net. Each of those allows any unrelated third party to register a subdomain — almost never what you want.

Important limitation: the PSL warning catches subdomain multi-tenancy, not path multi-tenancy. *.amazonaws.com won't warn, but it matches s3.amazonaws.com, which serves every customer's buckets via path-style URLs (s3.amazonaws.com/<bucket>/<key>). The proxy can't see paths inside HTTPS tunnels. Prefer specific hostnames over wildcards on cloud platform domains.

Exact wins over wildcard

When an entry appears in both classes via different specificities, the more-specific exact match wins:

regent --wildcard \
    --allow-external '*.example.com' \
    --allow-internal internal.example.com
# internal.example.com → internal (exact match wins)
# marketing.example.com → external (wildcard match)

A pattern listed in both classes is a parse error — a host has one class, not two. Applies to both exact entries (--allow-external foo.com --allow-internal foo.com) and wildcards (--allow-external '*.foo.com' --allow-internal '*.foo.com').

Why classification rather than a plain allowlist

A hostname-only allowlist is fooled by DNS:

  • A misconfigured api.theirorg.com that secretly resolves to 192.168.1.5 via split-horizon DNS would otherwise grant intranet access through Prince.
  • An attacker-controlled domain that resolves to 169.254.169.254 (AWS/GCP/Azure metadata service) would otherwise let a template leak IAM credentials.
  • DNS rebinding could swap a public IP for an internal one between checks.

Classification forces you to declare what you expect, and the proxy connects only to a specific verified IP — no TOCTOU window.

If you declare a host external but its DNS resolves to a private IP (or vice versa), the connection is denied with a message that names the resolved IP and points at the fix:

50.116.12.169:80 is external, but allowlist entry is internal;
move to --allow-external or check DNS

JSON config (recommended for production)

For per-render allowlists, build a JSON config and pipe it via stdin. Avoids shell-escaping a long allowlist into argv.

regent = subprocess.Popen(
    ["regent", "--config", "-"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
)
regent.stdin.write(json.dumps({
    "listen": "127.0.0.1:0",          # 0 = ephemeral port
    "wildcard": True,
    "allow": {
        "external": ["www.example.com", "*.example.com"],
        "internal": ["localhost"],
    },
}).encode())
regent.stdin.close()

# First stdout line is "listening on http://127.0.0.1:NNNNN"
bound = regent.stdout.readline().decode().strip()
addr = bound.removeprefix("listening on http://").strip()

subprocess.run([
    "prince",
    f"--http-proxy=http://{addr}",
    "input.html", "-o", "out.pdf",
])
regent.terminate()

--config /path/to/file.json reads from a file instead.

JSON schema:

{
  "listen": "127.0.0.1:0",
  "wildcard": false,
  "allow": {
    "external": ["host", "*.host", "host:port"],
    "internal": ["host", "host:port"]
  }
}

All fields optional. Unknown fields are rejected with a useful error:

$ echo '{"allow_external":["x"]}' | regent --config -
error: parsing config <stdin>: unknown field `allow_external`, expected one of `listen`, `wildcard`, `allow` at line 1 column 17

--config and the CLI allowlist flags (--allow-external, --allow-internal, --listen, --wildcard) are mutually exclusive.

Logging

Logs go to stderr. Stdout is reserved for the listening on http://... line — that's the IPC contract for the parent process to learn the bound port.

Default level is info, which prints one line per allowed connection and one per denial:

2026-05-23T02:10:57.889119Z  INFO ALLOW CONNECT www.example.com:443 → 93.184.216.34:443
2026-05-23T02:10:58.606213Z  WARN DENY CONNECT evil.com:443 (host not in allowlist)

Set verbosity via RUST_LOG:

RUST_LOG=warn denials and errors only
RUST_LOG=info (default) the above + allowed connections
RUST_LOG=regent=debug + ACCEPT/CLOSE, TUNNEL byte counts
RUST_LOG=hyper=debug hyper's internal events (useful for upstream protocol debugging)

ANSI colors auto-detect: enabled when stderr is a terminal, disabled when piped to a file or aggregator.

Troubleshooting

error: wildcard syntax "*.x" requires --wildcard — add --wildcard (and consider whether you really need it).

error: entry "x" appears in both --allow-external and --allow-internal — a host can have only one class; remove it from one list.

WARN DENY ... (... is external, but allowlist entry is internal ...) — the host's resolved IP doesn't match the declared class. Either move the entry to the other list, or check why DNS is returning what it is.

WARN DENY ... (host not in allowlist) — Prince tried to fetch a URL whose host isn't allowlisted at all. Add it (with the right class), or accept the denial.

Prince exits 0 with PDF produced despite denials — this is Prince's default behavior: it logs warning: error: 403 and continues rendering. Inspect Prince's stderr if you want to detect denials as failures.

Limitations

  • Plain HTTP doesn't pool upstream connections. Each forwarded HTTP request opens a fresh TCP connection. HTTPS via CONNECT is unaffected (Prince's connection reuse rides through the tunnel transparently). Visible only on HTTP-heavy renders to a remote host.
  • No connect timeout beyond the OS default (~75–130 s per address). Pair with Prince's --http-timeout=N if you need tight latency bounds.
  • PSL warning catches subdomain tenancy, not path tenancy. See the wildcards section.
  • IPv6 literals aren't valid allowlist entries. Hostnames resolving to IPv6 work fine; specifying [::1]:80 as an entry is rejected.
  • IPv4 string canonicalization isn't performed. 127.0.0.1 and 127.000.000.001 are different strings. Failure mode is over-strict (denial) rather than over-permissive — safe, but may surprise.
  • DNS rebinding is mitigated within a single upstream connection (resolve-once, connect-by-IP). Across separate connections, DNS is re-resolved.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages