The Prince proxy.
A host-allowlisting HTTP/HTTPS proxy companion for Prince. Restricts which URLs Prince can fetch during PDF conversion, so a malicious or compromised input document can't exfiltrate data, hit internal services, or reach attacker-chosen hosts.
Threat model: a trusted client (Prince) processing untrusted input (templates that may contain attacker-supplied URLs). If your HTML and CSS input is fully trusted, you don't need this.
cargo build --release
./target/release/regent --allow-external www.example.com
# listening on http://127.0.0.1:8100
Point Prince at it:
prince --http-proxy=http://127.0.0.1:8100 input.html -o out.pdf
Fetches from www.example.com succeed. Everything else returns 403 Forbidden; Prince logs warning: error: 403 and renders the document without the blocked resource.
Prince's single --http-proxy=URL flag covers both HTTP and HTTPS. HTTP requests are forwarded; HTTPS uses CONNECT to tunnel the encrypted bytes through — the proxy never sees inside TLS, so upstream certificates validate end-to-end.
Every host must be classified as external (resolves to a public IP) or internal (resolves to loopback / RFC 1918 / link-local / IPv6 ULA):
regent --allow-external www.example.com \
--allow-internal localhost
After the hostname matches the allowlist, the proxy resolves it and refuses to connect if the resolved IP class doesn't match the declared class. This catches split-horizon DNS surprises, DNS rebinding, and SSRF attempts via a nominally-public host that resolves to a private address — including the cloud metadata service at 169.254.169.254.
Both flags accept comma-separated entries or repeated use: --allow-external a.com,b.com is the same as --allow-external a.com --allow-external b.com.
Port pinning is host:port. Without a port, any port matches.
Wildcards are off by default. Enable with --wildcard:
regent --wildcard --allow-external '*.example.com'
*.host matches exactly one subdomain label:
| Host | *.example.com |
|---|---|
foo.example.com |
match |
bar.example.com |
match |
example.com (apex) |
no match |
a.b.example.com (two levels) |
no match |
evil-example.com (no label boundary) |
no match |
To allow both apex and subdomains, list both:
regent --wildcard --allow-external 'example.com,*.example.com'
If a wildcard's parent is itself a Mozilla Public Suffix, the proxy emits a startup warning:
WARN *.github.io: 'github.io' is a public suffix; wildcard matches resources of other registrants
This catches the common footguns: *.co.uk, *.com, *.s3.amazonaws.com, *.github.io, *.cloudfront.net, *.azurewebsites.net. Each of those allows any unrelated third party to register a subdomain — almost never what you want.
Important limitation: the PSL warning catches subdomain multi-tenancy, not path multi-tenancy. *.amazonaws.com won't warn, but it matches s3.amazonaws.com, which serves every customer's buckets via path-style URLs (s3.amazonaws.com/<bucket>/<key>). The proxy can't see paths inside HTTPS tunnels. Prefer specific hostnames over wildcards on cloud platform domains.
When an entry appears in both classes via different specificities, the more-specific exact match wins:
regent --wildcard \
--allow-external '*.example.com' \
--allow-internal internal.example.com
# internal.example.com → internal (exact match wins)
# marketing.example.com → external (wildcard match)
A pattern listed in both classes is a parse error — a host has one class, not two. Applies to both exact entries (--allow-external foo.com --allow-internal foo.com) and wildcards (--allow-external '*.foo.com' --allow-internal '*.foo.com').
A hostname-only allowlist is fooled by DNS:
- A misconfigured
api.theirorg.comthat secretly resolves to192.168.1.5via split-horizon DNS would otherwise grant intranet access through Prince. - An attacker-controlled domain that resolves to
169.254.169.254(AWS/GCP/Azure metadata service) would otherwise let a template leak IAM credentials. - DNS rebinding could swap a public IP for an internal one between checks.
Classification forces you to declare what you expect, and the proxy connects only to a specific verified IP — no TOCTOU window.
If you declare a host external but its DNS resolves to a private IP (or vice versa), the connection is denied with a message that names the resolved IP and points at the fix:
50.116.12.169:80 is external, but allowlist entry is internal;
move to --allow-external or check DNS
For per-render allowlists, build a JSON config and pipe it via stdin. Avoids shell-escaping a long allowlist into argv.
regent = subprocess.Popen(
["regent", "--config", "-"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
regent.stdin.write(json.dumps({
"listen": "127.0.0.1:0", # 0 = ephemeral port
"wildcard": True,
"allow": {
"external": ["www.example.com", "*.example.com"],
"internal": ["localhost"],
},
}).encode())
regent.stdin.close()
# First stdout line is "listening on http://127.0.0.1:NNNNN"
bound = regent.stdout.readline().decode().strip()
addr = bound.removeprefix("listening on http://").strip()
subprocess.run([
"prince",
f"--http-proxy=http://{addr}",
"input.html", "-o", "out.pdf",
])
regent.terminate()--config /path/to/file.json reads from a file instead.
JSON schema:
{
"listen": "127.0.0.1:0",
"wildcard": false,
"allow": {
"external": ["host", "*.host", "host:port"],
"internal": ["host", "host:port"]
}
}All fields optional. Unknown fields are rejected with a useful error:
$ echo '{"allow_external":["x"]}' | regent --config -
error: parsing config <stdin>: unknown field `allow_external`, expected one of `listen`, `wildcard`, `allow` at line 1 column 17
--config and the CLI allowlist flags (--allow-external, --allow-internal, --listen, --wildcard) are mutually exclusive.
Logs go to stderr. Stdout is reserved for the listening on http://... line — that's the IPC contract for the parent process to learn the bound port.
Default level is info, which prints one line per allowed connection and one per denial:
2026-05-23T02:10:57.889119Z INFO ALLOW CONNECT www.example.com:443 → 93.184.216.34:443
2026-05-23T02:10:58.606213Z WARN DENY CONNECT evil.com:443 (host not in allowlist)
Set verbosity via RUST_LOG:
RUST_LOG=warn |
denials and errors only |
RUST_LOG=info (default) |
the above + allowed connections |
RUST_LOG=regent=debug |
+ ACCEPT/CLOSE, TUNNEL byte counts |
RUST_LOG=hyper=debug |
hyper's internal events (useful for upstream protocol debugging) |
ANSI colors auto-detect: enabled when stderr is a terminal, disabled when piped to a file or aggregator.
error: wildcard syntax "*.x" requires --wildcard — add --wildcard (and consider whether you really need it).
error: entry "x" appears in both --allow-external and --allow-internal — a host can have only one class; remove it from one list.
WARN DENY ... (... is external, but allowlist entry is internal ...) — the host's resolved IP doesn't match the declared class. Either move the entry to the other list, or check why DNS is returning what it is.
WARN DENY ... (host not in allowlist) — Prince tried to fetch a URL whose host isn't allowlisted at all. Add it (with the right class), or accept the denial.
Prince exits 0 with PDF produced despite denials — this is Prince's default behavior: it logs warning: error: 403 and continues rendering. Inspect Prince's stderr if you want to detect denials as failures.
- Plain HTTP doesn't pool upstream connections. Each forwarded HTTP request opens a fresh TCP connection. HTTPS via CONNECT is unaffected (Prince's connection reuse rides through the tunnel transparently). Visible only on HTTP-heavy renders to a remote host.
- No connect timeout beyond the OS default (~75–130 s per address). Pair with Prince's
--http-timeout=Nif you need tight latency bounds. - PSL warning catches subdomain tenancy, not path tenancy. See the wildcards section.
- IPv6 literals aren't valid allowlist entries. Hostnames resolving to IPv6 work fine; specifying
[::1]:80as an entry is rejected. - IPv4 string canonicalization isn't performed.
127.0.0.1and127.000.000.001are different strings. Failure mode is over-strict (denial) rather than over-permissive — safe, but may surprise. - DNS rebinding is mitigated within a single upstream connection (resolve-once, connect-by-IP). Across separate connections, DNS is re-resolved.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.