Add RDP E2E tests#230
Draft
bernie-g wants to merge 14 commits into
Draft
Conversation
# Conflicts: # packages/pam/pam-proxy.go
fix: add keepalive, idle timeout, and session cleanup to gateway
goreleaser v1.x's mode: append cannot find draft releases (GitHub's "get release by tag" API excludes drafts), so the pre-created draft was ignored and goreleaser created a second, separate release. Fix: remove the create-release-draft job and let the goreleaser jobs handle release creation themselves via mode: append. The first job to finish publishing creates the release, the others append to it. Also replaces the broken --skip=build (v2-only flag) with separate dry-run/release goreleaser steps.
fix(release): remove --skip=build flag unsupported in goreleaser v1.26.2
* feat(pam): Oracle database support via proxied-auth gateway
Adds Oracle DB as the 8th PAM handler. Gateway accepts client
connections with a placeholder password, proxies pre-auth bytes
verbatim to upstream, intercepts at O5Logon to swap client-supplied
placeholder-keyed material for real-password-keyed material, and
byte-relays post-auth. Credential injection works end-to-end for
JDBC thin clients (sqlcl, SQL Developer, DBeaver) and go-ora;
user never sees real Oracle credentials.
Handler lives in packages/pam/handlers/oracle/. Ports crypto
primitives (PBKDF2+SHA512, AES-CBC session-key encryption, PKCS5
padding) and TTC codec (compressed ints, CLR byte strings, KVP
encoding) from MIT-licensed sijms/go-ora. See ATTRIBUTION.md and
ORACLE_PAM_NOTES.md for architecture notes and handoff.
Known dead code remains in this commit from the earlier full-
impersonation attempt (ano.go, nego.go, nego_templates.go, parts
of o5logon*.go, upstream.go, handshake_test.go). Kept intact to
preserve history of the approach; cleanup follows as a separate
commit.
* refactor(pam-oracle): remove impersonation-era dead code
The initial attempt used server-side Oracle impersonation (see prior
commit). That design worked through authentication but hit a
state-mismatch problem post-auth: upstream (via go-ora) and client
negotiated different TTC capabilities, so relayed queries were
rejected as protocol violations.
The replacement — proxied-auth — is already in proxy_auth.go and is
the flow wired to HandleConnection. This commit removes the dead
files and vestigial symbols that supported the old path:
Removed entirely:
- ano.go, nego.go, nego_templates.go (pre-auth TNS/TTC negotiation
handlers; pre-auth is now forwarded verbatim to upstream)
- upstream.go (go-ora-based upstream dial + KVP extraction;
replaced by dialUpstreamRaw in proxy_auth.go)
- handshake_test.go (tested the impersonation path, orphaned)
Pruned:
- proxy.go: handleConnectionLegacy (~200 LOC)
- o5logon.go: O5LogonServerState, NewO5LogonServerState,
VerifyClientPassword, deriveKey11g, md5Hash, parseIntVal
- o5logon_server.go: AuthPhaseOne, ParseAuthPhaseOne,
BuildAuthPhaseOneResponse, BuildAuthPhaseTwoResponse,
BuildAuthPhaseTwoResponseFromUpstream, RunServerO5Logon,
dumpBytes, readUint32
- tns.go: AcceptPacket, AcceptFromConnect, ConnectPacket,
ParseConnectPacket, MarkerPacketBytes (we forward raw packet
bytes rather than parse/build CONNECT or ACCEPT)
Kept: crypto primitives, DATA packet codec, TTC reader/builder,
query logger, prependedConn, error helpers — all live in the
proxied-auth flow.
Drops github.com/sijms/go-ora/v2 from go.mod — no longer imported.
Adds .idea and .vscode to .gitignore. Updates ORACLE_PAM_NOTES.md
with a current-state header; historical sections below retained
for context.
Net: ~1,600 LOC removed. The handler directory goes from 12 files
to 8. Build, fmt, and the Oracle SQL test matrix (SELECT, INSERT,
DDL, PL/SQL, bind vars, NLS queries) still pass against sqlcl →
gateway → AWS RDS Oracle 19c.
* fix(pam-oracle): capture SQL in session recordings
The TTC query extractor was logging empty strings for OALL8 payloads
because tryExtractSQL's "skip 6 compressed ints then read a CLR"
heuristic didn't land on the SQL text — the OALL8 wire format has
variable-length headers that differ by client driver and bind
pattern. As a result, session recordings contained only session
headers (124 bytes) with no actual query content.
Replace structured parsing with a simple scan for the longest
printable ASCII run in the payload. In practice the SQL text is
always the longest such run. Verified with sqlcl: .enc file grows
from 124 bytes (empty) to ~880 bytes (with captured queries) for
a SELECT + COMMIT session.
This only affects content — the tap, packet demultiplexing, and
encrypted file I/O were all working correctly. Fix is localised to
tryExtractSQL.
* fix(pam-oracle): capture queries bundled behind piggyback OCLOSE
sqlcl (and other JDBC thin clients) frequently bundle a piggybacked OCLOSE
for the previous cursor with the next OALL8 query in a single TTC packet.
The previous parser checked byte 0 for 0x03 (function call) and bailed when
it saw 0x11 (piggyback marker), missing the OALL8 underneath.
Scan the payload for the function-call + OALL8 byte pair instead, so the
parser finds the query regardless of any preceding piggyback prefix. Same
treatment for COMMIT and ROLLBACK, which also get piggybacked.
* feat(pam-oracle): TLS (TCPS) support for upstream
Matches the other SQL handlers' pattern: the client speaks plain TCP to
our local listener, we do TLS to the upstream database and translate in
the middle. No change to the client-facing UX — the same JDBC URL that
works against a plain-TCP Oracle resource now also works against a
TCPS-enabled one.
Implementing this correctly required discovering Oracle TCPS's two-handshake
flow (by reading go-ora's network/session.go readPacket RESEND branch —
credit there):
1. Dial TCP, do an initial TLS handshake. Forward the client's CONNECT
through this first TLS session.
2. When the upstream RESEND packet's byte-5 flag has 0x08 set, Oracle
expects the client to abandon the first TLS session and run a FRESH
TLS handshake on the bare TCP socket. Server drops its first-round
TLS state in lockstep. We mirror this via upgradeToTLS on the same
rawConn, then continue the Oracle handshake through the new session.
3. Mask byte 5 on packets going downstream so thin clients (JDBC thin,
python-oracledb thin) don't see the 0x08 signal and try to cast their
local TcpNTAdapter to TcpsNTAdapter — the cast would fail because the
client-to-proxy socket is plain TCP.
4. Accept TLS 1.0 as the floor for the upstream dial: Oracle 19c's
second-round handshake negotiates down to 1.0 in some configurations
(AWS RDS's SSL option being one of them). The outer ALPN mTLS tunnel
remains TLS 1.2+.
Also removes now-dead SSLRejectUnauthorized / SSLCertificate fields from
OracleProxyConfig — the shared TLSConfig built in pam-proxy.go carries
that information already.
Verified end-to-end against AWS RDS Oracle 19c (SSL option, port 2484)
with sqlcl: authentication, DDL, DML+COMMIT, PL/SQL, DBMS_OUTPUT, bind
variables, and session recording all work. Plain-TCP Oracle path is
unchanged.
* chore(pam-oracle): tighten TCPS TLS config comment
Pure comment cleanup on buildOracleTLSConfig — no behavior change.
* chore(pam-oracle): update stale comments + attribution
- ATTRIBUTION.md: drop reference to nego.go (deleted in the
impersonation-era cleanup) and add the upstream TCPS two-handshake
flow adaptation from go-ora's session.readPacket RESEND branch.
- o5logon_server.go: the file header still described the
impersonation-era architecture where the gateway acted as an Oracle
server and drove the O5Logon exchange. Rewrite it to describe the
current proxied-auth role — packet-layer helpers used by the byte-
level O5Logon translation in proxy_auth.go — and drop the reference
to upstream.go, which was removed in 3ff9cff.
* chore(pam-oracle): fix go vet warnings in proxy_auth.go
Two warnings, both from the original feat commit (4b25ec2):
- Dead self-assignment state.ServerSessKey = state.ServerSessKey on
the phase-2 request translation path. The inline comment "no-op;
kept for clarity" already admitted it was pointless. Delete.
- Unreachable _ = prefix after a return statement inside
replaceKvpValueKeepingSize. The prefix variable was left over from
a refactor — the new code uses oldStart / oldEnd instead — and the
_ = prefix trick to silence "declared but not used" landed on the
wrong side of the return. Delete both the dead prefix declaration
and the unreachable suppression. Also drops valStart (only used
by the dead prefix computation).
No behavior change; go vet is now clean on the oracle handler.
* chore(pam-oracle): drop TNS_ADMIN dance, shorten placeholder password
Two UX simplifications to the Oracle PAM access command:
1. Stop creating a per-session TNS_ADMIN directory and printing an
`export TNS_ADMIN=...` line in the connect instructions. The
directory only existed to hold a sqlnet.ora that set
DISABLE_OOB=TRUE — a defence against sqlcl's out-of-band Ctrl-C
signalling that we never actually observed breaking. If a real
interrupt problem surfaces we can revisit with a proper fix
instead of a per-session file dance.
2. Change ProxyPasswordPlaceholder from "infisical-pam-proxy" to
"password". The string value is cryptographically arbitrary —
Oracle's O5Logon needs the client and gateway to agree on SOME
string; any works. Shorter is easier to copy-paste. The
accompanying "not a real credential" note in the CLI output stays.
* chore(pam-oracle): drop unrelated changes from branch
Three items that snuck in but aren't part of Oracle PAM:
1. Untrack ORACLE_PAM_NOTES.md. It's a development notes/research log
that belongs with scratch work, not in the branch. The file stays
on disk (uncommitted) for local reference. Its content is also
stale since the placeholder password changed from
"infisical-pam-proxy" to "password" in 3ee7951 without matching
updates to the notes.
2. Revert the github.com/emirpasic/gods indirect bump from v1.12.0
to v1.18.1 in go.mod/go.sum. This was residue from when go-ora/v2
was temporarily added as a direct dep in the initial feat commit —
go-ora needed the newer gods, and Go module MVS held the bumped
version even after go-ora was removed in 3ff9cff. Running
`go mod tidy` against current HEAD (with main's go.mod/go.sum as
the baseline) produces no further changes, confirming nothing we
ship actually needs v1.18.1.
.gitignore additions (.vscode, .idea) stay — reasonable hygiene that
doesn't hurt the PR and will prevent future editor-file noise.
* fix(pam-oracle): AI-review findings + username parity with other handlers
- Password plaintext no longer embedded in "password mismatch" error
(was bubbling to gateway logs via zerolog's .Err chain).
- Long Oracle passwords (≥ 96 chars) now encode correctly: replaceKVPValue
routes AUTH_PASSWORD through TTCBuilder.PutClr so values above the
short-form threshold emit the 0xFE chunked form instead of a truncated
single-byte length.
- Client-supplied username rewritten to InjectUsername in phase-1 and
phase-2 auth requests. Matches the effect of how the postgres/mysql/mssql
handlers overwrite the startup-packet user — the client's choice becomes
inert; upstream always looks up the configured account's verifier.
- Dead / misleading code removed: local PacketTypeResendMarker constant
that duplicated tns.go's PacketTypeResend, package-level min() shadowing
the Go 1.21+ builtin, the if !use32Bit branch in extractDataPayload
where both arms assigned the same value, and the now-unused
encodeCompressedInt helper.
* chore(pam-oracle): remove dead code + tighten attribution
Static-analysis sweep (staticcheck + manual cross-file grep) across the
oracle handler package. All removals are symbols that were defined but
never referenced anywhere:
- tns.go: markerTypeReset, markerTypeInterrupt constants.
- o5logon_server.go: TTCMsgAuthResponse, TTCMsgBreak,
LogonModeUserAndPass, LogonModeNoNewPass constants; AuthPhaseTwo
fields ESpeedyKey / AlterSession / ClientInfo / LogonMode (parser
wrote them, no reader downstream); ParseAuthPhaseTwo's KVP switch
trimmed to the two keys actually consumed (AUTH_SESSKEY,
AUTH_PASSWORD).
- ttc.go: TTCReader.GetNullTermString(), TTCReader.SetUseBigClrChunks()
— both uncalled.
Also added a proper attribution header to o5logon_server.go (it was
adapting go-ora's phase-2 layout + summary-object format but lacked the
same kind of header the other ported files have), and expanded
ATTRIBUTION.md to cover it plus the specific primitives borrowed by
o5logon.go.
No behavior change. go build / go vet / staticcheck clean on the package.
* chore(pam-oracle): remove dead verifier-type and error-code constants
Exhaustive dead-export sweep via `go doc -all` (a more reliable check
than my earlier regex-based grep, which missed untyped constants).
Removed:
- VerifierType10g / VerifierType11g / VerifierType12c — defined but no
callers; the only verifier type our code implements (18453, 12c+
PBKDF2+SHA512) is hardcoded, the three named constants were never
referenced. The 18453-specific comments in the code retain the
documentation.
- ORA12660EncryptionRequired — defined but no callers. Only
ORA1017InvalidCredentials is actually emitted.
Post-sweep: every exported symbol in the oracle handler package has a
call site. `go build` / `go vet` / `staticcheck` clean.
* chore(pam-oracle): remove dead enum-block const members
staticcheck's U1000 has an exemption for const blocks: if any member is
used, sibling members aren't flagged as unused even when they are. An
exhaustive sweep that enumerates every top-level identifier via
manual grep (so const-block membership is irrelevant) caught the ones
staticcheck skipped.
Removed:
- tns.go: PacketTypeAbort, PacketTypeAck, PacketTypeAttn, PacketTypeCtrl,
PacketTypeNull. Only the 7 PacketType values we actually dispatch on
remain.
- query_logger.go: ttcFuncOFETCH, ttcFuncOCLOSE, ttcFuncOSTMT,
ttcFuncOLOGOFF, ttcMsgPiggyback. Only the 4 TTC opcodes the query tap
actually looks for remain (OALL8, OCOMMIT, ORLLBK, the outer msgFunction).
Post-sweep: exhaustive enumeration finds zero unused symbols across the
package (189 candidates checked). go build / go vet / staticcheck clean.
* fix(pam-oracle): show Infisical account name (not real DB user) in the banner
The banner was printing the real upstream DB username
(pamResponse.Metadata["username"]) in the connection URL, even though
the preceding "Account:" label already shows the Infisical account name.
Since the gateway now rewrites the client-supplied username in the
O5Logon exchange to the configured real user, the client can (and
should) connect using the account name — and the banner makes that
explicit.
Before:
Resource: aws-oracledb
Account: admin2
oracle://admin:password@localhost:53521/DATABASE ← confusing
After:
Resource: aws-oracledb
Account: admin2
oracle://admin2:password@localhost:53521/DATABASE ← matches the label
Scope: Oracle only for now. The postgres/mysql/mssql handlers also
overwrite the client username on the wire, so the same banner change
would work there, but that needs a separate verification pass per
dialect before we extend it.
* fix(pam-oracle): inject SERVICE_NAME from config into CONNECT packet
The client's CONNECT description string was forwarded unchanged, requiring
users to know the real Oracle service name. Now we rewrite SERVICE_NAME
to match InjectDatabase from the vault config, consistent with how
username and password are already injected.
* fix(pam-oracle): remove placeholder password verification in phase-2
The gateway no longer checks whether the client sent the placeholder
password "password". It unconditionally encrypts the real password
from the vault, regardless of what the client typed. Auth will
succeed either way since we inject the real credentials.
The placeholder password is still shown to the user in the CLI banner
and still used for key derivation in phase-1 — only the verification
check is removed.
* fix(pam-oracle): forward phase-2 response directly, remove SVR_RESPONSE regen
AUTH_SVR_RESPONSE is encrypted with encKey, which is derived from
session keys + CSK salt — not the password. The client and Oracle
derive the same encKey, so Oracle's original proof is already valid
for the client. No need to regenerate it.
Removes translatePhase2Response, BuildSvrResponse, and the
placeholderEncKey field from ProxyAuthState.
* fix(pam-oracle): replace 3s timeout peek with deterministic supplement drain
The post-ACCEPT supplement peek used a 3-second read deadline to detect
whether a go-ora client sent connect-data as a separate packet. This
added a fixed 3s delay for every non-go-ora client (sqlcl, JDBC thin).
Instead, check the CONNECT packet structure: if connect-data-length +
connect-data-offset exceeds the packet size, the data wasn't inline and
a supplement will follow. Track whether the RESEND handler already
consumed it; if not, do a blocking read (the supplement is guaranteed
to be in the TCP buffer since the client sent it before waiting for
a response).
Removes prependedConn, detectConnectDataSupplement, and the timeout.
* chore(pam-oracle): strip comments, enforce InjectDatabase, fix ora() formatting
- Remove verbose comments across all files (~20% → ~2% comment rate)
- Remove per-file go-ora attribution (ATTRIBUTION.md carries the license)
- Trim ATTRIBUTION.md to just the copyright notice + MIT text
- Make InjectDatabase mandatory (error if empty, always overwrite client's SERVICE_NAME)
- Format unknown Oracle error codes as ORA-XXXXX instead of bare "ERROR"
- Clarify ProxyPasswordPlaceholder as a decoy
* chore(pam-oracle): remove redundant banner comment and auth note
* fix(pam-oracle): forward connect-data supplement before handshake loop
JDBC thin sends the CONNECT description as a separate follow-up packet
when it doesn't fit inline (74-byte header-only CONNECT). Previously we
drained this supplement only after ACCEPT, relying on the server to
RESEND. AWS RDS sends RESEND, but OCI Autonomous DB does not — it waits
for the connect-data and eventually drops the connection.
Fix: check the CONNECT packet for non-inline data (connect-data-offset +
connect-data-length > packet size) and forward the supplement immediately.
Also forward it again after a RESEND+TLS-restart, since the client
re-sends CONNECT + supplement over the new TLS session.
* fix(pam-oracle): restore placeholder password check for clearer errors
Wrong password fails cryptographically anyway (ORA-17452 from garbage
decryption), but the error is confusing. Restore the explicit check
so the client gets a clean ORA-01017 instead.
* fix(pam-oracle): show real DB username in banner, matching other handlers
* chore(pam-oracle): update ProxyPasswordPlaceholder comment
* chore(pam-oracle): simplify ProxyPasswordPlaceholder comment
* chore(pam-oracle): rename ResourceTypeOracle to ResourceTypeOracledb
* fix(pam-oracle): cap handshake loop, zero keys after auth, remove dead comment
- Add maxHandshakeAttempts (10) to prevent infinite RESEND loops
- Zero RealKey, PlaceholderKey, ServerSessKey after auth completes
- Remove leftover debug comment from proxy_auth.go
* chore(pam-oracle): simplify key cleanup to state = nil
* fix(pam-oracle): show Easy Connect and JDBC connection strings in banner
---------
Co-authored-by: saif <11242541+saifsmailbox98@users.noreply.github.com>
* debug(pam): add diagnostic logging for SSH resource creation flake
Add detailed logging when CreateSshPamResource returns non-200 to
diagnose the intermittent 400 error seen only in CI. On failure, dumps:
- Response body (the actual backend error message)
- Request params (gatewayId, projectId, host, port)
- Relay/gateway process status and stderr/stdout
- Gateway API state (heartbeat, name)
- Backend container logs (last 3000 chars)
Also temporarily scopes CI to only PAM SSH tests in a loop (up to 10
runs, stops on first failure) to maximize chance of catching the flake.
* fix(pam): use localhost for SSH resource host to avoid iptables race
The SSH e2e tests were using getOutboundIP() (e.g. 10.1.0.34) as the
resource host. The gateway dials this address to validate the connection
during resource creation. On Linux CI with native Docker, this path goes
through iptables DNAT rules which can have a brief propagation delay
after container start — causing intermittent "connection refused" even
though testcontainers confirmed the port was listening (via localhost).
CI error: "Unable to validate connection to ssh: Connection lost before
handshake" with gateway log: "dial tcp 10.1.0.34:32774: connect:
connection refused"
Fix: use localhost instead. The gateway runs on the host, so
localhost:port reaches docker-proxy directly on the loopback interface
without needing iptables DNAT.
Diagnostic logging kept in place to catch any other failure modes.
* debug(ci): add sleep between loop iterations for port release
The relay binds to hardcoded ports (8443, 2222). When running multiple
test iterations in a loop, the previous relay's ports may still be in
TIME_WAIT when the next iteration starts, causing "Relay is reachable"
to never appear. Add 15s sleep between iterations.
* experiment: revert to getOutboundIP to confirm flake reproduces in CI
Reverting the localhost fix to verify that getOutboundIP triggers the
iptables race ("connection refused") in CI. If this fails, it confirms
the root cause. Will re-apply the fix afterward.
* fix(pam): re-apply localhost fix after confirming getOutboundIP flake
Experiment confirmed:
- getOutboundIP (10.1.0.168): FAILED run 1 — "dial tcp 10.1.0.168:32773:
connect: connection refused" (iptables DNAT race)
- localhost: 20/20 passed across two CI runs
Re-applying the localhost fix.
* experiment: second run with getOutboundIP to confirm reproducibility
* fix(pam): re-apply localhost fix — confirmed root cause with A/B test
A/B experiment in CI confirms getOutboundIP causes the flake:
getOutboundIP (external IP):
- Experiment 1: FAILED run 1 (password, 10.1.0.168:32773)
- Experiment 2: FAILED run 4 (certificate, 10.1.0.114:32799)
- Error: "dial tcp <external-ip>:<port>: connect: connection refused"
localhost:
- 3 CI runs: PASSED 25/25 consecutively
Root cause: on Linux CI with native Docker, port mappings via external
IP go through iptables DNAT rules which have a brief propagation delay
after container start. localhost bypasses this via docker-proxy on
loopback.
* fix(pam): finalize — restore CI workflow, keep diagnostics
- Restore all CI jobs (CLI, Agent, PAM) to normal single-run mode
- Run full PAM test suite (not just SSH)
- Keep diagnostic dump on createSSHPamResource failure
- Log resourceHost value so any future failure shows which path was used
* test: validation loop with log correlation check
Runs SSH tests 5 times with localhost fix, then greps all logs for
"connection refused" to verify the error is completely absent.
With getOutboundIP: "connection refused" appeared in gateway logs
right before every failure.
With localhost: should see zero occurrences across all 5 runs.
* fix(pam): use localhost for SSH test resource host + add failure diagnostics
Root cause: SSH e2e tests used getOutboundIP() (e.g. 10.1.0.168) as
the resource host. During backend's validateConnection(), the gateway
dials this address to reach the SSH container. On Linux CI with native
Docker, this path goes through iptables DNAT rules which have a brief
propagation delay after container start. testcontainers confirms the
port via localhost (docker-proxy on loopback), but the iptables path
isn't ready yet — causing "connection refused" at the gateway.
Fix: use "localhost" instead. The gateway runs on the Docker host, so
localhost reaches the port mapping directly via docker-proxy without
needing iptables DNAT.
Verified with A/B experiment in CI:
getOutboundIP: FAILED 2/2 runs ("connection refused" in gateway logs)
localhost: PASSED 30/30 runs (zero "connection refused" in logs)
Also adds diagnostic dump on createSSHPamResource failure (response
body, relay/gateway logs, backend container logs, gateway API state)
for visibility into any future failure modes.
* fix(pam): get SSH resource host from container.Host() like Postgres/Redis
Use container.Host(ctx) instead of getOutboundIP() or hardcoded
"localhost". This is the same pattern Postgres and Redis tests use,
and returns the correct Docker daemon host (localhost on Linux,
which routes through docker-proxy without iptables DNAT races).
* fix(pam): use container.Host() for SSH resource host, remove debug code
Get the SSH resource host from container.Host(ctx) — same as Postgres
and Redis tests — instead of getOutboundIP(). On Linux CI, Host()
returns localhost which routes through docker-proxy on loopback,
avoiding the iptables DNAT propagation race that caused intermittent
"connection refused" when using the host's external IP.
Removes all diagnostic logging added during investigation.
* chore: restore original workflow file (no changes needed)
* debug: log response body on createSSHPamResource failure
* debug: SSH-only loop to catch second flake mode with response body logging
* ci: retrigger loop to catch rare second flake
* debug: add gateway/relay stderr dump on failure to catch second flake
* ci: retrigger to catch rare flake (attempt 3)
* fix(pam): wait for sshd "Server listening" log, not just port
The SSH container's port 22 becomes listenable before sshd is fully
ready to accept connections. testcontainers' ForListeningPort confirms
the port is open, but sshd may still be generating host keys or loading
config. Connections during this window get "connection reset by peer"
(0 bytes from service), which the backend sees as "Connection lost
before handshake".
Fix: use ForAll(ForListeningPort, ForLog("Server listening")) to wait
for both port availability AND sshd's startup confirmation log.
* ci: retrigger for confidence
* fix(pam): use SSH handshake to verify sshd readiness after SIGHUP
After configureCertAuth sends SIGHUP to reload sshd config, the test
checked readiness with a TCP dial. But sshd never closes its listen
socket during SIGHUP, so the port is always reachable — the dial passes
instantly even while sshd is still processing the config reload.
Connections during this window get accepted then dropped, causing
"Connection to 127.0.0.1 closed by remote host".
Fix: do an actual SSH handshake (ssh.Dial with password auth) instead
of a TCP dial. This confirms sshd is fully serving connections, not
just listening.
* ci: retrigger confidence run
* fix(pam): fix three SSH e2e test flakes
1. Use container.Host(ctx) for resource host instead of getOutboundIP().
On Linux CI, external IPs route through iptables DNAT which has a
propagation delay after container start.
2. Wait for sshd "Server listening" log, not just port open.
ForListeningPort passes before sshd finishes key generation,
causing connection resets.
3. Use SSH handshake to verify sshd readiness after SIGHUP config
reload. sshd never closes its listen socket during SIGHUP, so TCP
dial passes instantly while sshd is still reloading.
---------
Co-authored-by: saif <11242541+saifsmailbox98@users.noreply.github.com>
6 subtests: connection, bad credentials, unreachable target, reconnect, concurrent connections, and session duration. Uses an xrdp container as the target and FreeRDP under xvfb for headless verification. CI pam-test job updated to build the Rust bridge and CLI with -tags rdp.
|
💬 Discussion in Slack: #pr-review-cli-230-add-rdp-e2e-tests Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel. |
The backend requires a pam_project_recording_configs row to exist before allowing Windows resource creation. Insert a dummy row directly into the database, bypassing FK checks since we don't need a real app connection for the E2E tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
-tags rdp, FreeRDP + xvfb installopenapi-cfg.yamlupdated withcreateWindowsPamResourceandcreateWindowsPamAccountfor future client regenerationTest plan