test: ui-smoke phase 2, g-code execution and endpoint check by grandixximo · Pull Request #4054 · LinuxCNC/linuxcnc

grandixximo · 2026-05-24T09:00:37Z

Phase 2 of the GUI test work tracked in #3756. Stacked on #3999 (now merged).

Summary

Extends each per-GUI ui-smoke test landing in #3999 with an end-to-end g-code execution stage:

Estop reset, machine on, home all (via c.home(-1), respects HOME_SEQUENCE).
MODE_AUTO, program_open on a shared _lib/smoke.ngc, auto(AUTO_RUN, 0).
Poll linuxcnc.stat until interp_state == INTERP_IDLE and queue == 0 for 5 consecutive 10 ms polls (settle window guards the inter-line moment where interp briefly reports IDLE while queueing the next move).
Assert stat.position[:3] delta against --expect-delta-mm 1,1,0, converted via stat.linear_units so the same arg works on inch (axis, touchy) and mm (gmoccapy, qtdragon) sims.

No new test directories. The four existing tests/ui-smoke/{axis,touchy,gmoccapy,qtdragon}/test.sh now also pass the --run-program and --expect-delta-mm flags through run-gui.sh. The Phase 1 connect-and-settle path stays as-is when those flags are omitted.

smoke.ngc

G21
G91
G0 X1 Y1
G90
M2

Forces mm input units (G21) and uses relative motion (G91) so the same file is sim-agnostic. The driver records stat.position[:3] after homing and checks (final - start) against the expected delta converted to machine units. This sidesteps each sim's HOME offset (axis homes to 0/0/0, qtdragon to 20/20/-10, etc).

State machine handling

linuxcnc.command.wait_complete() on a state or mode change only proves the NML message was acked, not that task_state / task_mode has transitioned. Polling the stat fields is the only deterministic signal. Two GUIs (gmoccapy reliably, qtdragon intermittently) re-issue their own mode commands during their own startup and revert task_mode AUTO -> MANUAL immediately after the driver sets it; without retry, auto(AUTO_RUN, 0) is then rejected and interp_state stays at IDLE.

ensure_state / ensure_mode helpers implement an issue + wait + stability-check + retry pattern, up to STATE_RETRY_BUDGET = 6 attempts. Intermediate timeouts use a quiet wait variant so spurious UI_SMOKE_FAIL lines do not pollute the log during retries (checkresult.sh greps for ^UI_SMOKE_FAIL on any line, so a retry that ultimately succeeds must not emit the marker).

qtdragon-specific bits

The qtdragon sim needs three CI-only workarounds layered in tests/ui-smoke/qtdragon/test.sh:

Writable config mirror. qtvcp's INI-driven LOG_FILE is rooted in the config dir. CI mounts the workspace read-only for the runtime user, so a relative LOG_FILE = qtdragon.log resolves to a path qtvcp cannot create and hal_bridge exits before the driver can attach. The test now mirrors configs/sim/qtdragon/qtdragon_xyz/ into mktemp -d and rewrites LOG_FILE to ~/qtdragon.log.
Offscreen Qt platform. qtvcp under xvfb + xcb on Ubuntu 24.04 segfaults during widget construction (no backtrace). Setting QT_QPA_PLATFORM=offscreen and LINUXCNC_OPENGL_PLATFORM=offscreen renders entirely in memory; xvfb-run still wraps the call so scripts/linuxcnc's X-display assumptions hold.
Block QtWebEngine import. qtdragon's UI embeds WebWidget (QWebEngineView, Chromium). Chromium's browser-process init segfaults inside the qtvcp PID under offscreen + xvfb on the Ubuntu runner even with --no-sandbox --single-process --disable-gpu (Chromium logs Sandboxing disabled by user. and then crashes). Rather than chase Chromium flags for a widget the smoke test never touches, the test drops a sitecustomize.py on PYTHONPATH that installs a sys.meta_path finder blocking qtpy.QtWebEngineWidgets (and PyQt5.QtWebEngineWidgets). lib/python/qtvcp/widgets/web_widget.py already has a fail-safe path that swaps the QWebEngineView for a plain QWidget when that import fails, so the UI loads cleanly with the Web tab inert.

debian/control

Adds python3-zmq under the <!nocheck> profile. Without it qtdragon's hal_bridge fails on startup and qtvcp tears down before the driver can attach.

Performance

ui-smoke total wall time on this machine:

	Tests	Wall	Marginal over Phase 1 baseline (~2 min)
Phase 1 only (#3999)	4	~2 min	baseline
Phase 1 + Phase 2 (this PR)	4	~2m43s	+~40s

Phase 2 reuses each test's existing linuxcnc startup and xvfb instance; CI cost is the Phase 2 sequence per GUI (estop + home + mode + run + verify, ~5 to 16 s depending on GUI), not the full ~30 s startup/shutdown overhead each test pays once.

Sequential by necessity: every ui-smoke test launches linuxcnc which claims a fixed set of SHM keys (SHM_KEYS in _lib/cleanup-runtime.sh), so parallel ui-smoke runs would race. Per-test SHM-key isolation would be a separate refactor.

Test plan

5 consecutive local scripts/runtests tests/ui-smoke runs, 4/4 pass, 0 shmem errors, clean tree
Per-test diagnostics confirm gmoccapy hits the mode-revert path on every run and the retry recovers cleanly
scripts/shellcheck.sh clean on changed scripts
python3 -m py_compile clean on drive.py
CI: rip-and-test, rip-and-test-clang, rip-rtai, and all 6 package-arch / 3 package-indep jobs green on run 26361844630

Out of scope

Phase 3 (screenshot or video on failure, per test: UI smoke tests for axis, touchy, gmoccapy, qtdragon #3999 discussion) tracked on Add tests starting GUIs, likely falling back to xvfb for it #3756.
Per-test SHM-key isolation for parallel ui-smoke runs, see Performance.

Adds a minimal harness under tests/ui-smoke/ that launches each GUI against its sim config under xvfb-run and verifies it reaches the 'task ready' NML state without crashing. Auto-discovered by scripts/runtests via per-GUI test.sh + checkresult + skip files. Layout: _lib/launch.sh - spawns linuxcnc -r under xvfb, runs driver, handles clean shutdown (group-SIGTERM with 60s wait, escalate to SIGKILL + shm cleanup) _lib/drive.py - polls linuxcnc.stat() until task ready, prints UI_SMOKE_OK / UI_SMOKE_FAIL _lib/checkresult.sh - grep for UI_SMOKE_OK / absence of FAIL _lib/skip-if-missing.sh - skip when xvfb-run absent (dev env) _lib/cleanup-runtime.sh - pre/post belt-and-braces daemon + shm cleanup; SHM key list mirrors scripts/runtests:157 (full 6-key set) _lib/run-gui.sh - dispatcher taking a relpath under configs/sim/, exec'd by per-GUI test.sh axis|touchy|gmoccapy|qtdragon/test.sh - one-line wrappers Force software OpenGL via LIBGL_ALWAYS_SOFTWARE + Qt RHI/QSG/QtQuick software backends; CI runners have no GPU and Qt GL paths segfault on headless display. Skip vs fail policy (BsAtHome / hdiethelm review): only xvfb-run absence skips; missing Python/typelib deps fail loudly so review catches them. Required deps are gated under !nocheck in debian/control.top.in (separate commit).

Adds the Python, Qt, GTK and typelib runtime deps needed for the ui-smoke harness under tests/ui-smoke/ to actually exercise each GUI's import path on CI. All gated with <!nocheck> so users building with DEB_BUILD_OPTIONS=nocheck aren't penalised with the extra packages. Includes pyqt5 (+ qsci/qtsvg/qtopengl/qtwebengine/qtpy/dev-tools), python3-dbus.mainloop.pyqt5, python3-cairo, python3-gi(+cairo), gir1.2-gtk-3.0, gir1.2-gtksource-4, python3-numpy, python3-configobj, xvfb and x11-xserver-utils.

Each per-GUI test now also drives estop reset, machine on, home all, mode auto, program_open + auto(RUN) on a tiny shared smoke.ngc, waits for sustained INTERP_IDLE, and asserts stat.position delta against --expect-delta-mm 1,1,0 converted via stat.linear_units so the same arg works on inch (axis, touchy) and mm (gmoccapy, qtdragon) sims. State/mode commands use ensure_state/ensure_mode helpers with a retry-and-stability pattern: gmoccapy and qtdragon re-issue their own mode commands during startup and can revert task_mode AUTO -> MANUAL right after we set it. The helpers wait for the desired state, then re-check after STATE_STABILITY_S; on revert they retry up to STATE_RETRY_BUDGET times. Intermediate timeouts use a quiet variant so spurious UI_SMOKE_FAIL lines do not pollute the log during retries (checkresult.sh greps for ^UI_SMOKE_FAIL on any line). smoke.ngc is G21 G91 G0 X1 Y1 G90 M2 - relative move in mm, sim- agnostic. The driver snapshots stat.position[:3] after homing and checks (final - start) against the converted delta, sidestepping each sim's HOME offset. Adds python3-zmq and python3-opencv to debian/control.top.in under !nocheck: qtdragon's hal_bridge and the camview widget segfault on startup without them, which is invisible to the connect-only Phase 1 smoke but breaks the run-program path before the program can start. 5 consecutive local runs all green at 2m43s wall each.

CI run hit 'timeout waiting for all joints homed after 60.0s' on qtdragon only; locally homing completes in <4s on all four sims. Likely cause: same task_mode revert race as ensure_mode catches for MODE_AUTO, except home() lives outside that helper, so a mid-sequence mode flip back to a non-MANUAL mode silently drops the home command. Wrap the post-c.home(-1) wait in a poll loop that re-asserts MANUAL and re-issues home(-1) every HOME_REISSUE_S (10s). Final timeout now also dumps homed[], task_state, task_mode and exec_state so the next CI failure has actionable diagnostics.

CI run hit a PermissionError in qtvcp's logger when it tried to open configs/sim/qtdragon/qtdragon_xyz/qtdragon.log for write: the GitHub Actions workspace is mounted read-only for the docker build user, and qtvcp resolves LOG_FILE = qtdragon.log into the config dir. hal_bridge then exits, linuxcnc tears down, and the driver retries ESTOP_RESET until the budget is exhausted. qtdragon test.sh now mirrors the qtdragon_xyz config dir to a mktemp directory, seds LOG_FILE to ~/qtdragon.log, and passes the absolute INI path to run-gui.sh. run-gui.sh treats any path starting with / as absolute; everything else still resolves under configs/sim. Trap cleans the tmp dir on exit so the working tree stays clean. Does not touch the shipped qtdragon config to avoid changing default behaviour for real users. The same fix would work for any other config that turns out to write into its own dir on CI.

Ubuntu 24.04 rip-and-test runs hit a qtvcp segfault after the log- permission fix let qtvcp get further than Phase 1 had. Debian package-arch passes the same code. Two known asymmetries match: - python3-opencv on Ubuntu pulls Qt5 GUI bits whose cv2/qt/plugins directory overrides the system PyQt5 platform plugin path under xvfb (opencv-python issue LinuxCNC#572, Qt Forum 119109). qtvcp's camview_widget tolerates ImportError on cv2 and just logs a warning, so dropping the dep restores the harmless fallback path Phase 1 was already exercising. - xcb_glx is the historical fragile integration under xvfb (Launchpad #1761708, QTBUG-67537); xcb_egl is what software-GL stacks expect anyway. Set as defense in depth. Local 4/4 still green with both changes.

xvfb + xcb + xcb_egl was not enough for Ubuntu 24.04 rip-and-test: qtvcp still segfaults during widget construction even with opencv and qtwebengine paths quiet, and the same code passes on Debian package-arch. Offscreen renders entirely in memory and exercises a different Qt plugin entirely, dodging the xcb-stack instability. scripts/linuxcnc itself forces QT_QPA_PLATFORM=xcb unless LINUXCNC_OPENGL_PLATFORM is set to a non-glx value, so pin both. Only qtdragon needs this; axis (Tk), touchy and gmoccapy (GTK) are unaffected. Trade-off: no Phase 3 screenshot from qtdragon under this config; Phase 3 would need an opt-out for offscreen tests.

qtdragon embeds QWebEngineView. On rip-and-test (gcc) CI it racy-crashed during Chromium browser-process spawn under offscreen + xvfb, no GPU, no user namespaces. rip-and-test-clang got past it by luck. Force --no-sandbox --single-process --no-zygote --disable-gpu so the renderer runs in-process with software rendering.

QtWebEngine browser-process init segfaults inside the qtvcp process on Ubuntu 24.04 CI even with --no-sandbox --single-process --disable-gpu. The smoke test never touches the WebWidget, so block the qtpy.QtWebEngineWidgets import via a sitecustomize meta_path finder; WebWidget already has a fallback that swaps in a plain QWidget when that import fails. No Chromium spawn, no segfault. The previous chromium-flags attempt was retracted: 'Sandboxing disabled by user.' confirmed Chromium got the flags but still crashed during init, so we are not going to win that race.

grandixximo added 9 commits May 11, 2026 21:16

grandixximo marked this pull request as ready for review May 25, 2026 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: ui-smoke phase 2, g-code execution and endpoint check#4054

test: ui-smoke phase 2, g-code execution and endpoint check#4054
grandixximo wants to merge 9 commits into
LinuxCNC:masterfrom
grandixximo:ui-tests-phase2

grandixximo commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grandixximo commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

smoke.ngc

State machine handling

qtdragon-specific bits

debian/control

Performance

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grandixximo commented May 24, 2026 •

edited

Loading