Skip to content

Record replay audio#252

Open
rgarcia wants to merge 11 commits into
mainfrom
hypeship/replay-audio
Open

Record replay audio#252
rgarcia wants to merge 11 commits into
mainfrom
hypeship/replay-audio

Conversation

@rgarcia
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia commented May 26, 2026

Summary

  • add optional audio capture to ffmpeg replay recordings using the PulseAudio output monitor
  • start a shared PulseAudio null sink in the browser images and keep the monitor active with a silent sink input
  • keep audio aligned through the full recording by avoiding forced audio PTS reset and using faster low-latency x264 encoding so ffmpeg does not fall behind real time
  • add e2e coverage that records browser audio, downloads the MP4, verifies an audio track, checks the audio stream duration reaches the end of the recording, and includes a Zombocom archive sample recording

Tests

  • go test ./lib/recorder -count=1
  • DOCKER_BUILDKIT=1 docker build -f images/chromium-headful/Dockerfile -t onkernel/chromium-headful-test:latest .
  • TESTCONTAINERS_RYUK_DISABLED=true TESTCONTAINERS_HOST_OVERRIDE=127.0.0.1 RECORDING_ZOMBO_OUTPUT_PATH=/home/agent/repos/kernel-images/server/e2e/testdata/replay-audio-zombocom.mp4 go test ./e2e -run 'TestReplayRecording(IncludesAudioTrack|ZombocomArchiveAudio)' -count=1 -v

Note

Medium Risk
Touches container boot order, Pulse/Chromium audio routing, ffmpeg encoding defaults, and Neko capture—misconfiguration could break recordings or live audio without affecting unrelated API paths.

Overview
Adds optional AAC audio to ffmpeg replay recordings by capturing KernelOutput.monitor via PulseAudio, with RECORD_AUDIO / AUDIO_SOURCE on the API and supervisor defaults (containers default recording audio on).

Introduces a shared PulseAudio bootstrap (KernelOutput sink + KernelInput null mic for enumerateDevices, silent keepalive on the sink) in headful and headless images: Pulse starts on the hot path before Chromium, Chromium gets PULSE_*, headless drops --mute-audio, and Neko is pointed at KernelOutput.monitor for live WebRTC audio. The Neko client unmutes on first user gesture (capture-phase listeners) because the unmute overlay is disabled.

ffmpeg gains a second Pulse input when enabled, AAC output, and veryfast / zerolatency x264 tuning (drops prior wallclock / async resample behavior). New e2e replay-audio tests and a verify_replay_pipeline.mjs harness exercise tracks, levels, and golden comparison.

Reviewed by Cursor Bugbot for commit ba68d65. Bugbot is set up for automated code reviews on this repo. Configure here.

@rgarcia rgarcia marked this pull request as ready for review May 26, 2026 21:26
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR modifies chromium launcher, wrapper, recorder, and config packages rather than the kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) specified in the filter.

To monitor this PR anyway, reply with @firetiger monitor this.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Recording cap shorter than script
    • I made the recording duration configurable per test and increased the Zombocom archive scenario cap to 70 seconds so recording no longer ends before the Playwright flow completes.

Create PR

Or push these changes by commenting:

@cursor push 2f6a9b26ef
Preview (2f6a9b26ef)
diff --git a/server/e2e/e2e_recording_audio_test.go b/server/e2e/e2e_recording_audio_test.go
--- a/server/e2e/e2e_recording_audio_test.go
+++ b/server/e2e/e2e_recording_audio_test.go
@@ -52,7 +52,7 @@
 		return await page.title();
 	`, audioSite.ContainerURL())
 
-	recordReplayAudio(t, ctx, c, playwrightCode, os.Getenv("RECORDING_AUDIO_OUTPUT_PATH"), 0.1)
+	recordReplayAudio(t, ctx, c, playwrightCode, os.Getenv("RECORDING_AUDIO_OUTPUT_PATH"), 35, 0.1)
 }
 
 func TestReplayRecordingZombocomArchiveAudio(t *testing.T) {
@@ -132,16 +132,15 @@
 		return playback;
 	`
 
-	recordReplayAudio(t, ctx, c, playwrightCode, outputPath, 0.01)
+	recordReplayAudio(t, ctx, c, playwrightCode, outputPath, 70, 0.01)
 }
 
-func recordReplayAudio(t *testing.T, ctx context.Context, c *TestContainer, playwrightCode string, outputPath string, minPeakLevel float64) {
+func recordReplayAudio(t *testing.T, ctx context.Context, c *TestContainer, playwrightCode string, outputPath string, maxDuration int, minPeakLevel float64) {
 	t.Helper()
 
 	client, err := c.APIClient()
 	require.NoError(t, err, "failed to create API client")
 
-	maxDuration := 35
 	maxFileSize := 100
 	startResp, err := client.StartRecordingWithResponse(ctx, instanceoapi.StartRecordingJSONRequestBody{
 		MaxDurationInSeconds: &maxDuration,

You can send follow-ups to the cloud agent here.

Comment thread server/e2e/e2e_recording_audio_test.go
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Pulse socket before sink ready
    • I load module-null-sink before module-native-protocol-unix so the Pulse socket appears only after KernelOutput is created.
  • ✅ Fixed: Keepalive exit kills PulseAudio
    • I replaced wait -n with a keepalive retry loop and wait only on PulseAudio so a pacat exit no longer terminates the daemon.

Create PR

Or push these changes by commenting:

@cursor push 8197b93a61
Preview (8197b93a61)
diff --git a/shared/start-pulseaudio.sh b/shared/start-pulseaudio.sh
--- a/shared/start-pulseaudio.sh
+++ b/shared/start-pulseaudio.sh
@@ -22,16 +22,12 @@
     --daemonize=no \
     --log-target=stderr \
     --exit-idle-time=-1 \
-    --load="module-native-protocol-unix socket=/tmp/pulse/native auth-anonymous=1" \
-    --load="module-null-sink sink_name=KernelOutput rate=48000 channels=2 sink_properties=device.description=KernelOutput" &
+    --load="module-null-sink sink_name=KernelOutput rate=48000 channels=2 sink_properties=device.description=KernelOutput" \
+    --load="module-native-protocol-unix socket=/tmp/pulse/native auth-anonymous=1" &
 
   pulse_pid=$!
-  keepalive_pid=""
 
   cleanup() {
-    if [ -n "$keepalive_pid" ]; then
-      kill "$keepalive_pid" 2>/dev/null || true
-    fi
     kill "$pulse_pid" 2>/dev/null || true
     wait 2>/dev/null || true
   }
@@ -44,10 +40,12 @@
     sleep 0.1
   done
 
-  (
-    pacat --raw --rate=48000 --channels=2 --format=s16le --device=KernelOutput /dev/zero
-  ) &
-  keepalive_pid=$!
+  while kill -0 "$pulse_pid" 2>/dev/null; do
+    pacat --raw --rate=48000 --channels=2 --format=s16le --device=KernelOutput /dev/zero || true
+    if kill -0 "$pulse_pid" 2>/dev/null; then
+      sleep 0.1
+    fi
+  done
 
-  wait -n "$pulse_pid" "$keepalive_pid"
+  wait "$pulse_pid"
   '

You can send follow-ups to the cloud agent here.

Comment thread shared/start-pulseaudio.sh
Comment thread shared/start-pulseaudio.sh
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Darwin audio path uses incompatible PulseAudio source name
    • Darwin now validates audio sources as numeric AVFoundation device indices or "none" in both parameter validation and ffmpeg argument construction, preventing invalid PulseAudio names from reaching ffmpeg.

Create PR

Or push these changes by commenting:

@cursor push 159fdd82c8
Preview (159fdd82c8)
diff --git a/server/lib/recorder/ffmpeg.go b/server/lib/recorder/ffmpeg.go
--- a/server/lib/recorder/ffmpeg.go
+++ b/server/lib/recorder/ffmpeg.go
@@ -89,8 +89,16 @@
 	if p.MaxDurationInSeconds != nil && *p.MaxDurationInSeconds <= 0 {
 		return fmt.Errorf("max duration must be greater than 0 seconds")
 	}
-	if p.recordAudio() && strings.TrimSpace(p.audioSource()) == "" {
-		return fmt.Errorf("audio source is required when recording audio")
+	if p.recordAudio() {
+		audioSource := strings.TrimSpace(p.audioSource())
+		if audioSource == "" {
+			return fmt.Errorf("audio source is required when recording audio")
+		}
+		if runtime.GOOS == "darwin" && audioSource != "none" {
+			if _, err := strconv.Atoi(audioSource); err != nil {
+				return fmt.Errorf("audio source must be a numeric device index or \"none\" on darwin")
+			}
+		}
 	}
 
 	return nil
@@ -511,10 +519,15 @@
 	case "darwin":
 		audioDevice := "none"
 		if recordAudio {
-			audioDevice = params.audioSource()
-			if strings.TrimSpace(audioDevice) == "" {
+			audioDevice = strings.TrimSpace(params.audioSource())
+			if audioDevice == "" {
 				return nil, fmt.Errorf("audio source is required when recording audio")
 			}
+			if audioDevice != "none" {
+				if _, err := strconv.Atoi(audioDevice); err != nil {
+					return nil, fmt.Errorf("audio source must be a numeric device index or \"none\" on darwin")
+				}
+			}
 		}
 		args = []string{
 			// Input options for AVFoundation

You can send follow-ups to the cloud agent here.

Comment thread server/lib/recorder/ffmpeg.go
rgarcia and others added 8 commits May 31, 2026 21:52
Add a standalone module-null-source (KernelInput) so the browser sees a
real, non-monitor microphone -- Chromium excludes monitor sources from
enumerateDevices(), so the null-sink monitor alone left zero audioinput
devices.

Point neko's audio capture at KernelOutput.monitor so live view streams
the browser audio. Neko's default device (audio_output.monitor) does not
exist here, which is why live view had no audio.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rgarcia rgarcia force-pushed the hypeship/replay-audio branch from ce8e4a2 to bfb3453 Compare June 1, 2026 01:54
rgarcia and others added 2 commits May 31, 2026 22:02
Extend TestReplayRecordingIncludesAudioTrack to enumerate media devices
over a raw CDP websocket and require both an audiooutput and a
non-monitor audioinput. Chromium drops PulseAudio monitor sources from
the input list, so this guards the KernelInput null-source against
regressing back to a monitor-only setup that antibot scripts flag.

Co-authored-by: Cursor <cursoragent@cursor.com>
module-null-source rejects source_properties in this PulseAudio version,
so the fake mic silently failed to load and the browser saw zero
audioinput devices. Drop the unsupported arg (the source keeps the
default description).

In the e2e check, navigate to a file:// page (a secure context) before
calling enumerateDevices, since navigator.mediaDevices is undefined on
about:blank, and assert a non-monitor audioinput is present rather than
matching a specific label.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Pulse ready before sink exists
    • The wrapper now waits for the configured Pulse sink to appear via pactl after the socket is ready before launching Chromium, removing the early-start race.
  • ✅ Fixed: Encoder tuning always on
    • Linux-specific low-latency x264 tuning is now applied only when audio recording is enabled and video-only Linux recordings regain -use_wallclock_as_timestamps.

Create PR

Or push these changes by commenting:

@cursor push 0076192699
Preview (0076192699)
diff --git a/server/cmd/wrapper/main.go b/server/cmd/wrapper/main.go
--- a/server/cmd/wrapper/main.go
+++ b/server/cmd/wrapper/main.go
@@ -188,6 +188,7 @@
 		startAll("mutter")
 	}
 	waitForSocket(pulseSocket, 10*time.Second)
+	waitForPulseSink(os.Getenv("PULSE_SINK"), 10*time.Second)
 	startAll("chromium")
 	waitForSocket(dbusSocket, 10*time.Second)
 	if prof == profileHeadful && webrtc {
@@ -332,3 +333,21 @@
 	logf(format, args...)
 	os.Exit(1)
 }
+
+func waitForPulseSink(sinkName string, timeout time.Duration) {
+	sinkName = strings.TrimSpace(sinkName)
+	if sinkName == "" {
+		logf("WARNING: pulse sink name is empty; skipping sink wait")
+		return
+	}
+
+	deadline := time.Now().Add(timeout)
+	for time.Now().Before(deadline) {
+		out, err := exec.Command("pactl", "-s", "unix:"+pulseSocket, "list", "short", "sinks").Output()
+		if err == nil && strings.Contains(string(out), "\t"+sinkName+"\t") {
+			return
+		}
+		time.Sleep(50 * time.Millisecond)
+	}
+	logf("WARNING: pulse sink %s not ready after %s", sinkName, timeout)
+}

diff --git a/server/lib/recorder/ffmeg_test.go b/server/lib/recorder/ffmeg_test.go
--- a/server/lib/recorder/ffmeg_test.go
+++ b/server/lib/recorder/ffmeg_test.go
@@ -2,6 +2,7 @@
 
 import (
 	"path/filepath"
+	"runtime"
 	"testing"
 	"time"
 
@@ -113,6 +114,20 @@
 	assert.NotContains(t, args, "aresample=async=1:first_pts=0")
 }
 
+func TestFFmpegArgs_VideoOnlyLinuxUsesWallclockTimestamps(t *testing.T) {
+	if runtime.GOOS != "linux" {
+		t.Skip("linux-only ffmpeg flag behavior")
+	}
+
+	tempDir := t.TempDir()
+	args, err := ffmpegArgs(defaultParams(tempDir), filepath.Join(tempDir, "out.mp4"))
+	require.NoError(t, err)
+
+	assert.Contains(t, args, "-use_wallclock_as_timestamps")
+	assert.NotContains(t, args, "veryfast")
+	assert.NotContains(t, args, "zerolatency")
+}
+
 func TestFFmpegRecorder_ForceStop(t *testing.T) {
 	tempDir := t.TempDir()
 	rec := &FFmpegRecorder{

diff --git a/server/lib/recorder/ffmpeg.go b/server/lib/recorder/ffmpeg.go
--- a/server/lib/recorder/ffmpeg.go
+++ b/server/lib/recorder/ffmpeg.go
@@ -563,11 +563,15 @@
 
 		// Video encoding
 		"-c:v", "libx264",
-		"-preset", "veryfast",
-		"-tune", "zerolatency",
 		"-profile:v", "high", // Explicit web-compatible profile
 		"-pix_fmt", "yuv420p", // Web-standard pixel format
 	}...)
+	if runtime.GOOS == "linux" && recordAudio {
+		args = append(args,
+			"-preset", "veryfast",
+			"-tune", "zerolatency",
+		)
+	}
 
 	if recordAudio {
 		args = append(args, []string{
@@ -577,6 +581,9 @@
 			"-ac", "2",
 		}...)
 	}
+	if runtime.GOOS == "linux" && !recordAudio {
+		args = append(args, "-use_wallclock_as_timestamps", "1")
+	}
 
 	args = append(args, []string{
 		// Timestamp handling for reliable playback

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 114f47b. Configure here.

startAll("mutter")
}
waitForSocket(pulseSocket, 10*time.Second)
startAll("chromium")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulse ready before sink exists

Medium Severity

Boot treats the PulseAudio Unix socket as ready, then starts Chromium, but start-pulseaudio.sh only creates the KernelOutput sink and keepalive stream after the socket exists. Chromium and replay capture can come up in a short window where Pulse is listening but browser playback and monitor capture are not fully wired.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 114f47b. Configure here.

// Timestamp handling for reliable playback
"-use_wallclock_as_timestamps", "1", // Use system time instead of input stream time
"-reset_timestamps", "1", // Reset timestamps to start from zero
"-reset_timestamps", "1",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoder tuning always on

Low Severity

-preset veryfast, -tune zerolatency, and removal of -use_wallclock_as_timestamps apply to every Linux recording, not only when RecordAudio is enabled. Video-only replays inherit lower-quality, real-time-oriented encoding and different timestamp behavior than before this change.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 114f47b. Configure here.

Replace the host.docker.internal httptest server with an in-container
file:// fixture. host-gateway is not routable in every sandbox, so the
browser could not reach the host fixture and page.goto timed out. A
file:// page is a secure context and keeps the test self-contained.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rgarcia rgarcia force-pushed the hypeship/replay-audio branch from ba68d65 to 3af1e94 Compare June 1, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant