Offline push-to-talk dictation (warm whisper.cpp server) by jappeace-sloth · Pull Request #27 · jappeace/linux-config

jappeace-sloth · 2026-06-16T20:49:56Z

Adds a dictate push-to-talk command bound to $mod+Ctrl+space in sway, plus the warm backend that makes it fast.

What it does

First press records 16kHz mono audio with pw-record; second press transcribes it offline and types the result into the focused window with wtype (same injection trick as the existing address bindings).

Architecture

Warm whisper-server user service holds the model resident in RAM, tied to sway-session.target like dunst so it is already listening when the first hotkey fires. Runs with -t "$(nproc)" so every core is used.
dictate POSTs the recorded WAV to that server over loopback (curl -sf, response_format=text); no per-press model load.
Model: ggml-base.q5_1 (~57MB, 5-bit quantized), multilingual so Dutch works too. A single let binding the server points at, so swapping to small/medium for better Dutch is a one-line change.

Why not the simpler cold-CLI version

The first cut shelled out to whisper-cli per press, which reloaded the whole model from disk every time (hundreds of ms of dead latency) and defaulted to 4 threads. The warm server + quantized base model removes both bottlenecks. Decisions are recorded inline with -- Decision: comments.

Verification

Built every artifact from the pin and tested end to end: the server-start wrapper launches and loads the model once, and dictate's exact curl returns clean transcription text with exit 0.

🤖 Generated with Claude Code

The dictate command was slow because whisper-cli reloaded the entire model from disk on every single press (hundreds of ms of dead latency) and defaulted to only 4 threads. Fixes: - Add a whisper-server systemd user service that keeps the model resident in RAM, tied to sway-session.target like dunst so it is already listening when the first $mod+Ctrl+space fires. It runs with -t "$(nproc)" via a writeShellScript wrapper so every core is used. - dictate now POSTs the recorded WAV to that warm server over loopback (curl -sf, response_format=text) instead of spawning a cold whisper-cli, so each dictation skips the model load entirely. - Switch from ggml-small to ggml-base.q5_1 (~57MB, 5-bit quantized), several times faster on CPU and accurate enough for fields and prose. The model is a single let binding the server points at, so swapping back to small/medium for better Dutch is a one-line change. Verified end to end against the built artifacts: the server-start wrapper launches and loads the model once, and dictate's exact curl returns clean transcription text with exit 0. Prompt: "okay we added that speech to text system just now to jappeace/linux-config, but it's slow as fuck, why?" followed by choosing the threads + base model + warm server combo. Tokens: ~118k Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Restore the actual curl error (connection refused, HTTP 500, timeout) in the failure notification instead of a generic "unreachable" guess, so a server-side error is not hidden behind an assumption that the server is merely down or still loading. The error was already captured to the log but no longer shown; surface its last line plus the hint. Also clarify the -sf comment: -s is silent, -f exits non-zero on HTTP 4xx/5xx; "fail loudly" was misleading next to silent mode. Prompt: dumbify canary flagged the lost error detail and the confusing "-sf fail loudly" wording as nice-to-haves; both are worth fixing and the first aligns with the no-silent-failure rule. Tokens: ~131k Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jappeace-sloth and others added 2 commits June 16, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline push-to-talk dictation (warm whisper.cpp server)#27

Offline push-to-talk dictation (warm whisper.cpp server)#27
jappeace-sloth wants to merge 2 commits into
jappeace:masterfrom
jappeace-sloth:speech-to-text-dictation

jappeace-sloth commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jappeace-sloth commented Jun 16, 2026

What it does

Architecture

Why not the simpler cold-CLI version

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant