Fix multi-byte Unicode input (CJK/emoji) by keyuchen21 · Pull Request #6 · Sentdex/minion

keyuchen21 · 2026-06-30T18:02:56Z

Summary

Fix garbled input when typing Chinese/Japanese/Korean characters or emoji in the chatbox — _raw_read_key() was reading one byte at a time and immediately decoding, corrupting multi-byte UTF-8 sequences into replacement characters (��)
Fix display width calculations in the input editor — CJK characters occupy 2 terminal columns but were counted as 1, causing cursor misalignment and broken box borders
_raw_read_available() (used for escape sequences and paste) now accumulates raw bytes before decoding, fixing pasted CJK text

Test plan

Type Chinese characters (你好世界) — should appear correctly, not as replacement chars
Type Japanese (こんにちは) and Korean (안녕하세요) — same
Paste multi-line Chinese text — should insert correctly
Verify cursor positioning is correct when moving left/right through CJK text
Verify the input box border stays aligned with wide characters
Verify normal ASCII input still works as before
Test emoji input (😀🎉) — should appear correctly

The raw input reader was reading stdin one byte at a time and immediately decoding, which corrupted multi-byte UTF-8 characters into replacement characters (U+FFFD). Now inspects the lead byte to determine continuation byte count before decoding. Also fixes display width calculations — CJK characters occupy 2 terminal columns but were treated as 1, causing cursor misalignment and broken box-drawing in the input editor.

keyuchen21 · 2026-06-30T18:04:47Z

keyuchen21 · 2026-06-30T18:05:12Z

Verification

Tested manually — Chinese input now works correctly:

Before fix: typing "你好" produced "��" (each UTF-8 byte decoded separately as replacement characters)

After fix: "你好" displays correctly in the input box with proper cursor positioning and box alignment

Test results:

✅ Chinese input (你好世界) renders correctly
✅ Input box borders stay aligned with wide characters
✅ Model responds to Chinese input properly
✅ ASCII input continues to work as before

Sentdex self-assigned this Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multi-byte Unicode input (CJK/emoji)#6

Fix multi-byte Unicode input (CJK/emoji)#6
keyuchen21 wants to merge 1 commit into
Sentdex:masterfrom
keyuchen21:fix/unicode-input

keyuchen21 commented Jun 30, 2026 •

edited

Loading

Uh oh!

keyuchen21 commented Jun 30, 2026

Uh oh!

keyuchen21 commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keyuchen21 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

keyuchen21 commented Jun 30, 2026

Uh oh!

keyuchen21 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keyuchen21 commented Jun 30, 2026 •

edited

Loading

keyuchen21 commented Jun 30, 2026 •

edited

Loading