fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding by 398651434 · Pull Request #5035 · nodejs/undici

398651434 · 2026-04-16T04:13:20Z

Description

Fixes a bug where response.body.setEncoding('utf8') corrupts multi-byte UTF-8 characters that span chunk boundaries.

Root Cause

Each chunk was being individually converted to a string via buffer.utf8Slice() (or toString()). When a multi-byte UTF-8 character (e.g., a Chinese character = 3 bytes) is split across two HTTP response chunks, the first chunk gets an incomplete byte sequence converted to garbage, and the second chunk's portion becomes a separate corrupted character.

Fix

Use Node.js's built-in StringDecoder (from node:string_decoder) which properly buffers incomplete byte sequences between write() calls:

setEncoding(encoding): Initialize a StringDecoder when encoding is set
consumePush: When a decoder exists, use decoder.write(chunk) instead of storing the raw buffer — this accumulates incomplete UTF-8 bytes internally
consumeFinish: Reset the decoder to allow garbage collection

Testing

The bug manifests when:

HTTP response contains multi-byte UTF-8 text (e.g., Chinese characters, emoji)
setEncoding('utf8') is called on the body
The text spans multiple TCP packets/chunks

After fix, characters are correctly reassembled across chunk boundaries.

Closes #5002

When setEncoding('utf8') is called, each chunk was being converted to a string individually, which corrupts multi-byte UTF-8 characters that span chunk boundaries. This fix: - Initializes a StringDecoder when setEncoding is called - Uses StringDecoder.write() in consumePush to properly handle incomplete UTF-8 sequences at chunk boundaries - Resets the decoder in consumeFinish to allow garbage collection Closes nodejs#5002

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding#5035

fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding#5035
398651434 wants to merge 1 commit intonodejs:mainfrom
398651434:main

398651434 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

398651434 commented Apr 16, 2026

Description

Root Cause

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant