Skip to content

std::wstring + deprecated <codecvt> conversion is non-portable (BMP-only on Windows) #25

@jkalias

Description

@jkalias

Problem

The public string type for text fields is std::wstring, and conversion to/from UTF-8 (StringUtilities::ToUtf8 / FromUtf8, src/string_utilities.cc) uses std::wstring_convert<std::codecvt_utf8<wchar_t>>.

Impact

  • std::wstring_convert and <codecvt> are deprecated since C++17 (and slated for removal), so this will eventually stop compiling cleanly.
  • wchar_t is 16-bit on Windows but 32-bit on Linux/macOS. std::codecvt_utf8<wchar_t> only handles the BMP when wchar_t is 16-bit, so characters outside the BMP (e.g. many emoji, some CJK extensions) do not round-trip correctly on Windows, while they do on Linux/macOS. The same data therefore behaves differently per platform.

Suggested direction

Move the internal text representation to UTF-8 std::string (or char8_t/std::u8string), or use a portable UTF-8 ⇄ UTF-16/32 conversion that doesn't depend on the platform width of wchar_t and doesn't rely on the deprecated <codecvt> facet.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions