Problem
The public string type for text fields is std::wstring, and conversion to/from UTF-8 (StringUtilities::ToUtf8 / FromUtf8, src/string_utilities.cc) uses std::wstring_convert<std::codecvt_utf8<wchar_t>>.
Impact
std::wstring_convert and <codecvt> are deprecated since C++17 (and slated for removal), so this will eventually stop compiling cleanly.
wchar_t is 16-bit on Windows but 32-bit on Linux/macOS. std::codecvt_utf8<wchar_t> only handles the BMP when wchar_t is 16-bit, so characters outside the BMP (e.g. many emoji, some CJK extensions) do not round-trip correctly on Windows, while they do on Linux/macOS. The same data therefore behaves differently per platform.
Suggested direction
Move the internal text representation to UTF-8 std::string (or char8_t/std::u8string), or use a portable UTF-8 ⇄ UTF-16/32 conversion that doesn't depend on the platform width of wchar_t and doesn't rely on the deprecated <codecvt> facet.
Problem
The public string type for text fields is
std::wstring, and conversion to/from UTF-8 (StringUtilities::ToUtf8/FromUtf8,src/string_utilities.cc) usesstd::wstring_convert<std::codecvt_utf8<wchar_t>>.Impact
std::wstring_convertand<codecvt>are deprecated since C++17 (and slated for removal), so this will eventually stop compiling cleanly.wchar_tis 16-bit on Windows but 32-bit on Linux/macOS.std::codecvt_utf8<wchar_t>only handles the BMP whenwchar_tis 16-bit, so characters outside the BMP (e.g. many emoji, some CJK extensions) do not round-trip correctly on Windows, while they do on Linux/macOS. The same data therefore behaves differently per platform.Suggested direction
Move the internal text representation to UTF-8
std::string(orchar8_t/std::u8string), or use a portable UTF-8 ⇄ UTF-16/32 conversion that doesn't depend on the platform width ofwchar_tand doesn't rely on the deprecated<codecvt>facet.