Skip to content

The \u escape range in Section 6.4 #131

@xfq

Description

@xfq

6.4 Escape Sequences
https://www.w3.org/TR/2026/WD-rdf12-turtle-20260320/#sec-escapes

Section 6.4 says \u represents a Unicode code point in the ranges U+0000 to U+D7FF and U+E000 to U+D7FF. For a four-hex-digit escape, the non-surrogate part of the BMP should be U+E000 to U+FFFF.

And there's a bigger problem: Unicode surrogates are not allowed. Allowing surrogates can cause problems with incomplete characters, but we (i18n WG) believe this issue shouldn't be resolved at the rdf-turtle level, but rather at a higher-level protocol.

rdf-turtle shouldn't prohibit surrogates, and it seems there's no such restriction in RDF 1.1 Turtle:

https://www.w3.org/TR/2014/REC-turtle-20140225/#sec-escapes

So it is also a breaking change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    i18n-needs-resolutionIssue the Internationalization Group has raised and looks for a response on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions