
Issue #21559 has been updated by ima1zumi (Mari Imaizumi). Assignee set to ima1zumi (Mari Imaizumi) This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals It seems the NFC process is combining characters across U+11930, even though its CCC is 0. CC: @duerst ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559#change-114480 * Author: tompng (tomoya ishida) * Status: Open * Assignee: ima1zumi (Mari Imaizumi) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/