[ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible

Issue #21559 has been reported by tompng (tomoya ishida). ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559 * Author: tompng (tomoya ishida) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/

Issue #21559 has been updated by nobu (Nobuyoshi Nakada). ```ruby "s\u{11930 323 11930 307}".unicode_normalize(:nfc).dump #=> "\u1E69\u{11930}\u{11930}" "s\u{323 307}".unicode_normalize(:nfc).dump #=> "\u1E69" ``` Are U+0323 and U+0307 composed to `s` jumping over U+11930? ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559#change-114479 * Author: tompng (tomoya ishida) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/

Issue #21559 has been updated by ima1zumi (Mari Imaizumi). Assignee set to ima1zumi (Mari Imaizumi) This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals It seems the NFC process is combining characters across U+11930, even though its CCC is 0. CC: @duerst ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559#change-114480 * Author: tompng (tomoya ishida) * Status: Open * Assignee: ima1zumi (Mari Imaizumi) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/

Issue #21559 has been updated by duerst (Martin Dürst). Assignee changed from ima1zumi (Mari Imaizumi) to duerst (Martin Dürst) @ima1zumi Not sure this is even allowed, but I'm sure I'm responsible for this behavior, and want to fix it myself, so I change the Assignee to myself. ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559#change-114486 * Author: tompng (tomoya ishida) * Status: Open * Assignee: duerst (Martin Dürst) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/

Issue #21559 has been updated by ima1zumi (Mari Imaizumi). @duerst Thank you, I appreciate you taking care of it. ---------------------------------------- Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible https://bugs.ruby-lang.org/issues/21559#change-114496 * Author: tompng (tomoya ishida) * Status: Open * Assignee: duerst (Martin Dürst) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't. ~~~ruby # Ruby 3.1 - 3.5 str = "s\u{11930}\u{323}\u{11930}\u{307}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ ~~~ruby # ruby 3.5.0dev str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}" p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd) #=> false ~~~ -- https://bugs.ruby-lang.org/
participants (4)
-
duerst
-
ima1zumi (Mari Imaizumi)
-
nobu (Nobuyoshi Nakada)
-
tompng (tomoya ishida)