
Issue #19908 has been updated by duerst (Martin Dürst). There is a serious issue than just whether using an '_' or an '=' in the property: Unicode 15.1 makes some serious changes to grapheme clusters. Our implementation (function 'node_extended_grapheme_cluster' in regparse.c) is based on Unicode 11.0, in particular https://www.unicode.org/reports/tr29/tr29-33.html#Grapheme_Cluster_Boundarie.... This is quite a bit different from the current version at https://www.unicode.org/reports/tr29/tr29-43.html#Grapheme_Cluster_Boundarie.... One major difference is that for Unicode 11.0, there was a regular expression for grapheme clusters, which I just implemented in the above function. Unicode 15.1 just says that it's possible to use a regular expression, but doesn't give this regular expression. From reading through https://www.unicode.org/versions/Unicode15.1.0/#Migration, that's the main issue affecting Ruby. ---------------------------------------- Feature #19908: Update to Unicode 15.1 https://bugs.ruby-lang.org/issues/19908#change-105854 * Author: nobu (Nobuyoshi Nakada) * Status: Assigned * Priority: Normal * Assignee: duerst (Martin Dürst) ---------------------------------------- The Unicode 15.1 is released. The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values. I'm not sure how these properties should be handled well. `/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file? https://github.com/nobu/ruby/tree/unicode-15.1 is the former. -- https://bugs.ruby-lang.org/