Issue #19908 has been updated by duerst (Martin Dürst).
There is a serious issue than just whether using an '_' or an '=' in the
property: Unicode 15.1 makes some serious changes to grapheme clusters.
Our implementation (function 'node_extended_grapheme_cluster' in regparse.c) is
based on Unicode 11.0, in particular
https://www.unicode.org/reports/tr29/tr29-33.html#Grapheme_Cluster_Boundari…. This is
quite a bit different from the current version at
https://www.unicode.org/reports/tr29/tr29-43.html#Grapheme_Cluster_Boundari…. One major
difference is that for Unicode 11.0, there was a regular expression for grapheme clusters,
which I just implemented in the above function. Unicode 15.1 just says that it's
possible to use a regular expression, but doesn't give this regular expression.
From reading through
https://www.unicode.org/versions/Unicode15.1.0/#Migration, that's
the main issue affecting Ruby.
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-105854
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with
values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/