[ruby-core:118487] [Ruby master Bug#20617] /\pArabic/ character property doesn't match certain Arabic characters

Issue #20617 has been reported by kytrinyx (Katrina Owen). ---------------------------------------- Bug #20617: /\pArabic/ character property doesn't match certain Arabic characters https://bugs.ruby-lang.org/issues/20617 * Author: kytrinyx (Katrina Owen) * Status: Open * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin21] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I am not sure this is a bug. On some occasions I have Arabic text, but the Arabic character property rejects it as being Arabic. Example: ``` str = "شغل مرحلة أولى ، جداً؟" /^\p{Arabic}$/.match(str).inspect # => nil str.chars.reject {|char| /\p{Arabic}/.match(char)}.uniq # arabic space, arabic comma, arabic question mark, and arabic fatahan ``` This isn't a problem, since I defined my own regex to include the missing characters, but wanted to raise it in case it is, in fact, a bug. -- https://bugs.ruby-lang.org/

Issue #20617 has been updated by alanwu (Alan Wu). Status changed from Open to Closed The "Arabic" property is a "scripts" property, which doesn't include punctuations: https://www.unicode.org/standard/supported.html Ruby documentation for Unicode properties is here: https://docs.ruby-lang.org/en/3.3/regexp/unicode_properties_rdoc.html The Regexp class level documentation has more general information about matching with Unicode properties. A way to additionally match the punctuations in your test string is by matching their [Unicode block]: ```ruby "شغلمرحلةأولى،جداً؟".chars.all? { /\p{In_Arabic}/.match?(_1) } # => true ``` [Unicode block]: https://en.wikipedia.org/wiki/Unicode_block ---------------------------------------- Bug #20617: /\pArabic/ character property doesn't match certain Arabic characters https://bugs.ruby-lang.org/issues/20617#change-109013 * Author: kytrinyx (Katrina Owen) * Status: Closed * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin21] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I am not sure this is a bug. On some occasions I have Arabic text, but the Arabic character property rejects it as being Arabic. Example: ``` str = "شغل مرحلة أولى ، جداً؟" /^\p{Arabic}$/.match(str).inspect # => nil str.chars.reject {|char| /\p{Arabic}/.match(char)}.uniq # arabic space, arabic comma, arabic question mark, and arabic fatahan ``` This isn't a problem, since I defined my own regex to include the missing characters, but wanted to raise it in case it is, in fact, a bug. -- https://bugs.ruby-lang.org/

Issue #20617 has been updated by duerst (Martin Dürst). `(\p{In_Arabic}` may not be enough. There are 8 blocks with a name containing 'Arabic'. For details, see e.g. https://www.unicode.org/Public/15.1.0/ucd/Blocks.txt. They would be selectable with: `\p{In_Arabic}|\p{In_Arabic_Extended_A}|\p{In_Arabic_Extended_B}|\p{In_Arabic_Extended_C}|\p{In_Arabic_Mathematical_Alphabetic_Symbols}|\p{In_Arabic_Presentation_Forms_A}|\p{In_Arabic_Presentation_Forms_B}|\p{In_Arabic_Supplement})`. ---------------------------------------- Bug #20617: /\pArabic/ character property doesn't match certain Arabic characters https://bugs.ruby-lang.org/issues/20617#change-109035 * Author: kytrinyx (Katrina Owen) * Status: Closed * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin21] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I am not sure this is a bug. On some occasions I have Arabic text, but the Arabic character property rejects it as being Arabic. Example: ``` str = "شغل مرحلة أولى ، جداً؟" /^\p{Arabic}$/.match(str).inspect # => nil str.chars.reject {|char| /\p{Arabic}/.match(char)}.uniq # arabic space, arabic comma, arabic question mark, and arabic fatahan ``` This isn't a problem, since I defined my own regex to include the missing characters, but wanted to raise it in case it is, in fact, a bug. -- https://bugs.ruby-lang.org/
participants (3)
-
alanwu (Alan Wu)
-
duerst
-
kytrinyx (Katrina Owen)