
Issue #20039 has been reported by dbrown9@gmail.com (Dustin Brown). ---------------------------------------- Bug #20039: Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error https://bugs.ruby-lang.org/issues/20039 * Author: dbrown9@gmail.com (Dustin Brown) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0dev (2023-12-03 master 85bc80a) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Matching a US-ASCII string to a UTF-8 encoded regexp with multibyte characters works as expected. ```ruby re = Regexp.new("\u2018".encode("UTF-8")) "".encode("US-ASCII").match?(re) => false ``` However, if that regexp is used to initialize a new regexp, the comparison fails with a Invalid mutibyte character error. ```ruby re = Regexp.new("\u2018".encode("UTF-8")) "".encode("US-ASCII").match?(Regexp.new(re)) => ArgumentError: regexp preprocess failed: invalid multibyte character ``` -- https://bugs.ruby-lang.org/