[ruby-core:115588] [Ruby master Bug#20039] Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error

4 Dec 2023

      Issue #20039 has been reported by dbrown9@gmail.com (Dustin Brown).

----------------------------------------
Bug #20039: Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error
https://bugs.ruby-lang.org/issues/20039

* Author: dbrown9@gmail.com (Dustin Brown)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.3.0dev (2023-12-03 master 85bc80a)
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
Matching a US-ASCII string to a UTF-8 encoded regexp with multibyte characters works as expected.

```ruby
re = Regexp.new("\u2018".encode("UTF-8"))
"".encode("US-ASCII").match?(re) 

=> false
```

However, if that regexp is used to initialize a new regexp, the comparison fails with a Invalid mutibyte character error.

```ruby
re = Regexp.new("\u2018".encode("UTF-8"))
"".encode("US-ASCII").match?(Regexp.new(re))

=> ArgumentError: regexp preprocess failed: invalid multibyte character
```

-- 
https://bugs.ruby-lang.org/

[ruby-core:115588] [Ruby master Bug#20039] Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error

dbrown9＠gmail.com (Dustin Brown)