[ruby-core:117990] [Ruby master Misc#20504] Interpolated string literal in regexp encoding handling

Issue #20504 has been reported by kddnewton (Kevin Newton). ---------------------------------------- Misc #20504: Interpolated string literal in regexp encoding handling https://bugs.ruby-lang.org/issues/20504 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here: ```ruby # encoding: us-ascii interp = "\x80" regexp = /#{interp}/ ``` the `regexp` variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation: ```ruby # encoding: us-ascii regexp = /#{"\x80"}/ ``` you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior? I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent. -- https://bugs.ruby-lang.org/

Issue #20504 has been updated by Eregon (Benoit Daloze). Agreed, the current behavior breaks referential transparency and unexpectedly analyzes string literals inside interpolated parts. This leads to extra confusion and I would think has no value in real-world usages of interpolated regexps (because it causes an error instead of none). So I think this is a bug and the implementation should not analyze those parts and consequently the behavior should be the same as with the extra local variable. ---------------------------------------- Misc #20504: Interpolated string literal in regexp encoding handling https://bugs.ruby-lang.org/issues/20504#change-108433 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here: ```ruby # encoding: us-ascii interp = "\x80" regexp = /#{interp}/ ``` the `regexp` variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation: ```ruby # encoding: us-ascii regexp = /#{"\x80"}/ ``` you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior? I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent. -- https://bugs.ruby-lang.org/

Issue #20504 has been updated by kddnewton (Kevin Newton). I'm fine with it analyzing the string literals, I would just prefer it take the same codepath as the interpolated variable case, in which it would produce an ascii-8bit regular expression as opposed to raising an error. ---------------------------------------- Bug #20504: Interpolated string literal in regexp encoding handling https://bugs.ruby-lang.org/issues/20504#change-108463 * Author: kddnewton (Kevin Newton) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here: ```ruby # encoding: us-ascii interp = "\x80" regexp = /#{interp}/ ``` the `regexp` variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation: ```ruby # encoding: us-ascii regexp = /#{"\x80"}/ ``` you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior? I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent. -- https://bugs.ruby-lang.org/

Issue #20504 has been updated by mame (Yusuke Endoh). Discussed at the dev meeting, and @matz said `/#{"\x80"}/` should not raise a SyntaxError but return a binary encoded regexp object. ---------------------------------------- Bug #20504: Interpolated string literal in regexp encoding handling https://bugs.ruby-lang.org/issues/20504#change-108686 * Author: kddnewton (Kevin Newton) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here: ```ruby # encoding: us-ascii interp = "\x80" regexp = /#{interp}/ ``` the `regexp` variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation: ```ruby # encoding: us-ascii regexp = /#{"\x80"}/ ``` you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior? I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent. -- https://bugs.ruby-lang.org/
participants (3)
-
Eregon (Benoit Daloze)
-
kddnewton (Kevin Newton)
-
mame (Yusuke Endoh)