[ruby-core:122722] [Ruby Bug#21507] Regexp considers variable repetition quantifiers invalid in lookbehind

Issue #21507 has been reported by tiago-macedo (Tiago Macedo). ---------------------------------------- Bug #21507: Regexp considers variable repetition quantifiers invalid in lookbehind https://bugs.ruby-lang.org/issues/21507 * Author: tiago-macedo (Tiago Macedo) * Status: Open * ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- This is my first bug subscription, please feel free to tell me if I can do anything better. # Description Attempting to use "variable" repetition quantifiers (`?`, `+`,`*`,`{n,}`, ...) inside lookbehind anchors raises a **SyntaxError** (invalid pattern in look-behind), but it's perfectly viable to do it in lookafter anchors. Examples of lookafter working: ```ruby irb(main):100> "axb".split /(?=x)/ => ["a", "xb"] irb(main):101> "axb".split /(?=x?)/ => ["a", "x", "b"] irb(main):102> "axb".split /(?=x+)/ => ["a", "xb"] irb(main):103> "axb".split /(?=x*)/ => ["a", "x", "b"] irb(main):104> "axb".split /(?=x{1,})/ => ["a", "xb"] irb(main):105> "axb".split /(?=x{,1})/ => ["a", "x", "b"] irb(main):106> "axb".split /(?=x{1,2})/ => ["a", "xb"] ``` Examples of lookbehind **working only with non-variable metacharacters**: ```ruby irb(main):107> "axb".split /(?<=x)/ => ["ax", "b"] irb(main):108> "axb".split /(?<=x?)/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):108: invalid pattern in look-behind: /(?<=x?)/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):109> "axb".split /(?<=x*)/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):109: invalid pattern in look-behind: /(?<=x*)/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):110> "axb".split /(?<=x{1,})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):110: invalid pattern in look-behind: /(?<=x{1,})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):111> "axb".split /(?<=x{,1})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):111: invalid pattern in look-behind: /(?<=x{,1})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):112> "axb".split /(?<=x{1,2})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):112: invalid pattern in look-behind: /(?<=x{1,2})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):113> "axb".split /(?<=x{1})/ => ["ax", "b"] irb(main):114> "axb".split /(?<=x{1,1})/ => ["ax", "b"] ``` # Note I have searched on the internet and, to my knowledge, this behavior is not intended. (This documentation page on regular expressions)[https://ruby-doc.org/core-3.1.0/doc/regexp_rdoc.html], for example, does not say anything about limitations specific to lookbehinds. -- https://bugs.ruby-lang.org/

Issue #21507 has been updated by mame (Yusuke Endoh). Status changed from Open to Feedback This is currently an intended implementation limitation. This behavior comes from the specifications of Onigmo, which Ruby's regular expression engine is based on. The Onigmo documentation states the following about look-behinds: ``` (?<=subexp) look-behind (?<!subexp) negative look-behind Subexp of look-behind must be fixed-width. But top-level alternatives can be of various lengths. ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed. ``` https://github.com/k-takata/Onigmo/blob/1d7ee878b3e4a9e41bf9825c937ae6cf0a9c... I'm hesitant about whether we should add Onigmo's detailed implementation specifics to the Ruby documentation. However, seeing that there's already a precedent for it, I've opened a PR for now. https://github.com/ruby/ruby/pull/13857 ---------------------------------------- Bug #21507: Regexp considers variable repetition quantifiers invalid in lookbehind https://bugs.ruby-lang.org/issues/21507#change-114001 * Author: tiago-macedo (Tiago Macedo) * Status: Feedback * ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- This is my first bug subscription, please feel free to tell me if I can do anything better. # Description Attempting to use "variable" repetition quantifiers (`?`, `+`,`*`,`{n,}`, ...) inside lookbehind anchors raises a **SyntaxError** (invalid pattern in look-behind), but it's perfectly viable to do it in lookafter anchors. Examples of lookafter working: ```ruby irb(main):100> "axb".split /(?=x)/ => ["a", "xb"] irb(main):101> "axb".split /(?=x?)/ => ["a", "x", "b"] irb(main):102> "axb".split /(?=x+)/ => ["a", "xb"] irb(main):103> "axb".split /(?=x*)/ => ["a", "x", "b"] irb(main):104> "axb".split /(?=x{1,})/ => ["a", "xb"] irb(main):105> "axb".split /(?=x{,1})/ => ["a", "x", "b"] irb(main):106> "axb".split /(?=x{1,2})/ => ["a", "xb"] ``` Examples of lookbehind **working only with non-variable metacharacters**: ```ruby irb(main):107> "axb".split /(?<=x)/ => ["ax", "b"] irb(main):108> "axb".split /(?<=x?)/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):108: invalid pattern in look-behind: /(?<=x?)/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):109> "axb".split /(?<=x*)/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):109: invalid pattern in look-behind: /(?<=x*)/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):110> "axb".split /(?<=x{1,})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):110: invalid pattern in look-behind: /(?<=x{1,})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):111> "axb".split /(?<=x{,1})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):111: invalid pattern in look-behind: /(?<=x{,1})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):112> "axb".split /(?<=x{1,2})/ /var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):112: invalid pattern in look-behind: /(?<=x{1,2})/ (SyntaxError) from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' irb(main):113> "axb".split /(?<=x{1})/ => ["ax", "b"] irb(main):114> "axb".split /(?<=x{1,1})/ => ["ax", "b"] ``` # Note I have searched on the internet and, to my knowledge, this behavior is not intended. (This documentation page on regular expressions)[https://ruby-doc.org/core-3.1.0/doc/regexp_rdoc.html], for example, does not say anything about limitations specific to lookbehinds. -- https://bugs.ruby-lang.org/
participants (2)
-
mame (Yusuke Endoh)
-
tiago-macedo (Tiago Macedo)