[ruby-core:113660] [Ruby master Feature#19694] Add Regexp#timeout= setter

Issue #19694 has been reported by aharpole (Aaron Harpole). ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by austin (Austin Ziegler). Since you have to `.dup` a regexp literal, it might be better to just call `Regexp.new`: ```ruby emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x emoji_filter_pattern = Regexp.new(emoji_filter_pattern, timeout: 1.0).freeze ``` I am a little surprised that regex literals are frozen, but Regexp.new regexen aren’t. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103305 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by byroot (Jean Boussier).
I am a little surprised that regex literals are frozen, but Regexp.new regexen aren’t.
It's because: ```ruby def foo /foo/.object_id end ``` Always return the same instance. So if it's mutable, it allow to leak state where it's not expected. Another reason if that deduplicating literal regexp could save a decent amount of memory: #16557. They were frozen to allow deduplicating them, but the implementation is not there yet. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103310 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by nobu (Nobuyoshi Nakada). I agree with @austin, calling `new` with `timeout:` feels simpler and better than `dup` then set. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103313 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by Eregon (Benoit Daloze). On TruffleRuby all Regexp instances are frozen, notably because they are all globally cached and deduplicated, so this pattern with `dup` + setter cannot work. Also it seems clearly better that a Regexp timeout is passed when creating the Regexp, not after, semantically. I do get the point that `Regexp.new` is less nice than `/.../`. However if you are on 3.2 and the Regexp is linear, then the timeout is most likely unnecessary, and so in general I think it's more effective to work on making a Regexp linear than to use individual timeouts. `Regexp.with_timeout(5.0) do` could be a way, as you show, although it introduces fiber-local state just to pass probably to a single Regexp, so it feels like too much complexity and footprint for that. --- I think `Regexp.timeout=` is insufficent anyway to prevent ReDoS, because it needs to be a fairly high value to deal with e.g. variance in timings and CPU usage, scheduling, etc, and if a user manages to hit the timeout on every request they could likely still cause a ReDoS. I think the real solution is to make as many Regexps as possible linear, and warn when a Regexp is not linear (I still need to file a proposal for that). ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103316 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by byroot (Jean Boussier).
I do get the point that Regexp.new is less nice than /.../.
The main issue with `Regexp.new` is that it compile the regexp on each invocation, which is slow. So you can't just translate from one syntax to the other, you also have to make sure to keep it in a constant.
On TruffleRuby all Regexp instances are frozen, notably because they are all globally cached and deduplicated, so this pattern with dup + setter cannot work.
I assume you could have an internal regexp object that is deduplicated and immutable, and the actual `Regexp` object that is exposed could be a tuple of `(internal_regexp, timeout)`. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103317 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-5:
I assume you could have an internal regexp object that is deduplicated and immutable, and the actual `Regexp` object that is exposed could be a tuple of `(internal_regexp, timeout)`.
Not really, it would be very messy and have a performance cost. The reason is we want to embed Regexp objects in the context-independent AST (for persisting JITed code), and so it must not refer to context-specific state. We could have two types, one for immutable regexps and one for mutable regexps, we do this for String, but it is a nightmare and is the only type I will do that for. Similarly, Ractor wants to be able to use Regexp objects on any Ractor, the requirement for that is that as many Regexp as possible are immutable. Probably we should make all Regexps immutable in CRuby too for consistency (both across Rubies and between literal/non-literal) and it would benefit Ractor. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103319 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by janosch-x (Janosch Müller). I think it is better if no code can mutate the timeout of the Regexps that are passed into it, even if that affected only dupped or non-literal Regexps. If it is really necessary to set custom timeouts on literals, maybe something like this could work: ```ruby regexp = Regexp.with_timeout(2.0) { /foo/ } regexp.timeout # => 2.0 ``` ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103343 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by Eregon (Benoit Daloze). janosch-x (Janosch Müller) wrote in #note-7:
```ruby regexp = Regexp.with_timeout(2.0) { /foo/ } regexp.timeout # => 2.0 ```
That, and as proposed in the description, doesn't really work if literal Regexps are created at parse time, before execution. This is the case on CRuby: ``` $ ruby --disable-gems -e 'pp ObjectSpace.count_objects; O=Object.new; R=/a/' | grep REGEX :T_REGEXP=>3, $ ruby --disable-gems -e 'pp ObjectSpace.count_objects; O=Object.new' | grep REGEX :T_REGEXP=>2, ``` and it is the case on TruffleRuby as well. Also it could be confusing for `[2.0, 4.0].each { |t| Regexp.with_timeout(t) { /foo/ } }` (it would either set it to 2.0 or to the global timeout, never to 4.0). Furthermore, even if that worked, it would then break Regexp interning, where the timeout at one place would affect another literal Regexp with the same pattern. Basically, I think there is no way besides `Regexp.new(pattern, timeout: t)` if you want a custom timeout for a Regexp. Literal Regexp are created too early to set anything. And adding state (timeout=) to Regexp feels wrong, since most instances are already immutable, and they might become all immutable. I would suggest to close this, `Regexp.new("a", timeout: 2.0)` already works and I think there is no alternative that works well to set the timeout per Regexp. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103347 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by janosch-x (Janosch Müller). I guess the only noteworthy argument for a change goes like this: A custom `timeout` only being available on `Regexp::new` might lead people to write less performant code. I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway. Maybe there could be an instance method, `Regexp#with_timeout`, that creates *and memoizes* a copy with a specific timeout? This also sounds a bit hacky, though... ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103350 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by nobu (Nobuyoshi Nakada). janosch-x (Janosch Müller) wrote in #note-9:
I guess the only noteworthy argument for a change goes like this:
A custom `timeout` only being available on `Regexp::new` might lead people to write less performant code.
I made a patch to improve `Regexp.new(/RE/)` (and `Regexp#dup`). https://github.com/nobu/ruby/tree/re_copy Iteration per second (i/s) against master 441302be1a: | |compare-ruby|built-ruby| |:-------|-----------:|---------:| |dup | 16.684k| 817.776k| | | -| 49.02x| |string | 16.520k| 16.538k| | | -| 1.00x| |regexp | 16.365k| 842.451k| | | -| 51.48x|
I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway.
A regexp with interpolation can be replaced with `Regexp.new` at almost same performance. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103486 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by byroot (Jean Boussier).
I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).
Interesting. Given that literal regexp are frozen, and even for unfrozen ones most of their state is immutable, have you considered using Copy on Write at the Ruby object level, like `Array#dup` / `String#dup` ? Even if copying the bytes is relatively fast, if used in a tight loop it may cause some `malloc` churn. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103491 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by Eregon (Benoit Daloze). janosch-x (Janosch Müller) wrote in #note-9:
A custom `timeout` only being available on `Regexp::new` might lead people to write less performant code.
I think it is very well known and easy to know though profiling that one should always cache the result of `Regexp.new` (e.g., in a constant). The docs should point this out though.
I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway.
Actually TruffleRuby inline-caches interpolated Regexp creation, so if it's the same pattern then the same Regexp is used. In the OP's case it seems easy to store the Regexp in a constant.
Maybe there could be an instance method, `Regexp#with_timeout`, that creates *and memoizes* a copy with a specific timeout? This also sounds a bit hacky, though...
I think timeouts are inefficient and insufficient to address ReDoS. IMO the real solution is #19720. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103497 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by nobu (Nobuyoshi Nakada). byroot (Jean Boussier) wrote in #note-12:
I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).
Interesting. Given that literal regexp are frozen, and even for unfrozen ones most of their state is immutable, have you considered using Copy on Write at the Ruby object level, like `Array#dup` / `String#dup` ?
Do you mean compiled pattern and so on in `OnigRegexType`? They are never changed once initialized until destruction, "Copy-on-Write" won't be a proper word. Currently `timelimit` is embedded at the same level as other fields, so the struct must be reconfigured to share other fields.
Even if copying the bytes is relatively fast, if used in a tight loop it may cause some `malloc` churn.
It is faster than re-parsing the source at least. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103504 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/

Issue #19694 has been updated by byroot (Jean Boussier).
They are never changed once initialized until destruction, "Copy-on-Write" won't be a proper word.
Right, I just meant directly sharing that structure, or more accurately having some `T_STRUCT` simply keeping a reference to another `T_STRUCT`. Making `Regexp.dup` and `Regexp.new(Regexp)` a simple embeded `T_STRUCT` allocation, like for `String#dup`. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103505 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/
participants (9)
-
aharpole (Aaron Harpole)
-
austin (Austin Ziegler)
-
byroot (Jean Boussier)
-
byroot (Jean Boussier)
-
Eregon (Benoit Daloze)
-
Eregon (Benoit Daloze)
-
janosch-x
-
nobu (Nobuyoshi Nakada)
-
nobu (Nobuyoshi Nakada)