
Issue #19694 has been updated by nobu (Nobuyoshi Nakada). byroot (Jean Boussier) wrote in #note-12:
I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).
Interesting. Given that literal regexp are frozen, and even for unfrozen ones most of their state is immutable, have you considered using Copy on Write at the Ruby object level, like `Array#dup` / `String#dup` ?
Do you mean compiled pattern and so on in `OnigRegexType`? They are never changed once initialized until destruction, "Copy-on-Write" won't be a proper word. Currently `timelimit` is embedded at the same level as other fields, so the struct must be reconfigured to share other fields.
Even if copying the bytes is relatively fast, if used in a tight loop it may cause some `malloc` churn.
It is faster than re-parsing the source at least. ---------------------------------------- Feature #19694: Add Regexp#timeout= setter https://bugs.ruby-lang.org/issues/19694#change-103504 * Author: aharpole (Aaron Harpole) * Status: Open * Priority: Normal ---------------------------------------- # Abstract In addition to allowing for a Regexp timeout to be set on individual instances by setting a `timeout` argument in `Regexp.new`, I'm proposing that we also allow setting the timeout on Regexp objects with a `#timeout=` setter. # Background To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the `timeout` keyword argument was added to `Regexp.new`, this isn't always a viable option. In the case of regex literal syntax (`/ab*/` or `%r{ab*}`, for instance), it's not possible to set a timeout at all right now without converting to `Regexp.new`, which may be awkward depending on the contents of the regex. It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized. Finally, because we offer a `Regexp#timeout` getter, for consistency it would be nice to also offer a setter. The introduction of a `Regexp#timeout=` setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification. # Proposal I propose that we add the method `Regexp#timeout=`. It works the same way the `timeout` argument works in `Regexp.new`, taking either a float or nil. This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to `dup` them first): ``` emoji_filter_pattern = %r{ (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)}) #{EmojiFilter.unicodes_pattern} (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })}) }x.dup emoji_filter_pattern.timeout = 1.0 emoji_filter_pattern.freeze ``` # Implementation This setter has been implemented in https://github.com/ruby/ruby/pull/7847. # Evaluation It's just a setter, so pretty straightforward in terms of implementation and use. # Discussion It's worth considering other options for overriding `Regexp.timeout`. I'd love to see something like the following for overriding regexp timeouts as well: ``` Regexp.timeout = 1.0 Regexp.with_timeout(5.0) do evaluate_slower_regexes end ``` It's possible to implement something like `Regexp.with_timeout` but it's not thread-safe by default since it would involve overwriting `Regexp.timeout`. # Summary Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global `Regexp.timeout` by making it simpler to adjust timeouts on a regex by regex basis. It's a minor change but the added consistency and flexibility help us optimize for developer happiness. -- https://bugs.ruby-lang.org/