Issue #19720 has been updated by Eregon (Benoit Daloze).
duerst (Martin Dürst) wrote in #note-5:
Introducing such a warning might be a good idea. But
there are several issues:
Thanks for the feedback.
1) The warning should only be used when asked for with
an option (i.e. default off).
Yes, as I mentioned in the description, it would be opt-in.
2) To a very large extent, whether a regular
expression is linear or not depends on the implementation. (The recent improvements to the
CRuby implementation show that very clearly). Does that mean that different
implementations would warn for different regular expressions?
Yes it can differ, although the restrictions for linear engines are fairly similar, see
https://bugs.ruby-lang.org/issues/19104#note-3.
3) In some cases, it may not be possible to
conclusively say whether a regular expression will run in linear time or not. The proposed
warning text makes this clear with the word "might".
4) Non-linear can be quadratic, cubic, or exponential, and so on. A quadratic case on
data with limited length (e.g. out of a database with fixed field lengths) might be
absolutely harmless. Even an exponential case on very short data can be harmless.
In general, anything non-linear with user input is highly susceptible to DoS. Even
quadratic is often enough to be problematic.
5) In many cases, the only person who would do a DoS
attack would be the programmer him/herself.
There were about a dozen ReDoS CVEs in the last 1-2 years. I don't think the reporters
were the gem maintainers.
I would think in some cases it was reported by people who actually got a ReDoS in
production (it seems the typical way these things get noticed).
6) Overall, careful design and implementation is
needed to make sure that this doesn't become a "crying wolf" warning that
quickly gets deactivated and then no longer helps anybody.
I think any company or individual which cares about security should take these warnings
seriously.
We need to make them actionable though so they get resolved.
At least we should address the non-linear Regexps in default and bundled gems, as well as
in RubyGems.
For the RubyGems Regexps above, the issue is the atomic groups.
One idea would be to check if the Regexp without atomic groups is `Regexp.linear_time?`
and if so use it, that seems easy.
Another idea would be a way to specify atomic groups for some Regexp(s) can be ignored for
semantics and are only meant for performance (seems almost always the case). That would
then use the linear engine if the Regexp with atomic groups changed to non-capturing
groups can run on that linear engine, otherwise use the original regexp with atomic groups
and warn.
----------------------------------------
Feature #19720: Warning for non-linear Regexps
https://bugs.ruby-lang.org/issues/19720#change-103499
* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
----------------------------------------
I believe the best way to solve ReDoS is to ensure all Regexps used in the process are
linear.
Using `Regexp.timeout = 5.0` or so does not really prevent ReDoS, given enough requests
causing that timeout the servers will still be very unresponsive.
To this purpose, we should make it easy to identify non-linear Regexps and fix them.
I suggest we either use
1. a performance warning (enabled with `Warning[:performance] = true`, #19538) or
2. a new regexp warning category (enabled with `Warning[:regexp] = true`).
I think we should warn only once per non-linear Regexp, to avoid too many such warnings.
We could warn as soon as the Regexp is created, or on first match.
On first match might makes more sense for Ruby implementations which compile the Regexp
lazily (since that is costly during startup), and also avoids warning for Regexps which
are never used (which can be good or bad).
OTOH, if the warning is enabled, we could always compile the Regexp eagerly (or at least
checks whether it's linear), and that would then provide a better way to guarantee
that all Regexps created so far are linear.
Because warnings are easily customizable, it is also possible to e.g. `raise/abort` on
such a warning, if one wants to ensure their application does not use a non-linear Regexp
and so cannot be vulnerable to ReDoS:
```ruby
Warning.extend Module.new {
def warn(message, category: nil, **)
raise message if category == :regexp
super
end
}
```
A regexp warning category seems better for that as it makes it easy to filter by category,
if a performance warning one would need to match the message which is less clean.
As a note, TruffleRuby already has a similar warning, as a command-line option:
```
$ truffleruby --experimental-options --warn-truffle-regex-compile-fallback -e
'Gem'
truffleruby-dev/lib/mri/rubygems/version.rb:176: warning: Regexp
/\A\s*([0-9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?)?\s*\z/
at_start=false encoding=US-ASCII requires backtracking and will not match in linear time
truffleruby-dev/lib/mri/rubygems/requirement.rb:105: warning: Regexp
/\A\s*(=|!=|>|<|>=|<=|~>)?\s*([0-9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?)\s*\z/
at_start=false encoding=US-ASCII requires backtracking and will not match in linear time
```
So the warning message could be like
`FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might not match in linear
time and might cause ReDoS`
or more concise:
`FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might cause ReDoS`
--
https://bugs.ruby-lang.org/