December 2022 - ruby-core - ml.ruby-lang.org

[ruby-core:111262] [Ruby master Bug#19195] Pattern match pin becomes syntax error if there is newline before closing paren
by tompng (tomoya ishida) 12 Dec '22

12 Dec '22

Issue #19195 has been reported by tompng (tomoya ishida). ---------------------------------------- Bug #19195: Pattern match pin becomes syntax error if there is newline before closing paren https://bugs.ruby-lang.org/issues/19195 * Author: tompng (tomoya ishida) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Syntax OK ~~~ruby 1 in ^( 1) # => true ~~~ Syntax error. I think it should be syntax ok. ~~~ruby 1 in ^( 1 ) # => syntax error, unexpected '\n', expecting ')' ~~~ -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111259] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
by hsbt (Hiroshi SHIBATA) 12 Dec '22

12 Dec '22

Issue #19007 has been updated by hsbt (Hiroshi SHIBATA). Thanks. I create the [3.3 milestone](https://bugs.ruby-lang.org/versions/71) ---------------------------------------- Bug #19007: Unicode tables differences from Unicode.org 14.0 data https://bugs.ruby-lang.org/issues/19007#change-100569 * Author: nobu (Nobuyoshi Nakada) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * ruby -v: 3.2.0 6898984f1cd * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED ---------------------------------------- I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it. Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow. `CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems. But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org. According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/. [Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ec… ```diff diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h index 99a3eeca190..f49e5cd7273 100644 --- a/enc/unicode/14.0.0/name2ctype.h +++ b/enc/unicode/14.0.0/name2ctype.h @@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = { /* 'Lower': [[:Lower:]] */ static const OnigCodePoint CR_Lower[] = { - 664, + 668, 0x0061, 0x007a, 0x00aa, 0x00aa, 0x00b5, 0x00b5, @@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10cc0, 0x10cf2, 0x118c0, 0x118df, 0x16e60, 0x16e7f, @@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = { /* 'Cased': Derived Property */ static const OnigCodePoint CR_Cased[] = { - 151, + 155, 0x0041, 0x005a, 0x0061, 0x007a, 0x00aa, 0x00aa, @@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10c80, 0x10cb2, 0x10cc0, 0x10cf2, 0x118a0, 0x118df, @@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = { /* 'Other_Lowercase': Binary Property */ static const OnigCodePoint CR_Other_Lowercase[] = { - 20, + 24, 0x00aa, 0x00aa, 0x00ba, 0x00ba, 0x02b0, 0x02b8, @@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = { 0xa770, 0xa770, 0xa7f8, 0xa7f9, 0xab5c, 0xab5f, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, }; /* CR_Other_Lowercase */ /* 'Other_Uppercase': Binary Property */ @@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = { /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { - 161, + 160, 0x0903, 0x0903, 0x093b, 0x093b, 0x093e, 0x0940, @@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { 0x116ac, 0x116ac, 0x116ae, 0x116af, 0x116b6, 0x116b6, - 0x11720, 0x11721, 0x11726, 0x11726, 0x1182c, 0x1182e, 0x11838, 0x11838, ``` -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111258] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
by duerst 12 Dec '22

12 Dec '22

Issue #19007 has been updated by duerst (Martin Dürst). hsbt (Hiroshi SHIBATA) wrote in #note-6: > @duerst Is there any action for Ruby 3.2 related this? If there is nothing to do for Ruby 3.2, I'll remove this from Ruby 3.2 milestone. No, nothing should be needed for Ruby 3.2. I removed it myself. BTW, can we add a "3.3" target version? ---------------------------------------- Bug #19007: Unicode tables differences from Unicode.org 14.0 data https://bugs.ruby-lang.org/issues/19007#change-100566 * Author: nobu (Nobuyoshi Nakada) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * ruby -v: 3.2.0 6898984f1cd * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED ---------------------------------------- I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it. Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow. `CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems. But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org. According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/. [Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ec… ```diff diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h index 99a3eeca190..f49e5cd7273 100644 --- a/enc/unicode/14.0.0/name2ctype.h +++ b/enc/unicode/14.0.0/name2ctype.h @@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = { /* 'Lower': [[:Lower:]] */ static const OnigCodePoint CR_Lower[] = { - 664, + 668, 0x0061, 0x007a, 0x00aa, 0x00aa, 0x00b5, 0x00b5, @@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10cc0, 0x10cf2, 0x118c0, 0x118df, 0x16e60, 0x16e7f, @@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = { /* 'Cased': Derived Property */ static const OnigCodePoint CR_Cased[] = { - 151, + 155, 0x0041, 0x005a, 0x0061, 0x007a, 0x00aa, 0x00aa, @@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10c80, 0x10cb2, 0x10cc0, 0x10cf2, 0x118a0, 0x118df, @@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = { /* 'Other_Lowercase': Binary Property */ static const OnigCodePoint CR_Other_Lowercase[] = { - 20, + 24, 0x00aa, 0x00aa, 0x00ba, 0x00ba, 0x02b0, 0x02b8, @@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = { 0xa770, 0xa770, 0xa7f8, 0xa7f9, 0xab5c, 0xab5f, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, }; /* CR_Other_Lowercase */ /* 'Other_Uppercase': Binary Property */ @@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = { /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { - 161, + 160, 0x0903, 0x0903, 0x093b, 0x093b, 0x093e, 0x0940, @@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { 0x116ac, 0x116ac, 0x116ae, 0x116af, 0x116b6, 0x116b6, - 0x11720, 0x11721, 0x11726, 0x11726, 0x1182c, 0x1182e, 0x11838, 0x11838, ``` -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111257] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
by hsbt (Hiroshi SHIBATA) 12 Dec '22

12 Dec '22

Issue #19007 has been updated by hsbt (Hiroshi SHIBATA). @duerst Is there any action for Ruby 3.2 related this? If there is nothing to do for Ruby 3.2, I'll remove this from Ruby 3.2 milestone. ---------------------------------------- Bug #19007: Unicode tables differences from Unicode.org 14.0 data https://bugs.ruby-lang.org/issues/19007#change-100561 * Author: nobu (Nobuyoshi Nakada) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * Target version: 3.2 * ruby -v: 3.2.0 6898984f1cd * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED ---------------------------------------- I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it. Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow. `CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems. But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org. According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/. [Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ec… ```diff diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h index 99a3eeca190..f49e5cd7273 100644 --- a/enc/unicode/14.0.0/name2ctype.h +++ b/enc/unicode/14.0.0/name2ctype.h @@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = { /* 'Lower': [[:Lower:]] */ static const OnigCodePoint CR_Lower[] = { - 664, + 668, 0x0061, 0x007a, 0x00aa, 0x00aa, 0x00b5, 0x00b5, @@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10cc0, 0x10cf2, 0x118c0, 0x118df, 0x16e60, 0x16e7f, @@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = { /* 'Cased': Derived Property */ static const OnigCodePoint CR_Cased[] = { - 151, + 155, 0x0041, 0x005a, 0x0061, 0x007a, 0x00aa, 0x00aa, @@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10c80, 0x10cb2, 0x10cc0, 0x10cf2, 0x118a0, 0x118df, @@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = { /* 'Other_Lowercase': Binary Property */ static const OnigCodePoint CR_Other_Lowercase[] = { - 20, + 24, 0x00aa, 0x00aa, 0x00ba, 0x00ba, 0x02b0, 0x02b8, @@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = { 0xa770, 0xa770, 0xa7f8, 0xa7f9, 0xab5c, 0xab5f, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, }; /* CR_Other_Lowercase */ /* 'Other_Uppercase': Binary Property */ @@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = { /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { - 161, + 160, 0x0903, 0x0903, 0x093b, 0x093b, 0x093e, 0x0940, @@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = { 0x116ac, 0x116ac, 0x116ae, 0x116af, 0x116b6, 0x116b6, - 0x11720, 0x11721, 0x11726, 0x11726, 0x1182c, 0x1182e, 0x11838, 0x11838, ``` -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111256] [Ruby master Bug#19136] OpenSSL::PKey::EC.check_key is useless when linked against OpenSSL 3
by hsbt (Hiroshi SHIBATA) 12 Dec '22

12 Dec '22

Issue #19136 has been updated by hsbt (Hiroshi SHIBATA). Status changed from Open to Third Party's Issue This issue has been filed at https://github.com/ruby/openssl/issues/563 ---------------------------------------- Bug #19136: OpenSSL::PKey::EC.check_key is useless when linked against OpenSSL 3 https://bugs.ruby-lang.org/issues/19136#change-100560 * Author: bannable (Joe Truba) * Status: Third Party's Issue * Priority: Normal * ruby -v: 3.1.2, 2.7.2 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- When calling `#check_key` against an `OpenSSL::PKey::EC` instance representing an invalid point for the group, the method always returns true. I believe this is because OpenSSL 3 deprecated `EC_KEY_check_key`, and the underlying call is swapped out for `EVP_PKEY_public_check` in `ruby/openssl` 3+. However, the `EVP_PKEY_public_check` does not serve the same purpose as `EC_KEY_check_key`. `EVP_PKEY_public_check` validates only the resulting public component, and does not validate the private component. **Reproducer** ```ruby # check.rb ver = ARGV[0] gem 'openssl', ver require 'openssl' # ECDSA secp384r1 encoded key where the point is not on the curve pem = <<~INVALID_KEY -----BEGIN EC PRIVATE KEY----- MIGkAgEBBDDA1Tm0m7YhkfeVpFuarAJYVlHp2tQj+1fOBiLa10t9E8TiQO/hVfxB vGaVEQwOheWgBwYFK4EEACKhZANiAASyGqmryZGqdpsq5gEDIfNvgC3AwSJxiBCL XKHBTFRp+tCezLDOK/6V8KK/vVGBJlGFW6/I7ahyXprxS7xs7hPA9iz5YiuqXlu+ lbrIpZOz7b73hyQQCkvbBO/Avg+hPAk= -----END EC PRIVATE KEY----- INVALID_KEY begin result = OpenSSL::PKey::EC.new(pem).check_key rescue => e result = e.message end puts format('%25s: %s','RUBY_VERSION', RUBY_VERSION) puts format('%25s: %s','OPENSSL_LIBRARY_VERSION', OpenSSL::OPENSSL_LIBRARY_VERSION) puts format('%25s: %s','OPENSSL_VERSION', OpenSSL::VERSION) puts format('%25s: %s','result', result) ``` **OpenSSL 1.1.1** ``` $ rvm 2.7 do ruby check.rb 2.1.2 RUBY_VERSION: 2.7.2 OPENSSL_LIBRARY_VERSION: OpenSSL 1.1.1f 31 Mar 2020 OPENSSL_VERSION: 2.1.2 result: EC_KEY_check_key: invalid private key $ rvm 2.7 do ruby check.rb 3.0.1 RUBY_VERSION: 2.7.2 OPENSSL_LIBRARY_VERSION: OpenSSL 1.1.1f 31 Mar 2020 OPENSSL_VERSION: 3.0.1 result: EVP_PKEY_public_check: invalid private key $ rvm 3.1.2 do ruby check.rb RUBY_VERSION: 3.1.2 OPENSSL_LIBRARY_VERSION: OpenSSL 1.1.1f 31 Mar 2020 OPENSSL_VERSION: 3.0.1 result: EVP_PKEY_public_check: invalid private key ``` **OpenSSL 3.0.2** ```ruby $ ruby check.rb RUBY_VERSION: 3.1.2 OPENSSL_LIBRARY_VERSION: OpenSSL 3.0.2 15 Mar 2022 OPENSSL_VERSION: 3.0.1 result: true $ ruby check.rb 3.0.0 RUBY_VERSION: 3.1.2 OPENSSL_LIBRARY_VERSION: OpenSSL 3.0.2 15 Mar 2022 OPENSSL_VERSION: 3.0.0 result: true ``` -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111248] [Ruby master Bug#19192] IO has third data mode, document is incomplete.
by YO4 (Yoshinao Muramatsu) 11 Dec '22

11 Dec '22

Issue #19192 has been reported by YO4 (Yoshinao Muramatsu). ---------------------------------------- Bug #19192: IO has third data mode, document is incomplete. https://bugs.ruby-lang.org/issues/19192 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- The documentation on the mode parameter of File.open is incomplete, I would like to clarify IO's data mode actual behavior here. document says ``` To specify whether data is to be treated as text or as binary data, either of the following may be suffixed to any of the string read/write modes above: 't': Text data; sets the default external encoding to Encoding::UTF_8; on Windows, enables conversion between EOL and CRLF and enables interpreting 0x1A as an end-of-file marker. 'b': Binary data; sets the default external encoding to Encoding::ASCII_8BIT; on Windows, suppresses conversion between EOL and CRLF and disables interpreting 0x1A as an end-of-file marker. If neither is given, the stream defaults to text data. ``` But actually it's more complicated than that. There is three Data Mode * text mode. Can convert encoding and newline. * binary mode. Cannot convert encoding nor newline. Encoding is treated as Encoding::ASCII_8BIT. * third mode: DOS TEXT mode. That enables conversion between EOL and CRLF and enables interpreting 0x1A as an end-of-file marker. On Windows platform 't' textmode with universal newline conversion. 'b' binary mode. If neither is given, DOS TEXT mode. On other platforms 't' textmode with universal newline conversion. 'b' binary mode. If neither is given, textmode without newline conversion. On Windows, there are some special cases. If Encoding conversion is specified, DOS TEXT mode is ignored and universal newline conversion applied. If access mode is "a+", last (only one) EOF charactor is overwritten when DOS TEXT mode. There are more parameter combinations, see https://gist.github.com/YO4/262e9bd5e44a37a7a2fa9118e271b30b Is this all? I have not fully investigated. Since the topic of data mode spanned access mode and encoding conversion, I don't think my English skills will allow me to summarize this into rdoc without breaking something... -- https://bugs.ruby-lang.org/

2 1

[ruby-core:111254] [Ruby master Feature#12813] Calling chunk_while, slice_after, slice_before, slice_when with no block
by sawa (Tsuyoshi Sawada) 11 Dec '22

11 Dec '22

Issue #12813 has been updated by sawa (Tsuyoshi Sawada). I think this has already been implemented by now. It should be closed. ---------------------------------------- Feature #12813: Calling chunk_while, slice_after, slice_before, slice_when with no block https://bugs.ruby-lang.org/issues/12813#change-100548 * Author: marcandre (Marc-Andre Lafortune) * Status: Assigned * Priority: Normal * Assignee: matz (Yukihiro Matsumoto) ---------------------------------------- Currently, `chunk_while`, `slice_after`, `slice_before`, `slice_when` all require a block. If one needs the index within the block, there is no good way to do this; `enum.each_with_index.chunk_while` would have indices in the results, so `enum.enum_for(:chunk_while).with_index` is the best solution. I feel that we should return `enum_for(:chunk_while)`. This is strictly more useful than raising as we currently do. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111253] [Ruby master Feature#19104] Introduce the cache-based optimization for Regexp matching
by mame (Yusuke Endoh) 10 Dec '22

10 Dec '22

Issue #19104 has been updated by mame (Yusuke Endoh). mmizutani (Minoru Mizutani) wrote in #note-7: > Regex fuzzing encountered an edge-case regression: Thanks. This has nothing to do with this ticket. It's a different issue caused by my change commit:1d2d25dcadda0764f303183ac091d0c87b432566 . ---------------------------------------- Feature #19104: Introduce the cache-based optimization for Regexp matching https://bugs.ruby-lang.org/issues/19104#change-100546 * Author: make_now_just (Hiroya Fujinami) * Status: Open * Priority: Normal ---------------------------------------- Regexp matching causes a time-complexity explosion problem as known as ReDoS ([https://en.wikipedia.org/wiki/ReDoS](https://en.wikipedia.org/wiki/ReDoS)) ReDoS has become serious vulnerability in many places in recent years, and Ruby is no exception. The following is the incomplete list of such vulnerability reports: - [https://github.com/sinatra/sinatra/pull/1823](https://github.com/sinatra/si… - [https://github.com/lostisland/faraday-net_http/pull/27](https://github.com/… These reports have been addressed by fixing the library/software implementation. But, if the language’s Regexp implementation becomes safe, the vulnerability is fundamentally archived. For a few months, Ruby has implemented a Regexp matching timeout ([https://bugs.ruby-lang.org/issues/17837](https://bugs.ruby-lang.org/issues/…) It is one of the useful methods for preventing ReDoS vulnerability, but it is a problem that setting a correct timeout value is hard. This value is depending on input length, environment, network status, system load, etc. When the value is too small, a system may be broken, but when the value is too large, it is not useful for preventing ReDoS. Therefore, as a new way to prevent ReDoS, we propose to introduce cache-based optimization for Regexp matching. As CS fundamental knowledge, automaton matching result depends on the position of input and state. In addition, matching time explosion is caused for repeating to arrive at the same position and state many times. Then, ReDoS can be prevented when pairs of position, and state arrived once is recorded (cached). In fact, under such an optimization, it is known as the matching time complexity is linear against input size [1]. [1]: Davis, James C., Francisco Servant, and Dongyoon Lee. "Using selective memoization to defeat regular expression denial of service (ReDoS)." *2021 IEEE symposium on security and privacy (SP)*. IEEE, 2021. [https://www.computer.org/csdl/proceedings-article/sp/2021/893400a543/1oak98… See the following example. ```bash $ ruby --version ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21] $ time ruby -e '/^(a|a)*$/ =~ "a" * 28 + "b"' ruby -e '/^(a|a)*$/ =~ "a" * 28 + "b"' 8.49s user 0.04s system 98% cpu 8.663 total $ ./miniruby --version ruby 3.2.0dev (2022-10-27T13:39:56Z recache bc59b7cc1e) [arm64-darwin21] $ time ./miniruby -e '/^(a|a)*$/ =~ "a" * 28 + "b"' ./miniruby -e '/^(a|a)*$/ =~ "a" * 28 + "b"' 0.01s user 0.01s system 8% cpu 0.310 total ``` In this example, using ruby v3.2.0-preview2, matching `/^(a|a)*$/` against `"a" * 28 + "b"` takes 8.6 seconds because matching time of this Regexp takes exponentially against `"a" * N + "b"` form string. But, using the patched version of ruby, it takes 0.01 seconds. Incredibly it is 860 times faster because matching is done in linear time. By this optimization, the matching time is linear to the input size. It sounds secure and good. Unfortunately, when Regexp uses some extension (e.g. look-around, back-reference, subexpression call), the optimization is not applied. Also, the optimization needs a bit more memory for caching. However, we have already investigated that they are not so the major problems (See the "Limitation" section). ## Implementation The basic cache implementation is complete at this time and can be found in the following Pull Request. [https://github.com/ruby/ruby/pull/6486](https://github.com/ruby/ruby/pull/6… Some tests seem to be failed, but it is no problem! Because the failed tests are for Regexp timeout, optimization works correctly and so they failed as expected. Of course, we need to fix these tests before merging. Implementation notes: - A null-check on loop causes non-linear behavior, so the field to index the latest null-check item on the stack is added to OnigStackType. ([https://github.com/ruby/ruby/pull/6486/files#diff-4347460e379cd970ba0b88b4a…) - When the loop is null and this loop has a capture, matching behaves as a monomaniac. To reproduce this behavior, caches in the loop is cleared as necessary. ([https://github.com/ruby/ruby/pull/6486/files#diff-c3cfe19efff0cc51813847413…) Like a flip-flop operator, we hope to drop this if possible. But it still remains backward compatibility. ### Limitation Cache-based optimization is not applied in the following cases: 1. Regexp using some extensions (back-reference and subexpression call, look-around, atomic, absent operators) is not optimized because it is impossible or hard. However, it may be possible for look-around operators. 2. A bounded or fixed times repetition nesting in another repetition (e.g. `/(a{2,3})*/`). It is an implementation issue entirely, but we believe it is hard to support this case correctly. 3. Bounded or fixed times repetition is too large (e.g. `/(a|b){100000,200000}/`). The cache table size is proportional to the product of the number of cache points of regex and input size. In this case, since the number of cache points becomes too large, the optimization cannot be applied. Experiments were conducted to investigate how these limitations are problematic in practice. We used ObjectSpace to collect Regexps and investigate whether they could be optimized and the number of cache points. Regexps were collected from the standard library, Webrick, and Rails. See the following gist for the details ([https://gist.github.com/makenowjust/83e1e75a2d7de8b956e93bdac004a06b](https…) The experiments result is shown in the following table. | Collected from | # Regexp | # non-optimizable | Maximum number of cache points | | --- | --- | --- | --- | | stdlib | 1009 | 86 (8.52%) | 81 | | Webrick | 356 | 44 (12.36%) | 20 | | Rails | 759 | 74 (7.75%) | 27 | | Total <br> (Duplications are reduced) | 1506 | 118 (7.84%) | 81 | This result shows that the percentage of non-optimizable Regexp is less than 10%, and the amount of memory used for optimization is about 10 times the length of the string (81/8, for a bit array) at worst in this case. It is considered that a sufficient number of Regexp can be optimized in practice. ## Specification The detailed specification has been fixed yet. We have some ideas and we would like to discuss them. - When is optimization enabled? Currently, it turns on when the backtrack causes as many as the input length. - How the number of cache points is allowed, and how memory can be allocated? It is not determined for now. - If the above parameters can be specified by users, how are they specified? (via command-line flags, or static / instance parameters like `Regexp#.timeout=` and `Regexp#timeout=`) - Unless the input size is too large, the availability of optimization can be determined on compile-time. So, we would like to add a new flag to Regexp to determine whether a cache is available. It becomes one of the criteria for whether Regexp is efficiently executable or not. We believe it helps users. Thus, which letter is preferred for this purpose? `l` (linear) or `r` (regular) sounds good, but I am not sure which is the best. Thank you. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111252] [Ruby master Feature#18951] Object#with to set and restore attributes around a block
by retro 10 Dec '22

10 Dec '22

Issue #18951 has been updated by retro (Josef Šimánek). > The use case isn't seen as common enough, so I added 3 real world example in the description, If that's not enough I can add as much as you want, this pattern is extremely common. @matz I have seen this pattern repeated in almost every mid-sized project test suites. It is also present in various popular gems like I18n and Globalize. I can confirm in my eyes it is extremely common one as well. https://github.com/globalize/globalize/blob/3c146abbba4200aed1bbdbf3e63d572… https://github.com/ruby-i18n/i18n/blob/75fc49b08d254ad657ebd589ad37cda3c6fe… It is also common on `ENV` (to temporarily change `ENV`). ---------------------------------------- Feature #18951: Object#with to set and restore attributes around a block https://bugs.ruby-lang.org/issues/18951#change-100545 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Use case A very common pattern in Ruby, especially in testing is to save the value of an attribute, set a new value, and then restore the old value in an `ensure` clause. e.g. in unit tests ```ruby def test_something_when_enabled enabled_was, SomeLibrary.enabled = SomeLibrary.enabled, true # test things ensure SomeLibrary.enabled = enabled_was end ``` Or sometime in actual APIs: ```ruby def with_something_enabled enabled_was = @enabled @enabled = true yield ensure @enabled = enabled_was end ``` There is no inherent problem with this pattern, but it can be easy to make a mistake, for instance the unit test example: ```ruby def test_something_when_enabled some_call_that_may_raise enabled_was, SomeLibrary.enabled = SomeLibrary.enabled, true # test things ensure SomeLibrary.enabled = enabled_was end ``` In the above if `some_call_that_may_raise` actually raises, `SomeLibrary.enabled` is set back to `nil` rather than its original value. I've seen this mistake quite frequently. ### Proposal I think it would be very useful to have a method on Object to implement this pattern in a correct and easy to use way. The naive Ruby implementation would be: ```ruby class Object def with(**attributes) old_values = {} attributes.each_key do |key| old_values[key] = public_send(key) end begin attributes.each do |key, value| public_send("#{key}=", value) end yield ensure old_values.each do |key, old_value| public_send("#{key}=", old_value) end end end end ``` NB: `public_send` is used because I don't think such method should be usable if the accessors are private. With usage: ```ruby def test_something_when_enabled SomeLibrary.with(enabled: true) do # test things end end ``` ```ruby GC.with(measure_total_time: true, auto_compact: false) do # do something end ``` ### Alternate names and signatures If `#with` isn't good, I can also think of: - `Object#set` - `Object#apply` But the `with_` prefix is by far the most used one when implementing methods that follow this pattern. Also if accepting a Hash is dimmed too much, alternative signatures could be: - `Object#set(attr_name, value)` - `Object#set(attr1, value1, [attr2, value2], ...)` # Some real world code example that could be simplified with method - `redis-client` `with_timeout` https://github.com/redis-rb/redis-client/blob/23a5c1e2ff688518904f206df8d4a… - Lots of tests in Rails's codebase: - Changing `Thread.report_on_exception`: https://github.com/rails/rails/blob/2d2fdc941e7497ca77f99ce5ad404b6e58f043e… - Changing a class attribute: https://github.com/rails/rails/blob/2d2fdc941e7497ca77f99ce5ad404b6e58f043e… -- https://bugs.ruby-lang.org/

1 0

[ruby-core:111251] [Ruby master Feature#18285] NoMethodError#message uses a lot of CPU/is really expensive to call
by ivoanjo (Ivo Anjo) 10 Dec '22

10 Dec '22

Issue #18285 has been updated by ivoanjo (Ivo Anjo). Hey everyone! I actually had someone reach out to me (because I had blogged about this) the other day, since they had been bitten by this issue. Interestingly, it appears that actually this keeps getting "rediscovered" in the Ruby ecosystem. Here's a Rails ticket from 2011 -- https://github.com/rails/rails/issues/1525 -- and looking at the "mentioned this issue" backlinks at the bottom, some gems and apps are even getting workarounds on their inspect to avoid causing issues. Here's one example: https://github.com/alphagov/whitehall/blob/main/config/initializers/small_i… . ---------------------------------------- Feature #18285: NoMethodError#message uses a lot of CPU/is really expensive to call https://bugs.ruby-lang.org/issues/18285#change-100544 * Author: ivoanjo (Ivo Anjo) * Status: Open * Priority: Normal ---------------------------------------- Hello there! I'm working at Datadog on the ddtrace gem -- https://github.com/DataDog/dd-trace-rb and we ran into this issue on one of our internal testing applications. I also blogged about this issue in <https://ivoanjo.me/blog/2021/11/01/nomethoderror-ruby-cost/>. ### Background While testing an application that threw a lot of `NoMethodError`s in a Rails controller (this was used for validation), we discovered that service performance was very much impacted when we were logging these exceptions. While investigating with a profiler, the performance impact was caused by calls to `NoMethodError#message`, because this Rails controller had a quite complex `#inspect` method, that was getting called every time we tried to get the `#message` from the exception. ### How to reproduce ```ruby require 'bundler/inline' gemfile do source 'https://rubygems.org' gem 'benchmark-ips' end puts RUBY_DESCRIPTION class GemInformation # ... def get_no_method_error method_does_not_exist rescue => e e end def get_runtime_error raise 'Another Error' rescue => e e end def inspect # <-- expensive method gets called when calling NoMethodError#message Gem::Specification._all.inspect end end NO_METHOD_ERROR_INSTANCE = GemInformation.new.get_no_method_error RUNTIME_ERROR_INSTANCE = GemInformation.new.get_runtime_error Benchmark.ips do |x| x.config(:time => 5, :warmup => 2) x.report("no method error message cost") { NO_METHOD_ERROR_INSTANCE.message } x.report("runtime error message cost") { RUNTIME_ERROR_INSTANCE.message } x.compare! end ``` ### Expectation and result Getting the `#message` from a `NoMethodError` should be no costly than getting it from any other exception. In reality: ``` ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux] no method error message cost 115.390 (± 1.7%) i/s - 580.000 in 5.027822s runtime error message cost 6.938M (± 0.5%) i/s - 35.334M in 5.092617s Comparison: runtime error message cost: 6938381.6 i/s no method error message cost: 115.4 i/s - 60130.02x (± 0.00) slower ``` ### Suggested solutions 1. Do not call `#inspect` on the object on which the method was not found (see <https://github.com/ruby/ruby/blob/e0915ba67964d843832148aeca29a1f8244ca7b1/…>) 2. Cache result of calling `#message` after the first call. Ideally this should be done together with suggestion 1. -- https://bugs.ruby-lang.org/

1 0