April 2024 - ruby-core - ml.ruby-lang.org

[ruby-core:117510] [Ruby master Bug#20427] Heap buffer overflow in `Array#sort!` when block modifies target array
by zack.ref＠gmail.com (Zack Deveau) 29 May '24

29 May '24

Issue #20427 has been reported by zack.ref(a)gmail.com (Zack Deveau). ---------------------------------------- Bug #20427: Heap buffer overflow in `Array#sort!` when block modifies target array https://bugs.ruby-lang.org/issues/20427 * Author: zack.ref(a)gmail.com (Zack Deveau) * Status: Open * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- **(note: It was decided we should handle this in the public issue tracker in security ticket #2327648)** The attached patch [has been applied to `master`](https://github.com/ruby/ruby/pull/10522) and should apply to latest `3.3.0` for backport. Could not reproduce on the following builds: - ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux] - ruby 3.1.4p249 (2024-01-11 revision 2b608349bb) [x86_64-linux] --- In cases where `rb_ary_sort_bang` is called with a block and `tmp` is an embedded array, we need to account for the block potentially impacting the capacity of `ary`. Reproduction script for x86 targets: ```ruby var_0 = (1..70).to_a var_0.sort! do |var_0_block_129, var_1_block_129| var_0.pop var_1_block_129 <=> var_0_block_129 end.shift(3) ``` Reproduction script for ARM targets: ```ruby 10.times do var_0 = (1..70).to_a var_0.sort! do |var_0_block_129, var_1_block_129| var_0.pop var_1_block_129 <=> var_0_block_129 end.shift(3) end ``` The above example can put the array into a corrupted state (`ary` after block has `len=0` and `capa=14`) : ``` ================== ary =================== ary: BD99908 is_embedded?: 0 is_shared?: 0 heap.len: 0 heap.capa: 14 heap.shared_root: 14 ================== tmp =================== ary: BD1EB18 is_embedded?: 1 is_shared?: 0 embed_len: 70 embed_capa: 78 heap.len: 141 heap.capa: 139 heap.shared_root: 139 ``` This results in a heap buffer overflow and possible segfault: ``` ==19964==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60b0000034f0 at pc 0x00010c35ee6c bp 0x0003070fb290 sp 0x0003070faa50 WRITE of size 560 at 0x60b0000034f0 thread T0 #0 0x10c35ee6b in wrap_memcpy+0x2ab (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x18e6b) #1 0x100e0b085 in ruby_nonempty_memcpy memory.h:671 #2 0x100e0e43e in ary_memcpy0 array.c:335 #3 0x100e0cb00 in ary_memcpy array.c:352 #4 0x100e1426c in rb_ary_sort_bang array.c:3519 [ ... ] ``` Was able to reproduce on the following builds: - ruby 3.4.0dev (2024-01-17T14:48:46Z ef4a08eb65) [x86_64-linux] - ruby 3.3.0 (2024-01-05 revision 634d4e29ef) [x86_64-darwin23] - ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23] This patch adds a conditional to determine when the capacity of `ary` has been modified by the provided block. If this is the case, ensure that the capacity of `ary` is adjusted to handle at minimum the len of `tmp`. `test-all` passes locally: ``` Finished tests in 70.194526s, 369.6727 tests/s, 89373.2939 assertions/s. 25949 tests, 6273516 assertions, 0 failures, 0 errors, 292 skips ``` ---Files-------------------------------- rb_ary_sort_bang_heap_overflow.patch (2.06 KB) -- https://bugs.ruby-lang.org/

3 2

[ruby-core:116901] [Ruby master Bug#20289] Bug in Zlib::GzipReader#eof? breaks reading certain sizes of gzipped files.
by martinemde (Martin Emde) 29 May '24

29 May '24

Issue #20289 has been reported by martinemde (Martin Emde). ---------------------------------------- Bug #20289: Bug in Zlib::GzipReader#eof? breaks reading certain sizes of gzipped files. https://bugs.ruby-lang.org/issues/20289 * Author: martinemde (Martin Emde) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Hello, A bug in the implementation of Zlib::GzipReader#eof? makes it very difficult to read certain rubygems using readpartial without receiving an EOFError. The bug is caused when the chunk size read from the gzip leaves only empty gzip "overhead" bytes at the end of the unread portion. The result is that a simple `readpartial until eof?` fails on some real world gems. ### Bug Explanation Imagine a gzip file, it has compressed data and some bytes that indicate the start and end of a chunk and then end of a file. ``` gzip file = "[... 2048 chunk of gzip data ...][ \0 bytes, chucksum info, with no new readable bytes ]" ``` When reading this file, zlib.c will pull a chunk of data according to READ_SIZE and return that data uncompressed. With exactly the right size of data (within exactly the range of READ-SIZE*n - 24 to READ_SIZE*n - 15 for any multiple of READ_SIZE) all of the file data is read and returned on the first read. The bug happens when there are bytes remaining on the gzip file that haven't been read, but contain no new data that can be returned to the buffer. Before reading the final empty chunk, #eof? returns false. After reading the final chunk, EOFError is raised because no bytes were returned into the buffer. What should happen is similar to how a Socket eof? is checked. On a socket, #eof? is not "passive" but actively reads ahead, filling the buffer. If that read-ahead fails to find new data, then the EOF is reached and eof? returns true. In zlib, #eof? does not read ahead to check if the end has been reached, causing this bug. ### Solution Samuel Giddins and I have both submitted alternate, but very similar solutions for this bug. [Samuel's PR](https://github.com/ruby/zlib/pull/73) and [My PR](https://github.com/ruby/zlib/pull/72). We'd leave it to ruby core to decide which is the best solution. Unfortunately, these bug fix PRs have been waiting for more than 2 months now with no movement on fixing this bug. The [original issue](https://github.com/ruby/zlib/issues/56) that I opened to point out this bug was submitted Aug 16, 2023. We've been working to find a solution for this bug for more than half a year. This bug was discovered because rubygems wanted to improve the efficiency of reading a gem by using readpartial to conserve memory. Sam submitted an additional PR that vastly improves the performance of the zlib gem by managing buffers better. We've already proven that this dramatically decreases memory usage when parsing gems. The [performance PR by Samuel](https://github.com/ruby/zlib/pull/61) has been open since September 13th, 2023. ### Plea We've been trying for months to get this solved. Opening a ruby-lang bug is my next attempt after communicating through our contacts at Ruby core has gone nowhere. Please merge and release these fixes, or tell us what is wrong with them so we can fix them so they can be merged. It will improve rubygems for everyone. -- https://bugs.ruby-lang.org/

3 2

[ruby-core:117047] [Ruby master Bug#20324] `(1..).overlap?('foo'..)` returns true
by kyanagi (Kouhei Yanagita) 29 May '24

29 May '24

Issue #20324 has been reported by kyanagi (Kouhei Yanagita). ---------------------------------------- Bug #20324: `(1..).overlap?('foo'..)` returns true https://bugs.ruby-lang.org/issues/20324 * Author: kyanagi (Kouhei Yanagita) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- While thinking about finding the intersection of two ranges, I found that `(1..).overlap?('foo'..)` returns true. In the current implementation, it seems that `(a..).overlap?(b..)` or `(..a).overlap?(..b)` returns true regardless of what `a` or `b` are. However, I think it should return true if and only if `a` and `b` are comparable. (What is the intersection of `1..` and `'foo'..`?) -- https://bugs.ruby-lang.org/

2 1

[ruby-core:116888] [Ruby master Bug#20285] Stale inline method caches when refinement modules are reloaded
by jhawthorn (John Hawthorn) 29 May '24

29 May '24

Issue #20285 has been reported by jhawthorn (John Hawthorn). ---------------------------------------- Bug #20285: Stale inline method caches when refinement modules are reloaded https://bugs.ruby-lang.org/issues/20285 * Author: jhawthorn (John Hawthorn) * Status: Assigned * Priority: Normal * Assignee: jhawthorn (John Hawthorn) * Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: UNKNOWN ---------------------------------------- This is essentially the same issue as #11672, but for inline method caches rather than class caches. In Ruby 3.3 we started using inline caches for refinements. However, we weren't clearing inline caches when defined on a reopened refinement module. ``` ruby class C end module R refine C do def m :foo end end end using R def m C.new.m end raise unless :foo == m() module R refine C do alias m m def m :bar end end end v = m() raise "expected :bar, got #{v.inspect}" unless :bar == v ``` This will raise in Ruby 3.3 as the inline cache finds a stale refinement, but passes in previous versions. -- https://bugs.ruby-lang.org/

2 2

[ruby-core:116344] [Ruby master Bug#20195] 3.3.0 YJIT mishandles splat into methods taking a rest parameter
by alanwu (Alan Wu) 29 May '24

29 May '24

Issue #20195 has been reported by alanwu (Alan Wu). ---------------------------------------- Bug #20195: 3.3.0 YJIT mishandles splat into methods taking a rest parameter https://bugs.ruby-lang.org/issues/20195 * Author: alanwu (Alan Wu) * Status: Open * Priority: Normal * Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED ---------------------------------------- Check with: ```ruby ruby2_keywords def foo(*args) = args def bar(*args, **kw) = [args, kw] def pass_bar(*args) = bar(*args) def body args = foo(a: 1) pass_bar(*args) end p body ``` ```shell $ ruby ../test.rb [[{:a=>1}], {}] $ ruby --yjit-call-threshold=1 ../test.rb [[], {:a=>1}] ``` -- https://bugs.ruby-lang.org/

2 1

[ruby-core:116374] [Ruby master Bug#20204] 3.3.0 YJIT rises TypeError instead of ArgumentError with some incorrect calls
by alanwu (Alan Wu) 29 May '24

29 May '24

Issue #20204 has been reported by alanwu (Alan Wu). ---------------------------------------- Bug #20204: 3.3.0 YJIT rises TypeError instead of ArgumentError with some incorrect calls https://bugs.ruby-lang.org/issues/20204 * Author: alanwu (Alan Wu) * Status: Open * Priority: Normal * Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED ---------------------------------------- Test with: ```ruby def foo(a, *) = a def call(args, &) foo(1) foo(*args, &) end call([1, 2]) call([]) ``` ``` $ ruby ../test.rb ../test.rb:1:in `foo': wrong number of arguments (given 0, expected 1+) (ArgumentError) $ ruby --yjit-call-threshold=1 ../test.rb ../test.rb:5:in `call': wrong argument type Array (expected Proc) (TypeError) ``` -- https://bugs.ruby-lang.org/

2 1

[ruby-core:116311] [Ruby master Bug#20192] YJIT in 3.3.0 miscompiles `yield` with keyword splats
by alanwu (Alan Wu) 28 May '24

28 May '24

Issue #20192 has been reported by alanwu (Alan Wu). ---------------------------------------- Bug #20192: YJIT in 3.3.0 miscompiles `yield` with keyword splats https://bugs.ruby-lang.org/issues/20192 * Author: alanwu (Alan Wu) * Status: Closed * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED ---------------------------------------- Test with: ```ruby def splat_kw(kwargs) = yield(**kwargs) p splat_kw({}) { _1 } ``` ```shell % ruby --yjit-call-threshold=1 test.rb {} % ruby test.rb nil ``` -- https://bugs.ruby-lang.org/

2 1

[ruby-core:116911] [Ruby master Bug#20294] Parser no longer warns on some duplicated keys
by kddnewton (Kevin Newton) 28 May '24

28 May '24

Issue #20294 has been reported by kddnewton (Kevin Newton). ---------------------------------------- Bug #20294: Parser no longer warns on some duplicated keys https://bugs.ruby-lang.org/issues/20294 * Author: kddnewton (Kevin Newton) * Status: Open * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Previously, the parser would warn on all duplicated keys. Now some cases are not handled: ```ruby { 100.0 => 1, 1e2 => 1 } { 100.0 => 1, 1E2 => 1 } { 100.0 => 1, 100.00 => 1 } { 100.0r => 1, 100.00r => 1 } { 100.0i => 1, 100.00i => 1 } ``` -- https://bugs.ruby-lang.org/

1 2

[ruby-core:117619] [Ruby master Bug#20438] String format "%\n" and "%\0" does not raise format error
by tompng (tomoya ishida) 28 May '24

28 May '24

Issue #20438 has been reported by tompng (tomoya ishida). ---------------------------------------- Bug #20438: String format "%\n" and "%\0" does not raise format error https://bugs.ruby-lang.org/issues/20438 * Author: tompng (tomoya ishida) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT +MN [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- `"%" % 1` raises `incomplete format specifier; use %% (double %) instead` `"%=" % 1` raises `malformed format string - %=`. But `"%\n" % 1` `"%\0" % 1` does not raise error. In `sprintf.c`, `\n` and `\0` are explicitly accepted. Is this expected? Some other language are: Perl: Warns `Invalid conversion in printf`. Just prints. `"%d% " → "1% "` Python: Error `ValueError: unsupported format character '?' (0xa)` with `print("%\n" % 123)` PHP: Error `Unknown format specifier` with `sprintf("%\n", 3)` C, C++: Warns `incomplete format specifier`. `"%\n" → "\n"`, `"%" → ""`, `"% " → ""` `sprintf("%f%\n",x)` and `"%f%\n" % x` is used in some codes https://github.com/search?q=language%3ARuby+%22f%25%5Cn%22&type=code -- https://bugs.ruby-lang.org/

2 2

[ruby-core:117469] [Ruby master Feature#20415] Precompute literal String hash code during compilation
by byroot (Jean Boussier) 28 May '24

28 May '24

Issue #20415 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #20415: Precompute literal String hash code during compilation https://bugs.ruby-lang.org/issues/20415 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- I worked on a proof of concept with @etienne which I think has some potential, but I'm looking for feedback on what would be the best implementation. The proof of concept is here: https://github.com/Shopify/ruby/pull/553 ### Idea Most string literals are relatively short, hence embedded, and have some wasted bytes at the end of their slot. We could use that wasted space to store the string hash. The goal being to make **looking up a literal String key in a hash, as fast as a Symbol key**. The goal isn't to memoize the hash code of all strings, but to **only selectively precompute the hash code of literal strings in the compiler**. The compiler could even selectively do this when we literal string is used to lookup a hash (`opt_aref`). Here's the benchmark we used: ```ruby hash = 10.times.to_h do |i| [i, i] end dyn_sym = "dynamic_symbol".to_sym hash[:some_symbol] = 1 hash[dyn_sym] = 1 hash["small"] = 2 hash["frozen_string_literal"] = 2 Benchmark.ips do |x| x.report("symbol") { hash[:some_symbol] } x.report("dyn_symbol") { hash[:some_symbol] } x.report("small_lit") { hash["small"] } x.report("frozen_lit") { hash["frozen_string_literal"] } x.compare!(order: :baseline) end ``` On Ruby 3.3.0, looking up a String key is a bit slower based on the key size: ``` Calculating ------------------------------------- symbol 24.175M (± 1.7%) i/s - 122.002M in 5.048306s dyn_symbol 24.345M (± 1.6%) i/s - 122.019M in 5.013400s small_lit 21.252M (± 2.1%) i/s - 107.744M in 5.072042s frozen_lit 20.095M (± 1.3%) i/s - 100.489M in 5.001681s Comparison: symbol: 24174848.1 i/s dyn_symbol: 24345476.9 i/s - same-ish: difference falls within error small_lit: 21252403.2 i/s - 1.14x slower frozen_lit: 20094766.0 i/s - 1.20x slower ``` With the proof of concept performance is pretty much identical: ``` Calculating ------------------------------------- symbol 23.528M (± 6.9%) i/s - 117.584M in 5.033231s dyn_symbol 23.777M (± 4.7%) i/s - 120.231M in 5.071734s small_lit 23.066M (± 2.9%) i/s - 115.376M in 5.006947s frozen_lit 22.729M (± 1.1%) i/s - 115.693M in 5.090700s Comparison: symbol: 23527823.6 i/s dyn_symbol: 23776757.8 i/s - same-ish: difference falls within error small_lit: 23065535.3 i/s - same-ish: difference falls within error frozen_lit: 22729351.6 i/s - same-ish: difference falls within error ``` ### Possible implementation The reason I'm opening this issue early is to get feedback on which would be the best implementation. #### Store hashcode after the string terminator Right now the proof of concept simply stores the `st_index_t` after the string null terminator, and only when the string is embedded and as enough left over space. Strings with a precomputed hash are marked with an user flag. Pros: - Very simple implementation, no need to change a lot of code, and very easy to strip out if we want to. - Doesn't use any extra memory. If the string doesn't have enough left over bytes, the optimization simply isn't applied. - The worst case overhead is a single `FL_TEST_RAW` in `rb_str_hash`. Cons: - The optimization won't apply to certain string sizes. e.g. strings between `17` and `23` bytes won't have a precomputed hash code. - Extracting the hash code requires some not so nice pointer arithmetic. #### Create another RString union Another possibility would be to add another entry in the `RString` struct union, such as we'd have: ```c struct RString { struct RBasic basic; long len; union { // ... existing members struct { st_index_t hash; char ary[1]; } embded_literal; } as; }; ``` Pros: - The optimization can now be applied to all string sizes. - The hashcode is always at the same offset and properly aligned. Cons: - Some strings would be bumped by one slot size, so would use marginally more memory. - Complexify the code base more, need to modify a lot more string related code (e.g. `RSTRING_PTR` and many others) - When compiling such string, if an equal string already exists in the `fstring` table, we'd need to replace it, we can't just mutate it in place to add the hashcode. ### Prior art [Feature #15331] is somewhat similar in its idea, but it does it lazily for all strings. Here it's much simpler because limited to string literals, which are the ones likely to be used as Hash keys, and the overhead is on compilation, not runtime (aside from a single flag check). So I think most of the caveats of that original implementation don't apply here. -- https://bugs.ruby-lang.org/

4 9