March 2024 - ruby-core - ml.ruby-lang.org

[ruby-core:117115] [Ruby master Feature#20331] Should parser warn hash duplication and when clause?

by yui-knk (Kaneko Yuichiro)

Issue #20331 has been reported by yui-knk (Kaneko Yuichiro). ---------------------------------------- Feature #20331: Should parser warn hash duplication and when clause? https://bugs.ruby-lang.org/issues/20331 * Author: yui-knk (Kaneko Yuichiro) * Status: Open ---------------------------------------- # Background Right now, parser warns duplicated hash keys (#1) and when clause (#2). For example, ```ruby {1 => :a, 1 => :b} # => warning: key 1 is duplicated and overwritten on line 1 ``` ```ruby case 2 when 1, 1 else end # => test.rb:2: warning: duplicated `when' clause with line 2 is ignored ``` The parser compares different cardinality numbers. ```ruby { 1 => :a, 0x1 => :b, 0b1 => :b, 0d1 => :b, 0o1 => :b, } # => test.rb:2: warning: key 1 is duplicated and overwritten on line 3 # => test.rb:3: warning: key 1 is duplicated and overwritten on line 4 # => test.rb:4: warning: key 1 is duplicated and overwritten on line 5 # => test.rb:5: warning: key 1 is duplicated and overwritten on line 6 ``` # Problem Currently this is implemeted by converting string like `"123"` to Ruby Object and compare them. It's needed to remove Ruby Object from parse.y for Universal Parser. I created PR https://github.com/ruby/ruby/pull/10079 which implements bignum for parse.y without dependency on Ruby Object, however nobu and mame express concern about the cost and benefit of implmenting bignum for parser. I want to discuss which is the best approach for this problem. By the way, it's needed to calculate irreducible fraction for Rational key if we will keep warning messages. ```ruby $ ruby -wc -e '{10.2r => :a, 10.2r => :b}' -e:1: warning: key (51/5) is duplicated and overwritten on line 1 -e:1: warning: unused literal ignored Syntax OK ``` # Options ## 1. Warnings on parser Pros: * Users of Universal Parser don't need to implement warnings by themselves. I guess developers of other Ruby implementation may get benefit of reducing their effort. * Warnings are shown by `ruby -wc`. Cons: * We need to maintain bignum implementation for parser. There are two approaches for this option. ### 1-1. Implement bignum for parser The PR is this approach, implementing sub set of Ruby bignum for parser. ### 1-2. Extract existing bignum implementation then use it Make existing bignum implementation to be independent of Ruby Object and use it from both bignum.c and parse.y. ## 2. Moving warnings logic into compile phase We can use Ruby Object in compile.c. Then moving the logic into compile.c solves this problem. Pros: * No need to implement bignum for parser. Cons: * Users of Universal Parser need to implement warnings by themselves. * Warnings are not shown by `ruby -wc`. -- https://bugs.ruby-lang.org/

2 months

7
10
0 0

[ruby-core:116941] [Ruby master Bug#20301] `Set#add?` does two hash look-ups

by AMomchilov (Alexander Momchilov)

Issue #20301 has been reported by AMomchilov (Alexander Momchilov). ---------------------------------------- Bug #20301: `Set#add?` does two hash look-ups https://bugs.ruby-lang.org/issues/20301 * Author: AMomchilov (Alexander Momchilov) * Status: Open * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- A common usage of `Set`s is to keep track of seen objects, and do something different whenever an object is seen for the first time, e.g.: ```ruby SEEN_VALUES = Set.new def receive_value(value) if SEEN_VALUES.add?(value) puts "Saw #{value} for the first time." else puts "Already seen #{value}, ignoring." end end receive_value(1) # Saw 1 for the first time. receive_value(2) # Saw 2 for the first time. receive_value(3) # Saw 3 for the first time. receive_value(1) # Already seen 1, ignoring. ``` Readers might reasonably assume that `add?` is only looking up into the set a single time, but it's actually doing two separate look-ups! ([source](https://github.com/ruby/ruby/blob/c976cb5/lib/set.rb#L517-L525)) ```rb class Set def add?(o # 1. `include?(o)` looks up into `@hash` # 2. if the value isn't there, `add(o)` does a second look-up into `@hash` add(o) unless include?(o) end end ``` This gets especially expensive if the values are large hash/arrays/objects, whose `#hash` is expensive to compute. We can optimize this if it was possible to set a value in hash, *and* retrieve the value that was already there, in a single go. I propose adding `Hash#update_value` to do exactly that. If that existed, we can re-implement `#add?` as: ```rb class Set def add?(o) # Only requires a single look-up into `@hash`! self unless @hash.update_value(o, true) end ``` Here's a PR: https://github.com/ruby/ruby/pull/10093 How much of a benefit this has depends on two things: 1. How much `#hash` is called, which depends on how many new objects are added to the set. * If every object is new, then `#hash` is called twice on every `#add?`. This is where this improvement makes the biggest (2x!) change. * If every object has already been seen, then `#hash` was never being called twice before anyway, so there would be no improvement * Every other case lies somewhere in between those two. 2. How slow `#hash` is to compute for the key * If the hash is slow to compute, this change will make a bigger improvement * If the hash value is fast to compute, then it won't matter as much. Even if we called it half as much, it's a minority of the total time, so it won't have much net impact. Here is a summary of the benchmark results: | | All objects are new | All objects are preexisting | |---------------------------|-------:|------:| | objects with slow `#hash` | 100.0% | ~0.0% | | objects with fast `#hash` | 24.5% | 4.6% | -- https://bugs.ruby-lang.org/

2 months

5
9
0 0

[ruby-core:116491] [Ruby master Bug#20225] Inconsistent behavior of regex matching for a regex has a null loop

by make_now_just (Hiroya Fujinami)

Issue #20225 has been reported by make_now_just (Hiroya Fujinami). ---------------------------------------- Bug #20225: Inconsistent behavior of regex matching for a regex has a null loop https://bugs.ruby-lang.org/issues/20225 * Author: make_now_just (Hiroya Fujinami) * Status: Open * Priority: Normal * Assignee: make_now_just (Hiroya Fujinami) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Usually, in Ruby (Onigmo), when a null loop (a loop consuming no characters) occurs on regex matching, this loop is terminated. But, if a loop has a capture and some complex condition is satisfied, this causes backtracking. This behavior invokes unexpected results, for example, ```ruby p /(?:.B.(?<a>(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => nil p /(?:.B.(?:(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => 0 ``` Because the above regex has a capture and the below does not, different matching results are returned. It is not very intuitive that the presence of a capture changes the matching result. The detailed condition for changing the null-loop behavior is 1) a previous capture in this loop holds the empty string, and 2) this capture's position is different from the current matching position. This condition is checked in `STACK_NULL_CHECK_MEMST` (https://github.com/ruby/ruby/blob/bbb7ab906ec64b963bd4b5d37e47b14796d64371/…). Perhaps, you cannot understand what this condition means. Don't worry, I also cannot understand. This condition has been introduced for at least 20 years, and no one may remember the reason for this necessity. (If you know, please tell me!) Even if there is a reason, I believe that there is no reasonable authority for allowing counter-intuitive behavior, such as the above example. This behavior can also cause memoization to be buggy. Memoization relies on the fact that backtracking only depends on positions and states (byte-code offsets of a regex). However, this condition additionally refers to captures, and the memoization is broken. My proposal is to **correct this inconsistent behavior**. Specifically, a null loop should be determined solely on the basis of whether the matching position has changed, without referring to captures. This fix changes the behavior of regex matching, but I believe that the probability that this will actually cause backward compatibility problems is remarkably low. This is because I have never seen any mention of this puzzling behavior before. -- https://bugs.ruby-lang.org/

2 months

4
9
0 0

[ruby-core:116983] [Ruby master Feature#20309] Bundled gems for Ruby 3.5

by hsbt (Hiroshi SHIBATA)

Issue #20309 has been reported by hsbt (Hiroshi SHIBATA). ---------------------------------------- Feature #20309: Bundled gems for Ruby 3.5 https://bugs.ruby-lang.org/issues/20309 * Author: hsbt (Hiroshi SHIBATA) * Status: Assigned * Assignee: hsbt (Hiroshi SHIBATA) ---------------------------------------- I propose migrate the following default gems to bundled gems at Ruby 3.5. So, It means users will get warnings if users try to load them. * ostruct * irb * reline * readline (wrapper file for readline-ext and reline) * io-console * logger * fiddle * pstore * open-uri * yaml (wrapper file for psych) * win32ole I have a plan to migrate the following default gems too. But I need to more feedback from other committers about them. * rdoc * We need to change build task like download rdoc gem before document generation. * or We make document generation is optional from Ruby 3.5 * We explicitly separate `make install` and `make install-doc` * un * `ruby -run` is one of cool feature of Ruby. Should we avoid uninstalling `un` gem? * singleton * This is famous design pattern. Should we enforce users add them to their Gemfile? * forwadable * `reline` needs to add forwardable their `runtime_dependency` after migration. * weakref * I'm not sure how impact after migrating bundled gems. * fcntl * Should we integrate these constants into ruby core? I would like to migrate `ipaddr` and `uri` too. But these are used by webrick that is mock server for our test suite. We need to rewrite `webrick` with `TCPSocker` or extract `ipaddr` and `uri` dependency from `webrick` Other default gems depend on our build process or other libraries deeply. I will update this proposal if I could extract them from default gems. -- https://bugs.ruby-lang.org/

2 months

5
12
0 0

[ruby-core:117186] [Ruby master Bug#20339] Parser segfault with ractor comment in the middle of a statement

by kddnewton (Kevin Newton)

Issue #20339 has been reported by kddnewton (Kevin Newton). ---------------------------------------- Bug #20339: Parser segfault with ractor comment in the middle of a statement https://bugs.ruby-lang.org/issues/20339 * Author: kddnewton (Kevin Newton) * Status: Open * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- ```ruby foo( # shareable_constant_value: literal (C = { bar => baz }) ) ``` -- https://bugs.ruby-lang.org/

2 months

1
1
0 0

[ruby-core:116016] [Ruby master Bug#20150] Memory leak in grapheme clusters

by peterzhu2118 (Peter Zhu)

Issue #20150 has been reported by peterzhu2118 (Peter Zhu). ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150 * Author: peterzhu2118 (Peter Zhu) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/

2 months

5
6
0 0

[ruby-core:117132] [Ruby master Bug#20334] Time.to_i truncates a fractional timestamp instead of rounding up

by werelnon (Malcolm Patterson)

Issue #20334 has been reported by werelnon (Malcolm Patterson). ---------------------------------------- Bug #20334: Time.to_i truncates a fractional timestamp instead of rounding up https://bugs.ruby-lang.org/issues/20334 * Author: werelnon (Malcolm Patterson) * Status: Open * ruby -v: 3.2.2 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Simple steps that can be executed in a ruby interactive shell ``` irb(main):007> t = Time.now => 2024-03-14 00:23:55.983885525 +0000 irb(main):008> t.to_f => 1710375835.9838855 irb(main):009> t.to_i => 1710375835 ``` Based on the example the result of `t.to_f.round` is the better result? -- https://bugs.ruby-lang.org/

2 months

2
2
0 0

[ruby-core:115912] [Ruby master Bug#20090] Anonymous arguments are now syntax errors in unambiguous cases

by willcosgrove (Will Cosgrove)

Issue #20090 has been reported by willcosgrove (Will Cosgrove). ---------------------------------------- Bug #20090: Anonymous arguments are now syntax errors in unambiguous cases https://bugs.ruby-lang.org/issues/20090 * Author: willcosgrove (Will Cosgrove) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- It looks like the changes that were made in #19370 may have gone further than intended. It's also possible I'm misunderstanding what decision was made. But it was my understanding that the goal was to make ambiguous cases a syntax error. The test cases added are all testing the ambiguous cases: ```rb assert_syntax_error("def b(&) ->(&) {c(&)} end", /anonymous block parameter is also used/) # ... assert_syntax_error("def b(*) ->(*) {c(*)} end", /anonymous rest parameter is also used/) assert_syntax_error("def b(a, *) ->(*) {c(1, *)} end", /anonymous rest parameter is also used/) assert_syntax_error("def b(*) ->(a, *) {c(*)} end", /anonymous rest parameter is also used/) # ... assert_syntax_error("def b(**) ->(**) {c(**)} end", /anonymous keyword rest parameter is also used/) assert_syntax_error("def b(k:, **) ->(**) {c(k: 1, **)} end", /anonymous keyword rest parameter is also used/) assert_syntax_error("def b(**) ->(k:, **) {c(**)} end", /anonymous keyword rest parameter is also used/) ``` However it is now also producing syntax errors in all of these cases: ```rb def b(&) -> { c(&) } end def b(*) -> { c(*) } end def b(a, *) -> { c(1, *) } end def b(*) ->(a) { c(a, *) } end def b(**) -> { c(**) } end def b(k:, **) -> { c(k: 1, **) } end def b(**) ->(k:) { c(k:, **) } end ``` Again, it's possible I misunderstood the scope of the previous change. But it would be sad to lose the unambiguous case, as I've used that pattern quite a bit in my own projects. This is my first time opening an issue here, so I apologize in advance if I've done anything non-standard. -- https://bugs.ruby-lang.org/

2 months

6
7
0 0

[ruby-core:116356] [Ruby master Bug#20198] Threaded DNS resolver does not propagate errno to the calling thread

by kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20198 has been reported by kjtsanaktsidis (KJ Tsanaktsidis). ---------------------------------------- Bug #20198: Threaded DNS resolver does not propagate errno to the calling thread https://bugs.ruby-lang.org/issues/20198 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- If we get a return value of `EAI_SYSTEM` from `getaddrinfo`, we transform that into an appropriate `Errno::` exception on the Ruby side. However, because we now run the actual call to `getaddrinfo` in a thread, we lose that `errno` value (because `errno` is thread-local). So, what we actually raise in case of `EAI_SYSTEM` is just the last error which happened on the calling thread - e.g. this `ECHILD` which presumably got set in the bowels of pthreads somewhere: ``` 1) Socket::IPSocket#getaddress raises an error on unknown hostnames ERROR Expected SocketError but got: Errno::ECHILD (No child processes - getaddrinfo) /home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:22:in `getaddress' /home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:22:in `block (3 levels) in <top (required)>' /home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:21:in `block (2 levels) in <top (required)>' /home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:4:in `<top (required)>' ``` -- https://bugs.ruby-lang.org/

2 months

2
3
0 0

[ruby-core:116073] [Ruby master Bug#20161] Memory leak in regexp grapheme clusters

by peterzhu2118 (Peter Zhu)

Issue #20161 has been reported by peterzhu2118 (Peter Zhu). ---------------------------------------- Bug #20161: Memory leak in regexp grapheme clusters https://bugs.ruby-lang.org/issues/20161 * Author: peterzhu2118 (Peter Zhu) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9447 The cc->mbuf gets overwritten, so we need to free it to not leak memory. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 15536 15760 15920 16144 16304 16480 16640 16784 17008 17280 ``` After: ``` 15584 15584 15760 15824 15888 15888 15888 15888 16048 16112 ``` -- https://bugs.ruby-lang.org/

2 months

3
2
0 0