Issue #20331 has been reported by yui-knk (Kaneko Yuichiro).
----------------------------------------
Feature #20331: Should parser warn hash duplication and when clause?
https://bugs.ruby-lang.org/issues/20331
* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
----------------------------------------
# Background
Right now, parser warns duplicated hash keys (#1) and when clause (#2).
For example,
```ruby
{1 => :a, 1 => :b}
# => warning: key 1 is duplicated and overwritten on line 1
```
```ruby
case 2
when 1, 1
else
end
# => test.rb:2: warning: duplicated `when' clause with line 2 is ignored
```
The parser compares different cardinality numbers.
```ruby
{
1 => :a,
0x1 => :b,
0b1 => :b,
0d1 => :b,
0o1 => :b,
}
# => test.rb:2: warning: key 1 is duplicated and overwritten on line 3
# => test.rb:3: warning: key 1 is duplicated and overwritten on line 4
# => test.rb:4: warning: key 1 is duplicated and overwritten on line 5
# => test.rb:5: warning: key 1 is duplicated and overwritten on line 6
```
# Problem
Currently this is implemeted by converting string like `"123"` to Ruby Object and compare them.
It's needed to remove Ruby Object from parse.y for Universal Parser.
I created PR https://github.com/ruby/ruby/pull/10079 which implements bignum for parse.y without dependency on Ruby Object, however nobu and mame express concern about the cost and benefit of implmenting bignum for parser.
I want to discuss which is the best approach for this problem.
By the way, it's needed to calculate irreducible fraction for Rational key if we will keep warning messages.
```ruby
$ ruby -wc -e '{10.2r => :a, 10.2r => :b}'
-e:1: warning: key (51/5) is duplicated and overwritten on line 1
-e:1: warning: unused literal ignored
Syntax OK
```
# Options
## 1. Warnings on parser
Pros:
* Users of Universal Parser don't need to implement warnings by themselves. I guess developers of other Ruby implementation may get benefit of reducing their effort.
* Warnings are shown by `ruby -wc`.
Cons:
* We need to maintain bignum implementation for parser.
There are two approaches for this option.
### 1-1. Implement bignum for parser
The PR is this approach, implementing sub set of Ruby bignum for parser.
### 1-2. Extract existing bignum implementation then use it
Make existing bignum implementation to be independent of Ruby Object and use it from both bignum.c and parse.y.
## 2. Moving warnings logic into compile phase
We can use Ruby Object in compile.c. Then moving the logic into compile.c solves this problem.
Pros:
* No need to implement bignum for parser.
Cons:
* Users of Universal Parser need to implement warnings by themselves.
* Warnings are not shown by `ruby -wc`.
--
https://bugs.ruby-lang.org/
Issue #20301 has been reported by AMomchilov (Alexander Momchilov).
----------------------------------------
Bug #20301: `Set#add?` does two hash look-ups
https://bugs.ruby-lang.org/issues/20301
* Author: AMomchilov (Alexander Momchilov)
* Status: Open
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
A common usage of `Set`s is to keep track of seen objects, and do something different whenever an object is seen for the first time, e.g.:
```ruby
SEEN_VALUES = Set.new
def receive_value(value)
if SEEN_VALUES.add?(value)
puts "Saw #{value} for the first time."
else
puts "Already seen #{value}, ignoring."
end
end
receive_value(1) # Saw 1 for the first time.
receive_value(2) # Saw 2 for the first time.
receive_value(3) # Saw 3 for the first time.
receive_value(1) # Already seen 1, ignoring.
```
Readers might reasonably assume that `add?` is only looking up into the set a single time, but it's actually doing two separate look-ups! ([source](https://github.com/ruby/ruby/blob/c976cb5/lib/set.rb#L517-L525))
```rb
class Set
def add?(o
# 1. `include?(o)` looks up into `@hash`
# 2. if the value isn't there, `add(o)` does a second look-up into `@hash`
add(o) unless include?(o)
end
end
```
This gets especially expensive if the values are large hash/arrays/objects, whose `#hash` is expensive to compute.
We can optimize this if it was possible to set a value in hash, *and* retrieve the value that was already there, in a single go. I propose adding `Hash#update_value` to do exactly that. If that existed, we can re-implement `#add?` as:
```rb
class Set
def add?(o)
# Only requires a single look-up into `@hash`!
self unless @hash.update_value(o, true)
end
```
Here's a PR: https://github.com/ruby/ruby/pull/10093
How much of a benefit this has depends on two things:
1. How much `#hash` is called, which depends on how many new objects are added to the set.
* If every object is new, then `#hash` is called twice on every `#add?`. This is where this improvement makes the biggest (2x!) change.
* If every object has already been seen, then `#hash` was never being called twice before anyway, so there would be no improvement
* Every other case lies somewhere in between those two.
2. How slow `#hash` is to compute for the key
* If the hash is slow to compute, this change will make a bigger improvement
* If the hash value is fast to compute, then it won't matter as much. Even if we called it half as much, it's a minority of the total time, so it won't have much net impact.
Here is a summary of the benchmark results:
| | All objects are new | All objects are preexisting |
|---------------------------|-------:|------:|
| objects with slow `#hash` | 100.0% | ~0.0% |
| objects with fast `#hash` | 24.5% | 4.6% |
--
https://bugs.ruby-lang.org/
Issue #20225 has been reported by make_now_just (Hiroya Fujinami).
----------------------------------------
Bug #20225: Inconsistent behavior of regex matching for a regex has a null loop
https://bugs.ruby-lang.org/issues/20225
* Author: make_now_just (Hiroya Fujinami)
* Status: Open
* Priority: Normal
* Assignee: make_now_just (Hiroya Fujinami)
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Usually, in Ruby (Onigmo), when a null loop (a loop consuming no characters) occurs on regex matching, this loop is terminated. But, if a loop has a capture and some complex condition is satisfied, this causes backtracking. This behavior invokes unexpected results, for example,
```ruby
p /(?:.B.(?<a>(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => nil
p /(?:.B.(?:(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => 0
```
Because the above regex has a capture and the below does not, different matching results are returned. It is not very intuitive that the presence of a capture changes the matching result.
The detailed condition for changing the null-loop behavior is 1) a previous capture in this loop holds the empty string, and 2) this capture's position is different from the current matching position. This condition is checked in `STACK_NULL_CHECK_MEMST` (https://github.com/ruby/ruby/blob/bbb7ab906ec64b963bd4b5d37e47b14796d64371/…).
Perhaps, you cannot understand what this condition means. Don't worry, I also cannot understand. This condition has been introduced for at least 20 years, and no one may remember the reason for this necessity. (If you know, please tell me!) Even if there is a reason, I believe that there is no reasonable authority for allowing counter-intuitive behavior, such as the above example.
This behavior can also cause memoization to be buggy. Memoization relies on the fact that backtracking only depends on positions and states (byte-code offsets of a regex). However, this condition additionally refers to captures, and the memoization is broken.
My proposal is to **correct this inconsistent behavior**. Specifically, a null loop should be determined solely on the basis of whether the matching position has changed, without referring to captures.
This fix changes the behavior of regex matching, but I believe that the probability that this will actually cause backward compatibility problems is remarkably low. This is because I have never seen any mention of this puzzling behavior before.
--
https://bugs.ruby-lang.org/
Issue #20309 has been reported by hsbt (Hiroshi SHIBATA).
----------------------------------------
Feature #20309: Bundled gems for Ruby 3.5
https://bugs.ruby-lang.org/issues/20309
* Author: hsbt (Hiroshi SHIBATA)
* Status: Assigned
* Assignee: hsbt (Hiroshi SHIBATA)
----------------------------------------
I propose migrate the following default gems to bundled gems at Ruby 3.5. So, It means users will get warnings if users try to load them.
* ostruct
* irb
* reline
* readline (wrapper file for readline-ext and reline)
* io-console
* logger
* fiddle
* pstore
* open-uri
* yaml (wrapper file for psych)
* win32ole
I have a plan to migrate the following default gems too. But I need to more feedback from other committers about them.
* rdoc
* We need to change build task like download rdoc gem before document generation.
* or We make document generation is optional from Ruby 3.5
* We explicitly separate `make install` and `make install-doc`
* un
* `ruby -run` is one of cool feature of Ruby. Should we avoid uninstalling `un` gem?
* singleton
* This is famous design pattern. Should we enforce users add them to their Gemfile?
* forwadable
* `reline` needs to add forwardable their `runtime_dependency` after migration.
* weakref
* I'm not sure how impact after migrating bundled gems.
* fcntl
* Should we integrate these constants into ruby core?
I would like to migrate `ipaddr` and `uri` too. But these are used by webrick that is mock server for our test suite. We need to rewrite `webrick` with `TCPSocker` or extract `ipaddr` and `uri` dependency from `webrick`
Other default gems depend on our build process or other libraries deeply. I will update this proposal if I could extract them from default gems.
--
https://bugs.ruby-lang.org/
Issue #20150 has been reported by peterzhu2118 (Peter Zhu).
----------------------------------------
Bug #20150: Memory leak in grapheme clusters
https://bugs.ruby-lang.org/issues/20150
* Author: peterzhu2118 (Peter Zhu)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED
----------------------------------------
GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.
For example:
```ruby
str = "hello world".encode(Encoding::UTF_32LE)
10.times do
1_000.times do
str.grapheme_clusters
end
puts `ps -o rss= -p #{$$}`
end
```
Before:
```
26000
42256
59008
75792
92528
109232
125936
142672
159392
176160
```
After:
```
9264
9504
9808
10000
10128
10224
10352
10544
10704
10896
```
--
https://bugs.ruby-lang.org/
Issue #20334 has been reported by werelnon (Malcolm Patterson).
----------------------------------------
Bug #20334: Time.to_i truncates a fractional timestamp instead of rounding up
https://bugs.ruby-lang.org/issues/20334
* Author: werelnon (Malcolm Patterson)
* Status: Open
* ruby -v: 3.2.2
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Simple steps that can be executed in a ruby interactive shell
```
irb(main):007> t = Time.now
=> 2024-03-14 00:23:55.983885525 +0000
irb(main):008> t.to_f
=> 1710375835.9838855
irb(main):009> t.to_i
=> 1710375835
```
Based on the example the result of `t.to_f.round` is the better result?
--
https://bugs.ruby-lang.org/
Issue #20090 has been reported by willcosgrove (Will Cosgrove).
----------------------------------------
Bug #20090: Anonymous arguments are now syntax errors in unambiguous cases
https://bugs.ruby-lang.org/issues/20090
* Author: willcosgrove (Will Cosgrove)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
It looks like the changes that were made in #19370 may have gone further than intended. It's also possible I'm misunderstanding what decision was made. But it was my understanding that the goal was to make ambiguous cases a syntax error. The test cases added are all testing the ambiguous cases:
```rb
assert_syntax_error("def b(&) ->(&) {c(&)} end", /anonymous block parameter is also used/)
# ...
assert_syntax_error("def b(*) ->(*) {c(*)} end", /anonymous rest parameter is also used/)
assert_syntax_error("def b(a, *) ->(*) {c(1, *)} end", /anonymous rest parameter is also used/)
assert_syntax_error("def b(*) ->(a, *) {c(*)} end", /anonymous rest parameter is also used/)
# ...
assert_syntax_error("def b(**) ->(**) {c(**)} end", /anonymous keyword rest parameter is also used/)
assert_syntax_error("def b(k:, **) ->(**) {c(k: 1, **)} end", /anonymous keyword rest parameter is also used/)
assert_syntax_error("def b(**) ->(k:, **) {c(**)} end", /anonymous keyword rest parameter is also used/)
```
However it is now also producing syntax errors in all of these cases:
```rb
def b(&) -> { c(&) } end
def b(*) -> { c(*) } end
def b(a, *) -> { c(1, *) } end
def b(*) ->(a) { c(a, *) } end
def b(**) -> { c(**) } end
def b(k:, **) -> { c(k: 1, **) } end
def b(**) ->(k:) { c(k:, **) } end
```
Again, it's possible I misunderstood the scope of the previous change. But it would be sad to lose the unambiguous case, as I've used that pattern quite a bit in my own projects.
This is my first time opening an issue here, so I apologize in advance if I've done anything non-standard.
--
https://bugs.ruby-lang.org/
Issue #20198 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).
----------------------------------------
Bug #20198: Threaded DNS resolver does not propagate errno to the calling thread
https://bugs.ruby-lang.org/issues/20198
* Author: kjtsanaktsidis (KJ Tsanaktsidis)
* Status: Open
* Priority: Normal
* Assignee: kjtsanaktsidis (KJ Tsanaktsidis)
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
If we get a return value of `EAI_SYSTEM` from `getaddrinfo`, we transform that into an appropriate `Errno::` exception on the Ruby side. However, because we now run the actual call to `getaddrinfo` in a thread, we lose that `errno` value (because `errno` is thread-local). So, what we actually raise in case of `EAI_SYSTEM` is just the last error which happened on the calling thread - e.g. this `ECHILD` which presumably got set in the bowels of pthreads somewhere:
```
1)
Socket::IPSocket#getaddress raises an error on unknown hostnames ERROR
Expected SocketError
but got: Errno::ECHILD (No child processes - getaddrinfo)
/home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:22:in `getaddress'
/home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:22:in `block (3 levels) in <top (required)>'
/home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:21:in `block (2 levels) in <top (required)>'
/home/runner/work/ruby/ruby/src/spec/ruby/library/socket/ipsocket/getaddress_spec.rb:4:in `<top (required)>'
```
--
https://bugs.ruby-lang.org/