Issue #19237 has been updated by mame (Yusuke Endoh).
Discussed at the dev meeting. @matz said "The performance penalty is not acceptable.
Addition to the documentation looks good enough."
----------------------------------------
Bug #19237: Hash default_proc is not thread-safe to lazy-initialize value for a given
key
https://bugs.ruby-lang.org/issues/19237#change-101345
* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
```ruby
1000.times do
h = Hash.new do |hash, key|
# Note, with [] instead of Array.new below it seems to work
# but then e.g. anything else in this block like any call also breaks it
hash[key] = Array.new
end
go = false
threads = 100.times.map do
Thread.new do
Thread.pass until go
h[:na] << true
end
end
go = true
threads.each(&:join)
raise "Expected 100 elements but was #{h[:na].size}" if h[:na].size != 100
end
```
gives (in 3 runs):
```
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements
but was 1 (RuntimeError)
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements
but was 3 (RuntimeError)
concurrent_hash_default_proc.rb:17:in `block in <main>': Expected 100 elements
but was 2 (RuntimeError)
```
So what happens is the same Hash entry gets assigned multiple times, which feels quite
unexpected and cause fairly surprising bugs (e.g. elements "disappearing" etc).
Any thoughts on how to solve that in CRuby?
In my PhD thesis one way I found is to actually pass a different object than the Hash
itself as the first argument to the block.
See
https://eregon.me/blog/assets/research/thesis-thread-safe-data-representati…
page 83 `Idiomatic Concurrent Hash Operations`. In short, it replaces `[]=` calls in the
initializer block with `put_if_absent` by passing a different object than the
Concurrent::Hash itself, which overrides `[]=` and delegates the rest.
#19069 could be another way but there are compatibility issues (e.g. storing when it
previously would not for `Hash.new { |h,k| k * 3 }`.
There is also the question whether the block should be allowed to execute more than once
for a given key.
I think that is difficult to solve (probably the only way is via some lock, but then that
can lead to deadlocks), and less important than ensuring the value is only assigned once
for a given key.
---
Note that concurrent-ruby has the same issue because it uses ::Array/::Hash on CRuby:
https://github.com/ruby-concurrency/concurrent-ruby/issues/970#issuecomment…
There is also more discussions about trade-offs there which apply to `Hash` too.
--
https://bugs.ruby-lang.org/