[ruby-core:123951] [Ruby Bug#21719] Thread deadlock with explicit require of a base clase in Linux Ruby 3.4
Issue #21719 has been reported by jcuello@fu.do (Juan Manuel Cuello). ---------------------------------------- Bug #21719: Thread deadlock with explicit require of a base clase in Linux Ruby 3.4 https://bugs.ruby-lang.org/issues/21719 * Author: jcuello@fu.do (Juan Manuel Cuello) * Status: Open * ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I originally [reported the issue](https://github.com/fxn/zeitwerk/issues/321) in Zeitwerk, but we then figured out that it seems to be related to Ruby. Basically, I'm having a threads deadlock when using requires with autoloadable classes: ```ruby # jobs/base.rb # module Jobs class Base end end # jobs/a.rb # require './jobs/base' module Jobs class A < Base def perform puts self.class.name end end end # jobs/b.rb # module Jobs class B < Base def perform puts self.class.name end end end # start.rb # module Jobs autoload :Base, './jobs/base' autoload :A, './jobs/a' autoload :B, './jobs/b' end a = Thread.new { Jobs::A.new.perform } b = Thread.new { Jobs::B.new.perform } a.join b.join ``` ``` ruby --version && ruby start.rb ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] start.rb:12:in 'Thread#join': No live threads left. Deadlock? (fatal) 3 threads, 3 sleeps current:0x00005ca30dfbc500 main thread:0x00005ca30dd30330 * #<Thread:0x000076b1a7d6a658 sleep_forever> rb_thread_t:0x00005ca30dd30330 native:0x000076b1a81b87c0 int:0 * #<Thread:0x000076b1a7d2e928 start.rb:9 sleep_forever> rb_thread_t:0x00005ca30dfbc500 native:0x000076b18c74d6c0 int:0 depended by: tb_thread_id:0x00005ca30dd30330 * #<Thread:0x000076b1a7d2e3b0 start.rb:10 sleep_forever> rb_thread_t:0x00005ca30dfb0fd0 native:0x000076b18c54b6c0 int:0 mutex:0x00005ca30dfe2b60 cond:1 from start.rb:12:in '<main>' ``` Note the require './jobs/base' in `jobs/a.rb`. If I remove it, everything works. The same happens if I add the same explicit require in `jobs/b.rb`. It seems to have been fixed in ruby 3.4 in commit:ea2af5782df63266577ba08a4ef4c30b6d63e564, but not apparent in Linux (which is my case) until commit:6fbc32b5d0da31535cccc0eca1853273313a0b52 I'm not familiar with the ruby codebase, so It's not clear to me why the change to prism fixed the threads issue and why it didn't have impact in Linux until the other fix, but bisecting the source code and running each revision against the code above, that is what I came to. I can create a PR to backport the Linux fix to `ruby_3_4` branch, as in `master` everything is working as expected. -- https://bugs.ruby-lang.org/
Issue #21719 has been updated by mame (Yusuke Endoh). It is not stably reproducible because the code heavily relies on race condition. Here is a more reproducible and simpified version. ```ruby # start.rb # autoload :Target, "./target" # a hack to trigger context switch after Kernel#require TracePoint.new(:script_compiled) { sleep 2 }.enable # just for debug print Thread.current.name = "main" TracePoint.new(:line) { p [Thread.current.name, it] }.enable Thread.new do sleep 1 Target end.name = "sub" require "./target" # target.rb # class Target end ``` The deadlock reproduces on both Ruby 3.4.7 and master. ``` $ ruby --disable-gems start.rb ["main", #<TracePoint:line start.rb:10>] ["main", #<TracePoint:line start.rb:15>] ["sub", #<TracePoint:line start.rb:11>] ["sub", #<TracePoint:line start.rb:12>] ["main", #<TracePoint:line /home/mame/work/ruby/target.rb:1>] /home/mame/work/ruby/target.rb:1:in '<top (required)>': No live threads left. Deadlock? (fatal) 2 threads, 2 sleeps current:0x000058ecbdbd5330 main thread:0x000058ecbdbd5330 * #<Thread:0x00007a1d1a7f8a08@main sleep_forever> rb_thread_t:0x000058ecbdbd5330 native:0x00007a1d34cecc00 int:0 /home/mame/work/ruby/target.rb:1:in '<top (required)>' start.rb:15:in 'Kernel#require' start.rb:15:in '<main>' * #<Thread:0x00007a1d18b4f190@sub start.rb:10 sleep_forever> rb_thread_t:0x000058ecbdd74e70 native:0x00007a1d18a3e6c0 int:0 mutex:1 cond:1 start.rb:12:in 'Kernel#require' start.rb:12:in 'block in <main>' from start.rb:15:in 'Kernel#require' from start.rb:15:in '<main>' ``` @akr @nobu I suspect the hack to hide constant definitions when requiring via autoload isn't working properly. ```ruby ## target.rb # # autoload of Target is not hidden here. Is this correct? p [Thread.current.name, autoload?(:Target)] #=> actual: ["main", "./target"], expected: ["main", nil] class Target # autoload is fired here and attempts to load target.rb recursively, which leads to the deadlock end ``` Do you understand what's happening? ---------------------------------------- Bug #21719: Thread deadlock with explicit require of a base clase in Linux Ruby 3.4 https://bugs.ruby-lang.org/issues/21719#change-115556 * Author: jcuello@fu.do (Juan Manuel Cuello) * Status: Open * ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- I originally [reported the issue](https://github.com/fxn/zeitwerk/issues/321) in Zeitwerk, but we then figured out that it seems to be related to Ruby. Basically, I'm having a threads deadlock when using requires with autoloadable classes: ```ruby # jobs/base.rb # module Jobs class Base end end # jobs/a.rb # require './jobs/base' module Jobs class A < Base def perform puts self.class.name end end end # jobs/b.rb # module Jobs class B < Base def perform puts self.class.name end end end # start.rb # module Jobs autoload :Base, './jobs/base' autoload :A, './jobs/a' autoload :B, './jobs/b' end a = Thread.new { Jobs::A.new.perform } b = Thread.new { Jobs::B.new.perform } a.join b.join ``` ``` ruby --version && ruby start.rb ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux] start.rb:12:in 'Thread#join': No live threads left. Deadlock? (fatal) 3 threads, 3 sleeps current:0x00005ca30dfbc500 main thread:0x00005ca30dd30330 * #<Thread:0x000076b1a7d6a658 sleep_forever> rb_thread_t:0x00005ca30dd30330 native:0x000076b1a81b87c0 int:0 * #<Thread:0x000076b1a7d2e928 start.rb:9 sleep_forever> rb_thread_t:0x00005ca30dfbc500 native:0x000076b18c74d6c0 int:0 depended by: tb_thread_id:0x00005ca30dd30330 * #<Thread:0x000076b1a7d2e3b0 start.rb:10 sleep_forever> rb_thread_t:0x00005ca30dfb0fd0 native:0x000076b18c54b6c0 int:0 mutex:0x00005ca30dfe2b60 cond:1 from start.rb:12:in '<main>' ``` Note the require './jobs/base' in `jobs/a.rb`. If I remove it, everything works. The same happens if I add the same explicit require in `jobs/b.rb`. It seems to have been fixed in ruby 3.4 in commit:ea2af5782df63266577ba08a4ef4c30b6d63e564, but not apparent in Linux (which is my case) until commit:6fbc32b5d0da31535cccc0eca1853273313a0b52 I'm not familiar with the ruby codebase, so It's not clear to me why the change to prism fixed the threads issue and why it didn't have impact in Linux until the other fix, but bisecting the source code and running each revision against the code above, that is what I came to. I can create a PR to backport the Linux fix to `ruby_3_4` branch, as in `master` everything is working as expected. -- https://bugs.ruby-lang.org/
participants (2)
-
jcuello@fu.do (Juan Manuel Cuello) -
mame (Yusuke Endoh)