
Issue #19473 has been updated by ko1 (Koichi Sasada). Eregon (Benoit Daloze) wrote in #note-20:
I think a simple way to look at this issue is to analyze all cases: 1. The Mutex is not held by any Thread, and the signal handler acquires it (the common case): this is fully correct and would work just fine without the unnecessary `can't be called from trap context (ThreadError)` from CRuby which breaks this. 2. The Mutex is held by the main Thread, and the signal handler tries to acquire it: it would be ` deadlock; recursive locking (ThreadError)` which is as good as the current `can't be called from trap context (ThreadError)`. 3. The Mutex is held by another Thread, and the signal handler tries to acquire it: the signal handler will wait until that Thread releases the Mutex. This is totally normal, it is Mutex semantics. You can't argue the Thread might keep the Mutex forever because that would be a bug in the first place. In the worst case it hangs, the user would Ctrl+C and see the stacktrace, which is a reasonable outcome for that case. CRuby is even sometimes able to detect such a deadlock and raises a `fatal` error in that case.
If anyone disagrees, can they explain how it is better for all of these cases to fail with `can't be called from trap context (ThreadError)`?
In most cases, (1) occurs, so it is difficult to notice the possibility of (2). With the current behavior, it is possible to recognize the risk of (2) even in case (1). This is the reason why this behavior is merged into Ruby. (In this thread, I mentioned as "hard to predict" — this is what it refers to.) ---------------------------------------- Bug #19473: can't be called from trap context (ThreadError) is too limiting https://bugs.ruby-lang.org/issues/19473#change-113951 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Simple reproducer: ``` $ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1' ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] -e:1:in `synchronize': can't be called from trap context (ThreadError) from -e:1:in `block in <main>' from -e:1:in `kill' from -e:1:in `<main>' ``` Expected behavior: ``` $ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1' truffleruby 22.3.1, like ruby 3.0.3, GraalVM CE Native [x86_64-linux] :OK $ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1' jruby 9.4.0.0 (3.1.0) 2022-11-23 95c0ec159f OpenJDK 64-Bit Server VM 17.0.6+10 on 17.0.6+10 +jit [x86_64-linux] :OK ``` This exception is highly problematic, for instance it breaks `Timeout.timeout` in `trap`: https://github.com/ruby/timeout/issues/17#issuecomment-1142035939 I suppose this behavior is because *sometimes* it's problematic to lock a Mutex in trap, e.g., if it's already locked by the main thread/fiber. But that would otherwise already raise `deadlock; recursive locking (ThreadError)`, so there is no point to fail early. And that's just one case, not all, so we should not always raise an exception. There seems to be no valid reason to prevent *all* `Mutex#synchronize` in `trap`. After all, if the Mutex for instance is only used in `trap`, it's well-defined AFAIK. For instance a given trap handler does not seem executed concurrently: ``` $ ruby -ve 'trap(:HUP) { puts "in trap\n"+caller.join("\n")+"\n\n"; sleep 0.1 }; pid = Process.pid; Process.wait fork { 20.times { Process.kill :HUP, pid } }; sleep 1' ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] in trap -e:1:in `wait' -e:1:in `<main>' in trap -e:1:in `wait' -e:1:in `<main>' in trap -e:1:in `wait' -e:1:in `<main>' in trap -e:1:in `wait' -e:1:in `<main>' in trap -e:1:in `wait' -e:1:in `<main>' in trap -e:1:in `wait' -e:1:in `<main>' ``` And if the trap handler using the Mutex is never called while the Mutex is held by the main thread/fiber, there is also no problem. -- https://bugs.ruby-lang.org/