Issue #22098 has been updated by Eregon (Benoit Daloze). jhawthorn (John Hawthorn) wrote in #note-12:
I feel very strongly that we must not allow arbitrary Ruby code in these hooks.
I agree, I expressed that poorly. I meant to say one would expect the GVL is held (given the name of the hook); and it would be nice if we can call various `rb_*()` functions, as in regular C extension code. But the latter does conflict with what you say, good points. Anything that check interrupts or may switch threads or release the GVL, etc is clearly a no-go.
As a subscriber to this API (as a profiler)
I'm also using this API for a profiler :)
I need the guarantee that these events are delivered reliably and in order (for the given thread). It's not acceptable that an earlier subscriber delays and changes the order in which events are delivered, breaking the documented invariants of the GVL state machine (ex. if we re-suspend in a hook the hook after it will see the SUSPENDED event before the previous RESUMED, will later see multiple RESUMED in a row). We could detect re-entrancy and disable the hooks (as tracepoint does), I really do not want that. It's important for my profiler to have accurate delivery of all events. This is a very low level API and we should expect consumers to all be careful and well behaved.
Agreed. (FTR, Ruby 3.2 does emit multiple SUSPENDED events in a row with ConditionVariable, but 3.3+ does not seem to have the issue) Better documentation would go a long way. I think we still want the RESUMED hook to be called on the thread which just acquired the GVL, it's too surprising otherwise and doesn't seem advantageous either.
However, in internal events, you can not use any Ruby APIs (even object creations).
"you can not use any Ruby APIs" is too strict, at minimum we need: * rb_internal_thread_specific_get()/rb_internal_thread_specific_set() * rb_postponed_job_trigger() * TypedData_Get_Struct() * some way to check if on main Ractor or not (which is BTW what the datadog gem uses, and similar for gvltools) If we forbid APIs like allocation, would it be possibly to reliably fail if called from these hooks with a RUBY_DEBUG build? That'd be a good way to validate it (which we could even document on `rb_internal_thread_add_event_hook()`). ---------------------------------------- Bug #22098: RUBY_INTERNAL_THREAD_EVENT_RESUMED runs without GVL held https://bugs.ruby-lang.org/issues/22098#change-117573 * Author: luke-gru (Luke Gruber) * Status: Open * Assignee: luke-gru (Luke Gruber) * Backport: 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Today, it's possible to get a deadlock when allocating during a hook for this event. I attached a reproduction script using the `gvltools` gem. One way to fix it would be to not allocate during this hook and change the documentation to be clear that the GVL is not held. Any gems relying on this behavior, like `gvltools`, would need to be patched. Another way to approach it would be to change the call site for this hook invocation. I believe it would need to be added to quite a few places. To maintain Ractor safety, it would also need to be invoked without any locks held. I'm curious about your thoughts @byroot. ---Files-------------------------------- repro.rb (3.11 KB) run_loop.sh (2.82 KB) -- https://bugs.ruby-lang.org/