[ruby-core:111146] [Ruby master Bug#19172] `ruby_thread_has_gvl_p` is innacurate sometimes -- document or change?

Issue #19172 has been reported by ivoanjo (Ivo Anjo). ---------------------------------------- Bug #19172: `ruby_thread_has_gvl_p` is innacurate sometimes -- document or change? https://bugs.ruby-lang.org/issues/19172 * Author: ivoanjo (Ivo Anjo) * Status: Open * Priority: Normal * ruby -v: (All Ruby versions) * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Howdy 👋! I work for Datadog [on the ddtrace gem](https://github.com/DataDog/dd-trace-rb) and I found a... sharp edge on the internal `ruby_thread_has_gvl_p` API. I am aware that `ruby_thread_has_gvl_p` is documented an experimental API that is exported as a symbol but not present on the VM include files. ### Background In the ddtrace profiling component, we setup a signal handler and then periodically send SIGPROF signals to try to interrupt the running Ruby thread (e.g. the thread that is holding the global VM lock or equivalent). In the signal handler, we need to perform some API calls which are not safe to do without the GVL. So we need to check if the signal handler got called in the thread that has the GVL. ### The issue ```ruby int ruby_thread_has_gvl_p(void) { rb_thread_t *th = ruby_thread_from_native(); if (th && th->blocking_region_buffer == 0) { return 1; } else { return 0; } } ``` In its current implementation, `ruby_thread_has_gvl_p` only checks if the thread has a `blocking_region_buffer` or not. Unfortunately, this means that when called from a thread that lost the GVL but not due to blocking (e.g. via `rb_thread_schedule()`), it can still claim that a thread is holding the GVL when that is not the case. I ran into this issue in https://github.com/DataDog/dd-trace-rb/pull/2415, and needed to find a workaround. ### Next steps Since this is an internal VM API, I'm not sure you'd want to change the current behavior, so I was thinking of perhaps two options: * Is it worth changing `ruby_thread_has_gvl_p` to be accurate in the case I've listed? * If not, would you accept a PR to document its current limitations, so that others don't run into the same issue I did? -- https://bugs.ruby-lang.org/

Issue #19172 has been updated by luke-gru (Luke Gruber). In doing some reading of the code, it's my understanding that `(TH_SCHED(th)->running == th)` would be a better way to tell if a thread has the GVL. Maybe I'm understanding it wrong though. ---------------------------------------- Bug #19172: `ruby_thread_has_gvl_p` is innacurate sometimes -- document or change? https://bugs.ruby-lang.org/issues/19172#change-101610 * Author: ivoanjo (Ivo Anjo) * Status: Open * Priority: Normal * ruby -v: (All Ruby versions) * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Howdy 👋! I work for Datadog [on the ddtrace gem](https://github.com/DataDog/dd-trace-rb) and I found a... sharp edge on the internal `ruby_thread_has_gvl_p` API. I am aware that `ruby_thread_has_gvl_p` is documented an experimental API that is exported as a symbol but not present on the VM include files. ### Background In the ddtrace profiling component, we setup a signal handler and then periodically send SIGPROF signals to try to interrupt the running Ruby thread (e.g. the thread that is holding the global VM lock or equivalent). In the signal handler, we need to perform some API calls which are not safe to do without the GVL. So we need to check if the signal handler got called in the thread that has the GVL. ### The issue ```c int ruby_thread_has_gvl_p(void) { rb_thread_t *th = ruby_thread_from_native(); if (th && th->blocking_region_buffer == 0) { return 1; } else { return 0; } } ``` In its current implementation, `ruby_thread_has_gvl_p` only checks if the thread has a `blocking_region_buffer` or not. Unfortunately, this means that when called from a thread that lost the GVL but not due to blocking (e.g. via `rb_thread_schedule()`), it can still claim that a thread is holding the GVL when that is not the case. I ran into this issue in https://github.com/DataDog/dd-trace-rb/pull/2415, and needed to find a workaround. ### Next steps Since this is an internal VM API, I'm not sure you'd want to change the current behavior, so I was thinking of perhaps two options: * Is it worth changing `ruby_thread_has_gvl_p` to be accurate in the case I've listed? * If not, would you accept a PR to document its current limitations, so that others don't run into the same issue I did? -- https://bugs.ruby-lang.org/

Issue #19172 has been updated by ivoanjo (Ivo Anjo). Yeah, that's my understanding, and what I'm using in that PR (although with a lot more complexity since I'm still trying to support older Rubies...). ---------------------------------------- Bug #19172: `ruby_thread_has_gvl_p` is innacurate sometimes -- document or change? https://bugs.ruby-lang.org/issues/19172#change-101616 * Author: ivoanjo (Ivo Anjo) * Status: Open * Priority: Normal * ruby -v: (All Ruby versions) * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Howdy 👋! I work for Datadog [on the ddtrace gem](https://github.com/DataDog/dd-trace-rb) and I found a... sharp edge on the internal `ruby_thread_has_gvl_p` API. I am aware that `ruby_thread_has_gvl_p` is documented an experimental API that is exported as a symbol but not present on the VM include files. ### Background In the ddtrace profiling component, we setup a signal handler and then periodically send SIGPROF signals to try to interrupt the running Ruby thread (e.g. the thread that is holding the global VM lock or equivalent). In the signal handler, we need to perform some API calls which are not safe to do without the GVL. So we need to check if the signal handler got called in the thread that has the GVL. ### The issue ```c int ruby_thread_has_gvl_p(void) { rb_thread_t *th = ruby_thread_from_native(); if (th && th->blocking_region_buffer == 0) { return 1; } else { return 0; } } ``` In its current implementation, `ruby_thread_has_gvl_p` only checks if the thread has a `blocking_region_buffer` or not. Unfortunately, this means that when called from a thread that lost the GVL but not due to blocking (e.g. via `rb_thread_schedule()`), it can still claim that a thread is holding the GVL when that is not the case. I ran into this issue in https://github.com/DataDog/dd-trace-rb/pull/2415, and needed to find a workaround. ### Next steps Since this is an internal VM API, I'm not sure you'd want to change the current behavior, so I was thinking of perhaps two options: * Is it worth changing `ruby_thread_has_gvl_p` to be accurate in the case I've listed? * If not, would you accept a PR to document its current limitations, so that others don't run into the same issue I did? -- https://bugs.ruby-lang.org/
participants (2)
-
ivoanjo (Ivo Anjo)
-
luke-gru (Luke Gruber)