[ruby-core:123961] [Ruby Feature#21722] Expose rb_gc_mark_weak API for use in extensions
Issue #21722 has been reported by ivoanjo (Ivo Anjo). ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and need the GVL, and thus make it hard to use from low-level tracepoints and to avoid overhead by doing profiler work with the GVL released. For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
Issue #21722 has been updated by peterzhu2118 (Peter Zhu). Hi, author of `rb_gc_mark_weak` here. I think it would be good to have such an API available. However, I don't think the current API is it. This is because the API is very tricky to use with incremental marking. Since incremental marking splits marking into several steps interleaved with Ruby code execution, it's possible that the state of the object changes after it has been marked. But since `rb_gc_mark_weak` operates on pointers, the underlying memory of the pointers may have been freed or realloced. This is why `rb_gc_remove_weak` exists and also why the ST tables in WeakMap/WeakKeyMap is an ST table that has keys and values that point to malloc memory containing the actual keys and values. I have proposed #21084 (implemented in [this PR](https://github.com/ruby/ruby/pull/12606)) that I think may be an easier to use API. ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722#change-115364 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and need the GVL, and thus make it hard to use from low-level tracepoints and to avoid overhead by doing profiler work with the GVL released. For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
Issue #21722 has been updated by ivoanjo (Ivo Anjo). Thanks for the hint, Peter! I had not spotted https://bugs.ruby-lang.org/issues/21084 :) Based on your notes it looks like exposing a `rb_gc_declare_weak_references` is a much better choice 👍 ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722#change-115365 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and need the GVL, and thus make it hard to use from low-level tracepoints and to avoid overhead by doing profiler work with the GVL released. For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
Issue #21722 has been updated by Eregon (Benoit Daloze). Continuing from https://bugs.ruby-lang.org/issues/21710#note-17 ivoanjo (Ivo Anjo) wrote:
Weakmap APIs are Ruby-level and can't be used from the NEWOBJ tracepoint.
Because that would recursively invoke the NEWOBJ tracepoint? I thought TracePoint prevents that, doesn't it? And if not, maybe it should? ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722#change-115437 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and can't be used from the NEWOBJ tracepoint. (Edit I previously claimed the issue was needing the GVL as well which was not accurate -- we have the GVL inside the tracepoint anyway) For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
Issue #21722 has been updated by jhawthorn (John Hawthorn). The reason I know of that it's unsafe to allocate (newobj or xmalloc) inside of the NEWOBJ hook is that a lot of code expects there to be a grace period between newobj and the next allocation where it can fill in the object without issuing write barriers and without the object being safe to mark. There are probably other reasons that not everywhere we allocate is a tracepoint.
I thought TracePoint prevents that, doesn't it? And if not, maybe it should?
The `NEWOBJ`/`FREEOBJ` tracepoints are internal so do not have many guardrails. In Ruby 3.4 NEWOBJ was changed to take a VM lock and to disable GC, which makes it a lot more forgiving, but in my opinion is untenable long term. @ivoanjo I don't know if this would work, but what about using `rb_postponed_job_trigger` to delay adding the object into a `WeakMap`? (there would also probably need to be a temporary buffer used to mark the object until it is added) ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722#change-115448 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and can't be used from the NEWOBJ tracepoint. (Edit I previously claimed the issue was needing the GVL as well which was not accurate -- we have the GVL inside the tracepoint anyway) For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
Issue #21722 has been updated by ivoanjo (Ivo Anjo).
@ivoanjo (Ivo Anjo) I don't know if this would work, but what about using rb_postponed_job_trigger to delay adding the object into a WeakMap? (there would also probably need to be a temporary buffer used to mark the object until it is added)
That's a good question. I know when we started off to make this work we were still trying to support old Rubies where `rb_gc_force_recycle` was even a thing so that factored into the design. (We've since abandoned all hope there and support 3.1+ only for heap profiling). There was also not the redesigned `WeakMap`, etc. You're very right in calling this out -- there's a number of assumptions we need to invalidate about why we picked the current design. I'll try it out and report back! I suspect it may take me a bit as the "let's use the object id" assumption is baked pretty deeply in the current design. 😅 ---------------------------------------- Feature #21722: Expose rb_gc_mark_weak API for use in extensions https://bugs.ruby-lang.org/issues/21722#change-115456 * Author: ivoanjo (Ivo Anjo) * Status: Open ---------------------------------------- In https://bugs.ruby-lang.org/issues/21710 it came up that 1. On top of [deprecating _id2ref](https://bugs.ruby-lang.org/issues/15408) on Ruby 4.0, [it's a bad idea to be using object_id from the NEWOBJ tracepoint](https://bugs.ruby-lang.org/issues/21710#note-8) 2. `rb_gc_mark_weak` which would be the alternative for an extension that needs weak reference-like behavior [is not available for extensions](https://bugs.ruby-lang.org/issues/21710#note-9) So I've opened this ticket to request exposing `rb_gc_mark_weak` so it can be used by extensions? --- The [Datadog Ruby profiler](https://github.com/datadog/dd-trace-rb) is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive. We're using this approach instead of: * The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object) * WeakMap => Weakmap APIs are Ruby-level and can't be used from the NEWOBJ tracepoint. (Edit I previously claimed the issue was needing the GVL as well which was not accurate -- we have the GVL inside the tracepoint anyway) For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases. -- https://bugs.ruby-lang.org/
participants (4)
-
Eregon (Benoit Daloze) -
ivoanjo (Ivo Anjo) -
jhawthorn (John Hawthorn) -
peterzhu2118 (Peter Zhu)