[ruby-core:125475] [Ruby Feature#22067] New TypedData bit to allow the type to be freed in parallel
Issue #22067 has been reported by luke-gru (Luke Gruber). ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by peterzhu2118 (Peter Zhu). I'm in favor of this feature. We already have the `obj_can_parallel_free_p` function in MMTk to determine if the type can be freed in parallel (which currently does not include `T_DATA`), and this will help improve the percentage of objects that can be freed in parallel. However, there is one thing that concerns me. You're using the concept of concurrent and parallel interchangeably, which I am worried may not be true in all cases. Concurrent means that the object can be freed while the Ruby VM is running. Parallel means that two objects can be freed at the same time on different threads. Objects that share global state may be safe to be freed concurrently (assuming no live objects can access this global state). However, it may not be safe to be freed in parallel since there may be race conditions. One example of this could be reference counting. However, I can't think of any examples in the wild that are safe to be freed concurrently but not in parallel (or vice versa). ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117288 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by luke-gru (Luke Gruber). Our current implementation of parallel sweeping has the sweep thread run both concurrently with the mutator and in parallel with the Ruby GC thread. I would prefer a bit that represents the safety in both situations. For MMTk, since it doesn't run concurrently with the mutator, that might mean taking uncontended locks where there were none before when freeing certain T_DATA while the mutator is running. Is this a fair compromise, or do you have something else in mind? ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117289 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by peterzhu2118 (Peter Zhu). I'm worried that having both semantics would mean that certain types cannot use it even when it could satisfy one of the two characteristics. If such a case never arises, then it's fine that this one flag satisfies both semantics.
For MMTk, since it doesn't run concurrently with the mutator,
I think it might be possible to run sweeping concurrently with the mutator, but I've never tried it because it wasn't safe to do before.
A possible future: parallel marking?
Should already be possible. MMTk already does parallel marking. I don't believe any TypedData object marking is not safe to run in parallel (since marking should never mutate). ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117290 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by wks (Kunshan Wang). File Screenshot_20260513_185452.webp added File Screenshot_20260513_192449.webp added File Screenshot_20260513_192734.webp added File Screenshot_20260513_192910.webp added If I read correctly, this proposal is about improving the performance of sweeping by running `obj_free` of TypedData concurrently with mutators. But I think this does not change a fact that the TypedData is still subject to finalization (in the form of `obj_free`) even with the proposed `RUBY_TYPED_PARALLEL_FREE_SAFE`. Note that CRuby has two forms of finalization (here "finalization" refers to the actions need to be done when an object is deemed unreachable): 1. `obj_free` which, in the case of `TypedData`, can be customized by providing a `dfree` pointer. 2. `ObjectSpace::define_finalizer`. If an object needs finalization in `obj_free`, its memory cannot be reclaimed until `obj_free` finishes because `obj_free` needs to read its fields. In CRuby, this leads to calling `obj_free` in lazy sweeping (in allocation slow paths), or "zombie" objects which are kept alive after a GC. If an object needs to release some resources (e.g. freeing memory, closing files, etc.) in `obj_free`, it must be done during sweeping, regardless whether it is done in the mutator thread or done concurrently in a "worker" thread. # Can we use `ObjectSpace::define_finalizer` instead? Before talking about the `RUBY_TYPED_PARALLEL_FREE_SAFE` proposal, I'd like to point out that `ObjectSpace::define_finalizer` is a better alternative. If the purpose of this proposal is optimizing for the cleaning up of TypedData types in the CRuby runtime itself, we should see if `ObjectSpace::define_finalizer` (or a similar finalization mechanism based on weak tables) is feasible. We may encourage extension writers to use `ObjectSpace::define_finalizer` if possible. The advantage of `ObjectSpace::define_finalizer` is that, because the finalizer registered with `ObjectSpace::define_finalizer` does not have access to the dead object itself, the dead object can be reclaimed regardless whether the finalizer has been executed or not. This means if a TypedData uses `ObjectSpace::define_finalizer` (instead of `dfree`) as its clean-up mechanism, it can just use `RUBY_TYPED_DEFAULT_FREE`, and `obj_free` doesn't need to invoke `dfree`, either. If the TypedData is also `RUBY_TYPED_EMBEDDABLE`, the cell can be trivially reclaimed without invoking `obj_free()` at all. Compared to concurrent sweeping which still needs to call `obj_free`, it's better to skip `obj_free` altogether. In MMTk, we skip calling `obj_free` on TypedData instances that are both `RUBY_TYPED_EMBEDDABLE` and `RUBY_TYPED_DEFAULT_FREE`. In the default GC, `obj_free` basically does no-op to such objects.) Note that the Java programming language deprecated `Object.finalize()` in Java 9, and deprecated it for removal in Java 18. Java's alternative to `Object.finalize()` is `PhantomReference` and its convenient wrapper `Cleaner`. It behaves very similar to Ruby's `ObjectSpace::define_finalizer` in that `PhantomReference` does not have access to the dead object, either, and avoids the infamous "object resurrection" problem in Java's `Object.finalize()`. # If we have to use `dfree` If we have to use `dfree` to finalize a TypedData instance, it may be profitable to run it in a "worker" thread concurrently with the mutator. But it may not be profitable to run them in multiple "worker" threads in parallel. From our experience of integrating MMTk with CRuby (See [this paper](https://wks.github.io/downloads/pdf/ruby-ismm-2025.pdf)), (1) finalization in the form of `obj_free` takes up the majority of the GC time when using MMTk, and (2) running `obj_free` in multiple parallel threads slows down the execution of `obj_free` instead of speeding it up because the `free()` function in libc does not scale when called in parallel. Since `dfree` usually calls the `free()` function in libc, it may happen that parallelizing the execution of `obj_free` will make it even slower. `RUBY_TYPED_PARALLEL_FREE_SAFE` probably doesn't profit MMTk directly. In MMTk, most GC algorithms, including high-performance algorithms like Immix and StickyImmix, don't do lazy sweeping. (MarkSweep supports lazy sweeping, but it has inferior performance.) They make the memory of the dead objects ready for allocation immediately after GC. This means we need to call `obj_free` during GC in a weak reference processing stage (which is after determining the objects' life and death, but before reclaiming their space). We are currently working hard reducing the number of objects that need `obj_free` because in CRuby almost all types are subject to `obj_free` (at least in some states). It is possible to replicate what Java used to do, i.e. retaining an unreachable object after GC so that a dedicated finalizer thread can finalize them. This deprecated feature is quite similar to this proposal, i.e. a concurrent "worker" thread that sweeps object. MMTk is able to retain dead objects and postpone the finalization after GC this way. But the problem is still that the space of those objects cannot be reclaimed (and therefore used for allocation) until their `dfree` functions are called. If we retain too many dead objects, the GC will not free up enough memory to keep up with the pace of allocation. And we will be in an awkward situation where a mutator triggered GC, but the GC cannot reclaim more memory until the finalizer thread (which is another mutator) finalizes more objects and stop retaining objects. It may be worth annotation not just whether a TypedData is "safe to be freed concurrently with the mutator", but also whether "it is profitable to free it in parallel to other objects". For this to be true, the TypedData instance should not hold `malloc()`-ed memory that needs to be `free()`-ed. But that's too much implementation details and I don't think programmers can easily reason about it. # About parallel marking As Peter said, MMTk is using parallel marking (multiple GC workers marking objects in parallel). It should just work unless the TypedData does something anything in `dmark`, which it shouldn't anyway. `dmark` should just visit all fields that hold references. If the TypedData uses `RUBY_TYPED_DECL_MARKING`, it will have even less interference with the GC. Marking can profit greatly from parallelization. I think it should be easy for the default GC to parallelize marking. I added an attachment that shows the timeline of a GC in MMTK when running the railsbench using the (non-generational) Immix GC. (Note that this is using the MMTk GC module in https://github.com/ruby/mmtk/ and it doesn't include optimizations mentioned in [this paper](https://wks.github.io/downloads/pdf/ruby-ismm-2025.pdf) such as making `obj_free` unnecessary for T_STRING, T_ARRAY and T_MATCH. So finalization (calling `obj_free`) still dominates.) I am currently working on using eBPF to visualize the default GC, too. From my current observation (see attachment), there are several long marking events during early execution. But as the program makes progress, sweeping gradually starts to dominate. I am curious what the timeline would look like if we have concurrent sweeping. ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117304 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. ---Files-------------------------------- Screenshot_20260513_185452.webp (99.8 KB) Screenshot_20260513_192449.webp (31.9 KB) Screenshot_20260513_192734.webp (22.4 KB) Screenshot_20260513_192910.webp (27.3 KB) -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by luke-gru (Luke Gruber). Thank you for your input Peter and Kunshan, I greatly appreciate it. As you may know, I am no GC expert :) ### Parallel marking For parallel marking, I agree that the vast majority of all TypedData mark functions are thread-safe. I was involved in a discussion where some others said they've seen problematic mark functions in some extensions (reading/mutating global state, etc.) Basically, it's a "if it can go wrong, it will go wrong" type of situation where user code is concerned when using lots of extensions. But, if it's been working for MMTk of course we could reconsider if we ever get to implementing it. ### Parallel Sweeping Perf We do both concurrent and parallel sweeping, but parallel sweeping does give benefits even when using libc and not another allocator like jemalloc/tcmalloc that can handle parallel calls to `free` well. We only have 1 "worker" thread so this reduces contention, and lots of objects don't require calls to `free`, as they're embedded. I think in the future, many more objects will be embedded as well. CRuby GC has many size classes in its allocator and in the future hopefully it gets large size classes so we can embed even more objects that now require an external allocation. We also want to look into having special pages for objects that require no finalization. ### Lazy Sweeping We're working under the constraints of Ruby's current GC architecture, so it was definitely easiest to keep lazy sweeping. Background (concurrent) sweeping is working out well for us too. Also, CRuby has historically cared a lot about having a lightweight footprint. We couldn't add so many worker threads to CRuby's default GC imo. ### Using define_finalizer Finalizers need to run on the main Ruby thread while the thread holds the GVL. Maybe I'm confused about the terminology you're using. ### Having finalization thread I like this idea and I'm aware that other GCs do it. I also don't think it's orthogonal to having a worker thread, especially when the worker works in parallel with Ruby's GC thread. It could definitely reduce pause time, I think. The downside would be that objects get kept around for an extra cycle after being swept, like you said. We would also need a separate finalization thread for it, and there would have to be synchronization with the mutator since it's putting objects back on the page's freelist. ### Responses
RUBY_TYPED_PARALLEL_FREE_SAFE probably doesn't profit MMTk directly.
That's too bad if that's the case. Peter seems to think it would, although maybe you meant something by __directly__ that I didn't catch. ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117308 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. ---Files-------------------------------- Screenshot_20260513_185452.webp (99.8 KB) Screenshot_20260513_192449.webp (31.9 KB) Screenshot_20260513_192734.webp (22.4 KB) Screenshot_20260513_192910.webp (27.3 KB) -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by jhawthorn (John Hawthorn). peterzhu2118 (Peter Zhu) wrote in #note-3:
since marking should never mutate
FWIW, `ractor_local_storage_mark`, `rb_thread_sched_mark_zombies`, `rb_iseq_mark_and_move` are examples where mutation happens as part of the mark step. I suspect/hope they do end up safe to run in parallel because they're only mutating their own storage. I don't know whether any C extensions have even less well behaved mark functions, but hopefully we can assume mark is safe to run in parallel and consider that a bug in the extension. wks (Kunshan Wang) wrote in #note-4:
# Can we use `ObjectSpace::define_finalizer` instead?
I think we're all in agreement that the ideal user-defined objects are RUBY_TYPED_EMBEDDABLE and use RUBY_TYPED_DEFAULT_FREE, ie. it requires no finalization and can just be immediately reclaimed. For other objects, we are stuck doing some form of finalization, the question is when is that expected to run. With no `RUBY_TYPED_*` flags, the requirement is that they are run where an `ObjectSpace::define_finalizer` on the mutator thread, holding the GVL, at a point interrupts are checked, with a real Ruby frame set up. `RUBY_TYPED_FREE_IMMEDIATELY` relaxes this requirement to allow it to be run as part of GC rather than deferred. I think MMTK also re-interpreted this flag to mean that it's safe to run on a separate thread, which is great if it hasn't caused compatibility issues. However a few types using `RUBY_TYPED_FREE_IMMEDIATELY` are still leaning on the assumption that the mutator has stopped, mostly we see this as removing from a linked list or deleting from a global table. `ObjectSpace::define_finalizer` because it requires the full GVL and needs to be deferred to an interrupt check is awkward and bad for performance (actually there are even more guarantees it _should_ require that aren't currently implemented correctly, like sticking to the defining Ractor). It also tends to hurt JIT performance (at least on microbenchmarks). You can achieve `ObjectSpace::define_finalizer` being used everywhere by simply setting no flags on the TypedData - that will allow the object slot itself to immediately be freed, and defer dfree function later on the external malloc memory, but I don't think that's desirable. `ObjectSpace::define_finalizer` should be discouraged. I think I understand your objections to needing to keep the slot around around like Java's `Object.finalize`, but I think that objection should be with the `RUBY_TYPED_EMBEDDABLE` flag, rather than `RUBY_TYPED_PARALLEL_FREE_SAFE`. This proposal only gives the GC more flexibility with these objects, and the flag is safe to ignore if you want. --- Being run concurrently with the mutator is the significant win here, but I would like the flag to guarantee that both concurrent and parallel freeing are fine. The vast majority of dfree functions out there are already safe for both. (Also is there a name that would better imply that? `RUBY_TYPED_THREAD_SAFE_FREE`?) A second goal of this flag is that if a Ractor-local GC is introduced in the future, this flag would provide a superset of the semantics that requires (ie. the dfree is safe to run without stopping other Ractors' mutators, and the dfree is safe to run if another Ractor happens to be GC-ing). I also think we should make `RUBY_TYPED_PARALLEL_FREE_SAFE` imply `RUBY_TYPED_FREE_IMMEDIATELY`. ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117309 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. ---Files-------------------------------- Screenshot_20260513_185452.webp (99.8 KB) Screenshot_20260513_192449.webp (31.9 KB) Screenshot_20260513_192734.webp (22.4 KB) Screenshot_20260513_192910.webp (27.3 KB) -- https://bugs.ruby-lang.org/
Issue #22067 has been updated by wks (Kunshan Wang). luke-gru (Luke Gruber) wrote in #note-5:
... some others said they've seen problematic mark functions in some extensions (reading/mutating global state, etc.) ...
jhawthorn (John Hawthorn) wrote in #note-6:
FWIW, `ractor_local_storage_mark`, `rb_thread_sched_mark_zombies`, `rb_iseq_mark_and_move` are examples where mutation happens as part of the mark step. I suspect/hope they do end up safe to run in parallel because they're only mutating their own storage. I don't know whether any C extensions have even less well behaved mark functions, but hopefully we can assume mark is safe to run in parallel and consider that a bug in the extension.
Yes. That's exactly the case I was referring to. For example, `rb_iseq_mark_and_move` tries to do clean-up during marking, and `rb_imemo_mark_and_move` attempts to clean up `imemo_callcache` during reference updating. This breaks the abstraction layer. From GC's point of view, the VM should provide an object-scanning function that simply enumerates fields that hold references, and does nothing else. Cleaning up is part of finalization (e.g. the `callcache` should have used the weak reference semantics to do finalization). That's why Peter introduced the `*_mark_and_move` functions in the first place, i.e. to make a unified field-enumerating function that works for both marking and reference updating (and may even be directly called by `rb_objspace_reachable_objects_from`). `RUBY_TYPED_DECL_MARKING` is even better because the developer of the TypedData doesn't need to write C functions for dmark or dcompact, leaving no room for them to introduce behaviors like attempting to clean up objects or accessing global data during marking. MMTk and other GC modules can benefit from `RUBY_TYPED_DECL_MARKING` because they can interpret the offsets table directly, avoiding the overhead of calling into C functions and setting up callbacks that intercept `rb_gc_impl_mark`. luke-gru (Luke Gruber) wrote in #note-5:
Finalizers need to run on the main Ruby thread while the thread holds the GVL. Maybe I'm confused about the terminology you're using.
jhawthorn (John Hawthorn) wrote in #note-6:
`ObjectSpace::define_finalizer` because it requires the full GVL and needs to be deferred to an interrupt check is awkward and bad for performance (actually there are even more guarantees it _should_ require that aren't currently implemented correctly, like sticking to the defining Ractor)...
Sorry about the confusion. I didn't realize `ObjectSpace::define_finalizer` needs GVL. Java doesn't have GIL, so the threads that process finalizers, `PhantomReference` and `Cleaner` can run concurrently with other Java threads. When I mentioned `ObjectSpace::define_finalizer`, I was referring to a finalization mechanism that is similar to PhantomReference and Cleaner in Java. Specifically, it 1. Doesn't have access to the dead object itself, therefore allowing the memory of the dead object to be immediately reused after GC. 2. Has all the context (e.g. the file descriptor, the memory pointer, etc.) needed to do the cleanup, therefore can be executed even if the dead object is reclaimed. In the MMTk GC module, we have something called "final jobs". Each "final job" is basically a tuple of `(free_func, data)`, and it is executed by calling `free_func(data)`. It is used to do what "zombies" do in CRuby's default GC, but does not leave dead objects in the heap after GC. Currently they are still executed at interrupts like zombies, but it should be possible to offload their execution to a dedicated fthread because currently zombies are for cleaning up IO or file objects, and they shouldn't need the GVL to clean up. So what I think is better for cleaning up TypedData is something like the "final job" (tuple of `(free_func, data)`) that can be executed concurrently with the mutator. It can even run concurrently with the sweeper thread which sweeps the heap pages. In this case, a `RUBY_TYPED_EMBEDDABLE | RUBY_TYPED_DEFAULT_FREE` object can be trivially swept. I wonder how we can present this interface to the extension developer. The simplest API is just a global function that lets the developer register the triple `(object, free_func, data)`, for example, ```c void ruby_register_cleanup_function(VALUE object, void (*free_func)(void*), void *data); ``` which basically means "if `object` is dead, call `free_func(data)`". We can require that this `free_func` must be callable without holding the GVL so that it can be executed concurrently with mutators (I think it is the intended semantics of `RUBY_TYPED_PARALLEL_FREE_SAFE`). Maybe we can give a warning if the `object` is not `RUBY_TYPED_EMBEDDABLE | RUBY_TYPED_DEFAULT_FREE` because if we use this mechanism for finalization, we won't need anything other than the default free, and the finalizer doesn't need to read the object body so it doesn't matter if it is embedded or not. luke-gru (Luke Gruber) wrote in #note-5:
That's too bad if that's the case. Peter seems to think it would, although maybe you meant something by *directly* that I didn't catch.
By "directly" I meant MMTk won't be able to use it if the MMTk GC module still calls `obj_free` on all dead objects during GC. But if we are able to offload the clean-up to mutator time, (e.g. using the `(free_func, data)` approach I mentioned earlier), it will still be applicable to MMTk. MMTk just needs to identify dead objects and enqueue the "final jobs" during GC, and they can be executed during mutator time. (Similarly, the default GC can identify such objects during sweeping, and the `free_func(data)` can be executed in parallel to sweeping.) I don't think resurrecting objects like Java's `Object.finalize` is a good idea. jhawthorn (John Hawthorn) wrote in #note-6:
... `RUBY_TYPED_FREE_IMMEDIATELY` relaxes this requirement to allow it to be run as part of GC rather than deferred...
Thank you for pointing this out. I have always thought `RUBY_TYPED_FREE_IMMEDIATELY` was a constraint (instead of relaxation) that an object must be freed before GC finishes (i.e. cannot be postponed to the mutator time) so that the object's memory can be reclaimed immediately after GC. Currently, MMTk always runs the `obj_free` of *all dead objects* during GC. With the intention of "`RUBY_TYPED_FREE_IMMEDIATELY` being an relaxation" in mind, I think that means the cleaning up of those objects can be postponed to mutator time (in a thread without holding GVL), as long as we can extract the `(free_func, data)` for it. The intention is that it is unprofitable to parallelize the invocation of `obj_free` because `free()` doesn't scale (glibc, jemalloc and tcmalloc scales negatively; mimalloc scales only up to 4x speed up. See Table 1 of [this paper](https://wks.github.io/downloads/pdf/ruby-ismm-2025.pdf)), and most applications only spend a small fraction of time on GC, and the fact that CRuby uses GVL leaves much CPU resource vacant during mutator time. p.s. It is interesting to see that some people want to offload the finalization to GC time because of GVL, and others want to offload the finalization to mutator time because it is not profitable to run in parallel during GC due to the scalability of `free()`. But their goals converge because they both want the finalization to be executed at the appropriate time without keeping other threads (mutators or GC workers) waiting. ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117318 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. ---Files-------------------------------- Screenshot_20260513_185452.webp (99.8 KB) Screenshot_20260513_192449.webp (31.9 KB) Screenshot_20260513_192734.webp (22.4 KB) Screenshot_20260513_192910.webp (27.3 KB) -- https://bugs.ruby-lang.org/
participants (4)
-
jhawthorn (John Hawthorn) -
luke-gru (Luke Gruber) -
peterzhu2118 (Peter Zhu) -
wks (Kunshan Wang)