Issue #22067 has been updated by luke-gru (Luke Gruber). Thank you for your input Peter and Kunshan, I greatly appreciate it. As you may know, I am no GC expert :) ### Parallel marking For parallel marking, I agree that the vast majority of all TypedData mark functions are thread-safe. I was involved in a discussion where some others said they've seen problematic mark functions in some extensions (reading/mutating global state, etc.) Basically, it's a "if it can go wrong, it will go wrong" type of situation where user code is concerned when using lots of extensions. But, if it's been working for MMTk of course we could reconsider if we ever get to implementing it. ### Parallel Sweeping Perf We do both concurrent and parallel sweeping, but parallel sweeping does give benefits even when using libc and not another allocator like jemalloc/tcmalloc that can handle parallel calls to `free` well. We only have 1 "worker" thread so this reduces contention, and lots of objects don't require calls to `free`, as they're embedded. I think in the future, many more objects will be embedded as well. CRuby GC has many size classes in its allocator and in the future hopefully it gets large size classes so we can embed even more objects that now require an external allocation. We also want to look into having special pages for objects that require no finalization. ### Lazy Sweeping We're working under the constraints of Ruby's current GC architecture, so it was definitely easiest to keep lazy sweeping. Background (concurrent) sweeping is working out well for us too. Also, CRuby has historically cared a lot about having a lightweight footprint. We couldn't add so many worker threads to CRuby's default GC imo. ### Using define_finalizer Finalizers need to run on the main Ruby thread while the thread holds the GVL. Maybe I'm confused about the terminology you're using. ### Having finalization thread I like this idea and I'm aware that other GCs do it. I also don't think it's orthogonal to having a worker thread, especially when the worker works in parallel with Ruby's GC thread. It could definitely reduce pause time, I think. The downside would be that objects get kept around for an extra cycle after being swept, like you said. We would also need a separate finalization thread for it, and there would have to be synchronization with the mutator since it's putting objects back on the page's freelist. ### Responses
RUBY_TYPED_PARALLEL_FREE_SAFE probably doesn't profit MMTk directly.
That's too bad if that's the case. Peter seems to think it would, although maybe you meant something by __directly__ that I didn't catch. ---------------------------------------- Feature #22067: New TypedData bit to allow the type to be freed in parallel https://bugs.ruby-lang.org/issues/22067#change-117308 * Author: luke-gru (Luke Gruber) * Status: Open ---------------------------------------- CRuby `TypedData` types are used internally in the VM and in C extensions and currently any `free` functions for these types are run with the VM lock held as well as the VM barrier (or, in the case of MMTk, are not freed in parallel). In short, no other Ruby threads or GC threads can run while these `free` functions are called. In order to allow one or more worker GC threads to free these `TypedData` objects, we need a way to specify that a `TypedData` type can be freed in parallel alongside the Ruby GC thread or Ruby code that is being run by another thread. Otherwise, we cannot free these types in the workers and must rely on the Ruby GC thread to do so. This is because it can be unsafe to call these `free` functions, depending on how they're implemented. Most `TypedData` are safe to free in parallel. The exceptions are those `free` functions that read or modify global state without locking. ### Examples ```c static void example_data_free(void *ptr) { st_delete(live_example_datas, (st_data_t*)&ptr, NULL); // Not thread-safe! } ``` If 2 of these `TypedData` objects are freed at the same time, this could corrupt the `st_table`. A lock must be held when manipulating this table. Because Ruby code can also run alongside a sweep worker, the lock must also be held when adding to or iterating this table. ```c rb_nativethread_lock_t example_data_lock; static void example_data_free(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_delete(live_example_datas, (st_data_t*)&ptr, NULL); rb_native_mutex_unlock(&example_data_lock); } static void live_examples_add(void *ptr) { rb_native_mutex_lock(&example_data_lock); st_insert(live_example_datas, (st_data_t)ptr, (st_data_t)ptr); rb_native_mutex_unlock(&example_data_lock); } static void Init_example(void) { rb_native_mutex_initialize(&example_data_lock); } ``` If a CRuby developer or user wants to make their code compaction-safe, they don’t need to worry about parallel sweep workers because the workers don't run during compaction. ### Proposal I propose adding a new flag to `TypedData` that allows both CRuby developers and extension authors to opt-in to allow their `TypedData` type to be freed in parallel. It could be called something like `RUBY_TYPED_PARALLEL_FREE_SAFE`. In our branch where we are developing parallel sweeping, most `TypedData` internal to the VM are given this flag. I believe C extension authors would opt in as well if they see that parallel sweeping gives good performance benefits to Ruby applications. We would need to document what is safe and what is not safe inside these `free` functions for types that are marked with this bit. If the user needs to use a native mutex to protect their `TypedData` from being corrupted when freed in parallel, they must do so. However, if they lock this native mutex in non-free function code paths as well, they may not allocate objects or use `ruby_xmalloc` while this mutex is held. These are some examples of trickiness when it comes to concurrency in the Ruby VM. I believe having a section in the extension documentation about these types of free functions (or elsewhere such as the Concurrency Guide) would give CRuby developers and extension authors more confidence in adding this bit to their types. ### A possible future: parallel marking? If in the future CRuby gets parallel marking, we believe we probably would need another bit for `TypedData` so we can register them as parallel-mark safe. If that’s the case, it’s unfortunate that authors that want this feature would need to update their extension again with a new bit if this bit were to be made public. One could argue that we should have a single bit that indicates safety for both parallel freeing and parallel marking. However, we believe the specifics of what could and couldn’t be executed inside these `free` and `mark` functions would be too hard to work out today for a combined bit without locking us into a specific parallel marking implementation. ### Alternative We could not expose the parallel-free-safe bit to extension authors and only free internal `TypedData` objects in parallel. However, this does slow the current implementation of parallel sweeping down because even if one of these objects is on a heap page, the Ruby GC thread needs to further post-process the page after the sweep thread sweeps it. It would also limit further optimizations to parallel sweeping. ### Details of the bit Worker threads could only free `TypedData` objects that have this new bit set alongside the `RUBY_TYPED_FREE_IMMEDIATELY` bit. Otherwise, the Ruby GC thread must free them. Also, if the type has `0` or `RUBY_TYPED_DEFAULT_FREE`/`RUBY_DEFAULT_FREE` as the free function, it can be freed in parallel. ### Conclusion Parallel sweeping is being actively developed and adding this bit increases performance of the developing implementation of parallel sweeping. It allows `TypedData` types to be freed by worker thread(s), which decreases GC pause times. Since `TypedData` are also exposed to extension authors, we need to whitelist these types for parallel freeing, as not all of these functions are currently safe. By documenting what is and isn’t safe, extension authors can write correct `free` functions that are safe to be freed by worker threads. Alternatively, we could have this bit be internal to CRuby. ---Files-------------------------------- Screenshot_20260513_185452.webp (99.8 KB) Screenshot_20260513_192449.webp (31.9 KB) Screenshot_20260513_192734.webp (22.4 KB) Screenshot_20260513_192910.webp (27.3 KB) -- https://bugs.ruby-lang.org/