[ruby-core:124634] [Ruby Feature#21852] New improved allocator function interface
Issue #21852 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). Are `initialize_copy/initialize_dup/initialize_clone` still called when using a `copy_allocator`? In the `backtrace_alloc` example you seem to already do some copying in that function, which then makes it unclear where the copying should be done.
First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs.
How about using `rb_undef_method(klass, "allocate");` for such cases? I think that's a simple solution and requires no changes. Fundamentally there is a difference between an `allocate` method and an `alloc function`, new/dup/clone all require an `alloc function`, but they should not require (and they don't IIRC) an `allocate` method. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116245 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by byroot (Jean Boussier).
Are initialize_copy/initialize_dup/initialize_clone still called when using a copy_allocator?
Yes, it is unchanged.
In the backtrace_alloc example you seem to already do some copying in that function, which then makes it unclear where the copying should be done.
Indeed. Technically it wouldn't be required, but I think it's more reliable to do it there than in `initialize_copy` as the later could e redefined and cause corruption.
How about using rb_undef_method(klass, "allocate"); for such cases?
It's a corner case, but that allows redefining it later on. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116246 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-2:
Indeed. Technically it wouldn't be required, but I think it's more reliable to do it there than in `initialize_copy` as the later could be redefined and cause corruption.
That could cause leaks if copying state involves extra allocations though, as a previously-existing `initialize_copy` might allocate and just set the pointers, but not free that first copy done in the `copy_allocator`. It makes the contract unclear about what is supposed to copy what. I think we need to trust `initialize_copy`, or invent a new Ruby-level protocol for copying objects. Inventing a new Ruby-level protocol for copying objects and for creation without uninitialized state would be great. Some core classes already do this but then they typically don't support dup/clone. It'd be great to have this in general, so one could write classes that never have to care about uninitialized state since there are no instances in that uninitialized state ever. We'd have some method to do both allocation + initialization at once, and another method to create a copy + initialize it as one call.
How about using rb_undef_method(klass, "allocate"); for such cases?
It's a corner case, but that allows redefining it later on.
I don't think we need to worry about this corner case. Such things are clearly violating internals of the class, and then they might as well `rb_define_alloc_func()` and break it too. But I suppose for core classes there might be a point to try to not segfault in that case, mmh. Maybe classes should have a flag for "allow/disallow Class#allocate" and then `Class#allocate` would check that? There is already a check for singleton classes, so we could merge it with that check for free and just have singleton classes always set that flag to false. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116247 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by byroot (Jean Boussier).
Inventing a new Ruby-level protocol for copying objects and for creation without uninitialized state would be great.
Yes, this is somewhat what this new allocator API does. It also solves the problem that the Ractor API must be able to clone objects but can hardly trust user defined `initialize_copy` methods. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116248 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). It only does it for classes defined in C, and if they do all state copying in `copy_allocator`. I think this would be valuable to have for any class, i.e. also for classes defined in Ruby and not in C. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116249 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). Thinking more about this I think there should be a protocol or an easy way to avoid the allocated-but-uninitialized state completely, which is problematic for classes defined in C but also in Ruby (though Ruby-defined classes will typically raise instead of segfault in such cases, but still the error can be cryptic and it's bad to have to deal with potentially-uninitialized objects). I think we would need only two things: * A method to allocate+initialize at once. This must be defined as a method of `Class`. * There is already `Class#new`, so overriding that seems the simplest. * A method to allocate+initialize_copy at once. * This could be either an instance method, or a class taking an existing object as argument. * Overriding `Kernel#dup` seems the easiest here, and maybe define `Kernel#clone` as `dup` + extra clone logic to simplify. In code: ```ruby module Kernel def clone copy = dup # extra clone logic like freezing end end # Usage class MyClassWithoutUninitializedState ALLOCATE = method(:allocate) private_const :ALLOCATE def self.new(*args) ALLOCATE.call.tap { it.send(:initialize) } end def dup ALLOCATE.call.tap { |copy| copy.send(:initialize_copy, self) } end # Could either: undef_method :allocate # but that still allow using e.g. Class.instance_method(:allocate).bind_call(MyClassWithoutUninitializedState) # Or remove_alloc_function # Same as rb_undef_alloc_func(), this would be a new method of Class # but that probably would break `ALLOCATE.call` above end ``` And similar for classes defined in C. In C it's easy to call the alloc function directly even when not registered as such on the Class. The tricky part of classes defined in Ruby is how to call the alloc function in overridden new/dup and still not expose it to users. So I think a better solution would be a new Class method that defines `new` and `dup` and `clone` while taking a snapshot of the current `alloc_func` and then removes the `alloc_func`, then that would work reliably. Say `Class#safe_initialization` which does something equivalent to (but more performant): ```ruby class Class def safe_initialization alloc_function = rb_get_alloc_func(self) define_singleton_method(:new) do |...| alloc_function.call.tap { it.send(:initialize) } end define_method(:dup) do alloc_function.call.tap { |copy| copy.send(:initialize_dup, self) } end define_method(:clone) do alloc_function.call.tap { |copy| copy.send(:initialize_clone, self) } # extra clone logic end rb_undef_alloc_func(self) return self end end # Usage class MyClassWithoutUninitializedState safe_initialization end ``` And that would work for both classes defined in Ruby and C. The only methods which would see uninitialized instances would be `initialize, initialize_copy, initialize_dup, initialize_clone`, which seems reasonable and also unavoidable anyway (I don't think anyone wants to e.g. write `def self.new = ALLOCATE.call.tap { it.instance_variable_set(:@foo, 42); ... }` instead of `def initialize; @foo = 42; ...; end`). ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116694 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). Going further, `Class#safe_initialization` instead of redefining these 3 methods could just set a new `internal_alloc_func` field in RClass (only used by new/dup/clone and can never be read by anything else) + rb_undef_alloc_func(). And `rb_define_alloc_func()` (as long as the current alloc_func is not NULL, i.e. hasn't been removed) could set that field too so that `new/dup/clone` only need to read that field and not the "public alloc_func" field. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116695 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). I'm aware what I propose doesn't solve `Can't support objects of variable width`, i.e. the case for `Thread::Backtrace`. But AFAIK variable width is only available for core classes (not as public API), and `Thread::Backtrace` is core too so could just define `dup` & `clone` to allocate the right size and use some internal functions to not have to reimplement the rest of dup/clone. Even with #21853 the alloc size would be fixed so it wouldn't need `rb_copy_alloc_func_t`. One clear downside of `rb_copy_alloc_func_t` is performance as it introduces an extra check for new/dup/clone to check whether to use a normal or copy alloc_func. BTW `Thread::Backtrace` is AFAIK never exposed to users, so there might be more direct ways to deal with that. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116696 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by akr (Akira Tanaka). The separation of allocation and initialization is essential to marshal cyclically referenced objects. We can use Marshal to dump and load cyclically referenced objects: ``` x = [] y = [x] x << y z = Marshal.load(Marshal.dump(x)) p z #=> [[[...]]] ``` Marshal.load generates objects using allocate method and initialize them later as (pseudo Ruby code). ``` x = Array.allocate y = Array.allocate x << y y << x ``` This needs that objects can be allocated without any information except the class. So, it is not a good idea to blur the separation between allocation and initialization. But the proposal merges them. It is interesting that variable width allocations and Marshal. I'm not sure that we can support variable width objects with marshal if such objects needs the width for allocation. I guess byroot try to solve a problem with Ractor. However, the problem is not explained well. So, it's difficult to sympathize with the necessity. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116785 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by byroot (Jean Boussier).
This needs that objects can be allocated without any information except the class.
This is unchanged.
But the proposal merges them.
Not really? There is a single function yes, but the function has to check for `UNDEF_P(other)` in all cases. It's a single function because we have to store the function pointer on the class, so storing two pointers is complicated (we have no space spaces in `RClass`).
However, the problem is not explained well.
Please let me know what you don't understand and I will do my best to clarify it. the crux of the problem is that Objects now have variable width, so when duping or cloning an object, the allocator need to be able to inspect the source object to know how large of a slot to allocate. The linked PR does copy right after allocation, but this is in no way a requirement. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116786 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). From the [dev meeting log](https://github.com/ruby/dev-meeting-log/blob/master/2026/DevMeeting-2026-03-...) it seems other agreed there should be no copying done in `backtrace_alloc`, but instead done in `initialize_copy`. For Thread::Backtrace the obvious fix seems to just define `dup` and `clone`, and that can reuse internal helper functions to avoid duplication, as said in https://bugs.ruby-lang.org/issues/21852#note-8. I suppose the main motivation for this issue is to copy Embedded TypedData (#21853)? If the Embedded TypedData has a fixed size (uses `TypedData_Make_Struct()`) then it should be no big issue because the size is fixed per `rb_data_type_t`. But the docs also mention `rb_data_typed_object_zalloc()` and I guess that's the problem? Could the `.dsize` member be used there perhaps? Maybe another way to go is exposing in the C-API functions that do the same as `dup` & `clone`, but work on an already-allocated object. Then the C extension could easily define `dup` & `clone` and just use the correct way to allocate, and it would even work with `rb_undef_alloc_func()`. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116814 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). akr (Akira Tanaka) wrote in #note-10:
The separation of allocation and initialization is essential to marshal cyclically referenced objects.
Thank you for that, I didn't think about it. That would then be only a small tweak for my idea in https://bugs.ruby-lang.org/issues/21852#note-7, where Marshal would also use `internal_alloc_func`. I'll extract that idea to a separate proposal, it's mostly orthogonal to this issue anyway. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116815 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by byroot (Jean Boussier). I feel like I'm repeating the same thing in a loop, so I guess I'm bad at explaining? But here it goes:
it seems other agreed there should be no copying done in backtrace_alloc, but instead done in initialize_copy.
Already answered twice, I did it in my example PR but it's no obligation at all, perhaps I shouldn't have done it, but please forget about it. This is in no way the reason for the requested API.
For Thread::Backtrace the obvious fix seems to just define dup and clone, and that can reuse internal helper functions to avoid duplication, as said in https://bugs.ruby-lang.org/issues/21852#note-8.
Pretty sure I repeated that one many times too. Ractors don't call `#clone/#dup` on objects (they can't trust it). It calls `rb_obj_clone` -> `mutable_obj_clone`: ```c static VALUE mutable_obj_clone(VALUE obj, VALUE kwfreeze) { VALUE clone = rb_obj_alloc(rb_obj_class(obj)); return rb_obj_clone_setup(obj, clone, kwfreeze); } ```
I suppose the main motivation for this issue is to copy Embedded TypedData (#21853)?
Yes, and types of dynamic slot size in general.
If the Embedded TypedData has a fixed size (uses TypedData_Make_Struct()) t
Yes, the happy path is handled fine by the current API. But even with `TypedData_Make_Struct` you can use the same `rb_data_type_t` with different `struct` hence different sizes. It's not very common, but I believe I've seen it in the past.
Could the .dsize member be used there perhaps?
No. `dsize` is meant to report all allocated memory, not just the main struct passed to `TypedData_Make_Struct`. This also assume that `clone/dup` do want to allocate the exact same size as the source object. You could imagine types where this is the opportunity to recompact the object, similar to how `Array#dup` and other core types behave.
Then the C extension could easily define dup & clone and just use the correct way to allocate
This would require to free space in `RClass` for more pointer functions. We're out of space right now. It's also a much larger API for not much gain IMO. ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116816 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
Issue #21852 has been updated by Eregon (Benoit Daloze). I had a call with @byroot and I understand better the proposal now. I think the proposal makes sense, but with the small tweaks proposed in https://bugs.ruby-lang.org/issues/21963#note-8 it would completely avoid allocated-but-uninitialized objects, which I think would be game changer for classes defined in C (never need to worry about objects `allocate`-d but not `initialize`-d, and no need to check for it in every instance method). ---------------------------------------- Feature #21852: New improved allocator function interface https://bugs.ruby-lang.org/issues/21852#change-116875 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When implementing native types with the `TypedData` API, You have to define an allocator function. That function receive the class to allocate and is supposed to return a new instance. ```c /** * This is the type of functions that ruby calls when trying to allocate an * object. It is sometimes necessary to allocate extra memory regions for an * object. When you define a class that uses ::RTypedData, it is typically the * case. On such situations define a function of this type and pass it to * rb_define_alloc_func(). * * @param[in] klass The class that this function is registered. * @return A newly allocated instance of `klass`. */ typedef VALUE (*rb_alloc_func_t)(VALUE klass); ``` ### Current API shortcomings There are a few limitations with the current API. #### Hard to disallow `.allocate` without breaking `#dup` and `#clone`. First, it is frequent for extensions to want to disable `Class#allocate` for their native types via `rb_undef_alloc_func`, as very often allowing uninitialized object would lead to bugs. The problem with `rb_undef_alloc_func` is that the alloc func is also used internally by `dup` and `clone`, so most types that undefine the allocator also prevent object copy without necessarily realizing it. If you want to both disable `Class#allocate` yet still allow copying, you need to entirely implement the `#dup` and `#clone` methods, which is non-trivial and very few types do. One notable exception is `Binding`, which has to implement these two methods: https://github.com/ruby/ruby/blob/bea48adbcacc29cce9536977e15ceba0d65c8a02/p... This works for Ruby code, however it doesn't work with C-level `rb_obj_dup(VALUE)`, as used by the Ractor logic to copy objects across ractors. In the case of `Binding` we probably wouldn't allow it anyway, but for other types it may be a problem. #### Can't support objects of variable width When duping or cloning an object of variable width, you need access to the original object to be able to allocate the right slot size. An example of that is `Thread::Backtrace` objects, as evidenced by [Bug #21818]. To support sending exception objects across ractors, we'd need to make `rb_obj_dup()` work for `Thread::Backtrace`, but to correctly duplicate a backtrace, the allocator needs to know the size. ### Proposed new API I'd like to propose a new API for defining allocators: ```ruby typedef VALUE (*rb_copy_alloc_func_t)(VALUE klass, VALUE other); ``` In addition to the class to allocate, the function also receives the instance to copy. When called by `Class#allocate`, the `other` argument is set to `Qundef`. Example usage: ```c static VALUE backtrace_alloc(VALUE klass, VALUE other) { rb_backtrace_t *bt; if (UNDEF_P(other)) { // Regular alloc return TypedData_Make_Struct(klass, rb_backtrace_t, &backtrace_data_type, bt); } else { // Copy rb_backtrace_t *other_bt; TypedData_Get_Struct(other, rb_backtrace_t, &backtrace_data_type, other_bt); VALUE self = backtrace_alloc_capa(other_bt->backtrace_size, &bt); bt->backtrace_size = other_bt->backtrace_size; MEMCPY(bt->backtrace, other_bt->backtrace, rb_backtrace_location_t, other_bt->backtrace_size); return self; } } ``` ### Backward compatibility Older-style allocator can keep being supported as long as we wish. The one backward potential compatibility concern is third party code that calls `rb_alloc_func_t rb_get_alloc_func(VALUE klass);`. As its documentation suggest, there's not much valid use case for it, but regardless we can keep supporting it by returning a "jump function". See `copy_allocator_adapter`: https://github.com/ruby/ruby/pull/15795/changes#diff-884a5a8a369ef1b4c7597e0... ### Opportunity for more changes? I was discussing this new interface with @ko1 and it appears that the current allocator interface may also be a limitation for Ractors and Ractor local GC. i.e. it might be useful to let the allocator function know that we're copying from one Ractor to another. But I know to little about Ractor local GC to make a proposition here, so I will let @ko1 make suggestions. ### Implementation I implemented this idea in https://github.com/ruby/ruby/pull/15795, to solve [Bug #21818]. It could remain a purely private API, but I think it would make sense to expose it. -- https://bugs.ruby-lang.org/
participants (3)
-
akr (Akira Tanaka) -
byroot (Jean Boussier) -
Eregon (Benoit Daloze)