[ruby-core:124635] [Ruby Feature#21853] Make Embedded TypedData a public API
Issue #21853 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). I'm thinking about this in the context of TruffleRuby, where `RTypedData` never moves (it's allocated via system `calloc()`). I think the best then would be to ignore this new flag entirely, and so the public API should be done in a way that it can be implemented as if it's not embedded. Related: https://github.com/truffleruby/truffleruby/issues/4130 So on TruffleRuby I think we could always use the same allocation for the `RTypedData` + `data` struct, when using `TypedData_Make_Struct()`, effectively the same as embedded TypedData but never moving. But not when using `TypedData_Wrap_Struct()` since that uses an existing data pointer. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116224 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by byroot (Jean Boussier).
So on TruffleRuby I think we could always use the same allocation for the RTypedData + data struct, when using TypedData_Make_Struct(), effectively the same as embedded TypedData but never moving.
I don't think so, because you still need to support `DATA_PTR(obj) = ptr`, which isn't allowed for embedded typed datas. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116225 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). Good point! How do embedded typed datas handle this, do they raise an exception in such a case? Seems tricky given the `DATA_PTR(obj)` API returning a pointer. I'd actually love if we had a separate API for changing the data pointer as a macro or function (e.g. `RTYPEDDATA_SET_DATA(obj, new_data_pointer)` to follow `RTYPEDDATA_GET_DATA`), so we know better when it can be changed. Currently we have to workaround in TruffleRuby that after every native call that accesses a T_DATA we have to check if the data pointer has changed :/ Of course we wouldn't be able to remove `DATA_PTR()` yet, but we could maybe deprecate it and at some make it return a `const` pointer or so to prevent writes. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116241 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by byroot (Jean Boussier).
How do embedded typed datas handle this, do they raise an exception in such a case?
Unfortunately not. It end up with data corruption.
I'd actually love if we had a separate API for changing the data pointer as a macro or function
Makes sense. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116244 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). One tricky aspect about `RUBY_TYPED_EMBEDDABLE` is if in the `struct` there is a pointer to inside that `struct` then those pointers will become invalid when the object is moved. Is there a way to handle that correctly to update such pointers? If the `struct` is ever passed to a native library I would consider it extremely dangerous to use `RUBY_TYPED_EMBEDDABLE`. Overall it sounds quite error-prone, also considering there is no safeguard to avoid writing to `DATA_PTR`, so I'm not sure it's appropriate to expose this to user extensions. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116697 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). Eregon (Benoit Daloze) wrote in #note-5:
also considering there is no safeguard to avoid writing to `DATA_PTR`
One idea to address this (but not the other 2 concerns) would be to raise on `DATA_PTR()` for `RUBY_TYPED_EMBEDDABLE`, and only allow `RTYPEDDATA_GET_DATA()`. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116700 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). Were the various safety concerns mentioned above (https://bugs.ruby-lang.org/issues/21853#note-5, https://bugs.ruby-lang.org/issues/21853#note-6) even discussed in the meeting? It seems not according to the log: https://github.com/ruby/dev-meeting-log/blob/master/2026/DevMeeting-2026-03-... So we are giving a new API to users that can easily cause data corruption (due to writing to DATA_PTR), out-of-bounds access (due to compaction) and subtle GC-related bugs (due to lack of RB_GC_GUARD) and probably has no checks currently against any of those, only docs? Seems a recipe for nasty segfaults to me. cc @ko1 @matz At least we should check the parts we can, I'm adding some comments on the PR: https://github.com/ruby/ruby/pull/16455 ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116813 * Author: byroot (Jean Boussier) * Status: Closed ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by larskanis (Lars Kanis). I found the original PR which added `RUBY_TYPED_EMBEDDABLE` very readable and expressive: https://github.com/ruby/ruby/pull/7440/changes It allowed me understand and use the feature in swig and fxruby without any docs. With the latest changes in https://github.com/ruby/ruby/pull/16455 , https://github.com/ruby/ruby/pull/16509 , https://github.com/ruby/ruby/pull/16518 I think it is easy and clear how to use it safely. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116867 * Author: byroot (Jean Boussier) * Status: Closed ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
Issue #21853 has been updated by Eregon (Benoit Daloze). With the follow-ups PRs for improving the docs and adding a (`ruby-debug`-only since it's `RUBY_ASSERT`) check for RTYPEDDATA_DATA (linked by Lars above), I'm OK with this. It's still a bit dangerous and people using it need to be careful, notably about `RB_GC_GUARD` and not storing pointers to or into the struct anywhere, but there is no way around that to provide this optimization. ---------------------------------------- Feature #21853: Make Embedded TypedData a public API https://bugs.ruby-lang.org/issues/21853#change-116868 * Author: byroot (Jean Boussier) * Status: Closed ---------------------------------------- As part of Ruby 3.3, we added a private `RUBY_TYPED_EMBEDDABLE` flag to the `TypedData` API to allow `TypedData` to use variable width allocation. Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision. This API has both memory and speed benefits as it allow to avoid some `malloc/free` churn, reduce pointer chasing, etc. For instance, when we converted `Time` to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103 I believe numerous third party native extensions could benefit from it (I would certainly make use of it in `ruby/json`), now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1 -- https://bugs.ruby-lang.org/
participants (3)
-
byroot (Jean Boussier) -
Eregon (Benoit Daloze) -
larskanis (Lars Kanis)