[ruby-core:121060] [Ruby master Feature#21140] Add a method to get the address of certain JIT related functions

Issue #21140 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by k0kubun (Takashi Kokubun). +1 This seems like the right approach to me too for the reasons pointed out in the description. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111975 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by nobu (Nobuyoshi Nakada). I think the functions belong to built-in classes can be exported simply, i.e., other than related to VM. I want to manage that list least as possible. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111977 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). nobu (Nobuyoshi Nakada) wrote in #note-2:
I think the functions belong to built-in classes can be exported simply, i.e., other than related to VM. I want to manage that list least as possible.
I understand. I think though, if YJIT uses some function, then I think we should make that function available for 3rd party JITs. Can we use YJIT's bindgen code to generate the list? Then we don't have to specifically maintain a separate list. I made a list of symbols that RJIT uses but are not available: ``` "rb_ary_entry_internal" "rb_ary_tmp_new_from_values" "rb_ary_unshift_m" "rb_ec_ary_new_from_values" "rb_ec_str_resurrect" "rb_ensure_iv_list_size" "rb_fix_aref" "rb_fix_div_fix" "rb_fix_mod_fix" "rb_fix_mul_fix" "rb_get_symbol_id" "rb_gvar_get" "rb_hash_new_with_size" "rb_hash_resurrect" "rb_obj_as_string_result" "rb_reg_new_ary" "rb_str_bytesize" "rb_str_concat_literals" "rb_str_eql_internal" "rb_str_getbyte" "rb_sym_to_proc" "rb_vm_bh_to_procval" "rb_vm_concat_array" "rb_vm_defined" "rb_vm_get_ev_const" "rb_vm_getclassvariable" "rb_vm_ic_hit_p" "rb_vm_opt_newarray_hash" "rb_vm_opt_newarray_max" "rb_vm_opt_newarray_min" "rb_vm_opt_newarray_pack" "rb_vm_set_ivar_id" "rb_vm_setclassvariable" "rb_vm_setinstancevariable" "rb_vm_splat_array" "rb_vm_throw" "rb_vm_yield_with_cfunc" ``` We could probably export many of these functions, but I guess there are a significant number of `rb_vm_*` functions. If we could reuse YJIT's bindgen code, that might make maintenance easier. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111987 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). As an example, Both RJIT and YJIT use `rb_ary_entry_internal`. YJIT solves this by wrapping the function [here](https://github.com/ruby/ruby/blob/ba148e71e590d057d930681ae9c93450b9cfef96/y...) (as `rb_yarv_ary_entry_internal`), then it adds the function to bindgen [here](https://github.com/ruby/ruby/blob/ba148e71e590d057d930681ae9c93450b9cfef96/y...). We could change RJIT to use `rb_yarv_ary_entry_internal`, and also use YJIT's bindgen code to generate `RubyVM::RJIT.address_of(:rb_yarv_ary_entry_internal)`. Then we don't have to maintain a specific list of symbols since YJIT's bindgen code must be updated when YJIT needs to update. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111988 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by Eregon (Benoit Daloze). IMO it's better to have them through `RubyVM::RJIT.address_of` than exporting them, because that way it's very clear they are not part of the public Ruby C API. And so e.g. it's expected that TruffleRuby does not expose these internal functions and that they should only be used for RJIT purposes. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111991 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). I made a patch [here](https://github.com/tenderlove/ruby/pull/new/rjit-addr). It generates a function based on YJIT's bindgen file. Whatever functions YJIT exposes in bindgen are also available via the API. I named the method: ```ruby RubyVM::Internals.address_of(:rb_vm_ci_argc) ``` I don't have any particular opinion on what the name should be, but I think `address_of` makes sense. Also I chose `RubyVM::Internals` because I don't think it should be specific to RJIT (I would like to use this in my own JIT compilers). ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-111992 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert). I'm skeptical of the idea of having third-party JITs as gems. This is exposing a ton of internal APIs that were not previously exposed, which could be potentially problematic if people start to rely on them. You have to think that random gems that are not actually JITs could begin to use these APIs. I can't stop you from making this change, but Ruby has a history of merging new features too fast without carefully considering the full implications. This is going to sound cynical, but Ruby is not your personal side-project, it's a piece of software that millions of people rely on. If you want a playground to build a JIT and have fun, why not build your own implementation of Lox from Crafting Interpreters or fork an existing one? I'm sorry if this sounds harsh, but I think we all need to ponder merging big changes really carefully. You too should at least try to play the devil's advocate here. What are the downsides? My two biggest concerns: 1. The additional maintenance burden of random gems relying on internal APIs they shouldn't rely on. Think of the JIT challenges we run into with people abusing binding now. The Ruby public API surface is already too big imo. 2. What does this mean for security? If you have access to these APIs from Rubyland you can potentially take control of the Ruby VM. Is access to these internal APIs restricted somehow? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112023 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). maximecb (Maxime Chevalier-Boisvert) wrote in #note-8:
I'm skeptical of the idea of having third-party JITs as gems. This is exposing a ton of internal APIs that were not previously exposed, which could be potentially problematic if people start to rely on them. You have to think that random gems that are not actually JITs could begin to use these APIs.
These APIs are already exposed via RJIT in current releases. Since we've extracted RJIT as a gem, I don't think RJIT can work without access to these.
I can't stop you from making this change, but Ruby has a history of merging new features too fast without carefully considering the full implications. This is going to sound cynical, but Ruby is not your personal side-project, it's a piece of software that millions of people rely on.
I think it can be both a side project as well as a piece of software that millions of people rely on. That's been at the core of the culture of the global Ruby community since its inception. I think for many of us on the Ruby-core team, it is a side-project, and I don't think it's right to take that aspect away. What Kokubun and I are trying to achieve is to give people a way to experiment with the language in ways you may not be able to imagine right now.
If you want a playground to build a JIT and have fun, why not build your own implementation of Lox from Crafting Interpreters or fork an existing one? I'm sorry if this sounds harsh, but I think we all need to ponder merging big changes really carefully. You too should at least try to play the devil's advocate here. What are the downsides?
My two biggest concerns: 1. The additional maintenance burden of random gems relying on internal APIs they shouldn't rely on. Think of the JIT challenges we run into with people abusing binding now. The Ruby public API surface is already too big imo.
I think it's important we document that this API is unstable / unreliable. That is why I called it `RubyVM::Internals` to try to indicate how private it is. Additionally it's an API that just returns an integer, so using this API is particularly hard.
2. What does this mean for security? If you have access to these APIs from Rubyland you can potentially take control of the Ruby VM. Is access to these internal APIs restricted somehow?
I don't think this change has any impact with regard to security. This information can be recovered via `dlsym` or parsing ELF / DWARF. This change just makes access somewhat easier. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112025 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert).
I think it can be both a side project as well as a piece of software that millions of people rely on.
I apologize for the tone of my post which was rather hostile. I woke up with a pretty bad headache this morning and was in a grumpy mood. My main point is that I am afraid that things get merged into Ruby without fully weighing the implications eg Ractors. This was merged because of the enthusiasm of one specific core dev, but it's been in non-working state until recently. I am sure that your API will work but as we were discussing in the YJIT meeting, we have a problem where C extensions already have access to lots of things which they really shouldn't have direct access to. We should make sure to tell Ruby extension developers "this is a JIT API and there are no stability guarantees between Ruby versions". This should be made 1000% clear in the documentation with **ALL CAPS AND BOLD FONTS** but there is still a risk that people will abuse it and some gems could break. We can say "oh well, their fault, they were stupid", but imagine if 5 years from now we make some CRuby change and 3 gems that Shopify depends on blow up. What do we do then? We'd have no choice but to roll back those CRuby changes and it could stall CRuby development in some areas. My recommendation: guard this API behind a special configure flag that is separate from YJIT's. Something like `--enable-jit-gem-api`. That way you get to have your cake and eat it too. Ruby devs can build a custom Ruby and do anything they want with it. You get to write your own Ruby JIT gem and embed it into your IoT toaster. You can even build your own Ruby and deploy it into production at your startup if you want to, but you also effectively shield the rest of Ruby users from security and avoid Ruby gems that have no JIT needs becoming dependent on this API. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112026 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by ufuk (Ufuk Kayserilioglu). maximecb (Maxime Chevalier-Boisvert) wrote in #note-10:
if 5 years from now we make some CRuby change and 3 gems that Shopify depends on blow up. What do we do then?
We would fix them forward. That's why we have our daily Ruby-head CI running, so that we can catch these kinds of changes as early as possible and fix any code that needs changing. This is something we've been doing for the last 3 years and we've fixed many similar incompatibilities in our codebase and/or our dependencies as appropriate.
My recommendation: guard this API behind a special configure flag that is separate from YJIT's. Something like `--enable-jit-gem-api`.
I think that would result in RJIT and any other experimental JIT related project being completely irrelevant and kill any kind of experimentation on the platform. Ruby has always been a language of folks running with scissors, and I don't think we should stop doing that now. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112028 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert).
We would fix them forward. That's why we have our daily Ruby-head CI running, so that we can catch these kinds of changes as early as possible and fix any code that needs changing.
You would't know about the gems that might break until someone actually merges the change into CRuby. At this point, you would be forced to delay deploying new versions of CRuby to Core/SFR until the gem issues are fixed. This would mean either changing our code to not use the broken gem (unknown time / difficulty), or trying to fork the gem or get the author to fix it (unknown time / difficulty). Either way, it's another small crisis that we need to handle and we potentially waste a significant amount of time on.
I think that would result in RJIT and any other experimental JIT related project being completely irrelevant and kill any kind of experimentation on the platform.
No it wouldn't. YJIT was guarded by a configure option at the beginning. Building a custom Ruby is not that difficult for people to do. If you want a slightly less difficult option, then I would say make it a command-line flag. Still has largely the same benefits but it saves you the building a custom Ruby step.
Ruby has always been a language of folks running with scissors, and I don't think we should stop doing that now.
So people have made bad decisions in the past, and there have been negative consequences, eg broken Ractors negatively impacting Ruby's credibility as a viable language... But we should keep making bad decisions with predictable outcomes because there is a precedent for doing so? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112029 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by Eregon (Benoit Daloze). Would this API be needed for e.g. a JIT in the FFI gem? (https://railsatscale.com/2025-02-12-tiny-jits-for-a-faster-ffi/) Looking at the PoC I'm unsure but I think not. I think it's a good idea to have this API not accessible by default because it is deep internals which are meant for experimentation and nothing else. If there is a convincing non-experimental use case that needs it, then it's a feature it's not enabled by default, because we would actually want to re-discuss and potentially expose the needed parts properly.
This information can be recovered via dlsym
Why not just use dlsym() then? (e.g. after `dlopen(NULL, flags)`) It seems basically equivalent to this API (it also takes a function name and returns an integer), and anyway one needs to ability to call native functions to use this API. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112032 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). Eregon (Benoit Daloze) wrote in #note-13:
Would this API be needed for e.g. a JIT in the FFI gem? (https://railsatscale.com/2025-02-12-tiny-jits-for-a-faster-ffi/) Looking at the PoC I'm unsure but I think not.
It's hard to know for sure until we finish it 😅
I think it's a good idea to have this API not accessible by default because it is deep internals which are meant for experimentation and nothing else. If there is a convincing non-experimental use case that needs it, then it's a feature it's not accessible by default, because we would actually want to re-discuss and potentially expose the needed parts properly.
I don't think providing the method _hurts_ anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
This information can be recovered via dlsym
Why not just use dlsym() then? (e.g. after `dlopen(NULL, flags)`) It seems basically equivalent to this API (it also takes a function name and returns an integer), and anyway one needs to ability to call native functions to use this API.
Many of the functions would work, but the list of functions I provided above do not. Their symbols are not available via `dlsym`. They can be recovered via DWARF, but from my experience with TenderJIT v1, it's a _huge_ pain. ``` [aaron@tc-lan-adapter ~]$ ruby -v -r fiddle -e'p Fiddle::Handle::DEFAULT["rb_vm_throw"]' ruby 3.5.0dev (2025-02-14T21:16:53Z master ba148e71e5) +PRISM [arm64-darwin24] -e:1:in 'Fiddle::Handle#[]': unknown symbol "rb_vm_throw" (Fiddle::DLError) from -e:1:in '<main>' [aaron@tc-lan-adapter ~]$ ruby -v -r fiddle -e'p Fiddle::Handle::DEFAULT["rb_shape_id"]' ruby 3.5.0dev (2025-02-14T21:16:53Z master ba148e71e5) +PRISM [arm64-darwin24] 4306064836 ``` Both `rb_vm_throw` and `rb_shape_id` are listed in the YJIT bindgen file, but only one symbol is visible via `dlsym`. If all of the symbols were available via `dlsym` then I would not propose this API. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112033 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert).
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
People do lots of things they shouldn't do though. What do you think about the idea of guarding it behind a Ruby command-line argument such as `--enable-jit-gem-api` or something similar? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112034 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). maximecb (Maxime Chevalier-Boisvert) wrote in #note-15:
I don't think providing the method hurts anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
People do lots of things they shouldn't do though.
What do you think about the idea of guarding it behind a Ruby command-line argument such as `--enable-jit-gem-api` or something similar?
I'd rather not add more flags and conditionals. But if that's what it takes to add this API then I would do it. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112035 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by Eregon (Benoit Daloze). tenderlovemaking (Aaron Patterson) wrote in #note-14:
I don't think providing the method _hurts_ anything. It's experimental and unstable, so people who build anything with it should understand the risks of depending on it.
I used to be of that opinion but clearly time has shown that people or gems have abused RubyVM too much already. A prime example of that is quite a few gems depend or depended on RubyVM::AbstractSyntaxTree, even though it was marked as experimental/unstable/etc (e.g the order of node children could change incompatibly at any time, and no way to access by name). From my POV it took several years of work on Prism, making it the official API for parsing Ruby, and then migrating those gems to Prism to clean that mess. I think RubyVM::AbstractSyntaxTree should have never been available by default, it should have been behind a configure flag or so, as a debug/research tool (or only as text output, like `--dump=parsetree`). A runtime flag might have been enough (I'm not sure, it feels more risky) to discourage using it for anything but experiments. Similar for RubyVM::InstructionSequence, it would be possible to design an API which is not tight to CRuby bytecode but can support serializing & deserializing a binary form of source code. But because RubyVM::InstructionSequence exists it will probably never be attempted. Things like getting the start/end column and end line got delayed by years because there were workarounds with `RubyVM`, which other Ruby implementations must not implement to not break more code. I can see how this request is not really of the same scope, but it is asking to expose deep internals of the VM like non-exported (`static`) functions. Those are likely to change so it feels wrong to expose them in any way. When RJIT was part of core it was more OK to use such internals as it could be evolved with CRuby, but that's no longer the case so it should use more stable APIs/rely less on internals. How about reviewing the functions RJIT really needs, and those it could work around? And then maybe having the ones really needed as exported functions (e.g. without declaration in header to make it not too easy) or static inline (in a separate header to make it clear it's not part of the Ruby C API). BTW for `static inline` functions those can easily be accessed with a C extension, e.g. `rb_ary_entry_internal()` in https://github.com/ruby/ruby/blob/27ba268b75bbe461460b31426e377b42d4935f70/i.... ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112036 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by tenderlovemaking (Aaron Patterson). Eregon (Benoit Daloze) wrote in #note-17:
How about reviewing the functions RJIT really needs, and those it could work around?
This seems like a lot of work. Are you volunteering? 😜 The functions listed in RJIT's bindgen file are very similar to YJIT's, which is no surprise as RJIT is based on YJIT. AFAICT, they use the same functions, just that YJIT wraps some (as I mentioned [here](https://bugs.ruby-lang.org/issues/21140#note-5)).
And then maybe having the ones really needed as exported functions (e.g. without declaration in header to make it not too easy) or static inline (in a separate header to make it clear it's not part of the Ruby C API).
This seems reasonable, but I'm worried about getting bogged down debating about each function and whether or not it is "really needed". That's why I wanted to use the set that RJIT/YJIT use already. I assume it's not just using the function "for fun" and actually has a purpose. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112037 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert).
That's why I wanted to use the set that RJIT/YJIT use already. I assume it's not just using the function "for fun" and actually has a purpose.
It's tough because the set of functions a JIT needs tends to grow over time. For example, YJIT never did much to optimize hash access, but in the future we would probably want to do that, so we would need to expose more functions. As such I don't think it makes sense to have some kind of a fixed list. Benoit is making the same argument I've made, which is that it's fragile to expose such a large set of internal functions without any safeguards. I think this point is 100% valid. I know that you would prefer no safeguards because it seems more convenient slash fun to play with in the short term, but IMO it's only reasonable to ask for either a configure flag or a command-line flag. It probably should be a configure flag. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112040 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by Eregon (Benoit Daloze). Thinking a bit more about this, how is RJIT going to work since on master it seems everything related to it has been removed? For example when looking at the [FFI JIT PoC](https://gist.github.com/tenderworks/f4cbb60f2c0dc3ab334eb73fec36f702) it uses `RubyVM::RJIT::C` (which is only defined when passing `--rjit` on the command line BTW) but that no longer exists at all on master. `Primitive.cexpr!()` (and Primitives in general) as used [here](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/r...) is not available outside of core, and so AFAIK there won't be any way to find out struct offsets of `rb_control_frame_t`. BTW exposing struct offsets is IMO more dangerous and risky than the list of internal functions above. It reminds me of old debugger gems which used to copy internal headers from some given CRuby version, that's very brittle. IOW, I think it would be good to figure out how RJIT is going to work as a gem, I suppose right now it cannot work as a gem. Maybe some part of it not depending on Fiddle should stay in core? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112046 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by maximecb (Maxime Chevalier-Boisvert). Correct me if I'm wrong: for struct offsets I'm assuming they will have to parse C header files or DWARF files? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-112048 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by matz (Yukihiro Matsumoto). I accept the concept. In the developers' meeting, RubyVM::Internals.address_of() was proposed. And it looks good to me. Matz. ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-114125 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/

Issue #21140 has been updated by nobu (Nobuyoshi Nakada). If we introduce this method, will yjit/bindgen also use it? ---------------------------------------- Feature #21140: Add a method to get the address of certain JIT related functions https://bugs.ruby-lang.org/issues/21140#change-114126 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Feature #21116 extracted RJIT as a gem. But RJIT accesses certain internal functions which it cannot access as a gem. For example it used the `rb_str_bytesize` function, but this symbol is not exported, so we cannot access it (even from a C extension). Instead of exporting these symbols, I would like to propose an API for getting access to their addresses in Ruby. For example ```ruby RubyVM::RJIT.address_of(:rb_str_bytesize) # => 123456 ``` I would like to limit the addresses to [this list](https://github.com/ruby/ruby/blob/f32d5071b7b01f258eb45cf533496d82d5c0f6a1/t...) which are the ones required by RJIT. -- https://bugs.ruby-lang.org/
participants (7)
-
Eregon (Benoit Daloze)
-
k0kubun (Takashi Kokubun)
-
matz (Yukihiro Matsumoto)
-
maximecb (Maxime Chevalier-Boisvert)
-
nobu (Nobuyoshi Nakada)
-
tenderlovemaking (Aaron Patterson)
-
ufuk (Ufuk Kayserilioglu)