- ruby-core - ml.ruby-lang.org

[ruby-core:116039] [Ruby master Bug#20154] aarch64: configure overrides `-mbranch-protection` if it was set in CFLAGS via environment

by jprokop (Jarek Prokop)

Issue #20154 has been reported by jprokop (Jarek Prokop). ---------------------------------------- Bug #20154: aarch64: configure overrides `-mbranch-protection` if it was set in CFLAGS via environment https://bugs.ruby-lang.org/issues/20154 * Author: jprokop (Jarek Prokop) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Recently a GH PR was merged <https://github.com/ruby/ruby/pull/9306> For PAC/BTI support on ARM CPUs for Coroutine.S. Without proper compilation support in configure.ac it segfaults Ruby with fibers on CPUs where PAC is supported: https://bugs.ruby-lang.org/issues/20085 At the time of writing, configure.ac appends the first option from a list for flag `-mbranch-protection` that successfully compiles a program <https://github.com/ruby/ruby/blob/master/configure.ac#L829>, to XCFLAGS and now also ASFLAGS to fix issue 20085 for Ruby master. This is suboptimal for Fedora as we set -mbranch-protection=standard by default in C{,XX}FLAGS: ``` CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer ' export CFLAGS CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer' export CXXFLAGS ``` And the appended flag overrides distribution's compilation configuration, which in this case ends up omitting BTI instructions and only using PAC. Would it make sense to check if such flags exist and not overwrite them if they do? Serious proposals: 1. Simplest fix that does not overwrite what is set in the distribution and results in higher security is simply prepending the list of options with `-mbranch-protection=standard`, it should cause no problems on ARMv8 CPUs and forward, BTI similarly to PAC instructions result into NOP, it is only extending the capability. See attached 0001-aarch64-Check-mbranch-protection-standard-first-to-u.patch 2. Other fix that sounds more sane IMO and dodges this kind of guessing where are all the correct places for the flag is what another Fedora contributor Florian Weimer suggested: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org… "The reliable way to do this would be to compile a C file and check whether that enables __ARM_FEATURE_PAC_DEFAULT, and if that's the case, define a *different* macro for use in the assembler implementation. This way, you don't need to care about the exact name of the option." IOW instead of using __ARM_FEATURE_* directly in that code, define a macro in the style of "USE_PAC" with value of the feature if it is defined, I think that way we shouldn't need to append ASFLAGS anymore. However it's also important to catch the value of those macros as their values have meaning, I have an idea how to do that but I'd get on that monday earliest. ---Files-------------------------------- 0001-aarch64-Check-mbranch-protection-standard-first-to-u.patch (1004 Bytes) -- https://bugs.ruby-lang.org/

7 hours, 44 minutes

4
4
0 0

[ruby-core:117905] [Ruby master Bug#20493] Segfault on rb_io_getline_fast

by josegomezr (Jose Gomez)

Issue #20493 has been reported by josegomezr (Jose Gomez). ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493 * Author: josegomezr (Jose Gomez) * Status: Open * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/b… ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-p… ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3… [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:conta… [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+… -- https://bugs.ruby-lang.org/

9 hours, 55 minutes

2
7
0 0

[ruby-core:114070] [Ruby master Bug#19753] IO::Buffer#get_string can't handle negative offset

by noteflakes (Sharon Rosner)

Issue #19753 has been reported by noteflakes (Sharon Rosner). ---------------------------------------- Bug #19753: IO::Buffer#get_string can't handle negative offset https://bugs.ruby-lang.org/issues/19753 * Author: noteflakes (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 3.2 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby irb(main):001:0> b = IO::Buffer.for('abc') => #<IO::Buffer 0x00007f858f5450c0+3 EXTERNAL READONLY SLICE> ... irb(main):002:0> b.get_string(-1) => "\x00abc" irb(main):003:0> b.get_string(-1000, 3) (irb):3:in `get_string': Specified offset+length exceeds data size! (ArgumentError) from (irb):3:in `<main>' from /home/sharon/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/irb-1.7.1/exe/irb:9:in `<top (required)>' from /home/sharon/.rbenv/versions/3.2.0/bin/irb:25:in `load' from /home/sharon/.rbenv/versions/3.2.0/bin/irb:25:in `<main>' ``` Using a negative offset returns garbage in the string but it also might segfault: ```ruby irb(main):003:0> b = IO::Buffer.map(File.open('sgt-nodes.sql', 'r+')) => #<IO::Buffer 0x00007f189de14000+2008858 EXTERNAL MAPPED SHARED> irb(main):004:0> b.get_string(-1000) (irb):4: [BUG] Segmentation fault at 0x00007f189de13c18 ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] -- Control frame information ----------------------------------------------- c:0021 p:---- s:0109 e:000108 CFUNC :get_string ... ``` ## Expected behaviour I think it might be nice to have `#get_string` behave like other methods taking an offset, like `String#[]`. For example: ```ruby irb(main):001:0> b = IO::Buffer.for('abc') => #<IO::Buffer 0x00007f858f5450c0+3 EXTERNAL READONLY SLICE> ... irb(main):002:0> b.get_string(-1) => "c" irb(main):003:0> b.get_string(-2) => "bc" irb(main):003:0> b.get_string(-1000) => "abc" irb(main):003:0> b.get_string(-1000, 2) => "ab" ``` -- https://bugs.ruby-lang.org/

11 hours, 33 minutes

5
9
0 0

[ruby-core:117901] [Ruby master Feature#20492] Debug option for tempfile

by hadmut (Hadmut Danisch)

Issue #20492 has been reported by hadmut (Hadmut Danisch). ---------------------------------------- Feature #20492: Debug option for tempfile https://bugs.ruby-lang.org/issues/20492 * Author: hadmut (Hadmut Danisch) * Status: Open ---------------------------------------- Hi, the ruby lib tempfile is quite useful, but since it always deletes files once the object is garbage collected or the program terminates (or the program explicitely asks to remove the file) it is difficult to debug programs and to check the file contents, after program termination. Replacing all tempfile uses with regular file operations is awkward. It therefore would be useful, if file deletion of tempfiles could be completely turned of e.g. through an environment variable or by the program itself, like through a --debug option or when catching errors, like deleting all files during normal program termination, but not if there's a runtime error. regards -- https://bugs.ruby-lang.org/

12 hours, 26 minutes

2
1
0 0

[ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public

by ms-tob (Matt S)

Issue #20448 has been reported by ms-tob (Matt S). ---------------------------------------- Feature #20448: Make coverage event hooking C API public https://bugs.ruby-lang.org/issues/20448 * Author: ms-tob (Matt S) * Status: Open ---------------------------------------- # Abstract Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API. # Background Ruby currently provides a number of avenues for hooking events *or* gathering coverage information: 1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module 2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module 3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hoo… extension function Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events. # Proposal The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete. The good news is that much of this functionality already exists, but it's part of the private, internal-only C API. 1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184 a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-… 2. Allow initializing global coverage state so that coverage tracking can be fully enabled a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120. b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state. I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`: ```c #include <ruby.h> #include <ruby/debug.h> #define RUBY_EVENT_COVERAGE_BRANCH 0x020000 // ... rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH; rb_event_hook_flag_t flags = ( RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG ); rb_add_event_hook2( (rb_event_hook_func_t) event_hook_branch, events, counter_hash, flags ); ``` If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features. The above two points would be requirements for this functionality, but there's an additional nice-to-have: 3. Extend the public `tracearg` functionality to include additional coverage information a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event. # Use cases This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided…. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L…. So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution. We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9 # Discussion Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal. The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case. # See also - https://bugs.ruby-lang.org/issues/20282 - https://github.com/google/atheris - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html - https://github.com/CodeIntelligenceTesting/jazzer/ - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer - https://go.dev/doc/security/fuzz/ -- https://bugs.ruby-lang.org/

1 day

2
7
0 0

[ruby-core:116460] [Ruby master Bug#20218] aset/masgn/op_asgn with keyword arguments

by jeremyevans0 (Jeremy Evans)

Issue #20218 has been reported by jeremyevans0 (Jeremy Evans). ---------------------------------------- Bug #20218: aset/masgn/op_asgn with keyword arguments https://bugs.ruby-lang.org/issues/20218 * Author: jeremyevans0 (Jeremy Evans) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I found that use of keyword arguments in multiple assignment is broken in 3.3 and master: ```ruby h = {a: 1} o = [] def o.[]=(*args, **kw) replace([args, kw]) end # This segfaults as RHS argument is not a hash o[1, a: 1], _ = [1, 2] # This passes the RHS argument as keywords to the method, treating keyword splat as positional argument o[1, **h], _ = [{b: 3}, 2] o # => [[1, {:a=>1}], {:b=>3}] ``` Before 3.3, keyword arguments were treated as positional arguments. This is similar to #19918, but for keyword arguments instead of block arguments. @matz indicated he wanted to prohibit block arguments in aset/masgn and presumably also op_asgn (making them SyntaxErrors). Can we also prohibit keyword arguments in aset/masgn/op_asgn? Note that aset treats keyword arguments as regular arguments: ```ruby o[1, a: 1] = 2 o # => [[1, {:a=>1}, 2], {}] o[1, **h] = {b: 3} o # => [[1, {:a=>2}, {:b=>3}], {}] ``` While op_asgn treats keyword arguments as keywords: ```ruby h = {a: 2} o = [] def o.[](*args, **kw) concat([:[], args, kw]) x = Object.new def x.+(v) [:x, v] end x end def o.[]=(*args, **kw) concat([:[]=, args, kw]) end o[1, a: 1] += 2 o # => [:[], [1], {:a=>1}, :[]=, [1, [:x, 2]], {:a=>1}] o.clear o[1, **h] += {b: 3} o # => [:[], [1], {:a=>2}, :[]=, [1, [:x, {:b=>3}]], {:a=>2}] ``` -- https://bugs.ruby-lang.org/

1 day, 7 hours

6
5
0 0

[ruby-core:117431] [Ruby master Misc#20407] Question about applying encoding modifier to an interpolated Regexp

by andrykonchin (Andrew Konchin)

Issue #20407 has been reported by andrykonchin (Andrew Konchin). ---------------------------------------- Misc #20407: Question about applying encoding modifier to an interpolated Regexp https://bugs.ruby-lang.org/issues/20407 * Author: andrykonchin (Andrew Konchin) * Status: Open ---------------------------------------- I am wondering how Regexp encoding modifiers (u, s, e, n) interfere in encoding negotiation of interpolated Regexp literal. Examples #1 ```ruby # encoding: us-ascii # Unicode: Ф - U+0424 # windows-1251: Ф - 0xD4 # without encoding modifier puts /a #{ "\xd4".force_encoding("windows-1251") } c/.encoding # Windows-1251 puts /a #{ "b".encode("windows-1251") } c/.encoding # US-ASCII puts /a #{ "\u0424".force_encoding("UTF-8") } c/.encoding # UTF-8 puts /a #{ "\xc2\xa1".b } c/.encoding # ASCII-8BIT # with encoding modifier puts /a #{ "\xd4".force_encoding("windows-1251") } c/e.encoding # Windows-1251 puts /a #{ "b".encode("windows-1251") } c/e.encoding # EUC-JP puts /a #{ "\u0424".force_encoding("UTF-8") } c/e.encoding # UTF-8 puts /a #{ "\xc2\xa1".b } c/e.encoding # ASCII-8BIT # string literals concatenation puts ("a" + "\xd4".force_encoding("windows-1251") + "c").encoding # Windows-1251 puts ("a" + "b".encode("windows-1251") + "c").encoding # US-ASCII puts ("a" + "\u0424".force_encoding("UTF-8") + "c").encoding # UTF-8 puts ("a" + "\xc2\xa1".b + "c").encoding # ASCII-8BIT ``` Example #2 ```ruby # encoding: utf-8 # windows-1251: Ф - 0xD4 # unicode: Ф - U+0424 # without encoding modifier puts /a #{ "\xd4".force_encoding("windows-1251") } c/.encoding # Windows-1251 puts /a #{ "b".encode("windows-1251") } c/.encoding # US-ASCII puts /a #{ "\u0424".force_encoding("UTF-8") } c/.encoding # UTF-8 puts /a #{ "\xc2\xa1".b } c/.encoding # ASCII-8BIT # with encoding modifier puts /a #{ "\xd4".force_encoding("windows-1251") } c/e.encoding # Windows-1251 puts /a #{ "b".encode("windows-1251") } c/e.encoding # EUC-JP puts /a #{ "\u0424".force_encoding("UTF-8") } c/e.encoding # UTF-8 puts /a #{ "\xc2\xa1".b } c/e.encoding # ASCII-8BIT # string literals concatenation puts ("a" + "\xd4".force_encoding("windows-1251") + "c").encoding # Windows-1251 puts ("a" + "b".encode("windows-1251") + "c").encoding # UTF-8 puts ("a" + "\u0424".force_encoding("UTF-8") + "c").encoding # UTF-8 puts ("a" + "\xc2\xa1".b + "c").encoding # ASCII-8BIT ``` In the examples above the `e` modifier changes Regexp's encoding only in one case when Regexp's encoding would be `US-ASCII` without the modifier: ```ruby # encoding: us-ascii puts /a #{ "b".encode("windows-1251") } c/.encoding # US-ASCII puts /a #{ "b".encode("windows-1251") } c/e.encoding # EUC-JP ``` ```ruby # encoding: utf-8 puts /a #{ "b".encode("windows-1251") } c/.encoding # US-ASCII puts /a #{ "b".encode("windows-1251") } c/e.encoding # EUC-JP ``` And the `e` modifier doesn't change Regexp's final encoding in all the other cases either Regexp's encoding without modifier is a file source encoding or `ASCII-8BIT`. Looking at the following example: ```ruby # encoding: us-ascii # without modifier p /\xc2\xa1 #{ "a" }\xc2\xa1/.encoding # ASCII-8BIT p /a #{ "\xc2\xa1".force_encoding("EUC-JP") } b/.encoding # EUC-JP p /a #{ "\xc2\xa1".b } b/.encoding # ASCII-8BIT # with modifier p /\xc2\xa1 #{ "a" }\xc2\xa1/e.encoding # EUC-JP p /a #{ "\xc2\xa1".force_encoding("EUC-JP") } b/e.encoding # EUC-JP p /a #{ "\xc2\xa1".b } b/e.encoding # ASCII-8BIT ``` we can notice that the `e` modifier change `ASCII-8BIT` to `EUC-JP` in the first case but doesn't in the third one. So I assume that the `e` modifier could be applied to the Regexp fragments (`\xc2\xa1` and `\xc2\xa1`) before encoding negotiation and not to the whole result after negotiation. Could you please clarify how it works? -- https://bugs.ruby-lang.org/

1 day, 21 hours

2
1
0 0

[ruby-core:117765] [Ruby master Feature#20470] Extract Ruby's Garbage Collector

by peterzhu2118 (Peter Zhu)

Issue #20470 has been reported by peterzhu2118 (Peter Zhu). ---------------------------------------- Feature #20470: Extract Ruby's Garbage Collector https://bugs.ruby-lang.org/issues/20470 * Author: peterzhu2118 (Peter Zhu) * Status: Open ---------------------------------------- # Extract Ruby's Garbage Collector ## Background As described in [[Feature #20351]](https://bugs.ruby-lang.org/issues/20351), we are working on the ability to plug alternative garbage collector implementations into Ruby. Our goal is to allow developers and researchers to create and experiment with new implementations of garbage collectors in Ruby in a simplified way. This will also allow experimentation with different GC implementations in production systems so users can choose the best GC implementation for their workloads. ## Implementation GitHub PR: [10721](https://github.com/ruby/ruby/pull/10721) In this patch, we have split the current `gc.c` file into two files: `gc.c` and `gc_impl.c`. `gc.c` now only contains code not specific to Ruby GC. This includes code to mark objects (which the GC implementation may choose not to use) and wrappers for internal APIs that the implementation may need to use (e.g. locking the VM). `gc_impl.c` now contains the implementation of Ruby's GC. This includes marking, sweeping, compaction, and statistics. Most importantly, `gc_impl.c` only uses public APIs in Ruby and a limited set of functions exposed in `gc.c`. This allows us to build `gc_impl.c` independently of Ruby and plug Ruby's GC into itself. ## Demonstration After [checking out the branch](https://github.com/ruby/ruby/pull/10721), we can first configure with `--with-shared-gc`: ```bash $ ./configure --with-shared-gc ... $ make -j ... ``` Let's now change the slot size of the GC to 64 bytes: ```bash $ sed -i 's/$#define BASE_SLOT_SIZE$.*/\1 64/' gc_impl.c ``` We can compile `gc_impl.c` independently using the following commands for clang or gcc (you may have to change the last `-I` to match your architecture and platform): ```bash $ clang -Iinclude -I. -I.ext/include/arm64-darwin23 -undefined dynamic_lookup -g -O3 -dynamiclib -o libgc.dylib gc_impl.c $ gcc -Iinclude -I. -I.ext/include/x86_64-linux -Wl,-undefined,dynamic_lookup -fPIC -g -O3 -shared -o libgc.so gc_impl.c ``` We can see that by default, the slot size is 40 bytes and objects are 40 bytes in size: ```bash $ ./ruby -e "puts GC.stat_heap(0, :slot_size)" 40 $ ./ruby -robjspace -e "puts ObjectSpace.dump(Object.new)" {"address":"0x1054a23f0", "type":"OBJECT", "shape_id":3, "slot_size":40, "class":"0x10528fd38", "embedded":true, "ivars":0, "memsize":40, "flags":{"wb_protected":true}} ``` We can now load our new GC using the `RUBY_GC_LIBRARY_PATH` environment variable (note that you may have to change the path to the DSO): ```bash $ RUBY_GC_LIBRARY_PATH=./libgc.dylib ./ruby -e "puts GC.stat_heap(0, :slot_size)" 64 $ RUBY_GC_LIBRARY_PATH=./libgc.dylib ./ruby -robjspace -e "puts ObjectSpace.dump(Object.new)" {"address":"0x1038de440", "type":"OBJECT", "shape_id":3, "slot_size":64, "class":"0x10355fc00", "embedded":true, "ivars":0, "memsize":64, "flags":{"wb_protected":true}} ``` ## Benchmark Benchmarks were ran on commit [c78cebb](https://github.com/ruby/ruby/commit/c78cebb469fe56b45ee5daad16ae97… on Ubuntu 22.04 using [yjit-bench](https://github.com/Shopify/yjit-bench/) on commit [cc5a76e](https://github.com/Shopify/yjit-bench/commit/cc5a76ef6240113650547…. Compiling gc_impl branch without `--with-shared-gc` (i.e. how the default Ruby is built), the benchmarks show little to no decrease in performance, with most of it being 0% to 1% slower: ``` -------------- ----------- ---------- ------------ ---------- --------------- -------------- bench master (ms) stddev (%) gc_impl (ms) stddev (%) gc_impl 1st itr master/gc_impl activerecord 73.9 0.3 74.6 0.3 1.00 0.99 chunky-png 911.1 0.2 937.2 0.2 0.97 0.97 erubi-rails 1582.4 0.1 1583.5 0.0 1.00 1.00 hexapdf 2716.2 1.1 2760.2 0.7 1.00 0.98 liquid-c 68.9 0.5 68.6 0.4 1.00 1.00 liquid-compile 67.9 0.1 68.2 0.2 0.99 1.00 liquid-render 172.8 0.1 174.9 0.1 0.99 0.99 lobsters 1033.9 0.4 1036.0 0.3 1.08 1.00 mail 135.1 0.2 136.5 0.2 0.99 0.99 psych-load 2250.8 0.1 2274.9 0.3 0.99 0.99 railsbench 2499.2 0.2 2502.9 0.1 1.00 1.00 rubocop 178.3 0.5 179.8 0.4 1.00 0.99 ruby-lsp 116.8 0.1 118.5 0.2 1.00 0.99 sequel 75.4 0.2 76.2 0.3 0.99 0.99 -------------- ----------- ---------- ------------ ---------- --------------- -------------- ``` Compiling gc_impl branch with `--with-shared-gc` and loading Ruby's current GC using `RUBY_GC_LIBRARY_PATH`, the benchmarks are still fairly good with performance decrease of only around 1% to 2%: ``` -------------- ----------- ---------- ------------ ---------- --------------- -------------- bench master (ms) stddev (%) gc_impl (ms) stddev (%) gc_impl 1st itr master/gc_impl activerecord 74.2 0.2 75.4 0.5 0.98 0.98 chunky-png 916.3 0.3 933.2 0.1 0.98 0.98 erubi-rails 1597.6 0.1 1586.3 0.2 1.01 1.01 hexapdf 2731.4 0.5 2776.8 0.7 1.00 0.98 liquid-c 68.5 0.1 68.9 0.4 0.97 0.99 liquid-compile 67.4 0.4 68.3 0.2 0.95 0.99 liquid-render 171.8 0.1 175.6 0.2 0.97 0.98 lobsters 1031.9 0.3 1041.4 0.3 0.94 0.99 mail 135.5 0.4 136.7 0.1 0.99 0.99 psych-load 2246.0 0.1 2281.3 0.1 0.99 0.98 railsbench 2490.9 0.0 2490.0 0.1 1.01 1.00 rubocop 179.8 2.3 180.0 0.4 0.94 1.00 ruby-lsp 117.3 0.1 118.5 0.1 0.99 0.99 sequel 75.8 0.5 76.3 0.2 0.99 0.99 -------------- ----------- ---------- ------------ ---------- --------------- -------------- ``` ## Limitations We recognize that our current implementation does not yet offer the flexibility required for a generic plug-in GC. Specifically, the set of APIs that the plug-in GC has to implement is relatively large, at around 70 functions. Additionally, some of these functions are specific to the current GC. We would like to emphasize that the API is NOT stable and is subject to change. We will be working on improving this API and reducing the surface area. This will be future work and we're not working on it in this phase. ## Future plans - Refactor and improve `gc_impl.c`. - Implement alternate GC implementations, such as the Epsilon GC and [MMTk](https://www.mmtk.io/) to prove that this API allows for alternate implementations of the GC. - Reduce and improve the API of the GC implementation. - Benchmark and improve performance of the DSO API. -- https://bugs.ruby-lang.org/

1 day, 21 hours

3
3
0 0

[ruby-core:117835] [Ruby master Feature#20484] A new pragma for eager resolution of classes referenced in rescue clauses.

by jfrisby (Jon Frisby)

Issue #20484 has been reported by jfrisby (Jon Frisby). ---------------------------------------- Feature #20484: A new pragma for eager resolution of classes referenced in rescue clauses. https://bugs.ruby-lang.org/issues/20484 * Author: jfrisby (Jon Frisby) * Status: Open ---------------------------------------- I've been using Ruby for 20 years, and just today learned (the hard way) that the class name(s) referenced in a `rescue` clause are not resolved until an exception occurs. Upon reflection, this behavior probably makes sense in a lot of situations. Late resolution may simplify code loading for the developer. I would, however, love to see an opt-in feature (a la `frozen-string-literal`) to force resolution when the code is loaded/parsed. -- https://bugs.ruby-lang.org/

2 days, 6 hours

4
5
0 0

[ruby-core:116589] [Ruby master Misc#20238] Use prism for mk_builtin_loader.rb

by kddnewton (Kevin Newton)

Issue #20238 has been reported by kddnewton (Kevin Newton). ---------------------------------------- Misc #20238: Use prism for mk_builtin_loader.rb https://bugs.ruby-lang.org/issues/20238 * Author: kddnewton (Kevin Newton) * Status: Open * Priority: Normal ---------------------------------------- I would like to propose that we use prism for mk_builtin_loader.rb. Right now the Ruby syntax that you can use in builtin classes is restricted to the base Ruby version (2.7). This means you can't use a lot of the nicer syntax that Ruby has shipped in the last couple of years. If we switch to using prism to parse the builtin files instead of using ripper, then we can always use the latest version of Ruby syntax. A pull request for this is here: https://github.com/kddnewton/ruby/pull/65. The approach for the PR is taken from how RJIT bindgen works. -- https://bugs.ruby-lang.org/

2 days, 15 hours

6
19
0 0

2024

2023

2022

ruby-core ----- 2024 ----- May 2024 April 2024 March 2024 February 2024 January 2024 ----- 2023 ----- December 2023 November 2023 October 2023 September 2023 August 2023 July 2023 June 2023 May 2023 April 2023 March 2023 February 2023 January 2023 ----- 2022 ----- December 2022 November 2022

ruby-core