Issue #20154 has been reported by jprokop (Jarek Prokop).
----------------------------------------
Bug #20154: aarch64: configure overrides `-mbranch-protection` if it was set in CFLAGS via environment
https://bugs.ruby-lang.org/issues/20154
* Author: jprokop (Jarek Prokop)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Recently a GH PR was merged <https://github.com/ruby/ruby/pull/9306> For PAC/BTI support on ARM CPUs for Coroutine.S.
Without proper compilation support in configure.ac it segfaults Ruby with fibers on CPUs where PAC is supported: https://bugs.ruby-lang.org/issues/20085
At the time of writing, configure.ac appends the first option from a list for flag `-mbranch-protection` that successfully compiles a program <https://github.com/ruby/ruby/blob/master/configure.ac#L829>,
to XCFLAGS and now also ASFLAGS to fix issue 20085 for Ruby master.
This is suboptimal for Fedora as we set -mbranch-protection=standard by default in C{,XX}FLAGS:
```
CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer '
export CFLAGS
CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer'
export CXXFLAGS
```
And the appended flag overrides distribution's compilation configuration, which in this case ends up omitting BTI instructions and only using PAC.
Would it make sense to check if such flags exist and not overwrite them if they do?
Serious proposals:
1. Simplest fix that does not overwrite what is set in the distribution and results in higher security is simply prepending the list of options with `-mbranch-protection=standard`, it should cause no problems on ARMv8 CPUs and forward, BTI similarly to PAC instructions result into NOP, it is only extending the capability.
See attached 0001-aarch64-Check-mbranch-protection-standard-first-to-u.patch
2. Other fix that sounds more sane IMO and dodges this kind of guessing where are all the correct places for the flag is what another Fedora contributor Florian Weimer suggested: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org…
"The reliable way to do this would be to compile a C file and check
whether that enables __ARM_FEATURE_PAC_DEFAULT, and if that's the case,
define a *different* macro for use in the assembler implementation.
This way, you don't need to care about the exact name of the option."
IOW instead of using __ARM_FEATURE_* directly in that code, define a macro in the style of "USE_PAC" with value of the feature if it is defined, I think that way we shouldn't need to append ASFLAGS anymore.
However it's also important to catch the value of those macros as their values have meaning, I have an idea how to do that but I'd get on that monday earliest.
---Files--------------------------------
0001-aarch64-Check-mbranch-protection-standard-first-to-u.patch (1004 Bytes)
--
https://bugs.ruby-lang.org/
Issue #20492 has been reported by hadmut (Hadmut Danisch).
----------------------------------------
Feature #20492: Debug option for tempfile
https://bugs.ruby-lang.org/issues/20492
* Author: hadmut (Hadmut Danisch)
* Status: Open
----------------------------------------
Hi,
the ruby lib tempfile is quite useful, but since it always deletes files once the object is garbage collected or the program terminates (or the program explicitely asks to remove the file) it is difficult to debug programs and to check the file contents, after program termination.
Replacing all tempfile uses with regular file operations is awkward.
It therefore would be useful, if file deletion of tempfiles could be completely turned of e.g. through an environment variable or by the program itself, like through a --debug option or when catching errors, like deleting all files during normal program termination, but not if there's a runtime error.
regards
--
https://bugs.ruby-lang.org/
Issue #20448 has been reported by ms-tob (Matt S).
----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448
* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract
Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.
# Background
Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:
1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hoo… extension function
Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.
# Proposal
The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.
The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.
1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-…
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.
I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:
```c
#include <ruby.h>
#include <ruby/debug.h>
#define RUBY_EVENT_COVERAGE_BRANCH 0x020000
// ...
rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
(rb_event_hook_func_t) event_hook_branch,
events,
counter_hash,
flags
);
```
If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.
The above two points would be requirements for this functionality, but there's an additional nice-to-have:
3. Extend the public `tracearg` functionality to include additional coverage information
a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.
# Use cases
This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided…. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L….
So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.
We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9
# Discussion
Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.
The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.
# See also
- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
- https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
- https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/
--
https://bugs.ruby-lang.org/
Issue #20218 has been reported by jeremyevans0 (Jeremy Evans).
----------------------------------------
Bug #20218: aset/masgn/op_asgn with keyword arguments
https://bugs.ruby-lang.org/issues/20218
* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
I found that use of keyword arguments in multiple assignment is broken in 3.3 and master:
```ruby
h = {a: 1}
o = []
def o.[]=(*args, **kw)
replace([args, kw])
end
# This segfaults as RHS argument is not a hash
o[1, a: 1], _ = [1, 2]
# This passes the RHS argument as keywords to the method, treating keyword splat as positional argument
o[1, **h], _ = [{b: 3}, 2]
o
# => [[1, {:a=>1}], {:b=>3}]
```
Before 3.3, keyword arguments were treated as positional arguments.
This is similar to #19918, but for keyword arguments instead of block arguments.
@matz indicated he wanted to prohibit block arguments in aset/masgn and presumably also op_asgn (making them SyntaxErrors). Can we also prohibit keyword arguments in aset/masgn/op_asgn?
Note that aset treats keyword arguments as regular arguments:
```ruby
o[1, a: 1] = 2
o
# => [[1, {:a=>1}, 2], {}]
o[1, **h] = {b: 3}
o
# => [[1, {:a=>2}, {:b=>3}], {}]
```
While op_asgn treats keyword arguments as keywords:
```ruby
h = {a: 2}
o = []
def o.[](*args, **kw)
concat([:[], args, kw])
x = Object.new
def x.+(v)
[:x, v]
end
x
end
def o.[]=(*args, **kw)
concat([:[]=, args, kw])
end
o[1, a: 1] += 2
o
# => [:[], [1], {:a=>1}, :[]=, [1, [:x, 2]], {:a=>1}]
o.clear
o[1, **h] += {b: 3}
o
# => [:[], [1], {:a=>2}, :[]=, [1, [:x, {:b=>3}]], {:a=>2}]
```
--
https://bugs.ruby-lang.org/
Issue #20470 has been reported by peterzhu2118 (Peter Zhu).
----------------------------------------
Feature #20470: Extract Ruby's Garbage Collector
https://bugs.ruby-lang.org/issues/20470
* Author: peterzhu2118 (Peter Zhu)
* Status: Open
----------------------------------------
# Extract Ruby's Garbage Collector
## Background
As described in [[Feature #20351]](https://bugs.ruby-lang.org/issues/20351), we are working on the ability to plug alternative garbage collector implementations into Ruby. Our goal is to allow developers and researchers to create and experiment with new implementations of garbage collectors in Ruby in a simplified way. This will also allow experimentation with different GC implementations in production systems so users can choose the best GC implementation for their workloads.
## Implementation
GitHub PR: [10721](https://github.com/ruby/ruby/pull/10721)
In this patch, we have split the current `gc.c` file into two files: `gc.c` and `gc_impl.c`.
`gc.c` now only contains code not specific to Ruby GC. This includes code to mark objects (which the GC implementation may choose not to use) and wrappers for internal APIs that the implementation may need to use (e.g. locking the VM).
`gc_impl.c` now contains the implementation of Ruby's GC. This includes marking, sweeping, compaction, and statistics. Most importantly, `gc_impl.c` only uses public APIs in Ruby and a limited set of functions exposed in `gc.c`. This allows us to build `gc_impl.c` independently of Ruby and plug Ruby's GC into itself.
## Demonstration
After [checking out the branch](https://github.com/ruby/ruby/pull/10721), we can first configure with `--with-shared-gc`:
```bash
$ ./configure --with-shared-gc
...
$ make -j
...
```
Let's now change the slot size of the GC to 64 bytes:
```bash
$ sed -i 's/\(#define BASE_SLOT_SIZE\).*/\1 64/' gc_impl.c
```
We can compile `gc_impl.c` independently using the following commands for clang or gcc (you may have to change the last `-I` to match your architecture and platform):
```bash
$ clang -Iinclude -I. -I.ext/include/arm64-darwin23 -undefined dynamic_lookup -g -O3 -dynamiclib -o libgc.dylib gc_impl.c
$ gcc -Iinclude -I. -I.ext/include/x86_64-linux -Wl,-undefined,dynamic_lookup -fPIC -g -O3 -shared -o libgc.so gc_impl.c
```
We can see that by default, the slot size is 40 bytes and objects are 40 bytes in size:
```bash
$ ./ruby -e "puts GC.stat_heap(0, :slot_size)"
40
$ ./ruby -robjspace -e "puts ObjectSpace.dump(Object.new)"
{"address":"0x1054a23f0", "type":"OBJECT", "shape_id":3, "slot_size":40, "class":"0x10528fd38", "embedded":true, "ivars":0, "memsize":40, "flags":{"wb_protected":true}}
```
We can now load our new GC using the `RUBY_GC_LIBRARY_PATH` environment variable (note that you may have to change the path to the DSO):
```bash
$ RUBY_GC_LIBRARY_PATH=./libgc.dylib ./ruby -e "puts GC.stat_heap(0, :slot_size)"
64
$ RUBY_GC_LIBRARY_PATH=./libgc.dylib ./ruby -robjspace -e "puts ObjectSpace.dump(Object.new)"
{"address":"0x1038de440", "type":"OBJECT", "shape_id":3, "slot_size":64, "class":"0x10355fc00", "embedded":true, "ivars":0, "memsize":64, "flags":{"wb_protected":true}}
```
## Benchmark
Benchmarks were ran on commit [c78cebb](https://github.com/ruby/ruby/commit/c78cebb469fe56b45ee5daad16ae97… on Ubuntu 22.04 using [yjit-bench](https://github.com/Shopify/yjit-bench/) on commit [cc5a76e](https://github.com/Shopify/yjit-bench/commit/cc5a76ef6240113650547….
Compiling gc_impl branch without `--with-shared-gc` (i.e. how the default Ruby is built), the benchmarks show little to no decrease in performance, with most of it being 0% to 1% slower:
```
-------------- ----------- ---------- ------------ ---------- --------------- --------------
bench master (ms) stddev (%) gc_impl (ms) stddev (%) gc_impl 1st itr master/gc_impl
activerecord 73.9 0.3 74.6 0.3 1.00 0.99
chunky-png 911.1 0.2 937.2 0.2 0.97 0.97
erubi-rails 1582.4 0.1 1583.5 0.0 1.00 1.00
hexapdf 2716.2 1.1 2760.2 0.7 1.00 0.98
liquid-c 68.9 0.5 68.6 0.4 1.00 1.00
liquid-compile 67.9 0.1 68.2 0.2 0.99 1.00
liquid-render 172.8 0.1 174.9 0.1 0.99 0.99
lobsters 1033.9 0.4 1036.0 0.3 1.08 1.00
mail 135.1 0.2 136.5 0.2 0.99 0.99
psych-load 2250.8 0.1 2274.9 0.3 0.99 0.99
railsbench 2499.2 0.2 2502.9 0.1 1.00 1.00
rubocop 178.3 0.5 179.8 0.4 1.00 0.99
ruby-lsp 116.8 0.1 118.5 0.2 1.00 0.99
sequel 75.4 0.2 76.2 0.3 0.99 0.99
-------------- ----------- ---------- ------------ ---------- --------------- --------------
```
Compiling gc_impl branch with `--with-shared-gc` and loading Ruby's current GC using `RUBY_GC_LIBRARY_PATH`, the benchmarks are still fairly good with performance decrease of only around 1% to 2%:
```
-------------- ----------- ---------- ------------ ---------- --------------- --------------
bench master (ms) stddev (%) gc_impl (ms) stddev (%) gc_impl 1st itr master/gc_impl
activerecord 74.2 0.2 75.4 0.5 0.98 0.98
chunky-png 916.3 0.3 933.2 0.1 0.98 0.98
erubi-rails 1597.6 0.1 1586.3 0.2 1.01 1.01
hexapdf 2731.4 0.5 2776.8 0.7 1.00 0.98
liquid-c 68.5 0.1 68.9 0.4 0.97 0.99
liquid-compile 67.4 0.4 68.3 0.2 0.95 0.99
liquid-render 171.8 0.1 175.6 0.2 0.97 0.98
lobsters 1031.9 0.3 1041.4 0.3 0.94 0.99
mail 135.5 0.4 136.7 0.1 0.99 0.99
psych-load 2246.0 0.1 2281.3 0.1 0.99 0.98
railsbench 2490.9 0.0 2490.0 0.1 1.01 1.00
rubocop 179.8 2.3 180.0 0.4 0.94 1.00
ruby-lsp 117.3 0.1 118.5 0.1 0.99 0.99
sequel 75.8 0.5 76.3 0.2 0.99 0.99
-------------- ----------- ---------- ------------ ---------- --------------- --------------
```
## Limitations
We recognize that our current implementation does not yet offer the flexibility required for a generic plug-in GC. Specifically, the set of APIs that the plug-in GC has to implement is relatively large, at around 70 functions. Additionally, some of these functions are specific to the current GC.
We would like to emphasize that the API is NOT stable and is subject to change. We will be working on improving this API and reducing the surface area. This will be future work and we're not working on it in this phase.
## Future plans
- Refactor and improve `gc_impl.c`.
- Implement alternate GC implementations, such as the Epsilon GC and [MMTk](https://www.mmtk.io/) to prove that this API allows for alternate implementations of the GC.
- Reduce and improve the API of the GC implementation.
- Benchmark and improve performance of the DSO API.
--
https://bugs.ruby-lang.org/
Issue #20484 has been reported by jfrisby (Jon Frisby).
----------------------------------------
Feature #20484: A new pragma for eager resolution of classes referenced in rescue clauses.
https://bugs.ruby-lang.org/issues/20484
* Author: jfrisby (Jon Frisby)
* Status: Open
----------------------------------------
I've been using Ruby for 20 years, and just today learned (the hard way) that the class name(s) referenced in a `rescue` clause are not resolved until an exception occurs.
Upon reflection, this behavior probably makes sense in a lot of situations. Late resolution may simplify code loading for the developer.
I would, however, love to see an opt-in feature (a la `frozen-string-literal`) to force resolution when the code is loaded/parsed.
--
https://bugs.ruby-lang.org/
Issue #20238 has been reported by kddnewton (Kevin Newton).
----------------------------------------
Misc #20238: Use prism for mk_builtin_loader.rb
https://bugs.ruby-lang.org/issues/20238
* Author: kddnewton (Kevin Newton)
* Status: Open
* Priority: Normal
----------------------------------------
I would like to propose that we use prism for mk_builtin_loader.rb.
Right now the Ruby syntax that you can use in builtin classes is restricted to the base Ruby version (2.7). This means you can't use a lot of the nicer syntax that Ruby has shipped in the last couple of years.
If we switch to using prism to parse the builtin files instead of using ripper, then we can always use the latest version of Ruby syntax. A pull request for this is here: https://github.com/kddnewton/ruby/pull/65. The approach for the PR is taken from how RJIT bindgen works.
--
https://bugs.ruby-lang.org/