[ruby-core:118939] [Ruby master Bug#20695] Elevated GC allocations in parse.y parser

23 Aug 2024

Issue #20695 has been reported by alanwu (Alan Wu).

----------------------------------------
Bug #20695: Elevated GC allocations in parse.y parser
https://bugs.ruby-lang.org/issues/20695

* Author: alanwu (Alan Wu)
* Status: Open
* ruby -v: ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED
----------------------------------------
Testing on the `lobsters` benchmark from [`yjit-bench`](https://github.com/shopify/yjit-bench), the latest `master` parser is allocating more objects compared to commit:98eeadc932 ("Development of 3.4.0 started."). The following patch shows allocation count after code loading:

```diff

diff --git a/benchmarks/lobsters/benchmark.rb b/benchmarks/lobsters/benchmark.rb
index 240c50c..6cdd0ac 100644
--- a/benchmarks/lobsters/benchmark.rb
+++ b/benchmarks/lobsters/benchmark.rb
@@ -7,6 +7,8 @@ Dir.chdir __dir__
 use_gemfile
 
 require_relative 'config/environment'
+printf "allocated_after_load=%d\n", GC.stat(:total_allocated_objects)
+exit
 require_relative "route_generator"
 
 # For an in-mem DB, we need to load all data on every boot
```

```
$ ruby benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2023-12-25T09:13:40Z master 98eeadc932) [arm64-darwin23]
<snip>
allocated_after_load=1747084
$ chruby master
$ ruby benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
<snip>
allocated_after_load=2163031
$ ruby --parser=prism benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) +PRISM [arm64-darwin23]
<snip>
allocated_after_load=932571
$ math 2163031 / 1747084
1.238081
```

---

Profiling shows extra GC allocations coming from set_number_literal(); set_yylval_node() expands to a call to rb_enc_str_new(). So this issue seems related to #20659. A surprising side effect of these extra allocations is that they seem to impact the speed of the benchmark, not only the loading speed, even though the body of benchmark *does not do any Ruby parsing* after warmup iterations. The speed impact seem to be due to poorer data locality, as perf(1) shows elevated levels of `CYCLE_ACTIVITY.STALLS_L2_MISS` and related events. Using Prism speeds up benchmark iterations, probably since it allocates a lot fewer during code loading, leaving the heap in a better layout in terms of locality. With Prism, fewer cycles stall on data cache misses.

```
$ ruby run_benchmarks.rb --chruby 'master;mstr+p::master --parser=prism' lobsters
<snip>
master: ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
mstr+p: ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) +PRISM [arm64-darwin23]

--------  -----------  ----------  -----------  ----------  --------------  -------------
bench     master (ms)  stddev (%)  mstr+p (ms)  stddev (%)  mstr+p 1st itr  master/mstr+p
lobsters  567.9        0.7         557.0        0.8         1.00            1.02         
--------  -----------  ----------  -----------  ----------  --------------  -------------
Legend:
- mstr+p 1st itr: ratio of master/mstr+p time for the first benchmarking iteration.
- master/mstr+p: ratio of master/mstr+p time. Higher is better for mstr+p. Above 1 represents a speedup.
```

In hindsight, it's clear how that the parser's allocation pattern can have a significant impact on heap layout, since typical app boot with Kernel#require alternates between running app code and running the parser. Parser allocations essentially end up as gaps between app objects.




-- 
https://bugs.ruby-lang.org/

    

alanwu (Alan Wu)

tags

participants (1)