New subject: [ruby-core:112435] [Ruby master Bug#19438] Ruby 2.7 -> 3.2 Performance Regression in CSV

15 Feb 2023

Issue #19438 has been reported by nick.schwaderer (Nicholas Schwaderer).

----------------------------------------
Bug #19438: Ruby 2.7 -> 3.2 Performance Regression
https://bugs.ruby-lang.org/issues/19438

* Author: nick.schwaderer (Nicholas Schwaderer)
* Status: Open
* Priority: Normal
* ruby -v: 3.2.0
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
## Introduction

Recently I had been going through some of the old benchmarks in the [Ruby Great
Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/)
from around 2010. 

As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit,
TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10.

Most results were as expected. However there was a benchmark that Ruby 2.6.10 was
_consistently_ outperforming all new Rubies on.

## Method

After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably
similar to an existing benchmark in the `/benchmark` 
directory,
[so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_….
We decided to go with that benchmark. For brevity I have not included the full 150 lines
of the benchmark here.

I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby
2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.).

It appears that about half of the regression occured from 2.7 -> 3.0; the other half
from 3.0 -> 3.2. One other interesting finding is that each minor version does appear
to regress 
from the last, even if a little.

## Code

This is my benchmark running code and harnass. [The full code and data can be found
here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f)

```ruby
RUNS = 100

results = Hash.new { |h, k| h[k] = [] }
RUNS.times do |i|
  puts i
  run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0'
-o simple`
  run.scan(/\d\.\d\.\d/).each_with_index do |version, index|
    results[version] << run.scan(/\d\.\d\d\d/)[index]
  end
end

require 'csv'

columns = results.keys
outdata = CSV.generate do |csv|
  csv << columns
  RUNS.times do |i|
    csv << columns.map { |c| results[c][i] }
  end
end

File.write("output.csv", outdata)
```
## Data

Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark

![Screenshot 2023-02-15 at 13 16
10](https://user-images.githubusercontent.com/7865030/219038430-4a124cc6-0d…

## Next Steps

I am happy to help investigate or learn more about this regression if anyone has any
ideas. 

-- 
https://bugs.ruby-lang.org/

[ruby-core:112431] [Ruby master Bug#19438] Ruby 2.7 -> 3.2 Performance Regression