[ruby-core:112431] [Ruby master Bug#19438] Ruby 2.7 -> 3.2 Performance Regression

Issue #19438 has been reported by nick.schwaderer (Nicholas Schwaderer). ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression https://bugs.ruby-lang.org/issues/19438 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/

Issue #19438 has been updated by byroot (Jean Boussier). Subject changed from Ruby 2.7 -> 3.2 Performance Regression to Ruby 2.7 -> 3.2 Performance Regression in CSV CSV being a gem, it's very likely that the regression isn't in ruby itself, but in the successive versions of the csv gem being bundled with Ruby: ``` ruby 3.2 => csv 3.2.6 ruby 3.1 => csv 3.2.2 ruby 3.0 => csv 3.1.9 ruby 2.7 => csv 3.1.2 ruby 2.6 => csv 3.0.9 ``` So if you wish to investigate this further, the best start would be to benchmark `csv 3.0.9` on Ruby 3.2, and `csv 3.2.6` on Ruby 2.7 or 2.6. I'm not very familiar with the `CSV` gem, but CSV as a format is losely defined, so I really wouldn't be surprised if over the years various fixes were added to handle corner cases and that it would impact performance. ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in CSV https://bugs.ruby-lang.org/issues/19438#change-101880 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/

Issue #19438 has been updated by byroot (Jean Boussier). Subject changed from Ruby 2.7 -> 3.2 Performance Regression in CSV to Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark Oh, I'm sorry, I misread your benchmark and thought it was a CSV based benchmark, I now see it's radically different, please disregard my previous answer. ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark https://bugs.ruby-lang.org/issues/19438#change-101881 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/

Issue #19438 has been updated by nickborromeo (Nick Borromeo). I took a look at the benchmark, and I was wondering what exactly it was trying to benchmark. It looks like its benchmarking the ability to sum up the word count of a given substring. If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use `tally` instead of manually incrementing the count. I ran the benchmark with this change and got the following result. ``` ❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 1.671 1.728 1.893 2.045 i/s - 1.000 times in 0.598269s 0.578549s 0.528261s 0.489104s Comparison: so_k_nucleotide 3.2.1: 2.0 i/s 3.1.2: 1.9 i/s - 1.08x slower 3.0.5: 1.7 i/s - 1.18x slower 2.7.5: 1.7 i/s - 1.22x slower ``` The method to count the frequency looks something like this ``` def word_tally(seq, length) words = [] n = seq.length - length + 1 (0 ... length).each do |f| (f ... n).step(length) do |i| words << seq[i,length] end end table = words.tally [n,table] end ``` In some of the iterations that I played around with, the slow down seems to be coming from the incrementing of the value of a particular key. * taking the substring * accessing the value of that key from the Hash ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark https://bugs.ruby-lang.org/issues/19438#change-102139 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/

Issue #19438 has been updated by byroot (Jean Boussier).
If you agree that that is what is being benchmarked, one thing we can do is change the benchmark to use tally instead of manually incrementing the count.
So this is a benchmark that was popularized by the "benchmark-game", if the goal is to look better on that benchmark (I personally don't care for it) then yes that is a solution. https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knuc... However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time. It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower. ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark https://bugs.ruby-lang.org/issues/19438#change-102143 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/

Issue #19438 has been updated by nickborromeo (Nick Borromeo).
However I think the interesting part of this issue, would be to figure out why that specific implementation got slower over time.
I definitely agree!
It would be interesting to break down the algorithm in smaller pieces and benchmark them idependently to see which operation exactly got slower.
before I did the change to `tally` I also broke it down to smaller steps and ran the benchmark, apologies for the wall of text to follow; a direct quote from a discussion I had with others looking at this --- I may have found something interesting while playing around with the benchmark tool. I wanted to see if we can narrow it down to a specific thing that might be slow. so I got rid of the the `sort_by_freq` and `find_seq` methods for now and focused mostly on the `frecuency` method since stackprof identified that as the _slowest_ one Just so we have a baseline here is the code and the benchmark of the `frecuency` method untouched ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) frequencies[seq[i, length]] += 1 end [ns, frequencies] end ``` ``` Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 7.530 6.524 6.056 6.294 i/s - 1.000 times in 0.132808s 0.153277s 0.165117s 0.158884s Comparison: so_k_nucleotide 2.7.5: 7.5 i/s 3.0.5: 6.5 i/s - 1.15x slower 3.2.1: 6.3 i/s - 1.20x slower 3.1.2: 6.1 i/s - 1.24x slower ``` Now lets look at `String#[]` as the only operation ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) seq[i, length] # frequencies[seq[i, length]] += 1 end [ns, frequencies] end ``` ``` ❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 14.000 14.225 13.797 14.096 i/s - 1.000 times in 0.071427s 0.070300s 0.072481s 0.070944s Comparison: so_k_nucleotide 3.0.5: 14.2 i/s 3.2.1: 14.1 i/s - 1.01x slower 2.7.5: 14.0 i/s - 1.02x slower 3.1.2: 13.8 i/s - 1.03x slower ``` Now lets look at `String#[]` and `Hash#[]=` ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) frequencies[seq[i, length]] end [ns, frequencies] end ``` ``` Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 12.516 12.248 12.058 13.034 i/s - 1.000 times in 0.079900s 0.081645s 0.082931s 0.076725s Comparison: so_k_nucleotide 3.2.1: 13.0 i/s 2.7.5: 12.5 i/s - 1.04x slower 3.0.5: 12.2 i/s - 1.06x slower 3.1.2: 12.1 i/s - 1.08x slower ``` However the interesting bit is when we assign things to variables or when we increment the counter for a hash value which I think is `Hash#[]=` ``` def frecuency(seq, length) frequencies = Hash.new(0) ns = seq.length + 1 - length for i in (0 ... ns) x = seq[i, length] end [ns, frequencies] end ``` ``` ❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 13.641 12.170 11.857 12.001 i/s - 1.000 times in 0.073306s 0.082168s 0.084335s 0.083325s Comparison: so_k_nucleotide 2.7.5: 13.6 i/s 3.0.5: 12.2 i/s - 1.12x slower 3.2.1: 12.0 i/s - 1.14x slower 3.1.2: 11.9 i/s - 1.15x slower ``` This is the same as the baseline in the start of the comment ``` ❯ benchmark-driver so_k_nucleotide.yml --rbenv '2.7.5;3.0.5;3.1.2;3.2.1' --repeat-count 5 Calculating ------------------------------------- 2.7.5 3.0.5 3.1.2 3.2.1 so_k_nucleotide 7.626 6.474 6.135 6.239 i/s - 1.000 times in 0.131126s 0.154468s 0.163005s 0.160283s Comparison: so_k_nucleotide 2.7.5: 7.6 i/s 3.0.5: 6.5 i/s - 1.18x slower 3.2.1: 6.2 i/s - 1.22x slower 3.1.2: 6.1 i/s - 1.24x slower ``` I think we might need to look into the assignment methods instead of the access methods for string and hash 🤔 ---------------------------------------- Bug #19438: Ruby 2.7 -> 3.2 Performance Regression in so_k_nucleotide benchmark https://bugs.ruby-lang.org/issues/19438#change-102159 * Author: nick.schwaderer (Nicholas Schwaderer) * Status: Open * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Introduction Recently I had been going through some of the old benchmarks in the [Ruby Great Implementation Shootout](https://programmingzen.com/the-great-ruby-shootout-july-2010/) from around 2010. As an experiment, one night I ran the benchmarks against Ruby 3.2.0, Ruby 3.2.0 --yjit, TruffleRuby, TruffleRuby +GraalVM, and Ruby 2.6.10. Most results were as expected. However there was a benchmark that Ruby 2.6.10 was _consistently_ outperforming all new Rubies on. ## Method After pairing with @eightbitraptor, we discovered that this old benchmark was remarkably similar to an existing benchmark in the `/benchmark` directory, [so_k_nucleotide.yml](https://github.com/ruby/ruby/blob/master/benchmark/so_k_nucleotide.yml). We decided to go with that benchmark. For brevity I have not included the full 150 lines of the benchmark here. I tested this benchmark out with 100 runs using `benchmark-driver` against Ruby 2.7,3.0,3.1,3.2. (I had discovered that 2.7 was even faster than 2.6.). It appears that about half of the regression occured from 2.7 -> 3.0; the other half from 3.0 -> 3.2. One other interesting finding is that each minor version does appear to regress from the last, even if a little. ## Code This is my benchmark running code and harnass. [The full code and data can be found here](https://gist.github.com/Schwad/16edf3d7cc5316af4baf23497f3c6a8f) ```ruby RUNS = 100 results = Hash.new { |h, k| h[k] = [] } RUNS.times do |i| puts i run = `benchmark-driver so_k_nucleotide.yml --chruby '2.7.5;3.0.5;3.1.3;3.2.0' -o simple` run.scan(/\d\.\d\.\d/).each_with_index do |version, index| results[version] << run.scan(/\d\.\d\d\d/)[index] end end require 'csv' columns = results.keys outdata = CSV.generate do |csv| csv << columns RUNS.times do |i| csv << columns.map { |c| results[c][i] } end end File.write("output.csv", outdata) ``` ## Data Ruby 2.7.5 was consistently ~18-20% faster than Ruby 3.2.0 in this Benchmark  ## Next Steps I am happy to help investigate or learn more about this regression if anyone has any ideas. -- https://bugs.ruby-lang.org/
participants (4)
-
byroot (Jean Boussier)
-
byroot (Jean Boussier)
-
nick.schwaderer (Nicholas Schwaderer)
-
nickborromeo (Nick Borromeo)