[ruby-core:121451] [Ruby Bug#21201] Performance regression when defining methods inside `refine` blocks

Issue #21201 has been reported by alpaca-tc (Hiroyuki Ishii). ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by byroot (Jean Boussier).
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal. But yes, this performance issue is why at work we disallowed usage of refinement using rubocop. ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112451 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by tenderlovemaking (Aaron Patterson). byroot (Jean Boussier) wrote in #note-1:
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss? ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112453 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by byroot (Jean Boussier).
refined methods are always an IC miss?
I like that idea. ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112454 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by byroot (Jean Boussier). The downside however is that I think we'd need to lock the VM to lookup methods? So it doesn't help with making Ractor lock less. ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112456 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by jhawthorn (John Hawthorn). tenderlovemaking (Aaron Patterson) wrote in #note-2:
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?
I believe that was the case before Ruby 3.3 ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112457 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by alpaca-tc (Hiroyuki Ishii). byroot (Jean Boussier) wrote in #note-1:
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
I created a pull request based on this approach. https://github.com/ruby/ruby/pull/13077 Indeed, the performance improvement is remarkable — reloading our Rails locally now completes in under 3 seconds. Many thanks to everyone in this thread for your helpful insights and suggestions. ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112579 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/

Issue #21201 has been updated by byroot (Jean Boussier). Your patch look really good. I wonder if it would be possible to do like the `vm->constant_cache` table, have the key be the method name, so that you wouldn't need to clear absolutely everything. But your patch as-is is already a nice improvement I think, the only downside being the one extra lock point for ractors. ---------------------------------------- Bug #21201: Performance regression when defining methods inside `refine` blocks https://bugs.ruby-lang.org/issues/21201#change-112581 * Author: alpaca-tc (Hiroyuki Ishii) * Status: Open * ruby -v: 3.3.7, 3.4.2, 3.5.0 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- After the change introduced in [PR #10037](https://github.com/ruby/ruby/pull/10037), a significant performance regression has been observed when defining methods within refine blocks. The PR correctly fixes a bug where `callcache` invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes [`rb_clear_all_refinement_method_cache()`](https://github.com/ruby/ruby/blob/752a1d785475d1950b33c7d77b289a6a3a753f02/v...) at the time of method definition within `refine`. However, this function performs a full object space traversal via `rb_objspace_each_objects`, making it extremely expensive. In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), `rb_clear_all_refinement_method_cache()` is called 1425 times, 1142 of which originate via `rb_clear_method_cache()`. As a result, the code reload time has increased from **5 seconds** to **15 seconds** after the patch. Since reloading occurs every time the code changes, this degrades development experience severely. ### Reproduction script: ```ruby require "bundler/inline" gemfile(true) do source "https://rubygems.org" gem "benchmark-ips" end mod = Module.new klass = Class.new Benchmark.ips do |x| x.report(RUBY_VERSION) do mod.send(:refine, klass) do def call_1 = nil def call_2 = nil def call_3 = nil end end x.save! "/tmp/performance_regression_refine.bench" x.compare! end ``` ### Benchmark results: ``` Comparison: 3.2.7: 1500316.7 i/s 3.3.7: 158.0 i/s - 9496.46x slower 3.5.0: 124.6 i/s - 12041.43x slower 3.4.2: 119.5 i/s - 12553.50x slower ``` Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539 I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently. -- https://bugs.ruby-lang.org/
participants (4)
-
alpaca-tc (Hiroyuki Ishii)
-
byroot (Jean Boussier)
-
jhawthorn (John Hawthorn)
-
tenderlovemaking (Aaron Patterson)