[ruby-core:124552] [Ruby Bug#21838] Rails seeding degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024)
Issue #21838 has been reported by bluz71 (Dennis B). ---------------------------------------- Bug #21838: Rails seeding degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838 * Author: bluz71 (Dennis B) * Status: Open * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by byroot (Jean Boussier). So I can somewhat repro. On my machine (macOS/M3) about 4.0 is half-way between 3.3 and 3.4. I added: ``` pp GC.stat pp GC.stat_heap ``` At the end of your script, given based on the description GC is likely to play a role in what you are seeing. And indeed: - 3.3 spends 1.6s in GC with ~2k minors but only 13 majors. - 3.4 spends 8.1s in GC with ~2k minors and ~400 majors. - 4.0 spends 3.8s in GC with ~1.9k minors and 137 majors. The time spent in GC seems to account for most of the difference between versions so I suspect the problem isn't so much the change itself, more that in consequence the default GC parameters aren't working as well with your app. But cc @peterzhu2118 just in case. 3.3.4: ``` === Image Processing Only === User CPU: 20.143620000000002s System CPU: 11.150236000000001s Real time: 28.597061999840662s ============================== === Benchmark Results === User CPU: 20.18374s System CPU: 11.297849s Real time: 28.78712600003928s ======================== {:count=>2042, :time=>1677, :marking_time=>582, :sweeping_time=>1095, :heap_allocated_pages=>426, :heap_sorted_length=>618, :heap_allocatable_pages=>192, :heap_available_slots=>370125, :heap_live_slots=>369139, :heap_free_slots=>986, :heap_final_slots=>0, :heap_marked_slots=>345806, :heap_eden_pages=>426, :heap_tomb_pages=>0, :total_allocated_pages=>426, :total_freed_pages=>0, :total_allocated_objects=>13435015, :total_freed_objects=>13065876, :malloc_increase_bytes=>2660112, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>2029, :major_gc_count=>13, :compact_count=>0, :read_barrier_faults=>0, :total_moved_objects=>0, :remembered_wb_unprotected_objects=>0, :remembered_wb_unprotected_objects_limit=>2963, :old_objects=>344695, :old_objects_limit=>592776, :oldmalloc_increase_bytes=>2660112, :oldmalloc_increase_bytes_limit=>18971452} {0=> {:slot_size=>40, :heap_allocatable_pages=>62, :heap_eden_pages=>132, :heap_eden_slots=>216136, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>132, :total_freed_pages=>0, :force_major_gc_count=>3, :force_incremental_marking_finish_count=>0, :total_allocated_objects=>4953144, :total_freed_objects=>4737427}, 1=> {:slot_size=>80, :heap_allocatable_pages=>22, :heap_eden_pages=>123, :heap_eden_slots=>100671, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>123, :total_freed_pages=>0, :force_major_gc_count=>2, :force_incremental_marking_finish_count=>0, :total_allocated_objects=>3636473, :total_freed_objects=>3536459}, 2=> {:slot_size=>160, :heap_allocatable_pages=>11, :heap_eden_pages=>118, :heap_eden_slots=>48238, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>118, :total_freed_pages=>0, :force_major_gc_count=>5, :force_incremental_marking_finish_count=>1, :total_allocated_objects=>4271349, :total_freed_objects=>4223425}, 3=> {:slot_size=>320, :heap_allocatable_pages=>31, :heap_eden_pages=>19, :heap_eden_slots=>3870, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>19, :total_freed_pages=>0, :force_major_gc_count=>0, :force_incremental_marking_finish_count=>0, :total_allocated_objects=>567972, :total_freed_objects=>564149}, 4=> {:slot_size=>640, :heap_allocatable_pages=>64, :heap_eden_pages=>36, :heap_eden_slots=>3667, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>36, :total_freed_pages=>0, :force_major_gc_count=>0, :force_incremental_marking_finish_count=>0, :total_allocated_objects=>8050, :total_freed_objects=>4416}} real 0m29.475s user 0m22.232s sys 0m13.533s ``` 3.4.6: ``` === Image Processing Only === User CPU: 28.113429999999997s System CPU: 13.080968s Real time: 38.77424399997108s ============================== === Benchmark Results === User CPU: 28.166331000000003s System CPU: 13.227464s Real time: 38.97528999997303s ======================== {count: 2428, time: 8179, marking_time: 6345, sweeping_time: 1834, heap_allocated_pages: 449, heap_empty_pages: 0, heap_allocatable_slots: 0, heap_available_slots: 390466, heap_live_slots: 329216, heap_free_slots: 61250, heap_final_slots: 0, heap_marked_slots: 240286, heap_eden_pages: 449, total_allocated_pages: 449, total_freed_pages: 0, total_allocated_objects: 15283498, total_freed_objects: 14954282, malloc_increase_bytes: 960, malloc_increase_bytes_limit: 16777216, minor_gc_count: 2031, major_gc_count: 397, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 2368, old_objects: 236843, old_objects_limit: 473686, oldmalloc_increase_bytes: 960, oldmalloc_increase_bytes_limit: 16777216} {0 => {slot_size: 40, heap_eden_pages: 146, heap_eden_slots: 239053, total_allocated_pages: 146, force_major_gc_count: 2, force_incremental_marking_finish_count: 1, total_allocated_objects: 5176949, total_freed_objects: 4973920}, 1 => {slot_size: 80, heap_eden_pages: 115, heap_eden_slots: 94118, total_allocated_pages: 115, force_major_gc_count: 312, force_incremental_marking_finish_count: 33, total_allocated_objects: 3689256, total_freed_objects: 3604380}, 2 => {slot_size: 160, heap_eden_pages: 114, heap_eden_slots: 46603, total_allocated_pages: 114, force_major_gc_count: 49, force_incremental_marking_finish_count: 2, total_allocated_objects: 4973174, total_freed_objects: 4936620}, 3 => {slot_size: 320, heap_eden_pages: 31, heap_eden_slots: 6314, total_allocated_pages: 31, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 1440404, total_freed_objects: 1439188}, 4 => {slot_size: 640, heap_eden_pages: 43, heap_eden_slots: 4378, total_allocated_pages: 43, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 7756, total_freed_objects: 7574}} real 0m39.772s user 0m30.376s sys 0m15.641s ``` 4.0.0: ``` === Image Processing Only === User CPU: 23.316253s System CPU: 13.033615000000001s Real time: 33.97317499993369s ============================== === Benchmark Results === User CPU: 23.362938999999997s System CPU: 13.172686s Real time: 34.16143300011754s ======================== {count: 2021, time: 3851, marking_time: 2476, sweeping_time: 1374, heap_allocated_pages: 420, heap_empty_pages: 0, heap_allocatable_slots: 92694, heap_available_slots: 363270, heap_live_slots: 358735, heap_free_slots: 4535, heap_final_slots: 0, heap_marked_slots: 332764, heap_eden_pages: 420, total_allocated_pages: 420, total_freed_pages: 0, total_allocated_objects: 13840724, total_freed_objects: 13481989, malloc_increase_bytes: 2851188, malloc_increase_bytes_limit: 16777216, minor_gc_count: 1884, major_gc_count: 137, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 3324, old_objects: 332470, old_objects_limit: 664940, oldmalloc_increase_bytes: 2851188, oldmalloc_increase_bytes_limit: 16777216} {0 => {slot_size: 40, heap_live_slots: 206502, heap_free_slots: 1454, heap_final_slots: 0, heap_eden_pages: 127, heap_eden_slots: 207956, total_allocated_pages: 149, force_major_gc_count: 0, force_incremental_marking_finish_count: 2, total_allocated_objects: 4977191, total_freed_objects: 4770689}, 1 => {slot_size: 80, heap_live_slots: 100320, heap_free_slots: 347, heap_final_slots: 0, heap_eden_pages: 123, heap_eden_slots: 100667, total_allocated_pages: 1031, force_major_gc_count: 111, force_incremental_marking_finish_count: 0, total_allocated_objects: 2417433, total_freed_objects: 2317113}, 2 => {slot_size: 160, heap_live_slots: 47162, heap_free_slots: 254, heap_final_slots: 0, heap_eden_pages: 116, heap_eden_slots: 47416, total_allocated_pages: 225, force_major_gc_count: 94, force_incremental_marking_finish_count: 0, total_allocated_objects: 5001103, total_freed_objects: 4953941}, 3 => {slot_size: 320, heap_live_slots: 4862, heap_free_slots: 29, heap_final_slots: 0, heap_eden_pages: 24, heap_eden_slots: 4891, total_allocated_pages: 3399, force_major_gc_count: 127, force_incremental_marking_finish_count: 0, total_allocated_objects: 1440733, total_freed_objects: 1435871}, 4 => {slot_size: 640, heap_live_slots: 3149, heap_free_slots: 10, heap_final_slots: 0, heap_eden_pages: 31, heap_eden_slots: 3159, total_allocated_pages: 151, force_major_gc_count: 6, force_incremental_marking_finish_count: 0, total_allocated_objects: 7524, total_freed_objects: 4375}} real 0m35.071s user 0m25.508s sys 0m15.621s ``` ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116129 * Author: bluz71 (Dennis B) * Status: Open * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by bluz71 (Dennis B). byroot (Jean Boussier) wrote in #note-2:
So I can somewhat repro. On my machine (macOS/M3) about 4.0 is half-way between 3.3 and 3.4.
Thank you byroot. I am most pleased that it is reproducible, especially 3.3 vs 3.4. Note, if I create a Rails inititializer `config/initializer/gc.rb` that does `GC.disable` and run my test case before and after Revision `079ef92b` the times do equalize to around 36-38 seconds. So indeed this regression is related purely to the GC. Thank you for spotting that. When I tested Ruby 4.0 earlier this week, it was still slower than 3.3 for my Rails seeding operation. Best regards. ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116139 * Author: bluz71 (Dennis B) * Status: Open * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by peterzhu2118 (Peter Zhu). Thank you for reporting this issue. I debugged it and have a fix here: https://github.com/ruby/ruby/pull/15882 I distilled the reproduction down to this script: <details> <summary>Script</summary> ```ruby require "bundler/inline" gemfile do source "https://rubygems.org" gem "ruby-vips" gem "benchmark" gem "image_processing" gem "shrine" gem "sqlite3" gem "rails" end require "rails" require "active_record" require "benchmark" require "sqlite3" require "shrine/storage/file_system" ActiveRecord::Base.establish_connection( adapter: "sqlite3", database: ":memory:" ) ActiveRecord::Schema.define do create_table "albums", force: :cascade do |t| t.integer "artist_id", null: false t.text "cover_data" t.datetime "created_at", null: false t.integer "duration", default: 0, null: false t.string "title" t.integer "tracks_count", default: 0, null: false t.datetime "updated_at", null: false t.index ["artist_id"], name: "index_albums_on_artist_id" t.index ["duration"], name: "index_albums_on_duration" t.index ["title", "artist_id"], name: "index_albums_on_title_and_artist_id", unique: true end create_table "artists", force: :cascade do |t| t.integer "albums_count", default: 0, null: false t.datetime "created_at", null: false t.text "description" t.string "name" t.datetime "updated_at", null: false t.string "website" t.string "wikipedia" t.index ["description"], name: "index_artists_on_description" t.index ["name"], name: "index_artists_on_name", unique: true end add_foreign_key "albums", "artists" end Shrine.storages = { cache: Shrine::Storage::FileSystem.new("/tmp", prefix: "uploads/cache"), store: Shrine::Storage::FileSystem.new("/tmp", prefix: "uploads/store"), } Shrine.plugin :activerecord Shrine.plugin :cached_attachment_data Shrine.plugin :restore_cached_data Shrine.plugin :derivatives Shrine.plugin :validation Shrine.plugin :validation_helpers Shrine.plugin :determine_mime_type Shrine.plugin :remote_url, max_size: 1*1024*1024 class CoverUploader < Shrine Attacher.derivatives do |original| vips = ImageProcessing::Vips.source(original) { small: vips.resize_to_limit!(300, 300), medium: vips.resize_to_limit!(500, 500) } end end class ApplicationRecord < ActiveRecord::Base primary_abstract_class end class Album < ApplicationRecord include CoverUploader::Attachment(:cover) # ASSOCIATIONS belongs_to :artist, counter_cache: true, touch: true # VALIDATIONS validates :title, presence: true, uniqueness: {scope: :artist_id} end class Artist < ApplicationRecord # ASSOCIATIONS has_many :albums, dependent: :destroy, strict_loading: true # VALIDATIONS validates :name, presence: true, uniqueness: {case_sensitive: false} # Only allow http:// and https:// URL schemes. validates :website, format: { with: URI::DEFAULT_PARSER.make_regexp(%w[http https]), allow_blank: true } end result = Benchmark.measure do artist_records = 500.times.map do |i| { name: "Artist #{i}", description: "Description for artist #{i}", wikipedia: "https://en.wikipedia.org/wiki/Artist_#{i}", website: "https://artist#{i}.com" } end Artist.insert_all(artist_records) album_records = 1000.times.map do |i| { title: "Album #{i}", artist_id: (i % 500) + 1, } end Album.insert_all(album_records) Shrine.storages[:cache].clear! Shrine.storages[:store].clear! image_benchmark = Benchmark.measure do Album.find_each do |album| cover_path = "/home/peter/Downloads/Cover.jpg" # Local cover upload. album.cover = File.open(cover_path) album.cover_derivatives! # Generate derivatives via libvips. album.save end end puts "\n=== Image Processing Only ===" puts "User CPU: #{image_benchmark.utime}s" puts "System CPU: #{image_benchmark.stime}s" puts "Real time: #{image_benchmark.real}s" puts "==============================\n" Shrine.storages[:cache].clear! end puts "\n=== Benchmark Results ===" puts "User CPU: #{result.utime}s" puts "System CPU: #{result.stime}s" puts "Real time: #{result.real}s" puts "========================\n" puts GC.stat pp GC.stat_heap ``` </details> Before: ``` {count: 2017, time: 918, marking_time: 559, sweeping_time: 359, heap_allocated_pages: 333, heap_empty_pages: 0, heap_allocatable_slots: 5546, heap_available_slots: 287962, heap_live_slots: 287105, heap_free_slots: 857, heap_final_slots: 0, heap_marked_slots: 269412, heap_eden_pages: 333, total_allocated_pages: 333, total_freed_pages: 0, total_allocated_objects: 3459213, total_freed_objects: 3172108, malloc_increase_bytes: 2259851, malloc_increase_bytes_limit: 16777216, minor_gc_count: 1987, major_gc_count: 30, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 2684, old_objects: 269126, old_objects_limit: 536956, oldmalloc_increase_bytes: 2259851, oldmalloc_increase_bytes_limit: 16777216} {0 => {slot_size: 40, heap_live_slots: 167011, heap_free_slots: 1639, heap_final_slots: 0, heap_eden_pages: 103, heap_eden_slots: 168650, total_allocated_pages: 119, force_major_gc_count: 3, force_incremental_marking_finish_count: 2, total_allocated_objects: 2412984, total_freed_objects: 2245973}, 1 => {slot_size: 80, heap_live_slots: 79897, heap_free_slots: 299, heap_final_slots: 0, heap_eden_pages: 98, heap_eden_slots: 80196, total_allocated_pages: 322, force_major_gc_count: 22, force_incremental_marking_finish_count: 0, total_allocated_objects: 589564, total_freed_objects: 509667}, 2 => {slot_size: 160, heap_live_slots: 33658, heap_free_slots: 273, heap_final_slots: 0, heap_eden_pages: 83, heap_eden_slots: 33931, total_allocated_pages: 189, force_major_gc_count: 3, force_incremental_marking_finish_count: 0, total_allocated_objects: 436213, total_freed_objects: 402555}, 3 => {slot_size: 320, heap_live_slots: 3306, heap_free_slots: 158, heap_final_slots: 0, heap_eden_pages: 17, heap_eden_slots: 3464, total_allocated_pages: 52, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 15741, total_freed_objects: 12435}, 4 => {slot_size: 640, heap_live_slots: 3271, heap_free_slots: 88, heap_final_slots: 0, heap_eden_pages: 33, heap_eden_slots: 3359, total_allocated_pages: 94, force_major_gc_count: 1, force_incremental_marking_finish_count: 0, total_allocated_objects: 4749, total_freed_objects: 1478}} ``` After: ``` {count: 2017, time: 541, marking_time: 205, sweeping_time: 336, heap_allocated_pages: 336, heap_empty_pages: 0, heap_allocatable_slots: 18931, heap_available_slots: 290435, heap_live_slots: 287797, heap_free_slots: 2638, heap_final_slots: 0, heap_marked_slots: 270104, heap_eden_pages: 336, total_allocated_pages: 336, total_freed_pages: 0, total_allocated_objects: 3460006, total_freed_objects: 3172209, malloc_increase_bytes: 2259899, malloc_increase_bytes_limit: 16777216, minor_gc_count: 2009, major_gc_count: 8, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 2663, old_objects: 269818, old_objects_limit: 532620, oldmalloc_increase_bytes: 2259899, oldmalloc_increase_bytes_limit: 18971452} {0 => {slot_size: 40, heap_live_slots: 167111, heap_free_slots: 1545, heap_final_slots: 0, heap_eden_pages: 103, heap_eden_slots: 168656, total_allocated_pages: 126, force_major_gc_count: 3, force_incremental_marking_finish_count: 2, total_allocated_objects: 2413543, total_freed_objects: 2246432}, 1 => {slot_size: 80, heap_live_slots: 80426, heap_free_slots: 595, heap_final_slots: 0, heap_eden_pages: 99, heap_eden_slots: 81021, total_allocated_pages: 360, force_major_gc_count: 2, force_incremental_marking_finish_count: 0, total_allocated_objects: 590103, total_freed_objects: 509677}, 2 => {slot_size: 160, heap_live_slots: 33750, heap_free_slots: 183, heap_final_slots: 0, heap_eden_pages: 83, heap_eden_slots: 33933, total_allocated_pages: 200, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 436028, total_freed_objects: 402278}, 3 => {slot_size: 320, heap_live_slots: 3254, heap_free_slots: 5, heap_final_slots: 0, heap_eden_pages: 16, heap_eden_slots: 3259, total_allocated_pages: 33, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 15623, total_freed_objects: 12369}, 4 => {slot_size: 640, heap_live_slots: 3294, heap_free_slots: 272, heap_final_slots: 0, heap_eden_pages: 35, heap_eden_slots: 3566, total_allocated_pages: 47, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 4747, total_freed_objects: 1453}} ``` Note that major GC count has reduced significantly, from 30 to 8. ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116144 * Author: bluz71 (Dennis B) * Status: Open * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by bluz71 (Dennis B). peterzhu2118 (Peter Zhu) wrote in #note-4:
Thank you for reporting this issue.
I am glad to help, even in this small way.
I debugged it and have a fix here: https://github.com/ruby/ruby/pull/15882
That is great news. I look forward to seeing this merged and eventually released, hopefully in a `4.0.X` release. Thank you very much Peter & byroot. ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116147 * Author: bluz71 (Dennis B) * Status: Open * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE, 4.0: REQUIRED ruby_3_4 commit:981097a7390db47e06d78b760ed198bff8eb184b merged revision(s) commit:8a586af33b59cae93a1bee13c39e87dd087a4a6b. ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116211 * Author: bluz71 (Dennis B) * Status: Closed * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE, 4.0: REQUIRED ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
Issue #21838 has been updated by k0kubun (Takashi Kokubun). Backport changed from 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE, 4.0: REQUIRED to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE, 4.0: DONE ruby_4_0 commit:d5d3fb8b80e81426517bf150ebcab83be4ed996d merged revision(s) commit:8a586af33b59cae93a1bee13c39e87dd087a4a6b. ---------------------------------------- Bug #21838: Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) https://bugs.ruby-lang.org/issues/21838#change-116351 * Author: bluz71 (Dennis B) * Status: Closed * ruby -v: ruby 3.4.0dev (2024-09-09T14:15:21Z v3_4_0_preview2~545 079ef92b5e) * Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONE, 4.0: DONE ---------------------------------------- Hello, I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and `libvips` image library. I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not. My degradation when doing `rails db:reset` (aka database seeding) was about 20% as compared with Ruby 3.3. The last couple of days I looked at this issue more closely and with the help of `git bisect` I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079... is the cause of my regression. That commit implements global allocatable slots and empty pages. The revision before is fast when seeding my Rails application. This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently. I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown Steps to reproduce: 1. `git clone https://github.com/bluz71/rails_app_ruby_slowdown` 2. `cd rails_app_ruby_slowdown` 3. `bundle config set force_ruby_platform true` 4. `bundle install` 5. `bundle exec rails db:create` 6. `RUBY_YJIT_ENABLE=0 time bundle exec rails db:reset` Note, `libvips` needs to be installed at the system level, for example `sudo apt install libvips`, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding. Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7...) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079...) to compare. I hope this is reproducible, if so, I also hope this performance loss can be minimised in future. Best regards. -- https://bugs.ruby-lang.org/
participants (5)
-
bluz71 (Dennis B) -
byroot (Jean Boussier) -
k0kubun (Takashi Kokubun) -
nagachika (Tomoyuki Chikanaga) -
peterzhu2118 (Peter Zhu)