[ruby-core:112398] [Ruby master Feature#19435] Expose counts for each GC reason in GC.stat

Issue #19435 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by rubyFeedback (robert heiler). I love introspection ever since I used the language Io many years ago; at a later time ruby also got better introspection (no idea if related to Io or not), so more information about e. g. GC is great so +1. \o/ The GC is a mystery to me, unfortunately, as I have never properly learned C. In libui, kojix2 uses some references to avoid GC being run, to avoid sudden crashes. Without a real understanding of the GC and C I feel that there will always be some parts of ruby that will remain a mystery - so that is another reason why more introspection is nice in general. ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-101840 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by Eregon (Benoit Daloze). Do we need the minor/major prefix? Or would it be good enough without? Also these names are fairly cryptic, what do they mean? Probably there should be some docs for that. I think counting calls to `GC.start` would be useful (any high number there means someone is calling `GC.start` repeatedly and should not). Maybe for `GC.stress` too. ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-101845 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by byroot (Jean Boussier).
Do we need the minor/major prefix?
I believe we do. Generally speaking what you really want to reduce is major GC. There are case where minor GC might trigger too much leading in performance issues (in part because it end up promoting objects to the old generation too quickly), but for the most part it's the majors you want to avoid.
Also these names are fairly cryptic, what do they mean? Probably there should be some docs for that.
Yes, if these are accepted I'll certainly document them in the `GC.stat` method.
I think counting calls to GC.start would be useful
Our initial patch had those but I removed it to limit the number of extra keys, and because it's not a cause you should see in production, ever, aside from pre-fork memory optimization etc. Same if not worse for `GC.stress`. There is 0 reason why it would trigger GC in production. ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-101848 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-3:
Our initial patch had those but I removed it to limit the number of extra keys, and because it's not a cause you should see in production, ever, aside from pre-fork memory optimization etc.
It shouldn't doesn't mean it doesn't unfortunately. I remember some horrible hack in oj which disabled and enabled the GC around a piece of C code for instance. So I think it's very valuable to add that and it makes a lot of sense with other keys added here. If it's low all good, if it's high, it is worth investigating, some gem is probably misbehaving. Also this is a key I would happily add to TruffleRuby and I suspect JRuby would add it too (it's a very bad idea to force GC on JVM, that's very slow and destroys GC heuristics). ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-101884 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by ko1 (Koichi Sasada). Now we can measure this kind of statistics with C-extension (https://github.com/ko1/gc_tracer is one example). Is it so important to have in core? ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-102330 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by byroot (Jean Boussier).
Now we can measure this kind of statistics with C-extension
Yes, but unfortunately GC hooks have the adverse effect of disallowing allocation fast path, so I'd rather not go this route.
Is it so important to have in core?
I think it would be useful for various application performance monitoring tools to be able to alert users on this kind of issues without having to use a C extension.
matz: stat gets bloated is kind of worrying.
I'm not too concerned about this, it's an advanced API for instrumentation purposes, and the doc clearly state: "The contents of the hash are implementation specific and may be changed in the future.". ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-102335 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-6:
matz: stat gets bloated is kind of worrying.
I'm not too concerned about this, it's an advanced API for instrumentation purposes, and the doc clearly state: "The contents of the hash are implementation specific and may be changed in the future.".
Rails calls `GC.stat(:total_allocated_objects)` on every request (https://github.com/rails/rails/blob/2eed4dc0afd1b82b2c12c6f77ab7271e72699168...), so one thing to keep in mind is not slowing down accessing just one value of GC.stat. ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-102356 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by byroot (Jean Boussier).
Rails calls GC.stat(:total_allocated_objects) on every request
Yes, but it's actually a bit silly because it only works properly with non-threaded servers (e.g. Unicorn).
one thing to keep in mind is not slowing down accessing just one value of GC.stat
It's implemented as many `if (key == XXX) return attrs[XXX_offset]`. So appending new keys don't slow down pre-existing ones. It could probably be improved though. ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-102617 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/

Issue #19435 has been updated by ko1 (Koichi Sasada). Sorry for late response.
Yes, but unfortunately GC hooks have the adverse effect of disallowing allocation fast path, so I'd rather not go this route.
You are correct and we can ignore gc_enter/gc_exit events here. So we can avoid this demerit.
Is it so important to have in core? I think it would be useful for various application performance monitoring tools to be able to alert users on this kind of issues without having to use a C extension.
I agree it is convenient if Rails itself (or major gems) monitors this kind of counters. However if a gem monitoring this kind of counters is provided, I think there is no reason to avoid C-extension. For example, we can introduce `major_gc_oldgen_time` and so on, but I think it is too much. I think this proposal is also too much. Too trivial reason I against is we want to introduce Ractor local GC and this kind of memory space should be allocated for each ractor, and I don't want to make it bigger and bigger (but again, it is not important issue). ---------------------------------------- Feature #19435: Expose counts for each GC reason in GC.stat https://bugs.ruby-lang.org/issues/19435#change-102998 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context We recently tuned the GC settings on our monolith application because we were seeing some very long GC pauses (multiple seconds) during some requests. Very early we realized that we could know how often the GC was triggered, and how long it was taking, but we had no information as to why, hence no good way to know which specific configuration to tune. As of today, the only way to get this information is to compile Ruby with debug counters, but that's not really accessible for most users, and not very suitable to be deployed in production. So we patched our Ruby to expose counters for each specific reason in `GC.stat` and this data was extremely valuable. For instance we discovered that the number 1 cause of major GC was `shady` objects, which allowed us to both better tune or GC and to drive some targeted patches to Ruby. ### Proposal We'd like to merge the patch we used on our Ruby build. It expose 8 new keys in `GC.stat`: - `:major_gc_nofree_count` - `:major_gc_oldgen_count` - `:major_gc_shady_count` - `:major_gc_newobj_count` - `:major_gc_malloc_count` - `:major_gc_oldmalloc_count` - `:minor_gc_newobj_count` - `:minor_gc_malloc_count` Some very uncommon reasons like `force` etc are ignored as they're not valuable. Also note that sometimes multiple conditions can be met to trigger GC, in such case we my increment several counters, so the sum of `major_gc_*_count` can be higher than `major_gc_count`. Proposed patch: https://github.com/ruby/ruby/pull/7250 -- https://bugs.ruby-lang.org/
participants (6)
-
byroot (Jean Boussier)
-
byroot (Jean Boussier)
-
Eregon (Benoit Daloze)
-
Eregon (Benoit Daloze)
-
ko1 (Koichi Sasada)
-
rubyFeedback (robert heiler)