[ruby-core:112926] [Ruby master Misc#19535] Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID`

Issue #19535 has been reported by byroot (Jean Boussier). ---------------------------------------- Misc #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by Eregon (Benoit Daloze). I chatted with @tenderlovemaking. I think it's best to make the too-complex-shape objects use an ordered hash. TruffleRuby doesn't have a too-complex shape, so that case is not a concern for TruffleRuby (currently at least). But indeed with Shapes it's only natural that `Kernel#instance_variables` returns in the order the Shape has, which is the order the ivar was set on that object and not per class. Having to maintain that on the class just to have the previous per-class ordering seems a significant overhead and complication, so I don't consider that a real solution. Notice that if we would always sort ivars in a Shape in an attempt to deduplicate shapes (I think PyPy does), that would affect the ordering which would then be alphabetical or so. But currently neither TruffleRuby nor CRuby do that, and doing that also makes writing ivars more complex (need to move values around). It's anyway good if user code sets variables in the same order for various instances, it's more predictable for everyone. BTW it's probably quite a bad idea to capture `self` in `initialize` in a `Hash` and define `#hash` and not just the default identity hash, that would break with random ordering too. It's also not thread-safe typically (leaks `self` before it's fully initialized). ---------------------------------------- Misc #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-102448 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by Eregon (Benoit Daloze). BTW it'd be nice to have some kind of standard performance warnings for Ruby. For instance there would be one for TOO_COMPLEX Shape, for megamorphic calls, etc. TruffleRuby already has some of these, but an integrated way to enable/disable them seems nice. It's a separate concern from this issue, but it'd be useful to reduce e.g. misusage of ivars which lead to TOO_COMPLEX Shape,. ---------------------------------------- Misc #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-102449 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by byroot (Jean Boussier).
I think it's best to make the too-complex-shape objects use an ordered hash
I would also lean on that solution, as it's the one that give a behavior that is consistent with shapes without too much changes.
TruffleRuby doesn't have a too-complex shape,
Interesting.
BTW it's probably quite a bad idea to capture self in initialize in a Hash and define #hash and not just the default identity hash, that would break with random ordering too.
Of course, that was purely for the purpose of reducing the reproduction script as much as possible. Rails isn't doing something that bad :) So since there seem to be an agreement on the solution, I think there's another thing we may need to discuss, potentially in another issue, is whether instance variables order should be specified or not (I don't think it should) and if not, we should probably add a line in the documentation of `Object#instance_variables` to clearly state not to rely on ordering. ---------------------------------------- Misc #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-102451 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by byroot (Jean Boussier).
BTW it'd be nice to have some kind of standard performance warnings for Ruby.
Interesting. I guess we could have `Warnings[:performance] = true`, with it disabled by default. I love the idea. ---------------------------------------- Bug #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-102453 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by tenderlovemaking (Aaron Patterson). byroot (Jean Boussier) wrote in #note-3:
I think it's best to make the too-complex-shape objects use an ordered hash
I would also lean on that solution, as it's the one that give a behavior that is consistent with shapes without too much changes.
I sent a patch [here](https://github.com/ruby/ruby/pull/7560) that implements this solution. ---------------------------------------- Bug #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-102459 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by gstokkink (WGJ Stokkink). Could this get backported to 3.2? Thanks in advance! ---------------------------------------- Bug #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-103629 * Author: byroot (Jean Boussier) * Status: Closed * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/

Issue #19535 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE ruby_3_2 fa72ba72f8c64fd0fa87c8f68cbc31f2e7b94b00 merged revision(s) 54dbd8bea8a79bfcdefa471c1717c6cd28022f33. ---------------------------------------- Bug #19535: Instance variables order is unpredictable on objects with `OBJ_TOO_COMPLEX_SHAPE_ID` https://bugs.ruby-lang.org/issues/19535#change-103953 * Author: byroot (Jean Boussier) * Status: Closed * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE ---------------------------------------- ### Context I've been helping the Mastodon folks in investigating a weird Marshal deserialization bug they randomly experience since they upgraded to Ruby 3.2: https://github.com/mastodon/mastodon/issues/23644 Ultimately the bug comes from a circular dependency issues in the object graph that is serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object. A simplified reproduction to better explain the problem is: ```ruby class Status def normal_order @attributes = { id: 42 } @relations = { self => 1 } self end def inverse_order @relations = nil @attributes = { id: 42 } @relations = { self => 1 } self end def hash @attributes.fetch(:id) end end s = Marshal.load(Marshal.dump(Status.new.normal_order)) s = Marshal.load(Marshal.dump(Status.new.inverse_order)) ``` In short, that `Status` object is both the top level object, and is referenced as a key in a hash, in that same payload. It also defined a custom `#hash` method, that requires some other attribute to be set. It all "works" as long as `@attributes` is dumped before `@relations`. ### Problem The above micro-reproduction uses two different shapes to demonstrate the ordering issues, but in both case the ordering is predictable. However if you generate too many shapes from a single class, it will be marked as `TOO_COMPLEX` and future instance will have their instance variables backed by an `id_table`, which is unordered, and will cause a similar issue. I definitely consider this a bug on the Rails side, and I will do what I can so that Rails doesn't depend on that implicit ordering. However it's unlikely we'll be able to fix older version, and other users may run into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve some sort of predicable ordering, at least for a few more versions. Additionally, debugging it was made particularly difficult, because it would work fine initially, and then break after enough shapes had been generated. Generally speaking I think such semi-predictable behavior is much worse than a fully random behavior (similar to how Go randomize keys order in their maps). ### Historical behavior On Ruby 3.1 and older, the instance variables ordering was defined by the order in which each ivar appeared for the very first time: ```ruby class Foo def set @a = 1 @b = 2 @c = 3 self end def inverse_order @c = 3 @b = 2 @a = 1 self end end p Foo.new.set.instance_variables # => [:@a, :@b, :@c] p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c] ``` This means that the order could be different from once execution of the program to another, but would remain stable inside a single process. On 3.2, it's now defined by the order in which each ivar appeared in that specific object instance: ```ruby [:@a, :@b, :@c] [:@c, :@b, :@a] ``` Except, if the object is backed by an `id_table`, in which case it's fully unpredictable. ### Possible changes I discussed this with @tenderlovemaking, and he suggested we could change the `id_table` for an `st_table` so that the ordering could be predictable again, and would behave like objects with a non-complex shape. Another possibility would be to preserve the observable behavior of 3.1 and older. Or of course we could clearly specify that the ordering is random, but if so I think it would be wise to make it always random so that this class of bugs has a much higher chance to be caught early in testing rather than in production. cc @Eregon as I presume this has implications on TruffleRuby as well. -- https://bugs.ruby-lang.org/
participants (5)
-
byroot (Jean Boussier)
-
Eregon (Benoit Daloze)
-
gstokkink (WGJ Stokkink)
-
nagachika (Tomoyuki Chikanaga)
-
tenderlovemaking (Aaron Patterson)