Issue #19535 has been updated by byroot (Jean Boussier).
I think it's best to make the too-complex-shape
objects use an ordered hash
I would also lean on that solution, as it's the one that give a behavior that is
consistent with shapes without too much changes.
TruffleRuby doesn't have a too-complex shape,
Interesting.
BTW it's probably quite a bad idea to capture self
in initialize in a Hash and define #hash and not just the default identity hash, that
would break with random ordering too.
Of course, that was purely for the purpose of reducing the reproduction script as much as
possible. Rails isn't doing something that bad :)
So since there seem to be an agreement on the solution, I think there's another thing
we may need to discuss, potentially in another issue, is whether instance variables order
should be specified or not (I don't think it should) and if not, we should probably
add a line in the documentation of `Object#instance_variables` to clearly state not to
rely on ordering.
----------------------------------------
Misc #19535: Instance variables order is unpredictable on objects with
`OBJ_TOO_COMPLEX_SHAPE_ID`
https://bugs.ruby-lang.org/issues/19535#change-102451
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
### Context
I've been helping the Mastodon folks in investigating a weird Marshal deserialization
bug they randomly experience since they upgraded to Ruby 3.2:
https://github.com/mastodon/mastodon/issues/23644
Ultimately the bug comes from a circular dependency issues in the object graph that is
serialized when one call `Marshal.dump` on an `ActiveRecord::Base` object.
A simplified reproduction to better explain the problem is:
```ruby
class Status
def normal_order
@attributes = { id: 42 }
@relations = { self => 1 }
self
end
def inverse_order
@relations = nil
@attributes = { id: 42 }
@relations = { self => 1 }
self
end
def hash
@attributes.fetch(:id)
end
end
s = Marshal.load(Marshal.dump(Status.new.normal_order))
s = Marshal.load(Marshal.dump(Status.new.inverse_order))
```
In short, that `Status` object is both the top level object, and is referenced as a key in
a hash, in that same payload. It also defined a custom `#hash` method, that requires some
other attribute to be set.
It all "works" as long as `@attributes` is dumped before `@relations`.
### Problem
The above micro-reproduction uses two different shapes to demonstrate the ordering issues,
but in both case the ordering is predictable.
However if you generate too many shapes from a single class, it will be marked as
`TOO_COMPLEX` and future instance will have their instance variables backed by an
`id_table`, which is unordered, and will cause a similar issue.
I definitely consider this a bug on the Rails side, and I will do what I can so that Rails
doesn't depend on that implicit ordering.
However it's unlikely we'll be able to fix older version, and other users may run
into this issue when upgrading to Ruby 3.2, so I think it may be worth to try to preserve
some sort of predicable ordering, at least for a few more versions.
Additionally, debugging it was made particularly difficult, because it would work fine
initially, and then break after enough shapes had been generated. Generally speaking I
think such semi-predictable behavior is much worse than a fully random behavior (similar
to how Go randomize keys order in their maps).
### Historical behavior
On Ruby 3.1 and older, the instance variables ordering was defined by the order in which
each ivar appeared for the very first time:
```ruby
class Foo
def set
@a = 1
@b = 2
@c = 3
self
end
def inverse_order
@c = 3
@b = 2
@a = 1
self
end
end
p Foo.new.set.instance_variables # => [:@a, :@b, :@c]
p Foo.new.inverse_order.instance_variables # => [:@a, :@b, :@c]
```
This means that the order could be different from once execution of the program to
another, but would remain stable inside a single process.
On 3.2, it's now defined by the order in which each ivar appeared in that specific
object instance:
```ruby
[:@a, :@b, :@c]
[:@c, :@b, :@a]
```
Except, if the object is backed by an `id_table`, in which case it's fully
unpredictable.
### Possible changes
I discussed this with @tenderlovemaking, and he suggested we could change the `id_table`
for an `st_table` so that the ordering could be predictable again, and would behave like
objects with a non-complex shape.
Another possibility would be to preserve the observable behavior of 3.1 and older.
Or of course we could clearly specify that the ordering is random, but if so I think it
would be wise to make it always random so that this class of bugs has a much higher chance
to be caught early in testing rather than in production.
cc @Eregon as I presume this has implications on TruffleRuby as well.
--
https://bugs.ruby-lang.org/