[ruby-core:112433] [Ruby master Bug#19439] Marshal.load doesn't load Regexp instance variables

Issue #19439 has been reported by andrykonchin (Andrew Konchin). ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by Eregon (Benoit Daloze). As a note, if CRuby plans to make all Regexp immutable & no subclass instances (like TruffleRuby already does), then that's another way to fix this. ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-101879 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by byroot (Jean Boussier). After a quick check, it's because the IVAR on Regexps is used to store the encoding: ```c case TYPE_REGEXP: { VALUE str = r_bytes(arg); int options = r_byte(arg); int has_encoding = FALSE; st_index_t idx = r_prepare(arg); if (ivp) { r_ivar(str, &has_encoding, arg); *ivp = FALSE; } ``` I'll dig a bit more, but it may be a fundamental limitation of the format. ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-101890 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by byroot (Jean Boussier). Ah! the explanation is quite funny. So the ivars are loaded fine, however, since the goal is to store the encoding, and that the encoding isn't on the regexp itself, but on the `regexp#source` which is a string, the ivars are assigned on that internal, inacessible string. ```ruby require 'objspace' ObjectSpace.trace_object_allocations_start GC.start gen = GC.count r = Regexp.new("a".b) r.instance_variable_set(:@ivar, []) r2 = Marshal.load(Marshal.dump(r)) puts ObjectSpace.dump_all(output: :string, shapes: false, since: gen) puts '-' * 40 puts ObjectSpace.dump(r2) ``` Filtered output: ``` {"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}} {"address":"0x10323fdf8", "type":"STRING", "shape_id":237, "slot_size":40, "class":"0x102fded98", "frozen":true, "embedded":true, "bytesize":1, "value":"a", "encoding":"US-ASCII", "coderange":"unknown", "references":["0x10323fd80"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":56, "flags":{"wb_protected":true}} {"address":"0x10323fd80", "type":"ARRAY", "shape_id":0, "slot_size":40, "class":"0x102fd33f8", "length":0, "embedded":true, "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":40, "flags":{"wb_protected":true}} ---------------------------------------- {"address":"0x10323fd58", "type":"REGEXP", "shape_id":0, "slot_size":40, "class":"0x102fd2a98", "references":["0x10323fdf8"], "file":"<internal:marshal>", "line":35, "method":"load", "generation":6, "memsize":500, "flags":{"wb_protected":true}} ``` If you follow the references you see `Regexp -> String -> Array(length: 1)`. I think this may be fixable, but the fix may be dirty. ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-101891 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by byroot (Jean Boussier). Ok, I have a patch for it, but it's ugly: https://github.com/ruby/ruby/pull/7323 ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-101892 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by andrykonchin (Andrew Konchin). Thank you! ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-101974 * Author: andrykonchin (Andrew Konchin) * Status: Closed * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by naruse (Yui NARUSE). Backport changed from 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: REQUIRED to 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE ruby_3_2 4e4a4e42b284d9309a7e51c97058093539e7a843 merged revision(s) d2520b7b76759118071a16e6bca22726a5de9fb4. ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-102244 * Author: andrykonchin (Andrew Konchin) * Status: Closed * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/

Issue #19439 has been updated by nagachika (Tomoyuki Chikanaga). When I backported d2520b7b76759118071a16e6bca22726a5de9fb4 into ruby_3_1 on my environment, it fails test with crash report. ``` [ 7489/21711] TestMarshal#test_regexp2<internal:marshal>:34: [BUG] unreachable ruby 3.1.4p222 (2023-03-26 revision 19af12ff19) [x86_64-darwin21] -- Crash Report log information -------------------------------------------- See Crash Report log file in one of the following locations: * ~/Library/Logs/DiagnosticReports * /Library/Logs/DiagnosticReports for more details. Don't forget to include the above Crash Report log file in bug reports. ``` C level backtrace ``` -- C level backtrace information ------------------------------------------- /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_print_backtrace+0xf) [0x10948177b] ../ruby_3_1/vm_dump.c:759 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) ../ruby_3_1/vm_dump.c:1045 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_bugreport) (null):0 /Users/nagachika/opt/ruby-3.1/src/build/ruby(bug_report_end+0x0) [0x10928a748] ../ruby_3_1/error.c:798 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug_without_die) ../ruby_3_1/error.c:798 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_bug+0x6f) [0x1094b6319] ../ruby_3_1/error.c:806 /Users/nagachika/opt/ruby-3.1/src/build/ruby(0x1094b87b2) [0x1094b87b2] /Users/nagachika/opt/ruby-3.1/src/build/ruby(ROBJECT_IV_INDEX_TBL_inline+0x0) [0x10944935f] ../ruby_3_1/variable.c:1648 /Users/nagachika/opt/ruby-3.1/src/build/ruby(obj_ivar_each) ../ruby_3_1/variable.c:1657 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_ivar_foreach) ../ruby_3_1/variable.c:1796 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_entry0+0x0) [0x1092fd845] ../ruby_3_1/marshal.c:1943 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1945 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1f) [0x1092fc32b] ../ruby_3_1/marshal.c:1729 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object_for) ../ruby_3_1/marshal.c:1769 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object0+0x1c) [0x1092f8417] ../ruby_3_1/marshal.c:1729 /Users/nagachika/opt/ruby-3.1/src/build/ruby(r_object) ../ruby_3_1/marshal.c:2197 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_marshal_load_with_proc) ../ruby_3_1/marshal.c:2272 /Users/nagachika/opt/ruby-3.1/src/build/ruby(invoke_bf+0x18) [0x109458a6f] ../ruby_3_1/vm_insnhelper.c:5909 /Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_invoke_builtin_delegate) ../ruby_3_1/vm_insnhelper.c:5936 /Users/nagachika/opt/ruby-3.1/src/build/ruby(vm_exec_core) ../ruby_3_1/insns.def:1487 /Users/nagachika/opt/ruby-3.1/src/build/ruby(rb_vm_exec+0xae5) [0x10946a8e5] (snip) ``` I believe there may be additional bugs related to instance variable management that need to be backport. However, this area of code has undergone significant changes with the introduction of Object Shapes in version 3.2/master. While I plan to continue investigating, I have decided to give up backporting this ticket for the time being. ---------------------------------------- Bug #19439: Marshal.load doesn't load Regexp instance variables https://bugs.ruby-lang.org/issues/19439#change-102552 * Author: andrykonchin (Andrew Konchin) * Status: Closed * Priority: Normal * ruby -v: 3.1.3 * Backport: 2.7: WONTFIX, 3.0: WONTFIX, 3.1: REQUIRED, 3.2: DONE ---------------------------------------- Hello, I've noticed this strange behaviour: ```ruby source_object = Regexp.new("a") source_object.instance_variable_set(:@foo, :bar) regexp = Marshal.load(Marshal.dump(source_object)) regexp.instance_variables # => [] ``` I would expect instance variables to be loaded -- https://bugs.ruby-lang.org/
participants (5)
-
andrykonchin (Andrew Konchin)
-
byroot (Jean Boussier)
-
Eregon (Benoit Daloze)
-
nagachika (Tomoyuki Chikanaga)
-
naruse (Yui NARUSE)