
Issue #20009 has been updated by Eregon (Benoit Daloze). Mmh, but Marshal.dump+load of such non-7-bit modules/classes works on TruffleRuby, although it needs a tiny fix: ```patch diff --git a/src/main/ruby/truffleruby/core/marshal.rb b/src/main/ruby/truffleruby/core/marshal.rb index 102468e774..ea7469ea4a 100644 --- a/src/main/ruby/truffleruby/core/marshal.rb +++ b/src/main/ruby/truffleruby/core/marshal.rb @@ -786,19 +786,19 @@ module Marshal end def construct_class - obj = const_lookup(get_byte_sequence.to_sym, Class) + obj = const_lookup(get_byte_sequence.force_encoding(Encoding::UTF_8).to_sym, Class) store_unique_object obj obj end def construct_module - obj = const_lookup(get_byte_sequence.to_sym, Module) + obj = const_lookup(get_byte_sequence.force_encoding(Encoding::UTF_8).to_sym, Module) store_unique_object obj obj end def construct_old_module - obj = const_lookup(get_byte_sequence.to_sym) + obj = const_lookup(get_byte_sequence.force_encoding(Encoding::UTF_8).to_sym) store_unique_object obj obj end ``` ```ruby class MultibyteぁあぃいClass end source_object = MultibyteぁあぃいClass p Marshal.dump(source_object) p Marshal.load(Marshal.dump(source_object)) ``` ``` $ ruby -v marshal_class.rb truffleruby 25.0.0-dev-a65bde3d, like ruby 3.3.7, Interpreted JVM [x86_64-linux] "\x04\bc\x1FMultibyte\xE3\x81\x81\xE3\x81\x82\xE3\x81\x83\xE3\x81\x84Class" MultibyteぁあぃいClass ``` I think at least if no encoding information is present we should assume UTF-8, because it's by far the most common source encoding. I think there is no value to look up the name in BINARY encoding as currently, such a constant wouldn't even print well. (FWIW TruffleRuby stores constant names as Java Strings, which means no encoding information. I'm not convinced it's a good idea to e.g. have two constants `É` in e.g. UTF-8 and ISO-8859-1 on the same module, it just seems needless confusion. Having non-7-bit BINARY-encoded constants seems no good either.) ---------------------------------------- Bug #20009: Marshal.load raises exception when load dumped class include non-ASCII https://bugs.ruby-lang.org/issues/20009#change-113294 * Author: ippachi (Kazuya Hatanaka) * Status: Open * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ## Reproduction code ```ruby class Cクラス; end Marshal.load(Marshal.dump(Cクラス)) ``` ## Actual result ``` <internal:marshal>:34:in `load': undefined class/module C\xE3\x82\xAF\xE3\x83\xA9\xE3\x82\xB9 (ArgumentError) from marshal.rb:2:in `<main>' ``` ## Expected result Returns `Cクラス` ## Impacted area An exception is raised in Rails under the following conditions * minitest is used with default settings * Parallel execution with parallelize * test class names contain non-ASCII characters The default parallelization uses DRb, and Marshal is used inside DRb. ## Other After trying various things, I thought I could fix it by making `rb_path_to_class` support strings containing non-ASCII characters, but I couldn't find anything more than that. -- https://bugs.ruby-lang.org/