
Issue #20189 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-5:
I must admit I'm not very familiar with wide char encodings, but this surprises me a bit. Ruby strings should always have their terminator, so I don't see how expanding a string would change their interpretation.
It's because in UTF-16 if the number of bytes is not a multiple of 2 then it's CR_BROKEN. Same for UTF-32 if not a multiple of 4. And since `rb_str_resize()` changes the String#bytesize then that condition can change: ```ruby irb(main):002:0> "a".force_encoding(Encoding::UTF_16LE).valid_encoding? => false irb(main):003:0> "a\x00".force_encoding(Encoding::UTF_16LE).valid_encoding? => true ``` ---------------------------------------- Bug #20189: `rb_str_resize` does not clear coderange when expanding https://bugs.ruby-lang.org/issues/20189#change-106261 * Author: tompng (tomoya ishida) * Status: Open * Priority: Normal * ruby -v: ruby 3.4.0dev (2024-01-09T07:07:19Z master db476cc71c) [x86_64-linux] * Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- Expanding string in some encoding (utf16 utf32) can change coderange to either valid or broken, but rb_str_resize does not clear coderange. This will cause a bug in c-extension libraries that use rb_str_resize. ~~~ruby # Example for stringio s = StringIO.new("\0".encode('UTF-16LE')) s.truncate(1); s.truncate(2); s.string.valid_encoding? #=> true s.truncate(1); s.string.valid_encoding?; s.truncate(2); s.string.valid_encoding? #=> false (expect to be true) ~~~ -- https://bugs.ruby-lang.org/