[ruby-core:120043] [Ruby master Bug#20919] IO#seek does not clear the character buffer in some cases while transcoding

Issue #20919 has been reported by javanthropus (Jeremy Bopp). ---------------------------------------- Bug #20919: IO#seek does not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` only clears the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). I've reproduced it without transcoding: ```ruby Tempfile.open() do |f| f.write('0123456789') f.rewind f.ungetc('a') # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end # => 'a' ``` ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-111758 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). It works OK with StringIO (unsurprisingly) ``` StringIO.open() do |f| f.write('0123456789') f.rewind f.ungetc('a') # Character buffer WILL NOT be cleared f.seek(2) f.getc end # => "1" ``` ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-111760 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). I rerun tests on 3.5.0 and it's indeed related to transcoding ``` puts "Hello dev-ruby! #{RUBY_VERSION}" require 'tempfile' Tempfile.open() do |f| f.write('0123456789') f.rewind f.ungetc('a') # Character buffer WILL NOT be cleared f.seek(2, :SET) puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open() do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a') # Character buffer WILL NOT be cleared f.seek(2, :SET) puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end ``` ``` Hello dev-ruby! 3.5.0 2 a 2 a2 ``` so the issue happened when encoding was set on `.open`. Also when a non-encoded char was `ungetc'-ed, `getc` returned two characters. ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-111761 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714 ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-111793 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714 Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed). ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-111825 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/

Issue #20919 has been updated by mjrzasa (Maciek Rząsa). Folks, could I ask for a review (and potential merge) on the fix of this issue https://github.com/ruby/ruby/pull/12714? ---------------------------------------- Bug #20919: IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding https://bugs.ruby-lang.org/issues/20919#change-112713 * Author: javanthropus (Jeremy Bopp) * Status: Open * ruby -v: ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When transcoding characters, `IO#seek` and `IO#pos=` only clear the internal character buffer if `IO#getc` is called first: ```ruby require 'tempfile' Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.seek(2, :SET) f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind f.ungetc('a'.encode('utf-16le')) # Character buffer WILL NOT be cleared f.pos = 2 f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le') end Tempfile.open(encoding: 'utf-8:utf-16le') do |f| f.write('0123456789') f.rewind # Added a call to #getc here f.getc f.ungetc('a'.encode('utf-16le')) # Character buffer WILL be cleared now f.seek(2, :SET) # Same behavior for #pos= #f.pos = 2 f.getc # => '2'.encode('utf-16le') end ``` -- https://bugs.ruby-lang.org/
participants (2)
-
javanthropus (Jeremy Bopp)
-
mjrzasa