Issue #18899 has been updated by jeremyevans0 (Jeremy Evans).
After more research, it appears the current behavior is expected. Parsing the single
string with embedded colon is already handled correctly. However, if the external
encoding is binary/ASCII-8BIT, then the internal encoding is deliberately set to `nil`:
```c
// in rb_io_ext_int_to_encs
if (ext == rb_ascii8bit_encoding()) {
/* If external is ASCII-8BIT, no transcoding */
intern = NULL;
}
```
Basically, the `'binary:utf-8'` encoding doesn't make sense. Providing two
encodings is done to transcode from one encoding to the other. There is no transcoding if
the external encoding is binary. If you want the internal encoding to be UTF-8, then just
use `'utf-8'`.
That still leaves us with inconsistent behavior between `'binary:utf-8'` and
`'binary', 'utf-8'`. So I propose to make the `'binary',
'utf-8'` behavior the same as `'binary:utf-8'`. I updated my pull request
to do that:
https://github.com/ruby/ruby/pull/6280
An alternative approach would be to remove the above code to treat the external encoding
specially.
----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100263
* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it
does when processing 2 arguments (whether Strings or Encodings) in the case where the
external encoding is being set to binary and the internal encoding is being set to any
other encoding.
This script demonstrates the resulting values of the external and internal encodings for
an IO instance given different ways to equivalently call `#set_encoding`:
```ruby
#!/usr/bin/env ruby
def show(io, args)
printf(
"args: %-50s external encoding: %-25s internal encoding: %-25s\n",
args.inspect,
io.external_encoding.inspect,
io.internal_encoding.inspect
)
end
File.open('/dev/null') do |f|
args = ['binary:utf-8']
f.set_encoding(*args)
show(f, args)
args = ['binary', 'utf-8']
f.set_encoding(*args)
show(f, args)
args = [Encoding.find('binary'), Encoding.find('utf-8')]
f.set_encoding(*args)
show(f, args)
end
```
This behavior is the same from Ruby 2.7.0 to 3.1.2.
--
https://bugs.ruby-lang.org/