[ruby-core:111247] [Ruby master Feature#19191] Implicit console input transcoding is more desirable

Issue #19191 has been reported by YO4 (Yoshinao Muramatsu). ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Priority: Normal ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/

Issue #19191 has been updated by Eregon (Benoit Daloze). YO4 (Yoshinao Muramatsu) wrote:
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
I'm against more inconsistent corner cases like this for `set_encoding`. Probably IRB should be fixed here to inherit the original `$stdin` external and internal encodings? ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191#change-100543 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Priority: Normal ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/

Issue #19191 has been updated by YO4 (Yoshinao Muramatsu). I agree that the IRB issue should be corrected on the part of the IRB. My point was that for certain devices, external_encoding on read can be fixed to the device's specification. In that case, external_encoding is not used when internal_encoding is specified, and if only external_encoding is specified, it is treated as a conversion from device_encoding to external_encoding. Input from the console will be treated as locale encoding. ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191#change-100723 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Priority: Normal ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/

Issue #19191 has been updated by YO4 (Yoshinao Muramatsu). Not sure if this is appropriate for this topic, Consider the case where UTF-16 reading from the console will be supported in the future. For explicit encoding ``` p [STDIN.external_encoding, STDIN.internal_encoding] => ["UTF-16LE", "UTF-8"]. ``` For implicit encoding ``` p [STDIN.external_encoding, STDIN.internal_encoding] => ["UTF-8", nil]. ``` And I think the console output implicitly uses UTF-16LE as device encoding. ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191#change-100724 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * Priority: Normal ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/

Issue #19191 has been updated by YO4 (Yoshinao Muramatsu). irb changes $stdin.{external,internal}_encoding. This causes gets() to no longer return the correct content in irb. ``` C:\>chcp 現在のコード ページ: 932 C:\>ruby -e "p [STDIN.external_encoding, STDIN.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] C:\>ruby -e "gets.then { p [_1, _1.encoding] }" あ ["あ\n", #<Encoding:UTF-8>] C:\>irb irb(main):001> p [STDIN.external_encoding, STDIN.internal_encoding]; [#<Encoding:UTF-8>, nil] irb(main):002> gets.then { p [_1, _1.encoding] }; あ ["\x82\xA0\n", #<Encoding:UTF-8>] irb(main):003> ``` It seems that making changes on the irb side would have a negative impact on test, etc. I think it is more reliable to deal with this on the ruby.exe side. ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191#change-110735 * Author: YO4 (Yoshinao Muramatsu) * Status: Open ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/

Issue #19191 has been updated by YO4 (Yoshinao Muramatsu). POC code here https://github.com/ruby/ruby/pull/12055 However for actual implementation for Unicode input I recommend the method larskanis does in https://github.com/ruby/ruby/pull/11799 . ---------------------------------------- Feature #19191: Implicit console input transcoding is more desirable https://bugs.ruby-lang.org/issues/19191#change-110736 * Author: YO4 (Yoshinao Muramatsu) * Status: Open ---------------------------------------- In response to Bug #18353, STDIN.internal_encoding are set and encoding is converted explcitly on Windows platform. For example, ```[STDIN.external_encoding, STDIN.internal_encoding] # => [Encoding::Windows-31J, Encoding::UTF-8]``` if STDIN is console. I feel that internal_encoding should be reserved for specific applications. And I think setting internal_encoding to STDIN is not foreseened. Today I found irb breaks STDIN encoding, like ```
ruby -rirb -e "p [$stdin.external_encoding, $stdin.internal_encoding]; IRB.setup(''); IRB::Irb.new(); p [$stdin.external_encoding, $stdin.internal_encoding]" [#<Encoding:Windows-31J>, #<Encoding:UTF-8>] [#<Encoding:UTF-8>, nil]
We know input has console code page encoding. So we always can convert encoding from console code page to io_input_encoding().
### proposal
when reading from console on Windows, input encoding is enfoced to console code page and encoding conversion is implicitly applied.
when ```set_encoding("UTF-8")``` implicitly converts console code page to UTF-8.
when ```set_encoding("CP437", "UTF-8")``` implicitly converts console code page to UTF-8. external_encoding is ignored.
binmode or binary input method is not affected by these specifications.
set_encoding, etc. will continue to work as before, and this specification will affect only when encoding conversion on read (NEED_READCONV() and make_readconv()).
--
https://bugs.ruby-lang.org/
participants (2)
-
Eregon (Benoit Daloze)
-
YO4 (Yoshinao Muramatsu)