[ruby-core:111815] [Ruby master Bug#19342] String#encode does not always throw exceptions for invalid source encodings

Issue #19342 has been reported by mathieu451 (Math Ieu). ---------------------------------------- Bug #19342: String#encode does not always throw exceptions for invalid source encodings https://bugs.ruby-lang.org/issues/19342 * Author: mathieu451 (Math Ieu) * Status: Open * Priority: Normal * ruby -v: ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [amd64-freebsd13] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Documentation says that String#encode throws Encoding::InvalidByteSequenceError when the string isn't valid in the source encoding, but it does not always do so: ``` "\x99".encode('UTF-8', 'UTF-8') "\x99".force_encoding('UTF-8').encode('UTF-8') ``` In both cases, it returns a string with invalid encoding. But those do throw an exception: ``` "\x99".encode('ISO8859-1', 'UTF-8') "\x99".force_encoding('UTF-8').encode('ISO8859-1') ``` I suppose it's debatable if it could be considered a bug or not. It's a weird case to ask to convert to/from the same encoding, but it happened to me with a loop that tried to interpret a binary string with multiple encodings: ``` input_string = "\x99".force_encoding('US-ASCII') want_encoding = 'UTF-8' %w{ISO8859-1 UTF-8}.each do |try_encoding| s = begin input_string.encode(want_encoding, try_encoding) rescue EncodingError next end process_string s end ``` I expected to get a Encoding::InvalidByteSequenceError exception during the conversion, but instead I got exceptions later on while trying to work on an invalid string that #encode returned. -- https://bugs.ruby-lang.org/

Issue #19342 has been updated by duerst (Martin Dürst). This was discussed in issue 6190. As you already say, it's somehow a weird case. The decision was to make transcoding from an encoding to the same encoding a no-op for performance. There was also some documentation (now in git commit 463633e4a934a00f869086a6ffbf84c6cb8ad630), but it seems to have been lost. That definitely should be fixed. The documentation is now in doc/transcode.rdoc. ---------------------------------------- Bug #19342: String#encode does not always throw exceptions for invalid source encodings https://bugs.ruby-lang.org/issues/19342#change-101228 * Author: mathieu451 (Math Ieu) * Status: Open * Priority: Normal * ruby -v: ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [amd64-freebsd13] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Documentation says that String#encode throws Encoding::InvalidByteSequenceError when the string isn't valid in the source encoding, but it does not always do so: ``` "\x99".encode('UTF-8', 'UTF-8') "\x99".force_encoding('UTF-8').encode('UTF-8') ``` In both cases, it returns a string with invalid encoding. But those do throw an exception: ``` "\x99".encode('ISO8859-1', 'UTF-8') "\x99".force_encoding('UTF-8').encode('ISO8859-1') ``` I suppose it's debatable if it could be considered a bug or not. It's a weird case to ask to convert to/from the same encoding, but it happened to me with a loop that tried to interpret a binary string with multiple encodings: ``` input_string = "\x99".force_encoding('US-ASCII') want_encoding = 'UTF-8' %w{ISO8859-1 UTF-8}.each do |try_encoding| s = begin input_string.encode(want_encoding, try_encoding) rescue EncodingError next end process_string s end ``` I expected to get a Encoding::InvalidByteSequenceError exception during the conversion, but instead I got exceptions later on while trying to work on an invalid string that #encode returned. -- https://bugs.ruby-lang.org/

Issue #19342 has been updated by duerst (Martin Dürst). Status changed from Open to Closed I fixed the documentation (which was moved to doc/string/encode.rdoc by @nobu in commit 468ce1488d) in commit 11f28f3268. I think that this issue can therefore be closed. ---------------------------------------- Bug #19342: String#encode does not always throw exceptions for invalid source encodings https://bugs.ruby-lang.org/issues/19342#change-101234 * Author: mathieu451 (Math Ieu) * Status: Closed * Priority: Normal * ruby -v: ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [amd64-freebsd13] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Documentation says that String#encode throws Encoding::InvalidByteSequenceError when the string isn't valid in the source encoding, but it does not always do so: ``` "\x99".encode('UTF-8', 'UTF-8') "\x99".force_encoding('UTF-8').encode('UTF-8') ``` In both cases, it returns a string with invalid encoding. But those do throw an exception: ``` "\x99".encode('ISO8859-1', 'UTF-8') "\x99".force_encoding('UTF-8').encode('ISO8859-1') ``` I suppose it's debatable if it could be considered a bug or not. It's a weird case to ask to convert to/from the same encoding, but it happened to me with a loop that tried to interpret a binary string with multiple encodings: ``` input_string = "\x99".force_encoding('US-ASCII') want_encoding = 'UTF-8' %w{ISO8859-1 UTF-8}.each do |try_encoding| s = begin input_string.encode(want_encoding, try_encoding) rescue EncodingError next end process_string s end ``` I expected to get a Encoding::InvalidByteSequenceError exception during the conversion, but instead I got exceptions later on while trying to work on an invalid string that #encode returned. -- https://bugs.ruby-lang.org/
participants (2)
-
duerst
-
mathieu451 (Math Ieu)