
Issue #21102 has been reported by toy (Ivan Kuchin). ---------------------------------------- Bug #21102: Unexpected encoding when concatenating ASCII string with ASCII compatible string with non ASCII encoding https://bugs.ruby-lang.org/issues/21102 * Author: toy (Ivan Kuchin) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The problem was noticed in code that is boiled down to: ```ruby # encoding: UTF-8 str = "something" p str.encoding # => #<Encoding:UTF-8> p [nil, str].join.encoding # => #<Encoding:US-ASCII> ``` As `nil.to_s` is an empty string with encoding `ASCII` and `"something"` is `ASCII` compatible string, the result is a string with `ASCII` encoding. Event simpler example is `p (nil.to_s + "something").encoding`. Confusing is that resulting encoding depends on order and on compatibility of string encodings: ```ruby # encoding: UTF-8 str1 = "something" # ASCII compatible str2 = "söméthíng" # not ASCII compatible p (nil.to_s + str1).encoding # => #<Encoding:US-ASCII> p (nil.to_s + str2).encoding # => #<Encoding:UTF-8> p (str1 + nil.to_s).encoding # => #<Encoding:UTF-8> p (str2 + nil.to_s).encoding # => #<Encoding:UTF-8> ``` I would expect it to behave akin to summing integers and floats or rationals: ```ruby p 1 + 1.0 # => 2.0 p 1.0 + 1 # => 2.0 p 1 + 1r # => (2/1) p 1r + 1 # => (2/1) ``` So it is at least surprising to me. #18579 is probably the most related, but also #14975 and #20594 -- https://bugs.ruby-lang.org/