[ruby-core:118182] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows

Issue #20526 has been reported by kou (Kouhei Sutou). ---------------------------------------- Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows https://bugs.ruby-lang.org/issues/20526 * Author: kou (Kouhei Sutou) * Status: Open * Target version: 3.2 * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I'm not sure whether this is an intentional behavior but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion: ```ruby File.write("a.txt", "a\r\n") File.read("a.txt").bytes # => [97, 13, 10] File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10] ``` Note that the `XXX: ` line the above codes. Is this an intentional behavior? -- https://bugs.ruby-lang.org/

Issue #20526 has been updated by nobu (Nobuyoshi Nakada). Probably a bug at push back after BOM look ahead. BTW, on Windows, `File.write` and `File.read` are in text mode by default. That file would be 4 bytes, "a\r\r\n" in binary. ---------------------------------------- Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows https://bugs.ruby-lang.org/issues/20526#change-108634 * Author: kou (Kouhei Sutou) * Status: Open * Target version: 3.2 * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I'm not sure whether this is an intentional behavior but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion: ```ruby File.write("a.txt", "a\r\n") File.read("a.txt").bytes # => [97, 13, 10] File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10] ``` Note that the `XXX: ` line the above codes. Is this an intentional behavior? -- https://bugs.ruby-lang.org/

Issue #20526 has been updated by YO4 (Yoshinao Muramatsu). There are similar strangeness around an encoding specifiers. preparations ```ruby RUBY_VERSION # => "3.3.5" File.write("a.txt", "a\r\n") File.binread("a.txt").bytes # => [97, 13, 13, 10] ``` experimentations ```ruby File.open("a.txt") {|f| f.read.bytes} # => [97, 13, 10] # expected(msvcrt[_*] newline) File.open("a.txt", "r:utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected File.open("a.txt", "r", encoding: "utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # XXX: universal newline enabled? ``` The omission of the mode parameter seems to enable universal newline. ```ruby File.open("a.txt", "rt:utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline) File.open("a.txt", "rt:bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX File.open("a.txt", "rt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline) File.open("a.txt", "rt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX ``` XXX: This is odd because universal newline and msvcrt newline appear to be cooperating. ---------------------------------------- Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows https://bugs.ruby-lang.org/issues/20526#change-111198 * Author: kou (Kouhei Sutou) * Status: Open * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt] * Backport: 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- I'm not sure whether this is an intentional behavior or not but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion: ```ruby File.write("a.txt", "a\r\n") File.read("a.txt").bytes # => [97, 13, 10] File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10] ``` Note that the `XXX: ` line in the above codes. Is this an intentional behavior? -- https://bugs.ruby-lang.org/
participants (3)
-
kou (Kouhei Sutou)
-
nobu (Nobuyoshi Nakada)
-
YO4 (Yoshinao Muramatsu)