[ruby-core:123453] [Ruby Bug#21634] Combining read(1) with eof? causes dropout of results unexpectedly on Windows.
Issue #21634 has been reported by YO4 (Yoshinao Muramatsu). ---------------------------------------- Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows. https://bugs.ruby-lang.org/issues/21634 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results. ```ruby irb(main):001> IO.binwrite("txt", "abcd\x1A") => 5 irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine "a" "b" "c" "d" "\x1A" irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure "b" "d" irb(main):004> ``` The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
Issue #21634 has been updated by YO4 (Yoshinao Muramatsu). The IO that has mode_enc "rt" will read with O_BINARY but opend with O_TEXT. This leads fill_cbuf using O_TEXT at rb_io_eof unexpectedly. I made [PR #18410](https://github.com/ruby/ruby/pull/14810). ---------------------------------------- Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows. https://bugs.ruby-lang.org/issues/21634#change-114830 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results. ```ruby irb(main):001> IO.binwrite("txt", "abcd\x1A") => 5 irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine "a" "b" "c" "d" "\x1A" irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure "b" "d" irb(main):004> ``` The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
Issue #21634 has been updated by nobu (Nobuyoshi Nakada). YO4 (Yoshinao Muramatsu) wrote in #note-1:
The IO that has mode_enc "rt" will read with O_BINARY but opend with O_TEXT. This leads fill_cbuf using O_TEXT at rb_io_eof unexpectedly.
I made [PR #18410](https://github.com/ruby/ruby/pull/14810).
.\miniruby.exe -v -e "open('txt', 'rt') {|f| p f.read(4); p f.eof?; p f.read}" ruby 3.5.0dev (2025-10-11T06:00:21Z master e8f0e1423b) +PRISM [arm64-mswin64_140] "abcd"
Thank you for the patch. `IO#eof?` behavior seems changing. With "txt" file that its content is "abcd\x1A\r\n", the current `IO#eof?` returns `true` at "\x1A", and further more read stops there. ```console true "" ``` However, with your PR, it seems simply "\x1A" is not considered EOF. ```console
.\miniruby-new.exe -v -e "open('txt', 'rt') {|f| p f.read(4); p f.eof?; p f.read}" ruby 3.5.0dev (2025-10-11T06:05:13Z eof-and-fpos 6e568e9cb2) +PRISM [arm64-mswin64_140] last_commit=Set O_BINARY correctly at rb_io_eof() "abcd" false "\u001A\n"
----------------------------------------
Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows.
https://bugs.ruby-lang.org/issues/21634#change-114833
* Author: YO4 (Yoshinao Muramatsu)
* Status: Open
* ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results.
```ruby
irb(main):001> IO.binwrite("txt", "abcd\x1A")
=> 5
irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine
"a"
"b"
"c"
"d"
"\x1A"
irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure
"b"
"d"
irb(main):004>
The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
./miniruby -v -e "open('txt', 'rt') { |f| p f.read(4); p f.eof?; p f.read(1); f.rewind; p f.readline }" ruby 3.5.0dev (2025-10-10T10:12:35Z master 4bf1475833) +PRISM [x64-mingw-ucrt] "abcd"
ruby -v -e "open('txt', 'r:CP932:UTF-8') { |f| p f.read(4); p f.eof?; p f.read(1); f.rewind; p f.readline }" ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x64-mingw-ucrt] "abcd"
Issue #21634 has been updated by YO4 (Yoshinao Muramatsu). That is interesting behavior I hadn't considered. My understanding is that with 'rt' uses universal newline conversion and 0x1A is treated as a regular character, on both Windows and other platforms. For example: ```ruby true nil "abcd\u001A\n" # => 0x1A is read as regular character ``` On Windows, there is little need to use universal newline conversion alone, but the same applies when using encoding conversion. This might slightly expand the impact. ```ruby true nil "abcd\u001A\n" ``` The behavior of IO#readline is as specified, and the existing behavior you pointed out seems to be unintended. As a future goal, I want to eliminate dependencies on the Microsoft C runtime's read() function, so I want to eliminate any existing unexplained behavior beforehand. In this issue, I was focusing on the file position but my patch also affected the behavior at 0x1A for IO#eof? Unfortunately, since the processes affected by the patch appear to fall outside the use case (eg. character read stream with binary read method), I am unable to determine whether any scripts exist that would be impacted by the changes in this patch. To move forward, is there anything I can do? I would appreciate any advice. ---------------------------------------- Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows. https://bugs.ruby-lang.org/issues/21634#change-114839 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results. ```ruby irb(main):001> IO.binwrite("txt", "abcd\x1A") => 5 irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine "a" "b" "c" "d" "\x1A" irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure "b" "d" irb(main):004> ``` The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
Issue #21634 has been updated by YO4 (Yoshinao Muramatsu). @nobu, I had not correctly understood the 'rt' case in #note-2. In the case where 'universal_newline: true' attribute, Ctrl-Z is not interpreted as an EOF, so I believe the behavior you pointed out is correct. ```
ruby -ve "open('eof.txt', 'rt') { |f| p f.read }" ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt] "abcd\u001A\n"
Additionally, while investigating another case, I observed behavior related to #note-2.
#21687#note-1
Therefore, the C runtime library's eof() appears unreliable.
The current rb_io_eof code originates from #6271, but if io_unread has been improved, we may not need to use eof().
```c
if (rb_w32_fd_is_text(fptr->fd)) {
return RBOOL(eof(fptr->fd));
}
If the above code is unnecessary, everytime we could rely on `RBOOL(io_fillbuf(fptr) < 0)`. However, as pointed out in #21687, the file position becomes incorrect after encountering an intermediate EOF, making subsequent operation results unreliable. That should be discussed in that issue. ---------------------------------------- Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows. https://bugs.ruby-lang.org/issues/21634#change-115215 * Author: YO4 (Yoshinao Muramatsu) * Status: Open * ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results. ```ruby irb(main):001> IO.binwrite("txt", "abcd\x1A") => 5 irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine "a" "b" "c" "d" "\x1A" irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure "b" "d" irb(main):004> ``` The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
Issue #21634 has been updated by hsbt (Hiroshi SHIBATA). Status changed from Open to Assigned Assignee set to windows ---------------------------------------- Bug #21634: Combining read(1) with eof? causes dropout of results unexpectedly on Windows. https://bugs.ruby-lang.org/issues/21634#change-116003 * Author: YO4 (Yoshinao Muramatsu) * Status: Assigned * Assignee: windows * ruby -v: ruby 3.5.0dev (2025-10-03T08:59:54Z master 5b2ec0eb1b) +PRISM [x64-mingw-ucrt] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- On Windows, when reading a file containing EOF(\x1A), using read(1) with IO#eof? causes unexpected dropout of results. ```ruby irb(main):001> IO.binwrite("txt", "abcd\x1A") => 5 irb(main):002> open("txt", "r") { p _1.read(1) until _1.eof? }; # works fine "a" "b" "c" "d" "\x1A" irb(main):003> open("txt", "rt") { p _1.read(1) until _1.eof? }; # has failure "b" "d" irb(main):004> ``` The problem disappeared when I commented out one of the following lines (though this will break other things). * previous_mode = set_binary_mode_with_seek_cur(fptr); in io_read() * flush_before_seek(fptr, false); in set_binary_mode_with_seek_cur(() * io_unread(fptr, discard_rbuf); in flush_before_seek() Within io_unread(), rbuf.len should have changed as 5, 4, 3,... but instead changed as 4, 2,(end). Since inconsistencies already exist at this point, the problem appears to originate elsewhere. I found this in ruby master but the same issue was found at least in ruby-1.9.3-p551. -- https://bugs.ruby-lang.org/
participants (3)
-
hsbt (Hiroshi SHIBATA) -
nobu (Nobuyoshi Nakada) -
YO4 (Yoshinao Muramatsu)