Issue #21800 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-11:
In principle in make sense, but in practice that other method returns strings, and weird ones at that:
Identifies the type of stat. The return string is one of: `“file”`, `“directory”`, `“characterSpecial”`, `“blockSpecial”`, `“fifo”`, `“link”`, `“socket”`, or `“unknown”`.
If we yield symbols instead, it's already not super consistent, but also I think it would be weird to yield camel cased symbols like `:characterSpecial` or `:blockSpecial`.
I noticed as well but those are pretty "special" files so I think it's not a big deal to use camelCase there. Even with these 2 camelCase names I do think they are much better and clearer than `DT_` constants (`DT_BLK, DT_CHR, DT_DIR, DT_FIFO, DT_LNK, DT_REG, DT_SOCK, DT_UNKNOWN`). We could do `:character_special` and `:block_special`, I think that's fine too. My point is the DT_ constants are unreadable/cryptic so we shouldn't use that, and the names returned by `File::Stat#ftype` are much better. ---------------------------------------- Feature #21800: `Dir.foreach` and `Dir.each_child` to optionally yield `File::Stat` object alongside the children name https://bugs.ruby-lang.org/issues/21800#change-116134 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively. The naive way to do this is to call `stat(2)` for each children, but this is quite costly. This use case is common enough that `readdir` on most modern platforms do expose `struct dirent.d_type`, which allows to know the type of the child without an extra syscall: From the `scandir` manpage:
d_type: This field contains a value indicating the file type, making it possible to avoid the expense of calling lstat(2)
I wrote a quick prototype, and relying on `dirent.d_type` instead of `stat(2)` allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667 Given that recursively scanning directories is a common task across many popular ruby tools (`zeitwerk`, `rubocop`, etc), I think it would be very valuable to provide this more efficient interface. In addition, @nobu noticed my prototype, and implemented a nicer version of it, where a `File::Stat` is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a In that case the `File::Stat` is lazy, it's only if you access something other than file type, that the actual `stat(2)` call is emitted. I think this API is both more efficient and more convenient. ### Proposed API ```ruby Dir.foreach(path) { |name| } Dir.foreach(path) { |name, stat| } Dir.each_child(path) { |name| } Dir.each_child(path) { |name, stat| } Dir.new(path).each_child { |name| } Dir.new(path).each_child { |name, stat| } Dir.new(path).each { |name| } Dir.new(path).each { |name, stat| } ``` Also important to note, the `File::Stat` is expected to be equivalent to a `lstat(2)` call, as to be able to chose to follow symlinks or not. Basic use case: ```ruby def count_ruby_files(root) count = 0 queue = [root] while dir = queue.pop Dir.each_child(dir) do |name, stat| next if name.start_with?(".") if stat.directory? queue << File.join(dir, name) elsif stat.file? count += 1 if name.end_with?(".rb") end end end count end ``` -- https://bugs.ruby-lang.org/