Issue #21800 has been updated by Eregon (Benoit Daloze). I think the spec in https://bugs.ruby-lang.org/issues/21839#note-3 is perfect and IMO good to go as-is:
* I propose: *`Dir.scan(path) { |entry_name, entry_type| }` *`Dir.scan(path) # => [[entry_name, entry_type], ...]` * The type is just a symbol, similar to `File::Stat#ftype` * In case of `DT_UNKNOWN`, Ruby issue a `lstat` to obtain the real type (important for portability).
It's simple, doesn't add overhead or complexity in File::Stat and serves its purpose well. We want to avoid extra syscalls for the case that `struct dirent.d_type` is available so we should not auto-resolve symlinks. Also auto-resolving symlinks is a common mistake e.g. when traversing a directories hierarchy as it could go "back up" (and loop infinitely). So lstat() is the simplest and safest choice there. People can always resolve the symlink themselves if that's what they want, but typically they'd want to do some validation first anyway. ---------------------------------------- Feature #21800: `Dir.foreach` and `Dir.each_child` to optionally yield `File::Stat` object alongside the children name https://bugs.ruby-lang.org/issues/21800#change-116275 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- When listing a directory, it's very common to need to know the type of each children, generally because you want to scan recursively. The naive way to do this is to call `stat(2)` for each children, but this is quite costly. This use case is common enough that `readdir` on most modern platforms do expose `struct dirent.d_type`, which allows to know the type of the child without an extra syscall: From the `scandir` manpage:
d_type: This field contains a value indicating the file type, making it possible to avoid the expense of calling lstat(2)
I wrote a quick prototype, and relying on `dirent.d_type` instead of `stat(2)` allows to recursively scan Ruby's repository twice as fast on my machine: https://github.com/ruby/ruby/pull/15667 Given that recursively scanning directories is a common task across many popular ruby tools (`zeitwerk`, `rubocop`, etc), I think it would be very valuable to provide this more efficient interface. In addition, @nobu noticed my prototype, and implemented a nicer version of it, where a `File::Stat` is yielded: https://github.com/ruby/ruby/commit/9acf67057b9bc6f855b2c37e41c1a2f91eae643a In that case the `File::Stat` is lazy, it's only if you access something other than file type, that the actual `stat(2)` call is emitted. I think this API is both more efficient and more convenient. ### Proposed API ```ruby Dir.foreach(path) { |name| } Dir.foreach(path) { |name, stat| } Dir.each_child(path) { |name| } Dir.each_child(path) { |name, stat| } Dir.new(path).each_child { |name| } Dir.new(path).each_child { |name, stat| } Dir.new(path).each { |name| } Dir.new(path).each { |name, stat| } ``` Also important to note, the `File::Stat` is expected to be equivalent to a `lstat(2)` call, as to be able to chose to follow symlinks or not. Basic use case: ```ruby def count_ruby_files(root) count = 0 queue = [root] while dir = queue.pop Dir.each_child(dir) do |name, stat| next if name.start_with?(".") if stat.directory? queue << File.join(dir, name) elsif stat.file? count += 1 if name.end_with?(".rb") end end end count end ``` -- https://bugs.ruby-lang.org/