[ruby-core:113372] [Ruby master Bug#19621] Resolv::Hosts uses ineffective File.read, making using big hosts file 'impossible'

Issue #19621 has been reported by felix.wolfsteller@betterplace.org (Felix Wolfsteller). ---------------------------------------- Bug #19621: Resolv::Hosts uses ineffective File.read, making using big hosts file 'impossible' https://bugs.ruby-lang.org/issues/19621 * Author: felix.wolfsteller@betterplace.org (Felix Wolfsteller) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- By default, `Resolv` will read `/etc/hosts` once. Privacy- and security aware people might use the file to prevent unwanted traffic, developers use it to quickly manipulate address resolution. `Resolv::Hosts` uses [`IO.read`](https://github.com/betterplace/ruby/blob/9b07d30df8c6bf65c2558c023fd64524059...), which seems to be inefficient when dealing with large amounts of data that should be consumed by line. E.g. if you install the `/etc/hosts` additions by [hblock](https://hblock.molinero.dev/hosts) (https://github.com/hectorm/hblock), the first call to resolve an address will likely take **minutes**. We believe the solution is easy: Use streaming `IO.foreach` (see patch and PR attached). Benchmarking with examplary /etc/host from xyz done like this ``` require 'resolv' require 'benchmark' Benchmark.measure do Resolv::Hosts.new.lazy_initialize end ``` With `read`: ... With `foreach`: ... -- https://bugs.ruby-lang.org/

Issue #19621 has been updated by felix.wolfsteller@betterplace.org (Felix Wolfsteller). The reading does not seem to be the problem, but the hash operations. ---------------------------------------- Bug #19621: Resolv::Hosts uses ineffective File.read, making using big hosts file 'impossible' https://bugs.ruby-lang.org/issues/19621#change-102940 * Author: felix.wolfsteller@betterplace.org (Felix Wolfsteller) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- By default on unixoid systems, `Resolv` will read `/etc/hosts` once. Privacy- and security aware people might use the file to prevent unwanted traffic, developers use it to quickly manipulate address resolution. `Resolv::Hosts` uses [`IO.read`](https://github.com/betterplace/ruby/blob/9b07d30df8c6bf65c2558c023fd64524059...), which seems to be inefficient when dealing with large amounts of data that should be consumed by line. E.g. if you install the `/etc/hosts` additions by [hblock](https://hblock.molinero.dev/hosts) (https://github.com/hectorm/hblock), the first call to resolve an address will likely take **minutes**. Unfortunately, replacing `.open ... .each` with `IO.foreach` does not help. Benchmarking with partial examplary `/etc/hosts` from above (172751 line) with this ```ruby require 'resolv' require 'benchmark' Benchmark.measure do Resolv::Hosts.new.lazy_initialize end ``` yields to ``` 25.622515 8.821095 34.443610 ( 34.495448) ``` .. Reading in all the lines into memory first and then consuming them (`File.readlines`) might improve the situation, but is probably not desirable due to memory concerns. -- https://bugs.ruby-lang.org/
participants (1)
-
felix.wolfsteller@betterplace.org (Felix Wolfsteller)