July 2023 - ruby-core - ml.ruby-lang.org

[ruby-core:112326] [Ruby master Feature#19430] Contribution wanted: DNS lookup by c-ares library

by mame (Yusuke Endoh)

Issue #19430 has been reported by mame (Yusuke Endoh). ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

7 months

6
15
0 0

[ruby-core:114250] [Ruby master Bug#19778] mkmf.rb pkg_config() interaction with RbConfig::CONFIG["cflags"]

by rhenium (Kazuki Yamaguchi)

Issue #19778 has been reported by rhenium (Kazuki Yamaguchi). ---------------------------------------- Bug #19778: mkmf.rb pkg_config() interaction with RbConfig::CONFIG["cflags"] https://bugs.ruby-lang.org/issues/19778 * Author: rhenium (Kazuki Yamaguchi) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0dev (2023-07-21T09:38:29Z master 22f9735587) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- (This was first reported at https://github.com/ruby/openssl/issues/650: The extension's build breaks in a strange way if `RbConfig::CONFIG["*flags"]` contains the path of an OpenSSL installation but `pkg-config` returns the path of a different OpenSSL installation) Commit commit:097c3e9cbbf23718371f08c24b2d2297b039f63f ("mkmf.rb: -I flags to $INCFLAGS", Ruby 2.2) changed how mkmf's `pkg_config()` handles the result of the `pkg-config` command. It now stores the `-I` flags in $INCFLAGS and others in $CFLAGS. mkmf generates a Makefile that compiles source files with: $(CC) $(INCFLAGS) $(CPPFLAGS) $(CFLAGS) and link the final library with: $(LDSHARED) -o $@ $(OBJS) $(LIBPATH) $(DLDFLAGS) $(LOCAL_LIBS) $(LIBS) This "new" behavior of `pkg_config()` is problematic when `RbConfig::CONFIG["{C,CPP}FLAGS"]` also provide `-I` flags and `RbConfig::CONFIG["LDFLAGS"]` provides the matching `-L` flags -- for example, if Ruby is compiled with `./configure --with-opt-dir=<dir>`. This would end up with compiling sources with [...] -I<from pkg-config> -I<from RbConfig> [...] and then linking with [...] -L<from RbConfig> -L<from pkg-config> [...] This doesn't seem right. I don't know which should come earlier, but the order should be consistent. The commit in question clearly describes the change in the commit message, but it doesn't have a linked issue. What is it intended for? Also, what is $INCFLAGS? On the other hand, `dir_config()` would prepend `-I` flags to $CPPFLAGS and `-L` flags to $LIBPATH, so it doesn't have issues with flags from `RbConfig`, albeit in a differently way. -- https://bugs.ruby-lang.org/

7 months

5
4
0 0

[ruby-core:111526] [Ruby master Bug#19288] Ractor JSON parsing significantly slower than linear parsing

by maciej.mensfeld (Maciej Mensfeld)

Issue #19288 has been reported by maciej.mensfeld (Maciej Mensfeld). ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

7 months, 3 weeks

6
16
0 0

[ruby-core:114181] [Ruby master Bug#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex

by rubyFeedback (robert heiler)

Issue #19767 has been reported by rubyFeedback (robert heiler). ---------------------------------------- Bug #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex https://bugs.ruby-lang.org/issues/19767 * Author: rubyFeedback (robert heiler) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- To get my knowledge about ruby regexes up-to-date I have been going through this tutorial/book here at: https://learnbyexample.github.io/Ruby_Regexp/unicode.html One example they provide is this, with some odd characters: 'fox:αλεπού'.scan(/\w+/n) This will match the found word ("fox"), but it also reports the following warning: warning: historical binary regexp match /.../n against UTF-8 string Now: this may be obvious to others, but to me personally I am not sure what a "historical" binary regexp match actually is. I assume it may have meant that this was more used in the past, and may be discouraged now? Or is something else meant? What does "historical" mean in this context? I may not be the only one who does not fully understand the term historical. Most of ruby's warnings are fairly easy to understand, but this one seems odd. Right now I do not know whether we can use the "n" modifier in a regex - not that I really have a good use case for it (I am using UTF-8 these days, so I don't seem to need ASCII-8BIT anyway), but perhaps the warning could be changed a little. I have no good alternative suggestion how it can be changed, largely because I do not know what it actually means, e. g. what is "historical" about it (but, even then, I'd actually recommend against using the word "historical" because I don't understand what it means; deprecated is easy to understand, historical does not tell me anything). Perhaps it could be expressed somewhat differently and we could get rid of the word "historical" there? Either way, it's a tiny issue so I was not even sure whether to report it. But, from the point of view of other warnings, I believe the term "historical" does not tell the user enough about what the issue is here. (irb):1: warning: historical binary regexp match /.../n against UTF-8 string => ["fox"] -- https://bugs.ruby-lang.org/

7 months, 3 weeks

3
2
0 0

[ruby-core:114038] [Ruby master Bug#19749] Confirm correct behaviour when attaching private method with `#define_method`

by itarato (Peter Arato)

Issue #19749 has been reported by itarato (Peter Arato). ---------------------------------------- Bug #19749: Confirm correct behaviour when attaching private method with `#define_method` https://bugs.ruby-lang.org/issues/19749 * Author: itarato (Peter Arato) * Status: Open * Priority: Normal * ruby -v: 3.3.0 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- This issue is a special case of https://bugs.ruby-lang.org/issues/19745: Should dynamically added private methods via `.singleton_class.send(:define_method,...` be accessible publicly? See the following example? ```ruby private def bar; end foo = Object.new foo.singleton_class.send(:define_method, :bar, method(:bar)) foo.bar # No error. ``` The script above runs fine on latest Ruby 3.3. Is this correct to ignore the fact that the added method (method(:bar)) is private? This came up during a TruffleRuby investigation (https://github.com/oracle/truffleruby/issues/3134) where the result for the same script is: `private method 'bar' called for #<Object:0xc8> (NoMethodError)` -- https://bugs.ruby-lang.org/

7 months, 3 weeks

5
13
0 0

[ruby-core:111450] [Ruby master Bug#19268] Mingw64 Build Failure

by cfis (Charlie Savage)

Issue #19268 has been reported by cfis (Charlie Savage). ---------------------------------------- Bug #19268: Mingw64 Build Failure https://bugs.ruby-lang.org/issues/19268 * Author: cfis (Charlie Savage) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x64-mingw-ucrt] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- For both Ruby 3.1.3 and Ruby 3.2.0, building on msys2/ucrt64 fails: ``` c linking miniruby.exe /usr/bin/sh: -c: line 1: syntax error near unexpected token `(' /usr/bin/sh: -c: line 1: `/usr/local/ruby/bin/ruby --disable=gems -n -e BEGIN{version=ARGV.shift;mis=ARGV.dup} -e END{abort "UNICODE version mismatch: #{mis}" unless mis.empty?} -e (mis.delete(ARGF.path); ARGF.close) if /ONIG_UNICODE_VERSION_STRING +"#{Regexp.quote(version)}"/o 15.0.0 ./enc/unicode/15.0.0/casefold.h ./enc/unicode/15.0.0/name2ctype.h' make: *** [uncommon.mk:878: .rbconfig.time] Error 2 ``` The fix that works for me is changing: ``` c -e '(mis.delete(ARGF.path); ARGF.close) if /ONIG_UNICODE_VERSION_STRING +"#{Regexp.quote(version)}"/o' \ ``` To ``` c -e "(mis.delete(ARGF.path); ARGF.close) if /ONIG_UNICODE_VERSION_STRING +\"#{Regexp.quote(version)}\"/o" \ ``` ``` -- https://bugs.ruby-lang.org/

7 months, 3 weeks

1
1
0 0

[ruby-core:114222] [Ruby master Misc#19772] API Naming for YARP compiler

by jemmai (Jemma Issroff)

Issue #19772 has been reported by jemmai (Jemma Issroff). ---------------------------------------- Misc #19772: API Naming for YARP compiler https://bugs.ruby-lang.org/issues/19772 * Author: jemmai (Jemma Issroff) * Status: Open * Priority: Normal ---------------------------------------- We are working on the YARP compiler, and have [the first PR ready](https://github.com/ruby/ruby/pull/8042) which introduces the YARP compile method. Our only outstanding question before merging it is about naming. How should we expose the public API for YARP's compile method? Potential suggestions: 1. YARP.compile 2. RubyVM::InstructionSequence.compile(yarp: true) 3. RubyVM::InstructionSequence.compile_yarp 4. Any of the above options, with a name other than yarp (please suggest an alternative) Regarding option 1, which would mirror `YARP.parse`, is the top level constant `YARP` acceptable? cc @matz @ko1 -- https://bugs.ruby-lang.org/

8 months

9
30
0 0

[ruby-core:113381] [Ruby master Bug#19624] Backticks - IO object leakage

by pineman

Issue #19624 has been reported by pineman (João Pinheiro). ---------------------------------------- Bug #19624: Backticks - IO object leakage https://bugs.ruby-lang.org/issues/19624 * Author: pineman (João Pinheiro) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Hi, This code works on ruby 3.0.6: ```ruby `echo` ObjectSpace.each_object(IO) do |io| if ![STDIN, STDOUT, STDERR].include?(io) io.close end end ``` but raises `IOError` on 3.2.2: ``` minimal-repro-case.rb:8:in `close': uninitialized stream (IOError) ``` I found it started failing on ruby 3.1.0 and after, on macOS and Linux. This code is useful for closing unneeded IO objects in forked processes. It looks like backticks is 'leaking' IO objects, waiting for GC, and it didn't used to in 3.1.0. In ruby 3.1.0, inside `rb_f_backquote`, `rb_gc_force_recycle` was removed in favor of `RB_GC_GUARD`. I wonder if this has something to do with the problem. Is this code incorrect since ruby 3.1.0 or is it a bug in ruby? Thanks. ---Files-------------------------------- minimal-repro-case.rb (109 Bytes) -- https://bugs.ruby-lang.org/

8 months

4
9
0 0

[ruby-core:113926] [Ruby master Bug#19735] Add support for UUID version 7

by nevans (Nicholas Evans)

Issue #19735 has been reported by nevans (Nicholas Evans). ---------------------------------------- Bug #19735: Add support for UUID version 7 https://bugs.ruby-lang.org/issues/19735 * Author: nevans (Nicholas Evans) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Although the specification for UUIDv7 is still in draft, the UUIDv7 algorithm has been stable as the RFC progresses to completion. Version 7 UUIDs can be very useful, because they are lexographically sortable, which can improve e.g: database index locality. See section 6.10 of the draft specification for further explanation: https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-06.html ```ruby require 'random/formatter' Random.uuid_v7 # => "0188ca50-fcc0-7881-b5c5-6d55cd8fc373" Random.uuid_v7 # => "0188ca51-0069-7304-be2e-0c3cd908789b" Random.uuid_v7 # => "0188ca51-04aa-7b57-a6ec-c49573412a9d" Random.uuid_v7 # => "0188ca51-0853-7979-ae37-485460e9f4f1" # or prng = Random.new prng.uuid_v7 # => "0188ca51-5e72-7950-a11d-def7ff977c98" ``` PR here: https://github.com/ruby/ruby/pull/7953 -- https://bugs.ruby-lang.org/

8 months

3
4
0 0

[ruby-core:113819] [Ruby master Feature#19720] Warning for non-linear Regexps

by Eregon (Benoit Daloze)

Issue #19720 has been reported by Eregon (Benoit Daloze). ---------------------------------------- Feature #19720: Warning for non-linear Regexps https://bugs.ruby-lang.org/issues/19720 * Author: Eregon (Benoit Daloze) * Status: Open * Priority: Normal ---------------------------------------- I believe the best way to solve ReDoS is to ensure all Regexps used in the process are linear. Using `Regexp.timeout = 5.0` or so does not really prevent ReDoS, given enough requests causing that timeout the servers will still be very unresponsive. To this purpose, we should make it easy to identify non-linear Regexps and fix them. I suggest we either use 1. a performance warning (enabled with `Warning[:performance] = true`, #19538) or 2. a new regexp warning category (enabled with `Warning[:regexp] = true`). I think we should warn only once per non-linear Regexp, to avoid too many such warnings. We could warn as soon as the Regexp is created, or on first match. On first match might makes more sense for Ruby implementations which compile the Regexp lazily (since that is costly during startup), and also avoids warning for Regexps which are never used (which can be good or bad). OTOH, if the warning is enabled, we could always compile the Regexp eagerly (or at least checks whether it's linear), and that would then provide a better way to guarantee that all Regexps created so far are linear. Because warnings are easily customizable, it is also possible to e.g. `raise/abort` on such a warning, if one wants to ensure their application does not use a non-linear Regexp and so cannot be vulnerable to ReDoS: ```ruby Warning.extend Module.new { def warn(message, category: nil, **) raise message if category == :regexp super end } ``` A regexp warning category seems better for that as it makes it easy to filter by category, if a performance warning one would need to match the message which is less clean. As a note, TruffleRuby already has a similar warning, as a command-line option: ``` $ truffleruby --experimental-options --warn-truffle-regex-compile-fallback -e 'Gem' truffleruby-dev/lib/mri/rubygems/version.rb:176: warning: Regexp /\A\s*([0-9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?)?\s*\z/ at_start=false encoding=US-ASCII requires backtracking and will not match in linear time truffleruby-dev/lib/mri/rubygems/requirement.rb:105: warning: Regexp /\A\s*(=|!=|>|<|>=|<=|~>)?\s*([0-9]+(?>\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?)\s*\z/ at_start=false encoding=US-ASCII requires backtracking and will not match in linear time ``` So the warning message could be like `FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might not match in linear time and might cause ReDoS` or more concise: `FILE:LINE: warning: Regexp /REGEXP/ requires backtracking and might cause ReDoS` -- https://bugs.ruby-lang.org/

8 months, 1 week

5
10
0 0

2024

2023

2022

ruby-core July 2023