- ruby-core - ml.ruby-lang.org

[ruby-core:117753] [Ruby master Bug#20466] Interpolated regular expressions have different encoding than interpolated strings

by tenderlovemaking (Aaron Patterson)

Issue #20466 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Bug #20466: Interpolated regular expressions have different encoding than interpolated strings https://bugs.ruby-lang.org/issues/20466 * Author: tenderlovemaking (Aaron Patterson) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When the encoding is set to US-ASCII, interpolated strings can have different encoding than interpolated regular expressions. I think they should have the same encoding: ```ruby # encoding: US-ASCII t0 = '\\xc1' t1 = "#{t0}" re = /#{t0}/ p [t0.encoding, t1.encoding, re.encoding] ``` Output is: ``` $ ./miniruby -v test.rb ruby 3.4.0dev (2024-05-02T15:27:18Z master 7c0cf71049) [arm64-darwin23] [#<Encoding:US-ASCII>, #<Encoding:US-ASCII>, #<Encoding:BINARY (ASCII-8BIT)>] ``` -- https://bugs.ruby-lang.org/

3 days, 2 hours

1
0
0 0

[ruby-core:117748] [Ruby master Bug#20463] Ruby fails to generate warning for require syslog

by tyatin (Serg Tyatin)

Issue #20463 has been reported by tyatin (Serg Tyatin). ---------------------------------------- Bug #20463: Ruby fails to generate warning for require syslog https://bugs.ruby-lang.org/issues/20463 * Author: tyatin (Serg Tyatin) * Status: Open * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- ``` docker run -it ruby:3.3.1 bash touch Gemfile bundle bundle exec ruby -e "require '/usr/local/lib/ruby/3.3.0/x86_64-linux/syslog'" /usr/local/lib/ruby/3.3.0/bundled_gems.rb:130:in `<': comparison of String with nil failed (ArgumentError) msg = " #{RUBY_VERSION < SINCE[gem] ? "will no longer be" : "is not"} part of the default gems since Ruby #{SINCE[gem]}." ^^^^^^^^^^ from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:130:in `build_message' from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:126:in `warning?' from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:71:in `block (2 levels) in replace_require' from -e:1:in `<main>' ``` the same happens on M1 mac -- https://bugs.ruby-lang.org/

3 days, 8 hours

2
1
0 0

[ruby-core:117746] [Ruby master Bug#20462] Native threads are no longer reused

by tenderlovemaking (Aaron Patterson)

Issue #20462 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Bug #20462: Native threads are no longer reused https://bugs.ruby-lang.org/issues/20462 * Author: tenderlovemaking (Aaron Patterson) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Ruby used to reuse native threads in order to amortize the cost of making a pthread. For example this program: ```ruby ntids = 1000.times.map { Thread.new { Thread.current.native_thread_id }.value } p ntids.uniq.length ``` With Ruby 3.2.0, this would return 1. With Ruby 3.3.x, it returns 1000. It means we cannot amortize the cost of a pthread for short lived threads. I was able to bisect this to commit be1bbd5b7d40ad863ab35097765d3754726bbd54. But the change is big so I don't know how to fix it. -- https://bugs.ruby-lang.org/

3 days, 21 hours

2
1
0 0

[ruby-core:117680] [Ruby master Bug#20451] Bad Ruby 3.1.5 backport causes fiddle to fail to build

by Bo98 (Bo Anderson)

Issue #20451 has been reported by Bo98 (Bo Anderson). ---------------------------------------- Bug #20451: Bad Ruby 3.1.5 backport causes fiddle to fail to build https://bugs.ruby-lang.org/issues/20451 * Author: Bo98 (Bo Anderson) * Status: Open * ruby -v: 3.1.5p252 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Ruby 3.1.5 shipped with the following backport: https://github.com/ruby/ruby/commit/84f2aabd272a54e79979795d2d405090704a1d07 However this backport directly breaks the build: ``` closure.c:279:60: error: use of undeclared identifier 'data' result = ffi_prep_closure(pcl, cif, callback, (void *)(data->self)); ^ ``` The original commit (https://github.com/ruby/fiddle/commit/2530496602) was updating the second branch to match the change in the first branch a couple lines up. However that change in the other branch does not exist in Ruby 3.1. The commit in question requires a previous commit of https://github.com/ruby/fiddle/commit/81a8a56239973ab7559229830a449d201955b…. The backport should either be reverted or an other commit should also be backported. Note that these commits were in a series of many commits made to fix an upstream issue https://github.com/ruby/fiddle/issues/102 so I cannot vouch whether or not the two commits are sufficient to fix the originally reported issue. -- https://bugs.ruby-lang.org/

3 days, 22 hours

3
4
0 0

[ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public

by ms-tob (Matt S)

Issue #20448 has been reported by ms-tob (Matt S). ---------------------------------------- Feature #20448: Make coverage event hooking C API public https://bugs.ruby-lang.org/issues/20448 * Author: ms-tob (Matt S) * Status: Open ---------------------------------------- # Abstract Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API. # Background Ruby currently provides a number of avenues for hooking events *or* gathering coverage information: 1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module 2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module 3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hoo… extension function Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events. # Proposal The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete. The good news is that much of this functionality already exists, but it's part of the private, internal-only C API. 1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184 a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-… 2. Allow initializing global coverage state so that coverage tracking can be fully enabled a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120. b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state. I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`: ```c #include <ruby.h> #include <ruby/debug.h> #define RUBY_EVENT_COVERAGE_BRANCH 0x020000 // ... rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH; rb_event_hook_flag_t flags = ( RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG ); rb_add_event_hook2( (rb_event_hook_func_t) event_hook_branch, events, counter_hash, flags ); ``` If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features. The above two points would be requirements for this functionality, but there's an additional nice-to-have: 3. Extend the public `tracearg` functionality to include additional coverage information a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event. # Use cases This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided…. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L…. So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution. We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9 # Discussion Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal. The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case. # See also - https://bugs.ruby-lang.org/issues/20282 - https://github.com/google/atheris - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html - https://github.com/CodeIntelligenceTesting/jazzer/ - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer - https://go.dev/doc/security/fuzz/ -- https://bugs.ruby-lang.org/

4 days, 7 hours

2
5
0 0

[ruby-core:117714] [Ruby master Bug#20457] Final `return` is eliminated from the AST

by tenderlovemaking (Aaron Patterson)

Issue #20457 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Bug #20457: Final `return` is eliminated from the AST https://bugs.ruby-lang.org/issues/20457 * Author: tenderlovemaking (Aaron Patterson) * Status: Open * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Given the following code: ```ruby def foo a = 1 return a end ``` If you parse this with RubyVM::AbstractSyntaxTree, the AST will be missing the `return` node. Of course the `return` node isn't necessary for compilation, but would be required for building an LSP for example. Here's a full program to demonstrate: ```ruby ast = RubyVM::AbstractSyntaxTree.parse DATA.read pp ast # Output is like this: # # (SCOPE@1:0-4:3 # tbl: [] # args: nil # body: # (DEFN@1:0-4:3 # mid: :foo # body: # (SCOPE@1:0-4:3 # tbl: [:a] # args: (ARGS@1:7-1:7 pre_num: 0 pre_init: nil opt: nil first_post: nil post_num: 0 post_init: nil rest: nil kw: nil kwrest: nil block: nil) # body: (BLOCK@2:2-3:10 (LASGN@2:2-2:7 :a (INTEGER@2:6-2:7 1)) (LVAR@3:9-3:10 :a))))) __END__ def foo a = 1 return a end ``` Btw, I'm happy to write failing tests for this type of stuff I'm just not sure where to put it! :) -- https://bugs.ruby-lang.org/

4 days, 22 hours

3
3
0 0

[ruby-core:117706] [Ruby master Bug#20454] IRB echoes excessive input in dumb terminal

by zonuexe (Kenta USAMI)

Issue #20454 has been reported by zonuexe (Kenta USAMI). ---------------------------------------- Bug #20454: IRB echoes excessive input in dumb terminal https://bugs.ruby-lang.org/issues/20454 * Author: zonuexe (Kenta USAMI) * Status: Open * ruby -v: ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When IRB is started on a terminal with the environment variable TERM=dumb, excessive output is generated as shown below. A simple terminal such as Emacs shell-mode or comint-mode is assumed, but it should be possible to reproduce the following with a rich terminal such as iTerm. Type 1[RET], 12[RET], 123[RET] on your keyboard in this order. % TERM=dumb irb irb(main):001> 1 irb(main):001> 1=> 1 irb(main):002> 12 irb(main):002> 1irb(main):002> 12=> 12 irb(main):003> 123 irb(main):003> 1irb(main):003> 12irb(main):003> 123=> 123 As arton-san says, you can avoid the problem by not using ReadlineInputMethod. https://twitter.com/arton/status/1783008921000804630 -- https://bugs.ruby-lang.org/

5 days, 3 hours

2
1
0 0

[ruby-core:117733] [Ruby master Bug#20461] Unreadable pipe included in the readable IO of IO.select

by yamam (Masanari Yamamoto)

Issue #20461 has been reported by yamam (Masanari Yamamoto). ---------------------------------------- Bug #20461: Unreadable pipe included in the readable IO of IO.select https://bugs.ruby-lang.org/issues/20461 * Author: yamam (Masanari Yamamoto) * Status: Open * ruby -v: ruby 3.4.0dev (2024-04-27T12:55:28Z master c844968b72) [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- When executing the following script, pipe_r is not supposed to be readable because no writing is done to pipe_w, but pipe_r is included in the return value rs of IO.select. Since it is not possible to read from pipe_r, the IO::EAGAINWaitReadable exception is raised. ```ruby pipe_r, pipe_w = IO.pipe 1000.times do |i| File.popen(['seq', '1', '1000']) do |popen_r| j = 0 while true rs, ws, = IO.select([popen_r, pipe_r]) if rs.include?(popen_r) unless popen_r.gets break end end if rs.include?(pipe_r) puts "IO.select BUG pipe_r is not readable! i = #{i} j = #{j}" p pipe_r.read_nonblock(1) end j += 1 end end end ``` ``` $ ruby select.rb IO.select BUG pipe_r is not readable! i = 0 j = 20 <internal:io>:63:in `read_nonblock': Resource temporarily unavailable - read would block (IO::EAGAINWaitReadable) from select.rb:14:in `block (2 levels) in <main>' from select.rb:3:in `popen' from select.rb:3:in `block in <main>' from select.rb:2:in `times' from select.rb:2:in `<main>' [1] 73732 exit 1 ruby select.rb ``` -- https://bugs.ruby-lang.org/

5 days, 14 hours

1
1
0 0

[ruby-core:117738] [Ruby master Feature#15438] Threads can't switch faster than TIME_QUANTUM_(NSEC|USEC|MSEC)

by jhawthorn (John Hawthorn)

Issue #15438 has been updated by jhawthorn (John Hawthorn). Status changed from Open to Closed I think this is something we should improve more (I would like even faster switching times), but it does seem possible as of Ruby 3.3 to have threads switch faster than 100ms. ``` def test_switching(priority = 0) done = false started = false busy = Thread.new do Thread.current.priority = priority until done started = true end end Thread.pass until started times = [] while times.length < 10 before = Process.clock_gettime(Process::CLOCK_MONOTONIC) Thread.pass after = Process.clock_gettime(Process::CLOCK_MONOTONIC) times << (after - before) end done = true busy.join times end puts RUBY_VERSION (-3).upto(3) do |priority| print "Priority: #{priority}" times = test_switching(priority) times = times.sort[1..-2] # drop fastest and slowest average = (times.sum / times.length) puts " average: #{average}" end ``` ``` $ ruby test_measure_switching_times.rb 3.2.2 Priority: -3 average: 0.10010529025021242 Priority: -2 average: 0.10010091188087245 Priority: -1 average: 0.10010025675364886 Priority: 0 average: 0.10010364262052462 Priority: 1 average: 0.20017941037440323 Priority: 2 average: 0.4003785701279412 Priority: 3 average: 0.8007820281236491 ``` ``` $ ruby test_measure_switching_times.rb 3.3.1 Priority: -3 average: 0.02040818374371156 Priority: -2 average: 0.03023905074587674 Priority: -1 average: 0.05040335562443943 Priority: 0 average: 0.10088755612378009 Priority: 1 average: 0.20198591962616774 Priority: 2 average: 0.4033456226279668 Priority: 3 average: 0.8073137137507729 ``` ---------------------------------------- Feature #15438: Threads can't switch faster than TIME_QUANTUM_(NSEC|USEC|MSEC) https://bugs.ruby-lang.org/issues/15438#change-108145 * Author: sylvain.joyeux (Sylvain Joyeux) * Status: Closed ---------------------------------------- Thread#priority can be set to negative values, which when looking at the code is meant to reduce the time allocated to the thread. However, as far as I could understand in the codebase, the quantum of time is definitely hard-coded to 100ms (TIME_QUANTUM_...). This means that the "lower allocated time" would only work for threads that would often yield one way or the other (sleep, blocking calls, ...) My projects would definitely benefit from a faster switching period. I was wondering how best to implement this ability ? I thought of the following: 1. globally using an environment variable 2. globally using an API 3. trying to adapt dynamically, using the highest needed period 4. lowering the period when a priority lower than 0 is set, leaving it at the lower period. Obviously (3) would seem to be the best, but I'm not sure I would be able to get it right in a decent amount of time. (4) seem to be a good trade-off between simplicity and performance (nothing changes if you never use priorities lower than 0, and if you were you basically get what you wanted). What do you think ? ---Files-------------------------------- 0001-dynamically-modify-the-timer-thread-period-to-accoun.patch (3.12 KB) 0001-2.6-fix-handling-of-negative-priorities.patch (8.43 KB) -- https://bugs.ruby-lang.org/

6 days, 3 hours

1
0
0 0

[ruby-core:117735] [Ruby master Feature#18583] Pattern-matching: API for custom unpacking strategies?

by ntl (Nathan Ladd)

Issue #18583 has been updated by ntl (Nathan Ladd). Could the match operator, `=~`, could be used as a general complement to `===`? Example (following Victor's original sketch): ``` ruby class Matcher def initialize(regexp) @regexp = regexp end def ===(obj) @regexp.match?(obj) end def =~(obj) match_data = @regexp.match(obj) match_data end end case "some string" in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data some_named_capture = match_data[:some_named_capture] puts "Match: #{some_named_capture}" end ``` This would add `=~` to the pattern matching protocol that's currently comprised of `===`, `deconstruct` and `deconstruct_keys`. It would make `===` significantly more useful, and regular expressions provide a great example of why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data is novel and only comes into existence upon a successful match: ``` subject = "some string" case subject in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data # Capturing the match data variable instead of the original string doesn't make the original string inaccessible: puts "Match subject: #{subject.inspect}" end ``` I also suspect this could be embedded into the pattern syntax itself, and could allow for some interesting possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure: ``` ruby SomeStruct = Struct.new(:some_attr, :some_other_attr) do def self.===(data) data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr) end def self.=~(data) new(**data) end end # Parse JSON into raw (primitive) data some_data = JSON.parse(<<JSON) { "some_attr": "some value", "some_other_attr": "some other value" } JSON # Reify data structure from raw data case some_data in SomeStruct => some_struct puts some_struct.inspect end ``` ---------------------------------------- Feature #18583: Pattern-matching: API for custom unpacking strategies? https://bugs.ruby-lang.org/issues/18583#change-108142 * Author: zverok (Victor Shepelev) * Status: Open ---------------------------------------- I started to think about it when discussing https://github.com/ruby/strscan/pull/30. The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching. In pseudocode, the "ideal API" would allow to write something like this: ```ruby case <what next matches> in /regexp1/ => value_that_matched # use value_that_matched in /regexp2/ => value_that_matched # use value_that_matched # ... ``` This seems "intuitively" that there *should* be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own `#===` and use it with pinning: ```ruby case scanner in ^(Matcher.new(/regexp1/)) => value_that_matched # ... ``` But there is no API to tell how the match result will be unpacked, just the whole `StringScanner` will be put into `value_that_matched`. So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like `try_match_pattern(value)`, which by default is implemented like `return value if self === value`, but can be redefined to return something different, like part of the object, or object transformed somehow. This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like ```ruby value => ^(type_caster(Integer)) => int_value ``` So... Just a discussion topic! -- https://bugs.ruby-lang.org/

1 week

1
0
0 0

2024

2023

2022

ruby-core

2024

2023

2022

ruby-core ----- 2024 ----- May 2024 April 2024 March 2024 February 2024 January 2024 ----- 2023 ----- December 2023 November 2023 October 2023 September 2023 August 2023 July 2023 June 2023 May 2023 April 2023 March 2023 February 2023 January 2023 ----- 2022 ----- December 2022 November 2022

ruby-core