Issue #20466 has been reported by tenderlovemaking (Aaron Patterson).
----------------------------------------
Bug #20466: Interpolated regular expressions have different encoding than interpolated strings
https://bugs.ruby-lang.org/issues/20466
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When the encoding is set to US-ASCII, interpolated strings can have different encoding than interpolated regular expressions. I think they should have the same encoding:
```ruby
# encoding: US-ASCII
t0 = '\\xc1'
t1 = "#{t0}"
re = /#{t0}/
p [t0.encoding, t1.encoding, re.encoding]
```
Output is:
```
$ ./miniruby -v test.rb
ruby 3.4.0dev (2024-05-02T15:27:18Z master 7c0cf71049) [arm64-darwin23]
[#<Encoding:US-ASCII>, #<Encoding:US-ASCII>, #<Encoding:BINARY (ASCII-8BIT)>]
```
--
https://bugs.ruby-lang.org/
Issue #20463 has been reported by tyatin (Serg Tyatin).
----------------------------------------
Bug #20463: Ruby fails to generate warning for require syslog
https://bugs.ruby-lang.org/issues/20463
* Author: tyatin (Serg Tyatin)
* Status: Open
* ruby -v: 3.3.1
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
```
docker run -it ruby:3.3.1 bash
touch Gemfile
bundle
bundle exec ruby -e "require '/usr/local/lib/ruby/3.3.0/x86_64-linux/syslog'"
/usr/local/lib/ruby/3.3.0/bundled_gems.rb:130:in `<': comparison of String with nil failed (ArgumentError)
msg = " #{RUBY_VERSION < SINCE[gem] ? "will no longer be" : "is not"} part of the default gems since Ruby #{SINCE[gem]}."
^^^^^^^^^^
from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:130:in `build_message'
from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:126:in `warning?'
from /usr/local/lib/ruby/3.3.0/bundled_gems.rb:71:in `block (2 levels) in replace_require'
from -e:1:in `<main>'
```
the same happens on M1 mac
--
https://bugs.ruby-lang.org/
Issue #20462 has been reported by tenderlovemaking (Aaron Patterson).
----------------------------------------
Bug #20462: Native threads are no longer reused
https://bugs.ruby-lang.org/issues/20462
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ruby used to reuse native threads in order to amortize the cost of making a pthread.
For example this program:
```ruby
ntids = 1000.times.map {
Thread.new {
Thread.current.native_thread_id
}.value
}
p ntids.uniq.length
```
With Ruby 3.2.0, this would return 1. With Ruby 3.3.x, it returns 1000. It means we cannot amortize the cost of a pthread for short lived threads.
I was able to bisect this to commit be1bbd5b7d40ad863ab35097765d3754726bbd54. But the change is big so I don't know how to fix it.
--
https://bugs.ruby-lang.org/
Issue #20451 has been reported by Bo98 (Bo Anderson).
----------------------------------------
Bug #20451: Bad Ruby 3.1.5 backport causes fiddle to fail to build
https://bugs.ruby-lang.org/issues/20451
* Author: Bo98 (Bo Anderson)
* Status: Open
* ruby -v: 3.1.5p252
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ruby 3.1.5 shipped with the following backport: https://github.com/ruby/ruby/commit/84f2aabd272a54e79979795d2d405090704a1d07
However this backport directly breaks the build:
```
closure.c:279:60: error: use of undeclared identifier 'data'
result = ffi_prep_closure(pcl, cif, callback, (void *)(data->self));
^
```
The original commit (https://github.com/ruby/fiddle/commit/2530496602) was updating the second branch to match the change in the first branch a couple lines up. However that change in the other branch does not exist in Ruby 3.1. The commit in question requires a previous commit of https://github.com/ruby/fiddle/commit/81a8a56239973ab7559229830a449d201955b….
The backport should either be reverted or an other commit should also be backported. Note that these commits were in a series of many commits made to fix an upstream issue https://github.com/ruby/fiddle/issues/102 so I cannot vouch whether or not the two commits are sufficient to fix the originally reported issue.
--
https://bugs.ruby-lang.org/
Issue #20448 has been reported by ms-tob (Matt S).
----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448
* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract
Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.
# Background
Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:
1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hoo… extension function
Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.
# Proposal
The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.
The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.
1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-…
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.
I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:
```c
#include <ruby.h>
#include <ruby/debug.h>
#define RUBY_EVENT_COVERAGE_BRANCH 0x020000
// ...
rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
(rb_event_hook_func_t) event_hook_branch,
events,
counter_hash,
flags
);
```
If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.
The above two points would be requirements for this functionality, but there's an additional nice-to-have:
3. Extend the public `tracearg` functionality to include additional coverage information
a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.
# Use cases
This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided…. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L….
So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.
We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9
# Discussion
Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.
The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.
# See also
- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
- https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
- https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/
--
https://bugs.ruby-lang.org/
Issue #20457 has been reported by tenderlovemaking (Aaron Patterson).
----------------------------------------
Bug #20457: Final `return` is eliminated from the AST
https://bugs.ruby-lang.org/issues/20457
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Given the following code:
```ruby
def foo
a = 1
return a
end
```
If you parse this with RubyVM::AbstractSyntaxTree, the AST will be missing the `return` node. Of course the `return` node isn't necessary for compilation, but would be required for building an LSP for example.
Here's a full program to demonstrate:
```ruby
ast = RubyVM::AbstractSyntaxTree.parse DATA.read
pp ast
# Output is like this:
#
# (SCOPE@1:0-4:3
# tbl: []
# args: nil
# body:
# (DEFN@1:0-4:3
# mid: :foo
# body:
# (SCOPE@1:0-4:3
# tbl: [:a]
# args: (ARGS@1:7-1:7 pre_num: 0 pre_init: nil opt: nil first_post: nil post_num: 0 post_init: nil rest: nil kw: nil kwrest: nil block: nil)
# body: (BLOCK@2:2-3:10 (LASGN@2:2-2:7 :a (INTEGER@2:6-2:7 1)) (LVAR@3:9-3:10 :a)))))
__END__
def foo
a = 1
return a
end
```
Btw, I'm happy to write failing tests for this type of stuff I'm just not sure where to put it! :)
--
https://bugs.ruby-lang.org/
Issue #20454 has been reported by zonuexe (Kenta USAMI).
----------------------------------------
Bug #20454: IRB echoes excessive input in dumb terminal
https://bugs.ruby-lang.org/issues/20454
* Author: zonuexe (Kenta USAMI)
* Status: Open
* ruby -v: ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When IRB is started on a terminal with the environment variable TERM=dumb, excessive output is generated as shown below.
A simple terminal such as Emacs shell-mode or comint-mode is assumed, but it should be possible to reproduce the following with a rich terminal such as iTerm.
Type 1[RET], 12[RET], 123[RET] on your keyboard in this order.
% TERM=dumb irb
irb(main):001> 1
irb(main):001> 1=> 1
irb(main):002> 12
irb(main):002> 1irb(main):002> 12=> 12
irb(main):003> 123
irb(main):003> 1irb(main):003> 12irb(main):003> 123=> 123
As arton-san says, you can avoid the problem by not using ReadlineInputMethod.
https://twitter.com/arton/status/1783008921000804630
--
https://bugs.ruby-lang.org/
Issue #20461 has been reported by yamam (Masanari Yamamoto).
----------------------------------------
Bug #20461: Unreadable pipe included in the readable IO of IO.select
https://bugs.ruby-lang.org/issues/20461
* Author: yamam (Masanari Yamamoto)
* Status: Open
* ruby -v: ruby 3.4.0dev (2024-04-27T12:55:28Z master c844968b72) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When executing the following script, pipe_r is not supposed to be readable because no writing is done to pipe_w, but pipe_r is included in the return value rs of IO.select. Since it is not possible to read from pipe_r, the IO::EAGAINWaitReadable exception is raised.
```ruby
pipe_r, pipe_w = IO.pipe
1000.times do |i|
File.popen(['seq', '1', '1000']) do |popen_r|
j = 0
while true
rs, ws, = IO.select([popen_r, pipe_r])
if rs.include?(popen_r)
unless popen_r.gets
break
end
end
if rs.include?(pipe_r)
puts "IO.select BUG pipe_r is not readable! i = #{i} j = #{j}"
p pipe_r.read_nonblock(1)
end
j += 1
end
end
end
```
```
$ ruby select.rb
IO.select BUG pipe_r is not readable! i = 0 j = 20
<internal:io>:63:in `read_nonblock': Resource temporarily unavailable - read would block (IO::EAGAINWaitReadable)
from select.rb:14:in `block (2 levels) in <main>'
from select.rb:3:in `popen'
from select.rb:3:in `block in <main>'
from select.rb:2:in `times'
from select.rb:2:in `<main>'
[1] 73732 exit 1 ruby select.rb
```
--
https://bugs.ruby-lang.org/
Issue #15438 has been updated by jhawthorn (John Hawthorn).
Status changed from Open to Closed
I think this is something we should improve more (I would like even faster switching times), but it does seem possible as of Ruby 3.3 to have threads switch faster than 100ms.
```
def test_switching(priority = 0)
done = false
started = false
busy = Thread.new do
Thread.current.priority = priority
until done
started = true
end
end
Thread.pass until started
times = []
while times.length < 10
before = Process.clock_gettime(Process::CLOCK_MONOTONIC)
Thread.pass
after = Process.clock_gettime(Process::CLOCK_MONOTONIC)
times << (after - before)
end
done = true
busy.join
times
end
puts RUBY_VERSION
(-3).upto(3) do |priority|
print "Priority: #{priority}"
times = test_switching(priority)
times = times.sort[1..-2] # drop fastest and slowest
average = (times.sum / times.length)
puts " average: #{average}"
end
```
```
$ ruby test_measure_switching_times.rb
3.2.2
Priority: -3 average: 0.10010529025021242
Priority: -2 average: 0.10010091188087245
Priority: -1 average: 0.10010025675364886
Priority: 0 average: 0.10010364262052462
Priority: 1 average: 0.20017941037440323
Priority: 2 average: 0.4003785701279412
Priority: 3 average: 0.8007820281236491
```
```
$ ruby test_measure_switching_times.rb
3.3.1
Priority: -3 average: 0.02040818374371156
Priority: -2 average: 0.03023905074587674
Priority: -1 average: 0.05040335562443943
Priority: 0 average: 0.10088755612378009
Priority: 1 average: 0.20198591962616774
Priority: 2 average: 0.4033456226279668
Priority: 3 average: 0.8073137137507729
```
----------------------------------------
Feature #15438: Threads can't switch faster than TIME_QUANTUM_(NSEC|USEC|MSEC)
https://bugs.ruby-lang.org/issues/15438#change-108145
* Author: sylvain.joyeux (Sylvain Joyeux)
* Status: Closed
----------------------------------------
Thread#priority can be set to negative values, which when looking at the code is meant to reduce the time allocated to the thread. However, as far as I could understand in the codebase, the quantum of time is definitely hard-coded to 100ms (TIME_QUANTUM_...). This means that the "lower allocated time" would only work for threads that would often yield one way or the other (sleep, blocking calls, ...)
My projects would definitely benefit from a faster switching period. I was wondering how best to implement this ability ?
I thought of the following:
1. globally using an environment variable
2. globally using an API
3. trying to adapt dynamically, using the highest needed period
4. lowering the period when a priority lower than 0 is set, leaving it at the lower period.
Obviously (3) would seem to be the best, but I'm not sure I would be able to get it right in a decent amount of time. (4) seem to be a good trade-off between simplicity and performance (nothing changes if you never use priorities lower than 0, and if you were you basically get what you wanted).
What do you think ?
---Files--------------------------------
0001-dynamically-modify-the-timer-thread-period-to-accoun.patch (3.12 KB)
0001-2.6-fix-handling-of-negative-priorities.patch (8.43 KB)
--
https://bugs.ruby-lang.org/
Issue #18583 has been updated by ntl (Nathan Ladd).
Could the match operator, `=~`, could be used as a general complement to `===`?
Example (following Victor's original sketch):
``` ruby
class Matcher
def initialize(regexp)
@regexp = regexp
end
def ===(obj)
@regexp.match?(obj)
end
def =~(obj)
match_data = @regexp.match(obj)
match_data
end
end
case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
some_named_capture = match_data[:some_named_capture]
puts "Match: #{some_named_capture}"
end
```
This would add `=~` to the pattern matching protocol that's currently comprised of `===`, `deconstruct` and `deconstruct_keys`. It would make `===` significantly more useful, and regular expressions provide a great example of why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data is novel and only comes into existence upon a successful match:
```
subject = "some string"
case subject
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
# Capturing the match data variable instead of the original string doesn't make the original string inaccessible:
puts "Match subject: #{subject.inspect}"
end
```
I also suspect this could be embedded into the pattern syntax itself, and could allow for some interesting possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure:
``` ruby
SomeStruct = Struct.new(:some_attr, :some_other_attr) do
def self.===(data)
data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr)
end
def self.=~(data)
new(**data)
end
end
# Parse JSON into raw (primitive) data
some_data = JSON.parse(<<JSON)
{
"some_attr": "some value",
"some_other_attr": "some other value"
}
JSON
# Reify data structure from raw data
case some_data
in SomeStruct => some_struct
puts some_struct.inspect
end
```
----------------------------------------
Feature #18583: Pattern-matching: API for custom unpacking strategies?
https://bugs.ruby-lang.org/issues/18583#change-108142
* Author: zverok (Victor Shepelev)
* Status: Open
----------------------------------------
I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.
In pseudocode, the "ideal API" would allow to write something like this:
```ruby
case <what next matches>
in /regexp1/ => value_that_matched
# use value_that_matched
in /regexp2/ => value_that_matched
# use value_that_matched
# ...
```
This seems "intuitively" that there *should* be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own `#===` and use it with pinning:
```ruby
case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...
```
But there is no API to tell how the match result will be unpacked, just the whole `StringScanner` will be put into `value_that_matched`.
So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like `try_match_pattern(value)`, which by default is implemented like `return value if self === value`, but can be redefined to return something different, like part of the object, or object transformed somehow.
This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like
```ruby
value => ^(type_caster(Integer)) => int_value
```
So... Just a discussion topic!
--
https://bugs.ruby-lang.org/