Issue #19729 has been reported by eightbitraptor (Matthew Valentine-House).
----------------------------------------
Feature #19729: Store object age in a bitmap
https://bugs.ruby-lang.org/issues/19729
* Author: eightbitraptor (Matthew Valentine-House)
* Status: Open
* Priority: Normal
----------------------------------------
## Github PR
[Store object age in a bitmap #7938](https://github.com/ruby/ruby/pull/7938)
## Abstract
Ruby currently uses 2 bits of the flags in every object to track how many GC
events the object has survived. Objects are created at age 0 and can currently
grow to age 3, at which point they are considered "old".
Similar to the work carried out for [bitmap marking in Ruby
2.0](https://bugs.ruby-lang.org/issues/5839) which moved the mark bit from the
flags to a bitmap attached to a heap page; This PR moves the age bits out of the
object flags and stores them in a bitmap.
## Description
This PR creates a new bitmap `age_bits` on each heap page. the size of the
bitmap is controlled by `HEAP_PAGE_BITMAP_LIMIT`, which roughly indicates how
many objects need to be considered by that bitmap, and `RVALUE_AGE_BITS_SIZE`
which varies how many bits we use per object to store the age.
We also introduce functions `RVALUE_AGE_SET`, `RVALUE_AGE_GET`,
`RVALUE_AGE_RESET` and `RVALUE_AGE_INC` to manipulate the age of an object
pointed to by a VALUE.
## Impact
**Benefits:**
* Improved CoW performance, because GC no longer has to mutate the object header
for objects that age.
* Allow configuration of the age at which objects are considered old. Because
the number of bits used is configurable, we can support arbitrary numbers of
GC events before an object is considered old. We can use this to change
major/minor GC timings for workloads with unusually high/low object churn.
* Free a flag in each object that can be repurposed. Object flags are a precious
resource, we should prefer to store data outside the flags where possible.
* Remove GC related concerns from the object structure. This is important for
initiatives like a generic GC interface and MMTk in order to keep GC related
code as isolated as possible.
**Concerns:**
* Slightly increased RSS, because we now allocate an extra 2 bits per heap page
slot, to create the age_bits bitmap on VM bootup. On my machine
`sizeof(struct heap_page)` has increased from 1312 to 1728 bytes.
## Benchmarking
Railsbench and Optcarrot benchmarked using `yjit-bench`. Showing small memory
increase, but comparable performance.
```
master: ruby 3.3.0dev (2023-06-08T11:22:43Z master 3fe09eba9d) [x86_64-linux]
mvh-rvalue-age-bitmap: ruby 3.3.0dev (2023-06-13T06:59:40Z master c74f42a4fb) [x86_64-linux]
---------- ----------- ---------- --------- -------------------------- ---------- --------- ----------------------------- ----------------------------
bench master (ms) stddev (%) RSS (MiB) mvh-rvalue-age-bitmap (ms) stddev (%) RSS (MiB) mvh-rvalue-age-bitmap 1st itr master/mvh-rvalue-age-bitmap
railsbench 2154.3 0.5 101.4 2124.9 0.5 101.5 1.01 1.01
optcarrot 5372.2 0.6 55.0 5282.3 0.6 55.1 1.02 1.02
---------- ----------- ---------- --------- -------------------------- ---------- --------- ----------------------------- ----------------------------
Legend:
- mvh-rvalue-age-bitmap 1st itr: ratio of master/mvh-rvalue-age-bitmap time for the first benchmarking iteration.
- master/mvh-rvalue-age-bitmap: ratio of master/mvh-rvalue-age-bitmap time. Higher is better for mvh-rvalue-age-bitmap. Above 1 represents a speedup.
```
## Notes
**FL_PROMOTED**
We still use one of the two age bits. The original `FL_PROMOTED0` has been
renamed to `FL_PROMOTED` and has been repurposed to indicate an objects
old/young status.
We need this because correctly tracking references from old to young objects
relies on a write barrier, that's triggered whenever a field on an object is
written to. Because this code is a very hot path it needs to be fast. Looking up
the heap page and then calculating the old/young status based on the age bits
would slow down this part of the code too much.
Instead we set `FL_PROMOTED` whenever the object crosses the threshold into the
old gen, and unset it if that object ever gets demoted back into the young gen.
This way the write barrier can quickly tell whether the object is old or not and
whether to add it to the rememberset.
**rb_age_reset**
I expose a function from gc.c to reset the object age. This is only used from
one place: `Init_VM` in `vm.c`. We create a hidden class during boot for the
VM's "Frozen Core". This is created using `rb_class_new` as a T_CLASS, then it
has it's class path set, and then the flags are overwritten, forcing the type to
T_ICLASS.
Classes are allocated with age 2, and `rb_set_class_path` allocates, which may
trigger GC. If this happens then the class will immediately become old, and have
its `FL_PROMOTED` bit set. This will then be immediately wiped over when the
flags are forced to T_ICLASS.
This will result in the `FL_PROMOTED` bit and the object age being out of sync.
I don't know why this code is using `rb_class_new` rather than
`rb_include_class_new` but I will follow up this PR with an investigation.
--
https://bugs.ruby-lang.org/
Issue #19722 has been reported by mame (Yusuke Endoh).
----------------------------------------
Misc #19722: DevMeeting-2023-07-13
https://bugs.ruby-lang.org/issues/19722
* Author: mame (Yusuke Endoh)
* Status: Open
* Priority: Normal
----------------------------------------
# The next dev meeting
**Date: 2023/07/13 13:00-17:00** (JST)
Log: *TBD*
- Dev meeting *IS NOT* a decision-making place. All decisions should be done at the bug tracker.
- Dev meeting is a place we can ask Matz, nobu, nurse and other developers directly.
- Matz is a very busy person. Take this opportunity to ask him. If you can not attend, other attendees can ask instead of you (if attendees can understand your issue).
- We will write a record of the discussion in the file or to each ticket in English.
- All activities are best-effort (keep in mind that most of us are volunteer developers).
- The date, time and place of the meeting are scheduled according to when/where we can reserve Matz's time.
- *DO NOT* discuss then on this ticket, please.
# Call for agenda items
If you have a ticket that you want matz and committers to discuss, please post it into this ticket in the following format:
```
* [Ticket ref] Ticket title (your name)
* Comment (A summary of the ticket, why you put this ticket here, what point should be discussed, etc.)
```
Example:
```
* [Feature #14609] `Kernel#p` without args shows the receiver (ko1)
* I feel this feature is very useful and some people say :+1: so let discuss this feature.
```
- It is recommended to add a comment by 2023/07/10. We hold a preparatory meeting to create an agenda a few days before the dev-meeting.
- The format is strict. We'll use [this script to automatically create an markdown-style agenda](https://gist.github.com/mame/b0390509ce1491b43610b9ebb665eb86). We may ignore a comment that does not follow the format.
- Your comment is mandatory. We cannot read all discussion of the ticket in a limited time. We appreciate it if you could write a short summary and update from a previous discussion.
--
https://bugs.ruby-lang.org/
Issue #19762 has been reported by nobu (Nobuyoshi Nakada).
----------------------------------------
Bug #19762: rubyspec refers wrongly ruby version instead of gem versions
https://bugs.ruby-lang.org/issues/19762
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: Eregon (Benoit Daloze)
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
```shell-session
$ git grep ruby_version_is spec/ruby/library/ | grep -F -v -e /rbconfig/ -e /objectspace/ -e /coverage/ -e /fiber/ | wc -l
193
```
Libraries under this directory, except for libraries excluded with `grep -v` in the above, are standard or bundled gems, and their own versions, and `RUBY_VERSION` is unrelated to them.
`version_is` must be used under the directory instead of `ruby_version_is`, for example:
```
spec/ruby/library/bigdecimal/remainder_spec.rb:57: version_is BigDecimal::VERSION, ""..."3.1.4" do
spec/ruby/library/datetime/to_time_spec.rb:22: version_is(date_version, '3.2.3') do
spec/ruby/library/logger/device/close_spec.rb:18: version_is Logger::VERSION, ""..."1.4.0" do
spec/ruby/library/logger/device/close_spec.rb:25: version_is Logger::VERSION, "1.4.0" do
spec/ruby/library/logger/device/write_spec.rb:38: version_is Logger::VERSION, ""..."1.4.0" do
spec/ruby/library/logger/device/write_spec.rb:45: version_is Logger::VERSION, "1.4.0" do
spec/ruby/library/matrix/unitary_spec.rb:17: version_is((Matrix::const_defined?(:VERSION) ? Matrix::VERSION : "0.1.0"), "0.3.0") do
spec/ruby/library/openssl/config/freeze_spec.rb:6:version_is(OpenSSL::VERSION, ""..."2.2") do
spec/ruby/library/stringio/initialize_spec.rb:299: version_is(stringio_version, "0.0.3"..."0.1.1")
spec/ruby/library/time/to_datetime_spec.rb:17: version_is(date_version, '3.2.3') do
```
--
https://bugs.ruby-lang.org/
Issue #19763 has been reported by itarato (Peter Arato).
----------------------------------------
Bug #19763: Inconsistent error message for String#index vs String#rindex
https://bugs.ruby-lang.org/issues/19763
* Author: itarato (Peter Arato)
* Status: Open
* Priority: Normal
* ruby -v: 3.3.0
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
`String#index` and `String#rindex` yields different error messages when used with a `Regexp` argument:
```ruby
'abc'.force_encoding("ISO-2022-JP").index(/わ/)
```
Prints: `incompatible character encodings: ISO-2022-JP and UTF-8 (Encoding::CompatibilityError)`
```ruby
'abc'.force_encoding("ISO-2022-JP").rindex(/わ/)
```
Prints: `incompatible encoding regexp match (UTF-8 regexp with ISO-2022-JP string) (Encoding::CompatibilityError)`
Looking at a few other use cases (eg `String#byteindex`, `String#byterindex`) Ruby seems to use the message `incompatible encoding regexp match ` when encoding is checked between a String and a Regexp instance. Is this a bug for `String#index`?
--
https://bugs.ruby-lang.org/
Issue #19057 has been updated by nobu (Nobuyoshi Nakada).
k0kubun (Takashi Kokubun) wrote in #note-20:
> It looks like @byroot sent a patch for this at https://yhbt.net/kgio-public/B5843088-6EC1-4D1B-A45A-2699CA75DD7F@gmail.com….
`rb_convert_type(io, T_FILE, "IO", "to_io")` can be `rb_io_get_io(io)`.
----------------------------------------
Feature #19057: Hide implementation of `rb_io_t`.
https://bugs.ruby-lang.org/issues/19057#change-103781
* Author: ioquatix (Samuel Williams)
* Status: Assigned
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
----------------------------------------
In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`.
By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion.
Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility.
So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used.
There are many fields which should not be exposed because they are implementation details.
## Current proposal
The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs:
```c
// include/ruby/io.h
#ifndef RB_IO_T
struct rb_io {
int fd;
// ... public fields ...
};
#else
struct rb_io;
#endif
// internal/io.h
#define RB_IO_T
struct rb_io {
int fd;
// ... public fields ...
// ... private fields ...
};
```
However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval.
That being said, maybe it's not safe.
There are two alternatives:
## Hide all details
We can make public `struct rb_io` completely invisible.
```c
// include/ruby/io.h
#define RB_IO_HIDDEN
struct rb_io;
int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state.
// internal/io.h
struct rb_io {
// ... all fields ...
};
```
This would only be forwards compatible, and code would need to feature detect like this:
```c
#ifdef RB_IO_HIDDEN
#define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor
#else
#define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr)
#endif
```
## Nested public interface
Alternatively, we can nest the public fields into the private struct:
```c
// include/ruby/io.h
struct rb_io_public {
int fd;
// ... public fields ...
};
// internal/io.h
#define RB_IO_T
struct rb_io {
struct rb_io_public public;
// ... private fields ...
};
```
## Considerations
I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us.
I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing.
--
https://bugs.ruby-lang.org/
Issue #19057 has been updated by k0kubun (Takashi Kokubun).
> I've noticed the same thing while trying to install kgio with the latest ruby HEAD.
It looks like @byroot sent a patch for this at https://yhbt.net/kgio-public/B5843088-6EC1-4D1B-A45A-2699CA75DD7F@gmail.com….
Both patches are still not released. Unless @normalperson cuts a timely release for kgio and raindrops, we will not be able to use Unicorn on Ruby 3.3 because of this Feature.
----------------------------------------
Feature #19057: Hide implementation of `rb_io_t`.
https://bugs.ruby-lang.org/issues/19057#change-103779
* Author: ioquatix (Samuel Williams)
* Status: Assigned
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
----------------------------------------
In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`.
By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion.
Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility.
So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used.
There are many fields which should not be exposed because they are implementation details.
## Current proposal
The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs:
```c
// include/ruby/io.h
#ifndef RB_IO_T
struct rb_io {
int fd;
// ... public fields ...
};
#else
struct rb_io;
#endif
// internal/io.h
#define RB_IO_T
struct rb_io {
int fd;
// ... public fields ...
// ... private fields ...
};
```
However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval.
That being said, maybe it's not safe.
There are two alternatives:
## Hide all details
We can make public `struct rb_io` completely invisible.
```c
// include/ruby/io.h
#define RB_IO_HIDDEN
struct rb_io;
int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state.
// internal/io.h
struct rb_io {
// ... all fields ...
};
```
This would only be forwards compatible, and code would need to feature detect like this:
```c
#ifdef RB_IO_HIDDEN
#define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor
#else
#define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr)
#endif
```
## Nested public interface
Alternatively, we can nest the public fields into the private struct:
```c
// include/ruby/io.h
struct rb_io_public {
int fd;
// ... public fields ...
};
// internal/io.h
#define RB_IO_T
struct rb_io {
struct rb_io_public public;
// ... private fields ...
};
```
## Considerations
I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us.
I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing.
--
https://bugs.ruby-lang.org/
Issue #19756 has been reported by postmodern (Hal Brodigan).
----------------------------------------
Bug #19756: URI::HTTP.build does not accept a host of `_gateway`, but `URI.parse` will.
https://bugs.ruby-lang.org/issues/19756
* Author: postmodern (Hal Brodigan)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
I noticed a difference in behavior between `URI::HTTP.build` and `URI.parse`. `URI::HTTP.build` will not accept `host:` value of `_gateway`, but `URI.parse` will.
## Steps To Reproduce
```ruby
URI::HTTP.build(host: "_gateway")
```
vs.
```ruby
URI.parse("http://_gateway")
```
### Expected Results
Both raise the same exception, or return the same URI object.
### Actual Results
```
URI::HTTP.build(host: "_gateway")
/usr/share/ruby/uri/generic.rb:601:in `check_host': bad component(expected host component): _gateway (URI::InvalidComponentError)
from /usr/share/ruby/uri/generic.rb:640:in `host='
from /usr/share/ruby/uri/generic.rb:673:in `hostname='
from /usr/share/ruby/uri/generic.rb:190:in `initialize'
from /usr/share/ruby/uri/generic.rb:136:in `new'
from /usr/share/ruby/uri/generic.rb:136:in `build'
from /usr/share/ruby/uri/http.rb:61:in `build'
from (irb):2:in `<main>'
from /usr/local/share/gems/gems/irb-1.7.0/exe/irb:9:in `<top (required)>'
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
```
```
URI.parse("https://_gateway")
# => #<URI::HTTPS https://_gateway>
```
## Additional Information
```
$ gem list uri
uri (default: 0.12.1)
```
--
https://bugs.ruby-lang.org/
Issue #19757 has been reported by nobu (Nobuyoshi Nakada).
----------------------------------------
Feature #19757: Add new C API to create a subclass of `Data`
https://bugs.ruby-lang.org/issues/19757
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
----------------------------------------
I propose a C API `rb_data_define` which crates a subclass of `Data`.
```C
/**
* Defines an anonymous data class.
*
* @endinternal
*
* @param[in] super Superclass of the defining class. Must be a
* descendant of ::rb_cData, or 0 as ::rb_cData.
* @param[in] ... Arbitrary number of `const char*`, terminated by
* NULL. Each of which are the name of fields.
* @exception rb_eArgError Duplicated field name.
* @return The defined class.
*/
VALUE rb_data_define(VALUE super, ...);
```
--
https://bugs.ruby-lang.org/