Issue #8973 has been updated by vo.x (Vit Ondruch).
Actually, the real intention here is to get rid of the `${arch}` from the paths. The thing is that `--enable-multiarch` used to make it possible, because this is the implementation:
~~~
rubyarchprefix=${multiarch+'${archlibdir}/${RUBY_BASE_NAME}'}${multiarch-'${rubylibprefix}/${arch}'}
~~~
As you can see, when `multiarch` is not defined, then the `${arch}` is mandatory. With `multiarch` enabled, it is enough to modify the `${archlibdir}` and whatever is the is there. It does not impose any expectations.
BTW out of curiosity, assuming that upstream expect that Ruby is installed via `./configure && make && make install`, what is the reason to bother with `${arch}` for non multiarch configurations? Does anybody really use single installation directory with multiple architectures?
----------------------------------------
Bug #8973: Allow to configure archlibdir for multiarch
https://bugs.ruby-lang.org/issues/8973#change-101092
* Author: vo.x (Vit Ondruch)
* Status: Feedback
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* ruby -v: ruby 2.1.0dev (2013-09-22 trunk 43011) [x86_64-linux]
----------------------------------------
Since r39347, there is impossible to configure placement of rubylib.so when build is configured with "--with-multiarch". That is probably OK for Debian, but it breaks Fedora :/ The attached patch allows to configure the archlibdir, but I feel that it is suboptimal, since the "--with-rubyarchprefix" should probably be the parameter which influences placement of the arch specific libraries. Any chance that this patch is accepted or better if rubyarchprefix is respected for every arch specific library, including libruby.so. Thanks.
---Files--------------------------------
ruby-2.1.0-Enable-configuration-of-archlibdir.patch (479 Bytes)
--
https://bugs.ruby-lang.org/
Issue #19302 has been reported by noraj (Alexandre ZANNI).
----------------------------------------
Bug #19302: Non-destructive String#insert
https://bugs.ruby-lang.org/issues/19302
* Author: noraj (Alexandre ZANNI)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
It would be nice to have a non-destructive version of String#insert to be able to work with frozen literals.
## Current behavior
There is only a destructive version of `String#insert` that will throw an error if the string is frozen.
```ruby
irb(main):007:0> a = 'foobar'.freeze
irb(main):008:0> b = a.insert(3,'baz')
(irb):8:in `insert': can't modify frozen String: "foobar" (FrozenError)
from (irb):8:in `<main>'
from /home/noraj/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/irb-1.6.2/exe/irb:11:in `<top (required)>'
from /home/noraj/.asdf/installs/ruby/3.2.0/bin/irb:25:in `load'
from /home/noraj/.asdf/installs/ruby/3.2.0/bin/irb:25:in `<main>'
```
This can happen pretty quickly when you have `# frozen_string_literal: true` in all your files.
## Idea of implementation
```ruby
def insert_nd(idx, str2)
self[0...idx] + str2 + self[idx..]
end
```
Note: this is a draft, as it doesn't handle negative index in the same way as insert
## Idea of naming
Ideally the actual `String#insert` would have been `String#insert!` so that the non-destructive version could be `String#insert`, but naturally that won't do as a renaming will cause a breaking change.
A more viable option would be to name it `insert_nd` (nd for non-destructive) but it's may not be following a naming convention.
Another idea to avoid confusion would be to avoid using `insert` and rather use a synonym like _place_, _slip_, _slot_, _lodge_, etc.
--
https://bugs.ruby-lang.org/
Issue #18949 has been updated by Eregon (Benoit Daloze).
@ko1 did https://github.com/ruby/ruby/pull/6935, great, thank you!
----------------------------------------
Feature #18949: Deprecate and remove replicate and dummy encodings
https://bugs.ruby-lang.org/issues/18949#change-101090
* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: Eregon (Benoit Daloze)
----------------------------------------
Ruby has a lot of accidental complexity.
Sometimes it becomes clear some features bring a lot of complexity and yet provide little value or are used very rarely.
Also most Ruby users do not even know about these features.
Replicate and dummy encodings seem to clearly fall into this category, almost nobody uses them but they add a significant complexity and also add a significant performance overhead.
Notably, the existence of those means the number of encodings in a Ruby runtime is actually variable and not fixed.
That means extra synchronization, hashtable lookups, indirections, function calls, etc.
## Replicate Encodings
Replicate encodings are created using `Encoding#replicate(name)`.
It almost sounds like an alias but in fact it is more than that and creates a new Encoding object, which can be used by a String:
```ruby
e = Encoding::US_ASCII.replicate('MY-US-ASCII')
s = "abc".force_encoding(e)
p s.encoding # => e
p s.encoding.name # => 'MY-US-ASCII'
```
This seems completely useless.
There is an obvious first step here which is to change `Encoding#replicate` to return the receiver, and just install an alias for it.
That avoids creating more encoding instances needlessly.
I think we should also deprecate and remove this method though, it is never a good idea to have a global mutable map like this.
If someone want extra aliases for encodings, they can easily to do so by having their own Hash: `{ alias => encoding }.fetch(name) { Encoding.find(name) }`.
## Dummy Encodings
Dummy encodings are not real encodings. They are artificial encodings designed to look like encodings, but don't function as encodings in Ruby.
From the docs:
```
enc.dummy? -> true or false
------------------------------------------------------------------------
Returns true for dummy encodings. A dummy encoding is an encoding for
which character handling is not properly implemented. It is used for
stateful encodings.
```
I wonder why we have those half-implemented encodings in core, it sounds to me like unfinished work which should not have been merged.
The "codepoints" of dummy encodings are just "bytes" and so they behave the same as `Encoding::BINARY`, with the exception of the UTF-16 and UTF-32 dummy encodings.
### UTF-16 and UTF-32 dummy encodings
These two are special dummy encodings.
What they do is they scan the first 2 or 4 bytes of the String, and if those bytes are a byte-order mark (BOM),
the true "actual" encoding is resolved to UTF-16BE/UTF-16LE or UTF-32BE/UTF-32LE.
Otherwise, `Encoding::BINARY` is returned.
This logic is done by `get_actual_encoding()`.
What is weird is this check is not done on String creation, no, it is done *every time* the encoding of that String is accessed (and the result is not stored on the String).
That is a needless overhead and really unreliable semantics.
Do we really want a String which automagically changes between UTF-16LE and UTF-16BE based on mutating its bytes? I think nobody wants that:
```ruby
s = "\xFF\xFEa\x00b\x00c\x00d\x00".force_encoding("UTF-16")
p s # => "\uFEFFabcd"
s.setbyte 0, 254
s.setbyte 1, 255
p s # => "\uFEFF\u6100\u6200\u6300\u6400"
```
I think the path is clear, we should deprecate and then remove Encoding::UTF_16 and Encoding::UTF_32 (dummy encodings).
And then we no longer need `get_actual_encoding()` and the overhead it adds to every String method.
We could also keep those constants and make them refer the native-endian UTF-16/32.
But that could cause confusing errors as we would change the meaning of them.
We could add `Encoding::UTF_16NE` / `Encoding::UTF_16_NATIVE_ENDIAN` if that is useful.
Another possibility would be to resolve these encodings on String creation, like:
```
"\xFF\xFE".force_encoding("UTF-16").encoding # => UTF-16LE
String.new("\xFF\xFE", encoding: Encoding::UTF_16).encoding # => UTF-16LE
"ab".force_encoding("UTF-16").encoding # exception, not a BOM
String.new("ab", encoding: Encoding::UTF_16).encoding # exception, not a BOM
```
I think it is unnecessary to keep such complexity though.
A class method on String or Encoding like e.g. `Encoding.find_from_bom(string)` is so much clearer and efficient (no need to special case those encodings in String.new, #force_encoding, etc).
FWIW JRuby seems to use `getActualEncoding()` only in 2 places (scanForCodeRange, inspect), which is an indication those dummy UTF encodings are barely used if ever. Similarly, TruffleRuby only has 4 usages of `GetActualEncodingNode`.
### Existing dummy encodings
```
> Encoding.list.select(&:dummy?)
[#<Encoding:UTF-16 (dummy)>, #<Encoding:UTF-32 (dummy)>,
#<Encoding:IBM037 (dummy)>, #<Encoding:UTF-7 (dummy)>,
#<Encoding:ISO-2022-JP (dummy)>, #<Encoding:ISO-2022-JP-2 (dummy)>, #<Encoding:ISO-2022-JP-KDDI (dummy)>,
#<Encoding:CP50220 (dummy)>, #<Encoding:CP50221 (dummy)>]
```
So besides UTF-16/UTF-32 dummy, it's only 7 encodings.
Does anyone use one of these 7 dummy encodings?
What is interesting to note, is that these encodings are exactly the ones that are also not ASCII-compatible, with the exception of UTF-16BE/UTF-16LE/UTF-32BE/UTF-32LE (non-dummy).
As a note, UTF-{16,32}{BE,LE} are ASCII-compatible in codepoints but not in bytes, and Ruby uses the bytes definition of ASCII-compatible.
There is potential to simplify encoding compatibility rules and encoding compatibility checks based on that.
So what this means is if we removed dummy encodings, all encodings except UTF-{16,32}{BE,LE} would be ASCII-compatible, which would lead to significant simplifications for many string operations which currently need to handle dummy encodings specially.
Unicode encodings like UTF-{16,32}{BE,LE} already have special behavior for some Ruby methods, so those are already handled specially in some places (they are the only encodings with minLength > 1).
```
> Encoding.list.reject(&:ascii_compatible?)
[#<Encoding:UTF-16BE>, #<Encoding:UTF-16LE>,
#<Encoding:UTF-32BE>, #<Encoding:UTF-32LE>,
#<Encoding:UTF-16 (dummy)>, #<Encoding:UTF-32 (dummy)>,
#<Encoding:IBM037 (dummy)>, #<Encoding:UTF-7 (dummy)>,
#<Encoding:ISO-2022-JP (dummy)>, #<Encoding:ISO-2022-JP-2 (dummy)>, #<Encoding:ISO-2022-JP-KDDI (dummy)>,
#<Encoding:CP50220 (dummy)>, #<Encoding:CP50221 (dummy)>]
```
What can we do with such a dummy non-ASCII-compatible encoding?
Almost nothing useful:
```ruby
s = "abc".encode("IBM037")
=> "\x81\x82\x83"
> s.bytes
=> [129, 130, 131]
> s.codepoints
=> [129, 130, 131]
> s == "abc"
=> false
> "été".encode("IBM037")
=> "\x51\xA3\x51"
```
So about the only thing that works with them is `String#encode`.
I think we could preserve that functionality, if actually used (does anyone use one of these 7 dummy encodings?), through:
```ruby
> "été".encode("IBM037")
=> "\x51\xA3\x51" (.encoding == BINARY)
> "\x51\xA3\x51".encode("UTF-8", "IBM037") # encode from IBM037 to UTF-8
=> "été" (.encoding == UTF-8)
```
That way there is no need for those to be Encoding instances, we would only need the conversion tables.
It is even better if we can remove them, so the notion of "dummy encodings" can disappear completely and nobody needs to understand or implement them.
### rb_define_dummy_encoding(name)
The C-API has `rb_define_dummy_encoding(const char *name)`.
This creates a new Encoding instance with `dummy?=true`, and it is also non-ASCII-compatible.
There seems to be no purpose to this besides storing the metadata of an encoding which does not exist in Ruby.
This seems a really expensive/complex way to handle that from the VM point of view (because it dynamically creates an Encoding and add it to lists/maps/etc).
A simple replacement would be to mark the String as BINARY and save the encoding name as an instance variable of that String.
Since anyway Ruby can't understand anything about that String, it's just raw bytes to Ruby's eyes.
## Summary
I suggest we deprecate replicate and dummy encodings in Ruby 3.2.
And then we remove them in the next version.
This will significantly simplify string-related methods, and the behavior exposed to Ruby users.
It will also significantly speedup encoding lookup in CRuby (and other Ruby implementations).
With a fixed number of encodings we can ensure all encoding indices fit in 7 bits, and `ENCODING_GET` can be simply `RB_ENCODING_GET_INLINED`.
`get_actual_encoding()` will be gone and its overhead as well.
`rb_enc_from_index()` would be just `return global_enc_table->list[index].enc;`, instead of the expensive behavior currently with `GLOBAL_ENC_TABLE_EVAL` which takes a lock and more when there are multiple Ractors.
Many checks in these methods would be removed as well.
Yet another improvement would be to load all encodings eagerly, that is small and fast in my experience, what is slow and big is the conversion tables, that'd simplify `must_encindex()` further.
These changes would affect most String methods, which use
```
STR_ENC_GET->get_encoding which does:
get_actual_encoding->rb_enc_from_index and possibly ->enc_from_index
ENCODING_GET->RB_ENCODING_GET_INLINED and possibly ->rb_enc_get_index->enc_get_index_str->rb_attr_get
```
Some of these details are mentioned in https://github.com/ruby/ruby/pull/6095#discussion_r915149708.
The overhead is so large that it is worth handling some hardcoded encoding indices directly in String methods.
This feels wrong, getting the encoding from a String should be simple, straightforward and fast.
Further optimizations will be unlocked as the encoding list becomes fixed and immutable.
For example, the name-to-Encoding map is then immutable and could use perfect hashing.
Inline caching those lookups also becomes easier as the the map cannot change.
Also that map would no longer need synchronization, etc.
## To Decide
Each item is independent. I think 1 & 2 are very important, 3 less but would be nice.
1. Deprecate and then remove `Encoding#replicate` and `rb_define_dummy_encoding()`. With that there is a fixed number of encodings, a lot of simplifications and many optimizations become available. They are used respectively in only 1 gem and 5 gems, see https://bugs.ruby-lang.org/issues/18949#note-4
2. Deprecate and then remove the dummy UTF-16 and UTF-32 encodings. This removes the need for `get_actual_encoding()` which is expensive. This functionality seems rarely used in practice, and it only works when such strings have a BOM, which is very rare.
3. Deprecate and then remove other dummy encodings, so there are no more dummy "half-implemented" encodings and all encodings are ASCII-compatible in terms of codepoints.
--
https://bugs.ruby-lang.org/
Issue #13671 has been updated by Eregon (Benoit Daloze).
@duerst Could you take a look at this? It's not fixed yet in 3.2.0.
----------------------------------------
Bug #13671: Regexp with lookbehind and case-insensitivity raises RegexpError only on strings with certain characters
https://bugs.ruby-lang.org/issues/13671#change-101088
* Author: dschweisguth (Dave Schweisguth)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 2.4.1
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
Here is a test program:
~~~ ruby
def test(description)
begin
yield
puts "#{description} is OK"
rescue RegexpError
puts "#{description} raises RegexpError"
end
end
test("ass, case-insensitive, special") { /(?<!ass)/i =~ '✨' }
test("bss, case-insensitive, special") { /(?<!bss)/i =~ '✨' }
test("as, case-insensitive, special") { /(?<!as)/i =~ '✨' }
test("ss, case-insensitive, special") { /(?<!ss)/i =~ '✨' }
test("ass, case-sensitive, special") { /(?<!ass)/ =~ '✨' }
test("ass, case-insensitive, regular") { /(?<!ass)/i =~ 'x' }
~~~
Running the test program with Ruby 2.4.1 (macOS) gives
~~~
ass, case-insensitive, special raises RegexpError
bss, case-insensitive, special raises RegexpError
as, case-insensitive, special is OK
ss, case-insensitive, special is OK
ass, case-sensitive, special is OK
ass, case-insensitive, regular is OK
~~~
The RegexpError is "invalid pattern in look-behind: /(?<!ass)/i (RegexpError)"
Side note: in the real code in which I found this error I was able to work around the error by using (?i) after the lookbehind instead of //i.
Running the test program with Ruby 2.3.4 does not report any RegexpErrors.
I think this is a regression, although I might be wrong and it might be saving me from an incorrect result with certain strings.
---Files--------------------------------
test.rb (531 Bytes)
--
https://bugs.ruby-lang.org/
Issue #8973 has been updated by vo.x (Vit Ondruch).
nobu (Nobuyoshi Nakada) wrote in #note-9:
> @vo.x With your patch and `--with-multiarch --with-archlibdir='${libdir}'`, not only libruby.so but also all standard library extensions are placed under `${libdir}` without `${arch}`.
> Is this intentional?
On Fedora, we don't use the `${arch}` anywhere. It is not assumed that we will mix e.g. aarch64 with x86_64 on one filesystem. We only mix 32/64 bit systems, i.e. i686 together with x86_64. But in that case, the 32 bit code is stored in /usr/lib while the 64 bit code is in /usr/lib64.
Strictly speaking, we still have troubles with headers, but we are using small platform independent wrapper for them. And the `${arch}` could be useful for some user installed gems, but that is different topic.
----------------------------------------
Bug #8973: Allow to configure archlibdir for multiarch
https://bugs.ruby-lang.org/issues/8973#change-101087
* Author: vo.x (Vit Ondruch)
* Status: Feedback
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* ruby -v: ruby 2.1.0dev (2013-09-22 trunk 43011) [x86_64-linux]
----------------------------------------
Since r39347, there is impossible to configure placement of rubylib.so when build is configured with "--with-multiarch". That is probably OK for Debian, but it breaks Fedora :/ The attached patch allows to configure the archlibdir, but I feel that it is suboptimal, since the "--with-rubyarchprefix" should probably be the parameter which influences placement of the arch specific libraries. Any chance that this patch is accepted or better if rubyarchprefix is respected for every arch specific library, including libruby.so. Thanks.
---Files--------------------------------
ruby-2.1.0-Enable-configuration-of-archlibdir.patch (479 Bytes)
--
https://bugs.ruby-lang.org/
Issue #18598 has been updated by Eregon (Benoit Daloze).
shugo (Shugo Maeda) wrote in #note-4:
> > * Do not use String and e.g. use an Array of byte values or a C extension
>
> I wouldn't like to implement regular expressions on Array.
>
> > * Use Ropes or similar implemented in Ruby, which would avoid extra copying and might not need to use byte offsets at all
>
> I prefer String for the reasons stated above.
The typical approach is to flatten (or convert) the Rope to String before matching (whether the Rope is in Ruby or from the VM).
I think that is good enough for a text editor.
----------------------------------------
Feature #18598: Add String#bytesplice
https://bugs.ruby-lang.org/issues/18598#change-101085
* Author: shugo (Shugo Maeda)
* Status: Closed
* Priority: Normal
----------------------------------------
I withdrew the proposal of String#bytesplice in #13110 because it may cause problems if the specified offset does not land on character boundary.
But how about to raise IndexError in such cases?
```
# encoding: utf-8
s = "あいうえおかきくけこ"
s.bytesplice(9, 6, "xx")
p s #=> "あいうxxかきくけこ"
s.bytesplice(2, 3, "x") #=> offset 2 does not land on character boundary (IndexError)
s.bytesplice(3, 4, "x") #=> offset 7 does not land on character boundary (IndexError)
```
## Pull request
https://github.com/ruby/ruby/pull/5584
## Spec
```
bytesplice(index, length, str) -> string
bytesplice(range, str) -> string
```
Replaces some or all of the content of +self+ with +str+, and returns +str+.
The portion of the string affected is determined using the same criteria as String#byteslice, except that +length+ cannot be omitted.
If the replacement string is not the same length as the text it is replacing, the string will be adjusted accordingly.
The form that take an Integer will raise an IndexError if the value is out of range; the Range form will raise a RangeError.
If the beginning or ending offset does not land on character (codepoint) boundary, an IndexError will be raised.
## Motivation
On a text editor [Textbringer](https://github.com/shugo/textbringer/pull/31/files), the content of a buffer is represented by a String whose encoding is ASCII-8BIT, and `force_encoding(Encoding::UTF_8)` is called when necessary.
It's because point (cursor position) and marks are represented by byte offsets for performance, and currently there is no way to modify UTF-8 strings with byte offsets.
If String#bytesplice is introduced, the content of a text buffer can be represented by a UTF-8 string, and force_encoding can be removed: https://github.com/shugo/textbringer/pull/31/files
--
https://bugs.ruby-lang.org/
Issue #16671 has been updated by nobu (Nobuyoshi Nakada).
What we often tend to write frequently are safe navigation operator and indented here-docs.
So I'll be happy it were 2.3 or later.
----------------------------------------
Misc #16671: BASERUBY version policy
https://bugs.ruby-lang.org/issues/16671#change-101068
* Author: ko1 (Koichi Sasada)
* Status: Assigned
* Priority: Normal
* Assignee: hsbt (Hiroshi SHIBATA)
----------------------------------------
Ruby 2.7 (MRI) requires Ruby 2.2 or later (== BASERUBY) to build from repository.
Tarball does not need any installed Ruby.
To build latest Ruby from repository, you need to build Ruby 2.2 and later from a tarball.
Can we make the BASERUBY version update policy?
# Background
To use `ISeq#to_a` (specific format) Ruby 2.2 or later is needed from Ruby 2.7.
The oldest version used by RubyCI machines was Ruby 2.2, so I determined update.
I needed to file a ticket about this version bump, sorry.
---
related: https://bugs.ruby-lang.org/issues/16668
--
https://bugs.ruby-lang.org/
Issue #16671 has been updated by hsbt (Hiroshi SHIBATA).
How about the Ruby version provided by the stable Ubuntu release?
* https://packages.ubuntu.com/search?keywords=ruby
* bionic (18.04LTS): 2.5.1, standard EOL date is 2023/04
* focal (20.04LTS): 2.7, standard EOL date is 2025/04
* jammy (22.04LTS): 3.0, standard EOL date is 2027/04
So, we can upgrade the BASERUBY version of Ruby 3.3 to Ruby 2.7 in this year.
----------------------------------------
Misc #16671: BASERUBY version policy
https://bugs.ruby-lang.org/issues/16671#change-101061
* Author: ko1 (Koichi Sasada)
* Status: Assigned
* Priority: Normal
* Assignee: hsbt (Hiroshi SHIBATA)
----------------------------------------
Ruby 2.7 (MRI) requires Ruby 2.2 or later (== BASERUBY) to build from repository.
Tarball does not need any installed Ruby.
To build latest Ruby from repository, you need to build Ruby 2.2 and later from a tarball.
Can we make the BASERUBY version update policy?
# Background
To use `ISeq#to_a` (specific format) Ruby 2.2 or later is needed from Ruby 2.7.
The oldest version used by RubyCI machines was Ruby 2.2, so I determined update.
I needed to file a ticket about this version bump, sorry.
---
related: https://bugs.ruby-lang.org/issues/16668
--
https://bugs.ruby-lang.org/