Issue #19155 has been reported by colorbox (box color).
----------------------------------------
Misc #19155: documentation of Pathname#join with absolute path
https://bugs.ruby-lang.org/issues/19155
* Author: colorbox (box color)
* Status: Open
* Priority: Normal
----------------------------------------
Pathname#join ignores previous directory name before absolute path
Is this intentional?
```irb
irb(main):002:0> require 'pathname'
=> true
irb(main):003:0> Pathname('/foo').join('bar', 'baz')
=> #<Pathname:/foo/bar/baz>
irb(main):004:0> Pathname('/foo').join('bar', '/baz')
=> #<Pathname:/baz>
irb(main):005:0>
➜✗ ruby -v
ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-darwin19]
```
I found that this behavior is intentional from test code but I cannot found reason.
https://github.com/ruby/pathname/blob/master/test/pathname/test_pathname.rb…
There was no description of this behavior in the documentation.
--
https://bugs.ruby-lang.org/
Issue #4040 has been updated by jeremyevans0 (Jeremy Evans).
I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/6816
----------------------------------------
Bug #4040: SystemStackError with Hash[*a] for Large _a_
https://bugs.ruby-lang.org/issues/4040#change-100285
* Author: runpaint (Run Paint Run Run)
* Status: Assigned
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* ruby -v: ruby 1.9.3dev (2010-11-09 trunk 29737) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
=begin
I've been hesitating over whether to file a ticket about this, so please feel free to close if I've made the wrong choice.
I often use Hash[*array.flatten] in IRB to convert arrays of arrays into hashes. Today I noticed that if the array is big enough, this would raise a SystemStackError. Puzzled, I looked deeper. I assumed I was hitting the maximum number of arguments a method's argc can hold, but realised that the minimum size of the array needed to trigger this exception differed depending on whether I used IRB or not. So, presumably this is indeed exhausting the stack...
In IRB, the following is the minimal reproduction of this problem:
Hash[*130648.times.map{ 1 }]; true
I haven't looked for the minimum value needed with `ruby -e`, but the following reproduces:
ruby -e 'Hash[*1380888.times.map{ 1 }]'
I suppose this isn't technically a bug, but maybe it offers another argument for either #666 or an extension of #3131.
=end
--
https://bugs.ruby-lang.org/
Issue #19030 has been updated by hsbt (Hiroshi SHIBATA).
I and @shugo have been migrated all of mailing-lists to google groups and mailmanlists.net.
```
ruby-core - Ruby developers
ruby-dev - Ruby developers (Japanese)
ruby-doc - Ruby documentation
ruby-list - Ruby users (Japanese)
ruby-talk - Ruby users
```
Unfortunately, above lists couldn't migrate to google groups. So, I did choose to stay mailman provided by mailmanlists.net.
mailmanlists.net provide the official archive built by [hyperkitty](https://gitlab.com/mailman/hyperkitty), (demo site is [here](https://lists.mailman3.org/mailman3/lists/)). I'm considering to move it from http://blade.nagaokaut.ac.jp/.
And now, we will get the error notification from `notifications(a)rubytalk.discoursemail.com` when we send a message to `ruby-talk`. I shared this status to @sam.saffron .
I'll close this ticket after resolving discourse integration. Thanks all.
----------------------------------------
Misc #19030: [ANN] Migrate lists.ruby-lang.org to Google Groups
https://bugs.ruby-lang.org/issues/19030#change-100284
* Author: hsbt (Hiroshi SHIBATA)
* Status: Assigned
* Priority: Normal
* Assignee: hsbt (Hiroshi SHIBATA)
----------------------------------------
Our mailing-list server that is `lists.ruby-lang.org` is too old. And it's difficult to replace new server on AWS because building mail-service on AWS has a lot of limitations. I and @shugo decided to migrate lists.ruby-lang.org to Google Groups.
* In Nov-Dec 2022, we migrate the current list member to Google Groups of our google workspace.
* I hope to migrate to the last list-id, But I'm not sure we can do that.
* What will be used as an archive viewer has yet to be TBD status.
* blade is still down.
* I prefer plain text viewer like blade instead of google groups. Should we build it?
I will update this plan in this thread.
--
https://bugs.ruby-lang.org/
Issue #19090 has been updated by k0kubun (Takashi Kokubun).
I get what you're saying. My position on this issue is:
* `CGI` is not a good place either unless you're writing a CGI application. ERB also has `ERB::Escape` now, and I'd say "embedded Ruby escape" is a better module name than "CGI" for the purpose.
* ERB 4.0.0 was shipped with a new file `erb/util.rb`, which allows you to load only a couple of escape methods, not loading an extra template engine.
* The way we defined the method was designed by @jeremyevans0 for Erubi. Loading the ERB template engine would be the last thing the Erubi maintainer would like to do. So, thanks to his idea, it's possible to load only escape methods.
* `ERB::Util.html_escape` is monkey-patched by Rails and it's been the only official `html_safe`-compatible HTML escape method for years (while it's been using Erubis or Erubi). It's the only Rails-official way to do it, and moving it to somewhere else would be unnecessarily disruptive.
----------------------------------------
Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML
https://bugs.ruby-lang.org/issues/19090#change-100283
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped.
More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped.
## Background
My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance.
## Motivation
The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.
The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case.
[Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today.
The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal.
## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?
Two reasons:
* The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist.
* Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that.
* We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case.
## Why not a new method?
It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason.
--
https://bugs.ruby-lang.org/
Issue #19090 has been updated by Eregon (Benoit Daloze).
Right, I forgot CGI is a default gem too.
I think it seems cleaner for other template engines (e.g. haml, slim, etc) to depend (as in require "cgi") on CGI vs depending on ERB, i.e. CGI feels smaller/like a subset of ERB. In other words, it doesn't seem ideal for other template engines to depend on the ERB template engine just for escaping (although practically speaking there is no issue, just from a design perspective).
----------------------------------------
Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML
https://bugs.ruby-lang.org/issues/19090#change-100282
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped.
More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped.
## Background
My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance.
## Motivation
The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.
The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case.
[Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today.
The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal.
## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?
Two reasons:
* The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist.
* Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that.
* We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case.
## Why not a new method?
It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason.
--
https://bugs.ruby-lang.org/
Issue #18899 has been updated by javanthropus (Jeremy Bopp).
Please also see #18995 for another example of the intricate implementation behaving unexpectedly. During my own investigation, I discovered that using `"-"` for the internal encoding name is silently ignored. According to the comments in the code, `"-"` is used to indicate no conversion, but it's completely undocumented for the method. If you use `"-"` for the external encoding name, you get similarly divergent behavior as reported for this issue if you pass `"-:utf-8"` vs. `"-"`, `"utf-8"`.
----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100280
* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.
This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:
```ruby
#!/usr/bin/env ruby
def show(io, args)
printf(
"args: %-50s external encoding: %-25s internal encoding: %-25s\n",
args.inspect,
io.external_encoding.inspect,
io.internal_encoding.inspect
)
end
File.open('/dev/null') do |f|
args = ['binary:utf-8']
f.set_encoding(*args)
show(f, args)
args = ['binary', 'utf-8']
f.set_encoding(*args)
show(f, args)
args = [Encoding.find('binary'), Encoding.find('utf-8')]
f.set_encoding(*args)
show(f, args)
end
```
This behavior is the same from Ruby 2.7.0 to 3.1.2.
--
https://bugs.ruby-lang.org/
Issue #19090 has been updated by k0kubun (Takashi Kokubun).
I filed a PR for truffleruby https://github.com/ruby/erb/pull/39.
> I think it is important to keep the optimized HTML escape in core/stdlib (e.g., in CGI)
I didn't get what you mean in this part. CGI and ERB are both default gems. There's no distinction in its "coreness" between them.
----------------------------------------
Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML
https://bugs.ruby-lang.org/issues/19090#change-100279
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped.
More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped.
## Background
My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance.
## Motivation
The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.
The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case.
[Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today.
The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal.
## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?
Two reasons:
* The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist.
* Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that.
* We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case.
## Why not a new method?
It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason.
--
https://bugs.ruby-lang.org/
Issue #19102 has been updated by k0kubun (Takashi Kokubun).
Would you mind reviewing https://github.com/ruby/erb/pull/39? I skipped this ticket's change for truffleruby in the PR.
----------------------------------------
Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines
https://bugs.ruby-lang.org/issues/19102#change-100278
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Change the behavior of `ERB::Util.html_escape` in the following two parts:
1. Skip converting an argument with `#to_s` if the argument is already a `T_STRING`.
2. Do not allocate and return a new String when nothing needs to be escaped.
## Background
The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_s)`. So the performance is almost equal to `CGI.escapeHTML` except for the `to_s` call. Because it's common to embed non-String expressions in template engines, a template engine typically calls `to_s` to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a `String` is embedded. We ignore the extreme case that `String#to_s` is weirdly monkey-patched.
As to proposal (2), my original implementation of `CGI.escapeHTML` https://github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old `gsub`-based implementation, we added the unneeded `rb_str_dup` call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because `ERB::Util.html_escape` is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`.
## Benchmark
Implementation: https://github.com/ruby/erb/pull/27
```rb
require 'benchmark/ips'
require 'erb'
class << ERB::Util
def html_escape_old(s)
CGI.escapeHTML(s.to_s)
end
end
Benchmark.ips do |x|
s = 'hello world'
x.report('before') { ERB::Util.html_escape_old(s) }
x.report('after') { ERB::Util.html_escape(s) }
x.compare!
end
```
```
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
before 1.066M i/100ms
after 1.879M i/100ms
Calculating -------------------------------------
before 10.615M (± 0.3%) i/s - 53.320M in 5.023083s
after 18.742M (± 0.4%) i/s - 93.929M in 5.011847s
Comparison:
after: 18741747.6 i/s
before: 10615137.1 i/s - 1.77x (± 0.00) slower
```
--
https://bugs.ruby-lang.org/
Issue #19090 has been updated by Eregon (Benoit Daloze).
I did not expect `rb_str_dup()` is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it.
I think it is important to keep the optimized HTML escape in core/stdlib (e.g., in CGI), as the fastest way is implementation-dependent.
See https://bugs.ruby-lang.org/issues/19102#note-4
I think we should add an optional argument to avoid copying the string when unchanged, that's easy and avoids yet another place/method with that escaping logic.
----------------------------------------
Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML
https://bugs.ruby-lang.org/issues/19090#change-100276
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped.
More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped.
## Background
My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance.
## Motivation
The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.
The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case.
[Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today.
The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal.
## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?
Two reasons:
* The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist.
* Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that.
* We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case.
## Why not a new method?
It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason.
--
https://bugs.ruby-lang.org/
Issue #19102 has been updated by Eregon (Benoit Daloze).
I think it is unfortunate to add a C extension for ERB for that, ERB was always pure-Ruby and that was nice.
Also the C extension is slower on TruffleRuby, the Regexp is actually JIT-compiled and can use vectorization, unlike that C code. Also part of it is RSTRING_PTR() basically forces a copy from managed memory (byte[]) to native memory (char*) on TruffleRuby.
```
truffleruby 23.0.0-dev-57e53f8a, like ruby 3.1.2, GraalVM CE Native [x86_64-linux]
CGI.escapeHTML 31.985M (± 1.2%) i/s - 160.093M in 5.006001s
ERB::Util.html_escape 7.427M (± 3.3%) i/s - 37.162M in 5.009721s
```
and CRuby 3.1 is:
```
ERB::Util.html_escape 14.551M (± 0.8%) i/s - 73.308M in 5.038335s
CGI.escapeHTML 10.065M (± 0.6%) i/s - 51.054M in 5.072629s
```
Given those results, could you build the C extension only for CRuby?
I think it would also be much nicer to keep the optimized HTML escape in CGI which is in stdlib, so it can be used by all templates engines.
In #19090 I did not expect `rb_str_dup()` is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it.
A new method in CGI sounds best, or probably nicer an optional argument whether to always return a copy.
It seems tricky to change the existing method to return a mutable string as-is, I guess the person who found #11858 actually in that incompatibility.
Ruby users seem to assume in general if a core/stdlib method returns a string either it's frozen or they are free to modify it, but here it would actually also mutate the original String passed to `CGI.escapeHTML`.
`String#to_s` really sounds like something every Ruby JIT/VM should be able to trivially optimize.
So that I think should be an insignificant cost, even on CRuby (and if it isn't it should be easy to fix).
----------------------------------------
Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines
https://bugs.ruby-lang.org/issues/19102#change-100275
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Change the behavior of `ERB::Util.html_escape` in the following two parts:
1. Skip converting an argument with `#to_s` if the argument is already a `T_STRING`.
2. Do not allocate and return a new String when nothing needs to be escaped.
## Background
The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_s)`. So the performance is almost equal to `CGI.escapeHTML` except for the `to_s` call. Because it's common to embed non-String expressions in template engines, a template engine typically calls `to_s` to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a `String` is embedded. We ignore the extreme case that `String#to_s` is weirdly monkey-patched.
As to proposal (2), my original implementation of `CGI.escapeHTML` https://github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old `gsub`-based implementation, we added the unneeded `rb_str_dup` call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because `ERB::Util.html_escape` is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`.
## Benchmark
Implementation: https://github.com/ruby/erb/pull/27
```rb
require 'benchmark/ips'
require 'erb'
class << ERB::Util
def html_escape_old(s)
CGI.escapeHTML(s.to_s)
end
end
Benchmark.ips do |x|
s = 'hello world'
x.report('before') { ERB::Util.html_escape_old(s) }
x.report('after') { ERB::Util.html_escape(s) }
x.compare!
end
```
```
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
before 1.066M i/100ms
after 1.879M i/100ms
Calculating -------------------------------------
before 10.615M (± 0.3%) i/s - 53.320M in 5.023083s
after 18.742M (± 0.4%) i/s - 93.929M in 5.011847s
Comparison:
after: 18741747.6 i/s
before: 10615137.1 i/s - 1.77x (± 0.00) slower
```
--
https://bugs.ruby-lang.org/