Issue #19090 has been updated by k0kubun (Takashi Kokubun).
I get what you're saying. My position on this issue is:
* `CGI` is not a good place either unless you're writing a CGI application. ERB also
has `ERB::Escape` now, and I'd say "embedded Ruby escape" is a better module
name than "CGI" for the purpose.
* ERB 4.0.0 was shipped with a new file `erb/util.rb`, which allows you to load only a
couple of escape methods, not loading an extra template engine.
* The way we defined the method was designed by @jeremyevans0 for Erubi. Loading the ERB
template engine would be the last thing the Erubi maintainer would like to do. So, thanks
to his idea, it's possible to load only escape methods.
* `ERB::Util.html_escape` is monkey-patched by Rails and it's been the only official
`html_safe`-compatible HTML escape method for years (while it's been using Erubis or
Erubi). It's the only Rails-official way to do it, and moving it to somewhere else
would be unnecessarily disruptive.
----------------------------------------
Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML
https://bugs.ruby-lang.org/issues/19090#change-100283
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing
to be escaped.
More specifically, stop calling this `rb_str_dup`
https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that
nothing needs to be escaped.
## Background
My original implementation
https://github.com/ruby/ruby/pull/1164 was not calling it. The
reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument
object for non-escaped cases is a backward incompatibility because the original
`gsub`-based implementation always returns a new object. As a result, even while many
people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it
ended up having a compromised performance.
## Motivation
The motivation is to improve performance. By just doing so, escaping a pre-allocated
`"string"` becomes 1.34x faster on my machine
https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.
The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the
result is just embedded in another string, the allocated new object will be just wasted.
It's pretty common that an embedded string fragment doesn't contain any of
`'"&<>` characters. So we should stop wasting that to optimize that
case.
[Bug #11858] wasn't really a use case but just "I think this is backward
incompatibility" based on frozen Hello World. Unlike user input, you usually
don't need to escape your own string literal. It feels like the ticket addressed a
problem that doesn't exist in actual applications. It should have cited existing code
that could be broken by that, and I can't find such code with `gem-codesearch` today.
The only reason to maintain the current behavior would be to allow using a return value of
`CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped
value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse.
Relying on the behavior never makes sense as an "optimization" either because it
makes all other cases (the result is not used as a string buffer) suboptimal.
## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?
Two reasons:
* The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code
using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't
exist.
* Passing keyword arguments to a C extension is unfortunately slow, and it defeats the
optimization purpose. In core classes, we could use `Primitive` to address that, but this
is a default gem and we can't use that.
* We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd
spoil the readability for maintaining an invalid use case.
## Why not a new method?
It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it
already, I can't think of a good name for it. And again, not calling it `escapeHTML`
or `escape_html` would spoil the readability for no valid reason.
--
https://bugs.ruby-lang.org/