[ruby-core:111024] [Ruby master Feature#19102] Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines

Issue #19102 has been updated by k0kubun (Takashi Kokubun). Would you mind reviewing https://github.com/ruby/erb/pull/39? I skipped this ticket's change for truffleruby in the PR. ---------------------------------------- Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines https://bugs.ruby-lang.org/issues/19102#change-100278 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Change the behavior of `ERB::Util.html_escape` in the following two parts: 1. Skip converting an argument with `#to_s` if the argument is already a `T_STRING`. 2. Do not allocate and return a new String when nothing needs to be escaped. ## Background The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_s)`. So the performance is almost equal to `CGI.escapeHTML` except for the `to_s` call. Because it's common to embed non-String expressions in template engines, a template engine typically calls `to_s` to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a `String` is embedded. We ignore the extreme case that `String#to_s` is weirdly monkey-patched. As to proposal (2), my original implementation of `CGI.escapeHTML` https://github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old `gsub`-based implementation, we added the unneeded `rb_str_dup` call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because `ERB::Util.html_escape` is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`. ## Benchmark Implementation: https://github.com/ruby/erb/pull/27 ```rb require 'benchmark/ips' require 'erb' class << ERB::Util def html_escape_old(s) CGI.escapeHTML(s.to_s) end end Benchmark.ips do |x| s = 'hello world' x.report('before') { ERB::Util.html_escape_old(s) } x.report('after') { ERB::Util.html_escape(s) } x.compare! end ``` ``` ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] Warming up -------------------------------------- before 1.066M i/100ms after 1.879M i/100ms Calculating ------------------------------------- before 10.615M (± 0.3%) i/s - 53.320M in 5.023083s after 18.742M (± 0.4%) i/s - 93.929M in 5.011847s Comparison: after: 18741747.6 i/s before: 10615137.1 i/s - 1.77x (± 0.00) slower ``` -- https://bugs.ruby-lang.org/
participants (1)
-
k0kubun (Takashi Kokubun)