- ruby-core - ml.ruby-lang.org

[ruby-core:111035] [Ruby master Misc#19155] documentation of Pathname#join with absolute path

by colorbox (box color)

Issue #19155 has been reported by colorbox (box color). ---------------------------------------- Misc #19155: documentation of Pathname#join with absolute path https://bugs.ruby-lang.org/issues/19155 * Author: colorbox (box color) * Status: Open * Priority: Normal ---------------------------------------- Pathname#join ignores previous directory name before absolute path Is this intentional? ```irb irb(main):002:0> require 'pathname' => true irb(main):003:0> Pathname('/foo').join('bar', 'baz') => #<Pathname:/foo/bar/baz> irb(main):004:0> Pathname('/foo').join('bar', '/baz') => #<Pathname:/baz> irb(main):005:0> ➜✗ ruby -v ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-darwin19] ``` I found that this behavior is intentional from test code but I cannot found reason. https://github.com/ruby/pathname/blob/master/test/pathname/test_pathname.rb… There was no description of this behavior in the documentation. -- https://bugs.ruby-lang.org/

1 year, 5 months

2
1
0 0

[ruby-core:111032] [Ruby master Bug#4040] SystemStackError with Hash[*a] for Large _a_

by jeremyevans0 (Jeremy Evans)

Issue #4040 has been updated by jeremyevans0 (Jeremy Evans). I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/6816 ---------------------------------------- Bug #4040: SystemStackError with Hash[*a] for Large _a_ https://bugs.ruby-lang.org/issues/4040#change-100285 * Author: runpaint (Run Paint Run Run) * Status: Assigned * Priority: Normal * Assignee: ko1 (Koichi Sasada) * ruby -v: ruby 1.9.3dev (2010-11-09 trunk 29737) [x86_64-linux] * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- =begin I've been hesitating over whether to file a ticket about this, so please feel free to close if I've made the wrong choice. I often use Hash[*array.flatten] in IRB to convert arrays of arrays into hashes. Today I noticed that if the array is big enough, this would raise a SystemStackError. Puzzled, I looked deeper. I assumed I was hitting the maximum number of arguments a method's argc can hold, but realised that the minimum size of the array needed to trigger this exception differed depending on whether I used IRB or not. So, presumably this is indeed exhausting the stack... In IRB, the following is the minimal reproduction of this problem: Hash[*130648.times.map{ 1 }]; true I haven't looked for the minimum value needed with `ruby -e`, but the following reproduces: ruby -e 'Hash[*1380888.times.map{ 1 }]' I suppose this isn't technically a bug, but maybe it offers another argument for either #666 or an extension of #3131. =end -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111031] [Ruby master Misc#19030] [ANN] Migrate lists.ruby-lang.org to Google Groups

by hsbt (Hiroshi SHIBATA)

Issue #19030 has been updated by hsbt (Hiroshi SHIBATA). I and @shugo have been migrated all of mailing-lists to google groups and mailmanlists.net. ``` ruby-core - Ruby developers ruby-dev - Ruby developers (Japanese) ruby-doc - Ruby documentation ruby-list - Ruby users (Japanese) ruby-talk - Ruby users ``` Unfortunately, above lists couldn't migrate to google groups. So, I did choose to stay mailman provided by mailmanlists.net. mailmanlists.net provide the official archive built by [hyperkitty](https://gitlab.com/mailman/hyperkitty), (demo site is [here](https://lists.mailman3.org/mailman3/lists/)). I'm considering to move it from http://blade.nagaokaut.ac.jp/. And now, we will get the error notification from `notifications(a)rubytalk.discoursemail.com` when we send a message to `ruby-talk`. I shared this status to @sam.saffron . I'll close this ticket after resolving discourse integration. Thanks all. ---------------------------------------- Misc #19030: [ANN] Migrate lists.ruby-lang.org to Google Groups https://bugs.ruby-lang.org/issues/19030#change-100284 * Author: hsbt (Hiroshi SHIBATA) * Status: Assigned * Priority: Normal * Assignee: hsbt (Hiroshi SHIBATA) ---------------------------------------- Our mailing-list server that is `lists.ruby-lang.org` is too old. And it's difficult to replace new server on AWS because building mail-service on AWS has a lot of limitations. I and @shugo decided to migrate lists.ruby-lang.org to Google Groups. * In Nov-Dec 2022, we migrate the current list member to Google Groups of our google workspace. * I hope to migrate to the last list-id, But I'm not sure we can do that. * What will be used as an archive viewer has yet to be TBD status. * blade is still down. * I prefer plain text viewer like blade instead of google groups. Should we build it? I will update this plan in this thread. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111030] [Ruby master Feature#19090] Do not duplicate an unescaped string in CGI.escapeHTML

by k0kubun (Takashi Kokubun)

Issue #19090 has been updated by k0kubun (Takashi Kokubun). I get what you're saying. My position on this issue is: * `CGI` is not a good place either unless you're writing a CGI application. ERB also has `ERB::Escape` now, and I'd say "embedded Ruby escape" is a better module name than "CGI" for the purpose. * ERB 4.0.0 was shipped with a new file `erb/util.rb`, which allows you to load only a couple of escape methods, not loading an extra template engine. * The way we defined the method was designed by @jeremyevans0 for Erubi. Loading the ERB template engine would be the last thing the Erubi maintainer would like to do. So, thanks to his idea, it's possible to load only escape methods. * `ERB::Util.html_escape` is monkey-patched by Rails and it's been the only official `html_safe`-compatible HTML escape method for years (while it's been using Erubis or Erubi). It's the only Rails-official way to do it, and moving it to somewhere else would be unnecessarily disruptive. ---------------------------------------- Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML https://bugs.ruby-lang.org/issues/19090#change-100283 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped. More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped. ## Background My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance. ## Motivation The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f. The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case. [Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today. The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal. ## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`? Two reasons: * The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist. * Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that. * We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case. ## Why not a new method? It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111029] [Ruby master Feature#19090] Do not duplicate an unescaped string in CGI.escapeHTML

by Eregon (Benoit Daloze)

Issue #19090 has been updated by Eregon (Benoit Daloze). Right, I forgot CGI is a default gem too. I think it seems cleaner for other template engines (e.g. haml, slim, etc) to depend (as in require "cgi") on CGI vs depending on ERB, i.e. CGI feels smaller/like a subset of ERB. In other words, it doesn't seem ideal for other template engines to depend on the ERB template engine just for escaping (although practically speaking there is no issue, just from a design perspective). ---------------------------------------- Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML https://bugs.ruby-lang.org/issues/19090#change-100282 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped. More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped. ## Background My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance. ## Motivation The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f. The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case. [Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today. The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal. ## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`? Two reasons: * The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist. * Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that. * We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case. ## Why not a new method? It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111026] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding

by javanthropus (Jeremy Bopp)

Issue #18899 has been updated by javanthropus (Jeremy Bopp). Please also see #18995 for another example of the intricate implementation behaving unexpectedly. During my own investigation, I discovered that using `"-"` for the internal encoding name is silently ignored. According to the comments in the code, `"-"` is used to indicate no conversion, but it's completely undocumented for the method. If you use `"-"` for the external encoding name, you get similarly divergent behavior as reported for this issue if you pass `"-:utf-8"` vs. `"-"`, `"utf-8"`. ---------------------------------------- Bug #18899: Inconsistent argument handling in IO#set_encoding https://bugs.ruby-lang.org/issues/18899#change-100280 * Author: javanthropus (Jeremy Bopp) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- `IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding. This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`: ```ruby #!/usr/bin/env ruby def show(io, args) printf( "args: %-50s external encoding: %-25s internal encoding: %-25s\n", args.inspect, io.external_encoding.inspect, io.internal_encoding.inspect ) end File.open('/dev/null') do |f| args = ['binary:utf-8'] f.set_encoding(*args) show(f, args) args = ['binary', 'utf-8'] f.set_encoding(*args) show(f, args) args = [Encoding.find('binary'), Encoding.find('utf-8')] f.set_encoding(*args) show(f, args) end ``` This behavior is the same from Ruby 2.7.0 to 3.1.2. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111025] [Ruby master Feature#19090] Do not duplicate an unescaped string in CGI.escapeHTML

by k0kubun (Takashi Kokubun)

Issue #19090 has been updated by k0kubun (Takashi Kokubun). I filed a PR for truffleruby https://github.com/ruby/erb/pull/39. > I think it is important to keep the optimized HTML escape in core/stdlib (e.g., in CGI) I didn't get what you mean in this part. CGI and ERB are both default gems. There's no distinction in its "coreness" between them. ---------------------------------------- Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML https://bugs.ruby-lang.org/issues/19090#change-100279 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped. More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped. ## Background My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance. ## Motivation The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f. The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case. [Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today. The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal. ## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`? Two reasons: * The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist. * Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that. * We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case. ## Why not a new method? It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111024] [Ruby master Feature#19102] Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines

by k0kubun (Takashi Kokubun)

Issue #19102 has been updated by k0kubun (Takashi Kokubun). Would you mind reviewing https://github.com/ruby/erb/pull/39? I skipped this ticket's change for truffleruby in the PR. ---------------------------------------- Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines https://bugs.ruby-lang.org/issues/19102#change-100278 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Change the behavior of `ERB::Util.html_escape` in the following two parts: 1. Skip converting an argument with `#to_s` if the argument is already a `T_STRING`. 2. Do not allocate and return a new String when nothing needs to be escaped. ## Background The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_s)`. So the performance is almost equal to `CGI.escapeHTML` except for the `to_s` call. Because it's common to embed non-String expressions in template engines, a template engine typically calls `to_s` to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a `String` is embedded. We ignore the extreme case that `String#to_s` is weirdly monkey-patched. As to proposal (2), my original implementation of `CGI.escapeHTML` https://github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old `gsub`-based implementation, we added the unneeded `rb_str_dup` call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because `ERB::Util.html_escape` is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`. ## Benchmark Implementation: https://github.com/ruby/erb/pull/27 ```rb require 'benchmark/ips' require 'erb' class << ERB::Util def html_escape_old(s) CGI.escapeHTML(s.to_s) end end Benchmark.ips do |x| s = 'hello world' x.report('before') { ERB::Util.html_escape_old(s) } x.report('after') { ERB::Util.html_escape(s) } x.compare! end ``` ``` ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] Warming up -------------------------------------- before 1.066M i/100ms after 1.879M i/100ms Calculating ------------------------------------- before 10.615M (± 0.3%) i/s - 53.320M in 5.023083s after 18.742M (± 0.4%) i/s - 93.929M in 5.011847s Comparison: after: 18741747.6 i/s before: 10615137.1 i/s - 1.77x (± 0.00) slower ``` -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111021] [Ruby master Feature#19090] Do not duplicate an unescaped string in CGI.escapeHTML

by Eregon (Benoit Daloze)

Issue #19090 has been updated by Eregon (Benoit Daloze). I did not expect `rb_str_dup()` is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it. I think it is important to keep the optimized HTML escape in core/stdlib (e.g., in CGI), as the fastest way is implementation-dependent. See https://bugs.ruby-lang.org/issues/19102#note-4 I think we should add an optional argument to avoid copying the string when unchanged, that's easy and avoids yet another place/method with that escaping logic. ---------------------------------------- Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML https://bugs.ruby-lang.org/issues/19090#change-100276 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped. More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped. ## Background My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance. ## Motivation The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f. The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case. [Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today. The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal. ## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`? Two reasons: * The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist. * Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that. * We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case. ## Why not a new method? It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason. -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

[ruby-core:111020] [Ruby master Feature#19102] Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines

by Eregon (Benoit Daloze)

Issue #19102 has been updated by Eregon (Benoit Daloze). I think it is unfortunate to add a C extension for ERB for that, ERB was always pure-Ruby and that was nice. Also the C extension is slower on TruffleRuby, the Regexp is actually JIT-compiled and can use vectorization, unlike that C code. Also part of it is RSTRING_PTR() basically forces a copy from managed memory (byte[]) to native memory (char*) on TruffleRuby. ``` truffleruby 23.0.0-dev-57e53f8a, like ruby 3.1.2, GraalVM CE Native [x86_64-linux] CGI.escapeHTML 31.985M (± 1.2%) i/s - 160.093M in 5.006001s ERB::Util.html_escape 7.427M (± 3.3%) i/s - 37.162M in 5.009721s ``` and CRuby 3.1 is: ``` ERB::Util.html_escape 14.551M (± 0.8%) i/s - 73.308M in 5.038335s CGI.escapeHTML 10.065M (± 0.6%) i/s - 51.054M in 5.072629s ``` Given those results, could you build the C extension only for CRuby? I think it would also be much nicer to keep the optimized HTML escape in CGI which is in stdlib, so it can be used by all templates engines. In #19090 I did not expect `rb_str_dup()` is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it. A new method in CGI sounds best, or probably nicer an optional argument whether to always return a copy. It seems tricky to change the existing method to return a mutable string as-is, I guess the person who found #11858 actually in that incompatibility. Ruby users seem to assume in general if a core/stdlib method returns a string either it's frozen or they are free to modify it, but here it would actually also mutate the original String passed to `CGI.escapeHTML`. `String#to_s` really sounds like something every Ruby JIT/VM should be able to trivially optimize. So that I think should be an insignificant cost, even on CRuby (and if it isn't it should be easy to fix). ---------------------------------------- Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines https://bugs.ruby-lang.org/issues/19102#change-100275 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Change the behavior of `ERB::Util.html_escape` in the following two parts: 1. Skip converting an argument with `#to_s` if the argument is already a `T_STRING`. 2. Do not allocate and return a new String when nothing needs to be escaped. ## Background The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_s)`. So the performance is almost equal to `CGI.escapeHTML` except for the `to_s` call. Because it's common to embed non-String expressions in template engines, a template engine typically calls `to_s` to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a `String` is embedded. We ignore the extreme case that `String#to_s` is weirdly monkey-patched. As to proposal (2), my original implementation of `CGI.escapeHTML` https://github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old `gsub`-based implementation, we added the unneeded `rb_str_dup` call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because `ERB::Util.html_escape` is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`. ## Benchmark Implementation: https://github.com/ruby/erb/pull/27 ```rb require 'benchmark/ips' require 'erb' class << ERB::Util def html_escape_old(s) CGI.escapeHTML(s.to_s) end end Benchmark.ips do |x| s = 'hello world' x.report('before') { ERB::Util.html_escape_old(s) } x.report('after') { ERB::Util.html_escape(s) } x.compare! end ``` ``` ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] Warming up -------------------------------------- before 1.066M i/100ms after 1.879M i/100ms Calculating ------------------------------------- before 10.615M (± 0.3%) i/s - 53.320M in 5.023083s after 18.742M (± 0.4%) i/s - 93.929M in 5.011847s Comparison: after: 18741747.6 i/s before: 10615137.1 i/s - 1.77x (± 0.00) slower ``` -- https://bugs.ruby-lang.org/

1 year, 5 months

1
0
0 0

2024

2023

2022

ruby-core

2024

2023

2022

ruby-core ----- 2024 ----- May 2024 April 2024 March 2024 February 2024 January 2024 ----- 2023 ----- December 2023 November 2023 October 2023 September 2023 August 2023 July 2023 June 2023 May 2023 April 2023 March 2023 February 2023 January 2023 ----- 2022 ----- December 2022 November 2022

ruby-core