
Issue #20519 has been updated by kddnewton (Kevin Newton). Hi @brightbits! I've investigated this one at length, and can give some context. As you already discovered, Onigmo stretches well beyond regular expressions. It also provides all of the encoding support within CRuby, stretching all of the way into the parser. This has led most other Ruby implementations to have to vendor Onigmo in order to match behavior 1:1. For example TruffleRuby uses it as a fallback (https://github.com/oracle/truffleruby/blob/master/lib/cext/include/ruby/onig...), Artichoke uses it as a fallback (https://github.com/artichoke/artichoke/blob/77434156f30188a6e27f321b9b0f8437...), Natalie uses it as its regexp engine (https://github.com/natalie-lang/natalie/blob/556e8c195423daddf1c5aba49bb67dd...), etc. For these reasons replacing Onigmo entirely _may_ be possible, but it would certainly be an extremely long and arduous process because of concerns about backward compatibility. That being said, there are things that could be done. The various options would be: * What you already mentioned about handling subsets of regular expressions and splitting them up/enhancing them with additional APIs. You could do this today with ISEQ translation. (Check out https://github.com/k0kubun/ruby-jit-challenge for an intro to how this could work.) * You could interpret the Onigmo bytecode in Ruby directly and attempt to work with YJIT to get performance up. Check out a couple of links here: https://speakerdeck.com/makenowjust/rubykaigi-2024-make-your-own-regex-engin... and https://github.com/Shopify/onigmo. * You could rewrite it entirely in Ruby (https://github.com/kddnewton/exreg). The only real way this matches up with performance would be having its own JIT. Certainly possible, but difficult. ---------------------------------------- Misc #20519: Porting regexp to pure ruby? https://bugs.ruby-lang.org/issues/20519#change-108738 * Author: brightbits (Michael Baldry) * Status: Feedback ---------------------------------------- Would there be any benefit in porting Regexp from Onigmo to a pure ruby implementation that could benefit from YJIT? Compiling a pattern could be translating to a ruby method which would be optimized by YJIT easily. Has this been explored or any work done around this kind of thing, before I take a look in to it more? Many thanks -- https://bugs.ruby-lang.org/