
Issue #20578 has been reported by tompng (tomoya ishida). ---------------------------------------- Bug #20578: Tokenizing string literal that have newline and invalid escape is wrong https://bugs.ruby-lang.org/issues/20578 * Author: tompng (tomoya ishida) * Status: Open * ruby -v: ruby 3.4.0dev (2024-06-13T09:49:46Z master 8b843b0dc7) [x86_64-linux] * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Tokenizing string literal that have newline and invalid escape is wrong When a string literal includes `\n` and an invalid escape after it, tokenize result gets wrong. ~~~ruby Ripper.tokenize "\"hello\\x world" # => ["\"", "hello\\x", " world"] # looks good Ripper.tokenize "\"\nhello\\x world" # => ["\"", "\n world", "hello\\x"] # order is reversed ~~~ These invalid escapes also gets wrong ~~~ruby Ripper.tokenize("\"\n\\Cxx\"") #=> ["\"", "\nx", "\\Cx", "\""] Ripper.tokenize("\"\n\\Mxx\"") #=> ["\"", "\nx", "\\Mx", "\""] Ripper.tokenize("\"\n\\c\\cx\"") #=> ["\"", "\nx", "\\c\\c", "\""] Ripper.tokenize("\"\n\\ux\"") #=> ["\"", "\nx", "\""] Ripper.tokenize("\"\n\\xx\"") #=> ["\"", "\nx", "\\x", "\""] ~~~ And these literals also gets wrong ~~~ruby Ripper.tokenize("<<A\n\n\\xyz") #=> ["<<A", "\n", "\nyz", "\\x"] Ripper.tokenize("%(\n\\xyz)") #=> ["%(", "\nyz", "\\x", ")"] Ripper.tokenize("%Q(\n\\xyz)") #=> ["%Q(", "\nyz", "\\x", ")"] Ripper.tokenize(":\"\n\\xyz\"") #=> [":\"", "\nyz", "\\x", "\""] ~~~ I encountered this while typing a valid string literal into IRB ~~~ruby irb(main):001> " irb(main):002> \x█ ~~~ Other invalid escape sequence that disappears from tokenize result ~~~ruby Ripper.tokenize('"\u{123') # => ["\""] ~~~ -- https://bugs.ruby-lang.org/