Issue #20024 has been updated by yui-knk (Kaneko Yuichiro).
# SyntaxError includes multiple errors
`SyntaxError` includes multiple errors like below, in this case two errors are included
into one `SyntaxError`. Therefore it needs to consider how to handle such cases.
```ruby
begin
eval <<~CODE
def m
retry
CODE
rescue SyntaxError => e
puts e.message
end
# (eval at test.rb:2):2: Invalid retry without rescue
# retry
# ^~~~~
# (eval at test.rb:2):3: syntax error, unexpected end-of-input, expecting `end' or
dummy end
```
We need to avoid losing information to provide rich information to `SyntaxError` users.
Therefore these are not options for this problem:
* Merge multiple errors into one `SyntaxError` subclass. Because this is misleading if
these errors are different types of errors.
* Use only a single error and ignore following erorrs. Because information is lost.
# irb use case and error tolerance parser
Regarding the [irb use
case](https://github.com/ruby/irb/blob/f86d9dbe2fc05ed62332069a27f4aacc59ba…,
it categorizes error , which is recoverable by adding tokens to the end, as
`recoverable_error` and error, which is recoverable by deleting tokens, as
`unrecoverable_error`, so that irb can determine to require more input or not.
When irb was created, error tolerance parser didn't exist. Therefore irb needs to take
care of how to recover the input using `SyntaxError#message`. However it's parser
responsibility to recover errors.
irb can categorize syntax error if error tolerance parser provides information how the
parser recover errors.
If only single error and recovery requires only token insert operations for the last of
input, it's `recoverable_error`.
By the way, does the parser always raise `SyntaxError` for invalid input? For this use
case, it's better to not raise `SyntaxError` but users ask to parser to have syntax
errors or not after parsing.
# Proposal
Considering these points, my proposal is defining parser interface.
* Parser can run without SyntaxError mode
* Parser provides a method to get syntax errors
* Each syntax error includes
* message: "syntax error, unexpected end-of-input, expecting `end' or dummy
end"
* location: (1,0)-(1,1), first/last & line/column
* operations for recovery:
* insert / delete
* location of recovery
----------------------------------------
Feature #20024: SyntaxError subclasses
https://bugs.ruby-lang.org/issues/20024#change-105468
* Author: kddnewton (Kevin Newton)
* Status: Open
* Priority: Normal
----------------------------------------
There are many places around the Ruby ecosystem that handle syntax errors in different
ways. Some provide highlighting, others provide recovery of some form, still more provide
LSP metadata. In order to provide more rich information, most of them switch on the
message of the error being returned, as in:
https://github.com/ruby/irb/blob/f86d9dbe2fc05ed62332069a27f4aacc59ba9634/l…
Within ruby/spec, specific error messages are required for these kinds of messages in
order to support this implicit interface that syntax errors have a hidden type, which is
only expressed through their message. For example:
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/…
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/…
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/…
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/…
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/…
It's not clear from these specs or from the parser itself which error messages are
permanent/guaranteed versus which are changeable. Either way, relying on the error message
itself as opposed to the type of the error is brittle at best.
I would like to suggest instead we implement subclasses on `SyntaxError` that would allow
tools that depend on specific syntax errors to rescue those subclasses instead of parsing
the message. In addition to alleviating the need to parse error messages with regex, this
would also allow for the possibility that the error messages could change in the future
without breaking external tooling.
Allowing these to change would allow them to be potentially enhanced or changed by other
tools - for example by providing recovery information or translating them.
This is particularly important for Prism since we are getting down to individual spec
failures and some of the failures are related to the fact that we have messages like
`"Numbered parameter is already used in outer scope"` where the spec requires
`/numbered parameter is already used in/`. Even this case-sensitivity is causing failures,
which seems like we're testing the wrong thing.
--
https://bugs.ruby-lang.org/