Issue #21795 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-9:
When new syntax is introduced to the Ruby master branch, the built-in `prism.c` is updated immediately. In this scenario, if we attempt to retrieve `#ast` using the node definitions from a released prism gem, I am concerned that we will not get a correct AST due to the node definition mismatch.
AFAIK the parser and node definitions always match. If the node definitions need changes, they would be updated at the same time than `prism.c`. If updating node definitions is somehow forgotten, then it needs to be fixed anyway (regardless of this issue), but it will only result in e.g. not having a new node field yet, not a big deal. As such there will never be "an incorrect AST". The scenario you mention would *only* be an issue when all of these are the case: * Using ruby-master and not a release * Using `#ast` on a file which uses new syntax not in the latest prism release (very rare already) * Using Bundler (just using RubyGems would pick the prism default gem which has the latest syntax changes) * In Gemfile, depending on a release version of prism and not using `bundle install --prefer-local` (which would pick the prism default gem) This seems such a rare case, and there are solutions like `bundle install --prefer-local` for those cases, or releasing prism (e.g. if new syntax is being adopted quickly and widely). In such a case, it wouldn't return an incorrect AST, it would raise a SyntaxError for the new syntax being used and not being recognized (or a `Prism::CurrentVersionError` if using an old Prism release). Isn't that good enough? I think it's worth highlighting that users of Ruby releases wouldn't have this problem at all, they would just need to depend on a recent enough `prism`, which is already a requirement and is fine for the many existing usages of Prism. --- If we do want to raise when using an older Prism, one idea here is: `Method#ast` would do `require "prism"` and after that `require` it would check that `Prism::VERSION >= EMBEDDED_PRISM_VERSION` (i.e. the version of Prism used by the interpreter to parse). If not, raise an exception. It's similar to the Prism ABI idea but simpler and doesn't require to maintain such an ABI version manually. One change we would need is to bump to the next version immediately in ruby/prism whenever doing a release, so e.g. on master the prism gem would be reported as 1.10.0 (or 1.10.0.dev to be more explicit), and not as 1.9.0 (which is the current latest release). That way the last release would be considered incompatible since there might be syntax changes since then. --- matz (Yukihiro Matsumoto) wrote in #note-11:
I'm not sure `ast` is the right name. The nodes returned by Prism retain concrete information such as positions, whitespace, and comments, making them closer to a Concrete Syntax Tree than an Abstract Syntax Tree. A name like `node` or `syntax_tree` might be more accurate.
The `parser` and `ast` gems and `RubyVM::AbstractSyntaxTree` call them "AST" and they also have positions. I'm not sure what is meant by whitespace, Prism itself doesn't return objects for whitespace (even in the lexer). Regarding comments, those would be ignored for these new methods as they would return a `Prism::Node`, not a `Prism::ParseResult`, but even then I think it's a minor detail. Prism is also not a pure Concrete Syntax Tree as e.g. postfix-if and regular `if` are both `Prism::IfNode`. And it's clearly used successfully as an abstract syntax tree in Ruby implementations. I think `ast` is the name most Rubyists would expect. `syntax_tree` sounds fine to me too if `ast` is not acceptable. `node` sounds rather ambiguous to me. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116748 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/