[ruby-core:124311] [Ruby Feature#21795] Methods for retrieving ASTs

kddnewton (Kevin Newton)

19 Dec 2025 19 Dec '25

2:09 a.m.

Issue #21795 has been reported by kddnewton (Kevin Newton). ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Show replies by date

mame (Yusuke Endoh)

19 Dec 19 Dec

9:52 a.m.

New subject: [ruby-core:124321] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by mame (Yusuke Endoh). I anticipated that we would consider this eventually, but incorporating it into the core presents significant challenges. Here are two major issues regarding feasibility. (Based on chats with @ko1, @tompng, and @yui-knk, though these are my personal views.) ## The Implementation Approach CRuby currently discards source code and ASTs after ISeq generation. The proposed `#ast` method would have to re-read and re-parse the source, which causes two problems: 1. If the file is modified after loading, `#ast` may return the wrong node. 2. It does not work for `eval` strings. `error_highlight` accepts this fragility because it displays just "hints". But I don't think that it is allowed for a built-in method. At least, we must avoid returning an incorrect node, and clarify when failures occur. I propose two approaches: 1. Keep loaded source in memory (e.g., `RubyVM.keep_script_lines = true` by default). This supports `eval` but increase memory usage. 2. Validate source hash. Store a hash in the ISeq and check it to ensure the file hasn't changed. ## The Parser Switching Problem What is the node definition returned by `#ast`? As noted in #21618, built-in Prism is not exposed as a Ruby API. If `Gemfile.lock` specifies an older version of prism gem, even `require "prism"` won't provide the expected definition. IMO, it would be good to have a node definition that does not depend on prism gem (maybe `Ruby::Node`?). I am not sure how much effort is needed for this. We would also need to consider where to place what in the ruby/prism and ruby/ruby repositories for development. We also need to decide if `#ast` should return `RubyVM::AST::Node` when `--parser=parse.y` is specified. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-115824 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

7 Jan 7 Jan

11:41 a.m.

New subject: [ruby-core:124427] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). One idea to make it work with `--parser=parse.y` until universal parser supports the Prism API (#21825) would be to: * Get the `RubyVM::AST::Node` of that object, then extract the start/end line & start/end columns. Or do the same internally without needing a `RubyVM::AST::Node`, it's just converting from parse.y node_id to "bounds". * With those values, use something similar to [Prism.node_for](https://github.com/ruby/prism/pull/3808) to find a node base on those bounds. * There should be a single node matching those bounds because we are only looking for specific nodes. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-115976 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

kddnewton (Kevin Newton)

3:24 p.m.

New subject: [ruby-core:124437] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by kddnewton (Kevin Newton). Thanks @mame for the detailed reply! I appreciate your thoughtfulness here. With regard to the implementation approach problem, I love your solution of keeping a source hash on the iseq. I think that makes a lot of sense, and could be used in error highlight today as well. That could potentially even be used by other tools in the case of code reloading. I think we _could_ potentially store the code for eval, but I would be tempted to say let us not change anything for now and return `nil` or raise an error in that case. (In the same way we would need to return `nil` or raise an error for C methods.) For the parser switching problem, I think I would like to introduce a Prism ABI version (alongside the Prism gem version). I would update this version whenever a structural change is made (field added/renamed/removed/etc.). Then, if we could store the Prism ABI version on the ISEQ as well, we could require prism and check if the ABI version matches before attempting to re-parse. We could be clear through the error message that the Prism ABI version is a mismatch and therefore we cannot re-parse. I am not sure if we should return RubyVM::AST nodes in the case the ISEQ was compiled with parse.y/compile.c, but I am okay with it if that's the direction you would like to go. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-115986 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

16 Jan 16 Jan

9:14 a.m.

New subject: [ruby-core:124578] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). Rails' `_callable_to_source_string` would be a good use case for this, see https://github.com/rails/rails/pull/56624 ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116154 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

mame (Yusuke Endoh)

13 Feb 13 Feb

5:15 a.m.

New subject: [ruby-core:124809] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by mame (Yusuke Endoh). Eregon (Benoit Daloze) wrote in #note-2:

...

...
As noted in #21618, built-in Prism is not exposed as a Ruby API. If `Gemfile.lock` specifies an older version of prism gem, even `require "prism"` won't provide the expected definition.

This is basically a solved problem, as discussed there. In that case, `Prism.parse(foo, version: "current")` fails with a clear exception explaining one needs to use a newer prism gem.

I believe #21618 primarily discusses released Ruby versions. My concern is specifically about the behavior on the master branch. When new syntax is introduced to the Ruby master branch, the built-in `prism.c` is updated immediately. In this scenario, if we attempt to retrieve `#ast` using the node definitions from a released prism gem, I am concerned that we will not get a correct AST due to the node definition mismatch. kddnewton (Kevin Newton) wrote in #note-4:

...

For the parser switching problem, I think I would like to introduce a Prism ABI version (alongside the Prism gem version). I would update this version whenever a structural change is made (field added/renamed/removed/etc.). Then, if we could store the Prism ABI version on the ISEQ as well, we could require prism and check if the ABI version matches before attempting to re-parse. We could be clear through the error message that the Prism ABI version is a mismatch and therefore we cannot re-parse.

While this is certainly a feasible solution, I don't feel it is the optimal one. I acknowledge the engineering challenges involved, but ideally, I believe having a built-in node definition (like `Ruby::Node`) within Ruby core itself would be the simplest and best approach. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116429 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

kddnewton (Kevin Newton)

14 Feb 14 Feb

5:58 p.m.

New subject: [ruby-core:124822] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by kddnewton (Kevin Newton). Would a Ruby::Node be the same thing as a Prism::Node? As in, would it basically be a Ruby API that duplicates the Prism interface? I'm not sure about how to maintain it. For example, if we add more features to Prism's Ruby API (for example the work we've been doing on the translation layers to Ripper recently) would we also duplicate it to the various live branches of the Ruby::Node API? Or would it just be a trimmed down version? Either way, I'm not sure when I would recommend using Ruby::Node, because it seems like it would always be an out-of-date version of Prism::Node. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116443 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

matz (Yukihiro Matsumoto)

17 Mar 17 Mar

7:43 a.m.

New subject: [ruby-core:125037] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by matz (Yukihiro Matsumoto). I have two concerns before we move forward. On the name AST I'm not sure `ast` is the right name. The nodes returned by Prism retain concrete information such as positions, whitespace, and comments, making them closer to a Concrete Syntax Tree than an Abstract Syntax Tree. A name like `node` or `syntax_tree` might be more accurate. On the ABI version approach. Embedding a Prism ABI version in the ISeq sounds reasonable at first, but I'm worried it would make these methods reliably broken during active development on master — any time prism.c is updated ahead of a released gem, callers would get nil or an exception as a matter of course. That's a poor developer experience for people working on master. This concern points back to the suggestion from @mame: perhaps we need the built-in Prism to be exposed as a gem-independent API first, before we can ship these methods in a stable way. I'm positive about the overall direction. I just want to make sure we resolve these two points before committing to the API shape. Matz. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116734 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

9:46 a.m.

New subject: [ruby-core:125049] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-9:

...

When new syntax is introduced to the Ruby master branch, the built-in `prism.c` is updated immediately. In this scenario, if we attempt to retrieve `#ast` using the node definitions from a released prism gem, I am concerned that we will not get a correct AST due to the node definition mismatch.

AFAIK the parser and node definitions always match. If the node definitions need changes, they would be updated at the same time than `prism.c`. If updating node definitions is somehow forgotten, then it needs to be fixed anyway (regardless of this issue), but it will only result in e.g. not having a new node field yet, not a big deal. As such there will never be "an incorrect AST". The scenario you mention would *only* be an issue when all of these are the case: * Using ruby-master and not a release * Using `#ast` on a file which uses new syntax not in the latest prism release (very rare already) * Using Bundler (just using RubyGems would pick the prism default gem which has the latest syntax changes) * In Gemfile, depending on a release version of prism and not using `bundle install --prefer-local` (which would pick the prism default gem) This seems such a rare case, and there are solutions like `bundle install --prefer-local` for those cases, or releasing prism (e.g. if new syntax is being adopted quickly and widely). In such a case, it wouldn't return an incorrect AST, it would raise a SyntaxError for the new syntax being used and not being recognized (or a `Prism::CurrentVersionError` if using an old Prism release). Isn't that good enough? I think it's worth highlighting that users of Ruby releases wouldn't have this problem at all, they would just need to depend on a recent enough `prism`, which is already a requirement and is fine for the many existing usages of Prism. --- If we do want to raise when using an older Prism, one idea here is: `Method#ast` would do `require "prism"` and after that `require` it would check that `Prism::VERSION >= EMBEDDED_PRISM_VERSION` (i.e. the version of Prism used by the interpreter to parse). If not, raise an exception. It's similar to the Prism ABI idea but simpler and doesn't require to maintain such an ABI version manually. One change we would need is to bump to the next version immediately in ruby/prism whenever doing a release, so e.g. on master the prism gem would be reported as 1.10.0 (or 1.10.0.dev to be more explicit), and not as 1.9.0 (which is the current latest release). That way the last release would be considered incompatible since there might be syntax changes since then. --- matz (Yukihiro Matsumoto) wrote in #note-11:

...

I'm not sure `ast` is the right name. The nodes returned by Prism retain concrete information such as positions, whitespace, and comments, making them closer to a Concrete Syntax Tree than an Abstract Syntax Tree. A name like `node` or `syntax_tree` might be more accurate.

The `parser` and `ast` gems and `RubyVM::AbstractSyntaxTree` call them "AST" and they also have positions. I'm not sure what is meant by whitespace, Prism itself doesn't return objects for whitespace (even in the lexer). Regarding comments, those would be ignored for these new methods as they would return a `Prism::Node`, not a `Prism::ParseResult`, but even then I think it's a minor detail. Prism is also not a pure Concrete Syntax Tree as e.g. postfix-if and regular `if` are both `Prism::IfNode`. And it's clearly used successfully as an abstract syntax tree in Ruby implementations. I think `ast` is the name most Rubyists would expect. `syntax_tree` sounds fine to me too if `ast` is not acceptable. `node` sounds rather ambiguous to me. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116748 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

5 Apr 5 Apr

2:46 p.m.

New subject: [ruby-core:125198] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). I thought more about `node_id` and I found at least one case where it is problematic with different versions of Prism. The problem is the `node_id` from the bytecode is computed by the builtin Prism parser used by `prism_compile.c`, while the usage of e.g. `Prism.find` might use a more recent Prism. Here is a reproduction showing the problem: ```ruby if false # A code snippet which generates a different number of nodes on Prism 1.2.0 and 1.9.0 case 1 in 2 A.print message: in 3 A.print message: end end def a end def b end require "prism" p Prism.find(method(:b)) ``` ruby master is fine: ``` $ ruby -v find_check.rb ruby 4.1.0dev (2026-03-27T16:16:27Z revert-source_loca.. f510d4103e) +PRISM [x86_64-linux] @ DefNode (location: (14,0)-(15,3)) ├── flags: newline ├── name: :b ``` But on Ruby 3.4.5 it's broken: ``` # Use latest prism, there is no prism release with Prism.find yet: $ cd prism $ bundle exec rake compile $ bundle exec ruby find_check.rb @ DefNode (location: (11,0)-(12,3)) ├── flags: newline ├── name: :a ``` This returns method `a` and not `b`! And if I change `p Prism.find(method(:b))` to `p Prism.find(method(:a))`, then ruby-master is correct, but 3.4.5 returns an `IfNode`. IOW, 3.4.5 returns the node before the correct one, which can be any node. I think this illustrates well that `node_id` is brittle, it depends on the number of nodes before the node of interest. OTOH, start line/column + end line/column (or equivalently, start & end offsets) is far more robust, because it represents an actual position in the source file, independent of the Prism version. The only confusion there would be if there are nodes with exactly the same start & end offsets, and they can't be differentiated based on the input. That's not a problem for the 5 methods proposed in this issue: This relates to the discussion [here](https://bugs.ruby-lang.org/issues/6012#note-47) with @mame about whether `node_id` is better, but from this finding it's clear `node_id` is worse if the version isn't fixed. I'll quote here the relevant part about `source_location` being able to locate the right node:

...

...
However, `source_location` is not an appropriate key to look up the AST subtree corresponding to a Ruby object.

...

I believe it is though, with the knowledge of what kind of node we are looking for. For example in `def foo; bar; end`, `bar` in the AST is covered exactly by both a `StatementsNode` and a `CallNode`. If we are using `Thread::Backtrace::Location#ast` we'd want the location of the call to `bar`, so we know we want the `CallNode`, not the `StatementsNode` and there is no ambiguity. I believe the same holds for all 5 methods proposed in #21795. My intuition there is all nodes listed in https://bugs.ruby-lang.org/issues/21795#note-2 cannot have the exact same `source_location` (e.g. we cannot have code with the same starting and ending position that is two of `DefNode`, `LambdaNode`, `ForNode`, `Call*Node`, `Index*Node`, `YieldNode`). Do you have a counter-example where this wouldn't hold?

No counter-example has been found. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116931 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

2:51 p.m.

New subject: [ruby-core:125199] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). The implications of that are, assuming this proposal is implemented based on start line/column + end line/column (or equivalently, start & end offsets): * No need for `node_id` to implement this feature * This feature works with both `--parser=prism` and `--parser=parse.y` * This feature works for Ruby implementations which do not have `node_id` (e.g. TruffleRuby) * The Prism version is not a concern, the start/end offsets of such node are stable enough This makes this proposal simpler, more portable and safer, so I suggest we go this way. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-116932 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

mame (Yusuke Endoh)

14 Apr 14 Apr

6:48 a.m.

New subject: [ruby-core:125256] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by mame (Yusuke Endoh). As matz pointed out in #note-11, the ABI versioning approach would leave master in a routinely broken state. As a maintainer of error_highlight, I cannot accept this. Not being able to verify error_highlight's behavior against code using new syntax until the next Prism release would be a serious problem for me. My position is that the underlying assumption itself is untenable: that the parser and node definitions used by the interpreter, and those used by `#ast`, may legitimately differ. Eregon's example in #note-13 is presented as evidence of `node_id`'s fragility, but I read it instead as a signal that this assumption should be reconsidered. The correct fix, I believe, is to make such divergence structurally impossible. Concretely, this means either integrating the Prism repository into ruby/ruby, or keeping the Prism repository separate but synchronizing the node definitions themselves into Ruby core. When I mentioned `Ruby::Node` in #note-9, I had the latter in mind. To add a personal note, I find it structurally unnatural that the parser, the component that defines the language's syntax, is primarily developed outside ruby/ruby. Don't get me wrong, I have great respect for Kevin and the Prism team's work. But I believe that unless we take one of the two forms above, it will be difficult to settle the design of `#ast`. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-117000 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

baweaver (Brandon Weaver)

7:12 a.m.

New subject: [ruby-core:125257] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by baweaver (Brandon Weaver). matz (Yukihiro Matsumoto) wrote in #note-11:

...

I'm not sure ast is the right name. The nodes returned by Prism retain concrete information such as positions, whitespace, and comments, making them closer to a Concrete Syntax Tree than an Abstract Syntax Tree. A name like node or syntax_tree might be more accurate.

Might I recommend `to_ast`? It would be more in line with common Ruby coercion patterns, and would very quickly indicate that it is a coercion method from `Object` to an AST variation. I fear that `node` would be overloaded with various graph and tree-like algorithms. Likewise `to_syntax_tree` may be workable, but I'm still more partial towards `to_ast`. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-117001 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Earlopain (Earlopain _)

10:16 a.m.

New subject: [ruby-core:125259] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Earlopain (Earlopain _).

...

Concretely, this means either integrating the Prism repository into ruby/ruby

I don't think that would be a very good solution, prism is not only the parser as used by CRuby. It has bindings to other languages (rust, javascript, java). Also the various translators for previous ruby syntax parser gems. Then there's integration with other other runtimes like jruby and truffleruby that also live in prism. These are all equally tied to the prism version, same as CRuby. As an example the java integration with jruby/truffleruby has seen much activity recently. It is good that this is not happening in ruby/ruby since it's not relevant and also would make it more difficult for them.

...

To add a personal note, I find it structurally unnatural that the parser, the component that defines the language's syntax, is primarily developed outside ruby/ruby

The C library does not have much connection to ruby (if you ignore the one big point that it is a parser for the syntax) and can exist without it. It's one of the main reasons why it sees such big adoption. It's true that they are tightly integrated and don't make sense in isolation but it's not necessary to drive development exclusively in ruby/prism. For me it is a preference since I have no permissions on ruby/ruby but it is also more managable since it's a smaller project overall. Ruby also does not run all the tests, either because they rely on external gems like `parser` or because they just aren't synced (intentionally). But especially changes that tweak behaviour in some way tend to be done in ruby/ruby first and later synced back instead of the other way around (failing syntax tests, prism_compile.c changes, etc.) ----- Anyways, I don't think moving prism entirely into ruby/ruby would really change anything. The main problem is the mismatch between gem and standard version. You can move development into ruby/ruby but the code is already synced anyways and users can still make use of prism the gem, which gives you the same problem. As long as the prism version that ruby ships with is not used (or any other of the proposed solutions), you do not gain much from it. And solving it that way doesn't require such a drastic change. In the end you need to integrate _something_ but there's nothing stopping you from doing that today already. It would only really work if there is no prism gem that users can cause a mismatch with, so to me it sounds more like you are arguing for `Ruby::Node` instead.

...

Either way, I'm not sure when I would recommend using Ruby::Node, because it seems like it would always be an out-of-date version of Prism::Node.

It would fill the gap where `ripper` is currently used. It's always exactly what ruby uses and it's clear that there's use-cases for it, especially in connection with `node_id`. Not many would need it, prism the gem is plenty for majority of the cases but it would indeed be very helpful for usage and exposure in ruby itself. It goes back to https://bugs.ruby-lang.org/issues/21618 where I was happy with how it is handled when using the gem but with ruby internals it is not always good enough. --- More generally about the `node_id` mismatch from @eregon. I haven't checked all the cases but it looks like they are all the result of one bug or another in prism where it misparsed the input (some examples are also syntax invalid in earlier prism versions, so I don't think they should be part of the list). I'm not saying that `node_id` should be considered stable or anything. Just practically it rarely happens, even less so as prism continues to mature and never on code that people actually write. Of course, `node_id` cannot change for any other reason or risk breaking things. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-117003 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Eregon (Benoit Daloze)

9 p.m.

New subject: [ruby-core:125264] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by Eregon (Benoit Daloze). @mame What do you think about my idea to use start line/column + end line/column (or equivalently, start & end offsets)? AFAIK it solves all problems around this area, it's reliable, works across Prism versions, etc. We could even use `RubyVM::AbstractSyntaxTree` to get this line & column data on older Ruby versions, so it would work there too. Adding `Ruby::Node` seems not great to me, notably because it wouldn't be usable for gems needing to support anything older than Ruby 4.1. Also the `Ruby::Node` API would change without any control from the gem to say e.g. which major version of Prism it wants (well, it could use `required_ruby_version` but that seems very inconvenient for this purpose). With a dependency on the `prism` it lets Bundler resolve the version compatible with the various usages (or error if there isn't one). ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-117010 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

matz (Yukihiro Matsumoto)

17 Apr 17 Apr

5:38 a.m.

New subject: [ruby-core:125293] [Ruby Feature#21795] Methods for retrieving ASTs

Issue #21795 has been updated by matz (Yukihiro Matsumoto). Thanks for the analysis in #13, especially the finding about node_id fragility across Prism versions. But I don't think the offset-based approach solves the real problem. It makes the identifier more robust, but the node returned is still produced by whichever Prism happens to be loaded, which may differ from the Prism that built the ISeq. Offsets only guarantee finding a node at that position, not that its meaning matches the bytecode. The real issue is one point: the parser that interpreted the running Ruby program and the parser that returns the syntax tree must be the same. Everything else is minor. There are several ways to achieve this. Exposing the built-in Prism as a gem-independent API (mame in #1), checking an ABI version (kddnewton in #4), or carrying node_id definitions in master, etc. Any of these is fine. I leave the choice to the implementers. But I'm against adding the #ast methods until this identity is guaranteed. I remain positive on the overall direction. Let's land this prerequisite first, then proceed with the methods. Matz. ---------------------------------------- Feature #21795: Methods for retrieving ASTs https://bugs.ruby-lang.org/issues/21795#change-117047 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes: * Proc#ast * Method#ast * UnboundMethod#ast * Thread::Backtrace::Location#ast * TracePoint#ast (on call/return events) The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code. There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST. Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return `nil`, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API. The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing. -- https://bugs.ruby-lang.org/

Age (days ago)

119

Last active (days ago)

List overview

Download

15 comments

6 participants

participants (6)

baweaver (Brandon Weaver)
Earlopain (Earlopain _)
Eregon (Benoit Daloze)
kddnewton (Kevin Newton)
mame (Yusuke Endoh)
matz (Yukihiro Matsumoto)

[ruby-core:124311] [Ruby Feature#21795] Methods for retrieving ASTs

tags

participants (6)