[ruby-core:112505] [Ruby master Bug#19452] `Thread::Backtrace::Location` should have column information if possible.

Issue #19452 has been reported by ioquatix (Samuel Williams). ---------------------------------------- Bug #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by rubyFeedback (robert heiler). Perhaps this could be discussed as "one item" in an upcoming dev meeting, e. g. the extensions ioquatix suggested in the last some issues raised here. Since I love introspection I am all in favour of what can be useful for us, when writing ruby code, and wanting to get more useful information. (Note that I have not thought through if I may need this, or not; I am trusting that ioquatix is convinced it may be useful. I have had very little exposure to Threads actually, and none at all with Threads::Backtrace, so to me this is a bit more of a niche topic. Guess I have to update my ruby knowledge with zverok's new documentation. :D) ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-101959 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by mame (Yusuke Endoh). @ioquatix Please, please write a use case in every proposal. First of all, I think the POC itself is very naive for daily use. Consider the following method call. For Thread::Bactrace::Location of the call to foo, `#first_column` will point before the receiver, not before the method name. And `#last_column` will point after all arguments. ``` very_very_long_receiver.foo(many_many_arguments) ^ ^ | | +--- #first_column #last_columns ---+ ``` Without a use case, it would be impossible to discuss whether and how it is useful. This is a heavier feature than you might think to introduce for the sake of "better than nothing". --- `ErrorHighlight.spot` is a bit more intelligent: it cuts out the method name part. ``` very_very_long_receiver.foo(many_many_arguments) ^^^^ | +--- ErrorHighlight.spot ``` To achieve this, ErrorHighlight uses `RubyVM::AST.of`: it reparses the source code, identify the node in the method call, and draw an underline after its receiver and before its arguments. Because of using `RubyVM::AST.of`, ErrorHighlight works only with CRuby. That is unfortunate. Currently @kddeisz is working on a new Ruby parser project called [yarp](https://github.com/Shopify/yarp). As far as I know, its goal is to be a common parser for all major Ruby implementations including CRuby and TruffleRuby. If the goal is accomplished, I will use yarp for ErrorHighlight, which will (hopefully) allow ErrorHighlight to work on Ruby interpreters rather than CRuby. Even in that case, it is still necessary to map `Thread::Backtrace::Location` to the AST node. We may use `Thread::Backtrace::Location#first_lineno`, etc. to identify the AST node corresponding to the `Thread::Backtrace::Location`. But this should be considered after yarp actually becomes a CRuby parser. (Implementation note: currently, `Thread::Backtrace::Location` has the ID of the AST node, which is a number assigned internally to the nodes in the AST in (approximate) pre-order. `RubyVM::AST.of` re-parses the source code to recover the full AST, and then identifies the AST node by using the node ID. This is the method suggested by @ko1. The four values, first_lineno / first_column / last_lineno / last_column, could be another way to identify nodes. But this is slightly less accurate: if there is another node in the exact same code range, it is not uniquely identifiable. I don't think this is a problem for ErrorHighlight, though.) @eregon @kddeisz Any opinions? ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102173 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by Eregon (Benoit Daloze). YARP for sure will be useful for this. However I think it is somewhat independent of whether AST nodes/bytecodes/etc keep track of column information or not. TruffleRuby already tracks the column information, so it would be trivial to provide that (except that the current parser doesn't keep column info, but YARP does). Regarding finding out the actually method call name and the `.` before, maybe YARP could/should remember those two locations? WDYT @kddeisz? That would avoid the need to reparse, at the cost of having to store 2 extra uint32 "byte offsets" per call node. I like the proposed API because it's straightforward. I think it's also enough to identity which call it is, e.g. for `foo(1).bar(2).baz(3)`, i.e. it's always the rightmost call inside `first_column...last_columns`. But that's indeed not as clear or obvious as underlining the operator + method name alone. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102174 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by mame (Yusuke Endoh). @eregon Thank you for your comment. Eregon (Benoit Daloze) wrote in #note-5:
That would avoid the need to reparse, at the cost of having to store 2 extra uint32 "byte offsets" per call node.
I think there is a confusion of what the word "reparse" means here. 1. When CRuby compiles a AST to a bytecode, it copies just node_id and lineno to each instruction but discards the column information. (@ko1 is not willing to copy column information due to memory consumption concerns.) So `RubyVM::AST.of` needs to "reparse" the whole source code to recover the whole AST. 2. CRuby's `RubyVM::AST` does not keep the column information of the method name because it converts the name to a symbol. It needs to "reparse" (or "re-tokenize"?) the code to identify where the method name is. Here I am talking about 1, which will not change even if YARP is introduced. If YARP's AST keeps the method name column information, 2 will be unnecessary, which is good.
I like the proposed API because it's straightforward. I think it's also enough to identity which call it is, e.g. for `foo(1).bar(2).baz(3)`, i.e. it's always the rightmost call inside `first_column...last_columns`. But that's indeed not as clear or obvious as underlining the operator + method name alone.
This is somewhat a matter of taste, but since this is a user interface, I believe it is important to provide pinpoint information at a glance. So I think it is worthwhile to elaborate it, even if it is somewhat tedious. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102175 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by mame (Yusuke Endoh). mame (Yusuke Endoh) wrote in #note-6:
This is somewhat a matter of taste, but since this is a user interface, I believe it is important to provide pinpoint information at a glance. So I think it is worthwhile to elaborate it, even if it is somewhat tedious.
Sorry, I'm getting off topic. I think it is possible to use four values of first_lineno, etc. as information to connect `Thread::Backtrace::Location` and AST node. (@ko1 is not so keen on it, though.) However, we would need to clearly agree that this is a use case of `Thread::Backtrace::Location#first_column`. If we change the return value arbitrarily, it would not be able to match it up with the AST node. Anyway, I think this should be discussed after YARP is completed and incorporated into CRuby. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102176 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by kddeisz (Kevin Newton). YARP keeps column information around for this. In the call node there's column information at each of the points below: ``` ruby very_very_long_receiver.foo(many_many_arguments) ^ ^^ ^^ ^ ``` This may be too much at the end of the day, but for now it's working out nicely. In terms of memory consumption, I would think it would be 1-to-1 if we dropped node_id and replaced it with column ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102295 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-6:
I think there is a confusion of what the word "reparse" means here.
I meant it as `1.` as well. I think copying/keeping the column information is the most reliable way to have this information. Indeed it should be encoded efficiently to avoid too much of a memory increase. I think we should measure, e.g. by how much % does it increase the memory of a Ruby program and a Rails app? I suspect it's not much higher than keeping the node id. In fact if we assumed e.g. no call expression is longer than 2^16 bytes long, we could fit 2 column offsets (from the start of the call) in 32-bit. Or even 4 column offsets if we need more, and then have some fallback encoding if longer than 256 (likely rare) or so. I don't know what CRuby uses currently to keep line information though.
I think it is possible to use four values of first_lineno, etc. as information to connect Thread::Backtrace::Location and AST node.
I think this makes sense, although it's even better if we don't need to reparse to find out column information. It's probably more efficient to internally have the first_byteoffset and last_byteoffset, that's just two 32-bit integers and we can derive {first,last}_{lineno,column} from it. It's also how YARP records locations. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102307 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by Eregon (Benoit Daloze).
Anyway, I think this should be discussed after YARP is completed and incorporated into CRuby.
I'm OK with that, I don't need these methods urgently. It will be nice to have public APIs for this kind of functionality vs experimental RubyVM:: stuff. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102308 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by ufuk (Ufuk Kayserilioglu). If memory increase is a concern, could we not store line/column information as varints (e.g. like [how Protocol Buffers encode integers](https://protobuf.dev/programming-guides/encoding/#varints)) in the bytecode, so that smaller numbers (which would be way more common, especially for column numbers) take up less memory to begin with? Since access to this data would probably not be on any hot-path, the fact that they'd have to be converted to regular ints when needed should be a very small concern. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102311 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by mame (Yusuke Endoh). Thank you all! kddeisz (Kevin Newton) wrote in #note-8:
In terms of memory consumption, I would think it would be 1-to-1 if we dropped node_id and replaced it with column
What concerns me slightly is that it makes it impossible to uniquely identify nodes that have the same column information, such as NODE_SCOPE or its child node. Perhaps it is not a very big deal for ErrorHighlight, though. Eregon (Benoit Daloze) wrote in #note-9:
I think we should measure, e.g. by how much % does it increase the memory of a Ruby program and a Rails app? I suspect it's not much higher than keeping the node id.
When adding node_id, I evaluated the memory usage with `rails s`. (See #17930.) It increased about 3 % (97 MB -> 100 MB). Not very small to ignore. ufuk (Ufuk Kayserilioglu) wrote in #note-11:
could we not store line/column information as varints
That makes sense. Although some efforts have already been made in compression, I guess that there would be room for improvement. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102326 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by ko1 (Koichi Sasada). Let me clear why column information is better than `node_id` My understanding: * Better than `node_id` * B1. We can get one column information (maybe most wider information, for example: `a.b.c.d.func(...)` for `func()`. * B2. We can get (maybe) corresponding node by column information. not exact same with `node_id` but similar. * B3. Column information is not magical than `node_id` * B4. Can detect file modification on reparsing (not all modification though) because we can not get corresponding node by column information. * Worse than `node_id` * W1. More memory consumption * W2. Can not get exact corresponding node with column information because some nodes can have same column information. * W3. Can not get corresponding node on reparsing when the script was modified, even if spacing or commenting (<=> B4) * W4. (personal concern) if we introduce it, I'm afraid that people ask to hold more and more information than the column information * Both worse * S1. Need to reparse a script and analyze the script to get more information like method name, parameters and so on. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-102329 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/

Issue #19452 has been updated by janosch-x (Janosch Müller). `TracePoint` also lacks column information AFAICT so it might be worthwhile to consider it in search of a consistent solution. ---------------------------------------- Feature #19452: `Thread::Backtrace::Location` should have column information if possible. https://bugs.ruby-lang.org/issues/19452#change-103245 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal ---------------------------------------- I discussed this with @mame and it would be pretty useful if we could also get the column information from exception backtrace location, even if it was slow. A POC: ```ruby class Thread::Backtrace::Location if defined?(RubyVM::AbstractSyntaxTree) def first_column RubyVM::AbstractSyntaxTree.of(self, keep_script_lines: true).first_column end else def first_column raise NotImplementedError end end end ``` It would be good to have a standard interface, so we follow the same interface as https://bugs.ruby-lang.org/issues/19451 and vice versa where it makes sense. I'll investigate it. -- https://bugs.ruby-lang.org/
participants (8)
-
Eregon (Benoit Daloze)
-
ioquatix (Samuel Williams)
-
janosch-x
-
kddeisz (Kevin Newton)
-
ko1 (Koichi Sasada)
-
mame (Yusuke Endoh)
-
rubyFeedback (robert heiler)
-
ufuk (Ufuk Kayserilioglu)