[ruby-core:124206] [Ruby Bug#21783] {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters
Issue #21783 has been reported by Eregon (Benoit Daloze). ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by kddnewton (Kevin Newton). I think this is a documentation issue, as both parsers/compilers operate in terms of bytes. Changing this to characters would likely be a noticeable difference in speed, and quite a bit of code change. (Either both parsers/compilers would have to do this work initially, as that's where the numbers come from, or the source_location function would have to re-parse the source, which is not possible in some cases.) All of that is to say, please do not change this, it will be a ton of work for minimal benefit. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115695 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). Updating the docs is one solution, so at least it's consistent between docs and behavior. I think as a Ruby-facing API it's weird that it operates in terms of bytes (and `source_location` does not have a `byte` prefix to indicate that). I think most programmers when they hear `line 4 column 6` they expect the 6th character on the 4th line, not the character starting at the 6th byte (actually hard to find in an editor, most editors don't show "byte columns", in fact it's not even possible to place the cursor at some byte positions, every programmer always think in characters when looking at source code). For example, one might expect that highlighting with `^` based on the return values from `source_location` works, but it doesn't: ```ruby def underline(callable) file, start_line, start_column, end_line, end_column = callable.source_location raise unless start_line == end_line source = File.readlines(file)[start_line-1] puts source puts ' '*start_column + '^'*(end_column-start_column) end my_proc = proc { ascii-only } underline my_proc my_proc = proc { il était une fois un été } underline my_proc ``` gives ``` $ ruby underline.rb my_proc = proc { ascii-only } ^^^^^^^^^^^^^^ my_proc = proc { il était une fois un été } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ```
Either both parsers/compilers would have to do this work initially, as that's where the numbers come from, or the source_location function would have to re-parse the source, which is not possible in some cases.
This is a good point, I didn't realize that. I think it would still be worth it to change the parsers/compilers to compute the proper character column for literal lambdas, blocks and methods, and probably wouldn't be very expensive given most source files are ASCII-only and potentially the parsers could even use the knowledge that a given line is ASCII-only so it would still be as fast even if the file contains a few non-ASCII characters. If columns would e.g. appear in error messages, I think everyone would expect them to be character columns, not byte columns. For example gcc shows character columns, as one would expect: ```c int main() { /* été */ notexist } ``` ``` gcc test.c test.c: In function ‘main’: test.c:2:15: error: ‘notexist’ undeclared (first use in this function) 2 | /* été */ notexist | ^~~~~~~~ ``` Note it's 2:15 (i.e. character columns), not 2:17 (byte columns). The highlighting also needs to use character columns of course. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115699 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). From https://bugs.ruby-lang.org/issues/6012#note-25 @matz said adding column was OK, but not byte offsets. I'm not sure what were his reasons, but maybe it's that byte offsets are too low-level for `source_location`? If so, I would think byte columns are also too low level and it should be character columns instead. From a user POV character columns seem better and more expected. OTOH, I understand the reservation from @kddnewton and I share it as a Ruby implementer, it's much simpler to return byte columns. For example in TruffleRuby we currently save location information by having `int32_t start_offset; int32_t length;` in every Truffle AST node, i.e. byte offset and byte length. Returning byte columns from that is easy and only requires the "newline offsets" array, and not the actual source code. To return character columns, TruffleRuby would need to read from the beginning of the line to the byte offset to find how many characters that is, and keep the source code in memory (currently TruffleRuby does keep it in memory, but it might not in the future). I have also seen this in the context of [adding Prism.node_for](https://github.com/ruby/prism/pull/3808) and for that usage having byte columns is actually easier than character columns, OTOH it's not hard to convert from character columns to byte columns in that case and I already wrote the logic for that (because I expected `source_location` would return character columns, even before reading the docs). It is of course possible to convert from character column to byte column and vice versa, but it requires access to the source code, which is not always available (e.g. `eval`). ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115700 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by kddnewton (Kevin Newton). Honestly if we're interpreting column as something visual like you're implying, we're also going to run into issues with grapheme clusters and east asian width and all the other implications for whatever "character" actually means. I think we would also have to return the encoding of the source file inside that array in order for it to make any sense. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115701 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by matz (Yukihiro Matsumoto). I'd like to cancel `source_location` to have column information in 4.0, due to this concern. In my personal opinion, I am leaning toward byte index, though. Matz. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115704 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). matz (Yukihiro Matsumoto) wrote in #note-8:
I'd like to cancel `source_location` to have column information in 4.0, due to this concern.
Matz.
Thank you for the quick reply, I think that would be the worst outcome though, https://bugs.ruby-lang.org/issues/6012 was already opened 14 years ago and I have seen multiple users needing this in the last years. It is one of the features I'm most looking forward to in Ruby 4.0 (in fact it's one of two features that really interests me in 4.0: that and Ractor improvements, the rest looks rather unexciting to me). IOW, I would much rather have byte columns in 4.0 than no columns at all. If we delay this, we'll implicitly tell people that using RubyVM::AbstractSyntaxTree is the only way to get column information, and that's bad because it only works on CRuby and it's not a proper API. Alternative Ruby implementations might have to define their own API to get column information, vs just using the one we have agreed on in #6012 (I'd much rather not get there).
In my personal opinion, I am leaning toward byte index, though.
Let's go with byte columns then? I can make a PR to document that. It seems you and @kddnewton agree on that, and I'm basically hesitating which is best, but definitely better byte columns than columns. I see it has already been reverted though :/ https://github.com/ruby/ruby/commit/065c48cdf11a1c4cece84db44ed8624d294f8fd5 ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115711 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by byroot (Jean Boussier). Maybe I'm totally off, but I expect this data to be used to extract the source code, e.g show a snippet of code in an error message, or something akin to that, hence byte offsets seem actually more convenient? (and performant). But yes, it definitely need to be explicitly documented. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115712 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). I made a PR to re-add `{Method,UnboundMethod,Proc}#source_location` and fix all known issues: https://github.com/ruby/ruby/pull/15580 @matz Would it be OK to merge it? 🙏 For context I opened this issue because I was surprised at the semantics and it was documented in character columns. If the documentation stated the columns are in bytes I would have thought "somewhat unexpected for me, but I can deal with it, moving on". So for the next person, if they look at the docs it should be clear now with this PR. I would have never expected this issue to revert the feature, this is certainly *not* what I want. I opened this issue to show the doc & implementation inconsistency, explain my expectations, and discuss what fix makes sense. I think @kddnewton makes a good point about grapheme clusters and east asian width, where even character width is not enough. And it's probably not reasonable to ask parsers to handle those cases either. So I now think byte columns is a good choice, as long as it's properly documented. I think this is a case of we shouldn't let perfection get in the way of usefulness, from the saying [Perfect is the enemy of good](https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good). --- BTW using a variant of my C example from above with different compilers shows some variety: ```c int main() { char* s = "🎉🎉🎉"; oops } ``` ``` $ gcc test.c test.c: In function ‘main’: test.c:2:25: error: ‘oops’ undeclared (first use in this function) 2 | char* s = "🎉🎉🎉"; oops | ^~~~ $ clang test.c test.c:2:31: error: use of undeclared identifier 'oops' 2 | char* s = "🎉🎉🎉"; oops | ^ ``` `gcc` shows column 25 which corresponds to nothing, `' char* s = "🎉🎉🎉"; '` is 21 characters and 30 bytes. `clang` shows the column in bytes, so there is clearly some variety there, and at least `clang` chose to show byte columns. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115713 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). byroot (Jean Boussier) wrote in #note-10:
Maybe I'm totally off, but I expect this data to be used to extract the source code, e.g show a snippet of code in an error message, or something akin to that, hence byte offsets seem actually more convenient? (and performant).
Yes, for that case which might indeed be the most common one, it's more convenient to have the columns in bytes. And I suspect many of these cases would likely use `Prism` and specifically [Prism.node_for](https://github.com/ruby/prism/pull/3808) to get more information, such as showing the method call to which the block was given, highlight some specific part of the block or method, etc. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115714 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by mame (Yusuke Endoh). Eregon (Benoit Daloze) wrote in #note-12:
And I suspect many of these cases would likely use `Prism` and specifically [Prism.node_for](https://github.com/ruby/prism/pull/3808) to get more information, such as showing the method call to which the block was given, highlight some specific part of the block or method, etc.
I agree. But then, when would `Proc#source_location` be useful? ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115716 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-13:
I agree. But then, when would `Proc#source_location` be useful?
There are cases where it's useful to show just the block, i.e. `{ block body }`. And also to be able to use `Prism.node_for` one of course needs `Proc#source_location`. Maybe you're saying `Proc#source_location` should always include the method call? I think it's good as-is and does not need changing (a little bit discussed in #21784). ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115717 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by byroot (Jean Boussier). Isn't `Proc.source_location` very useful for things akin to `power_assert`? If it was easy to extract a proc's source code I'd certainly integrate it in testing framework to improve failure rendering. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115719 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). Yes, it'd be useful for e.g. `assert_raise(SomeException) { ... }` in test-unit, `expect { ... }.to` in RSpec, `-> { ... }.should raise_error(...)` in MSpec, etc. For example `-> { 13 - "10" }.should raise_error(ArgumentError)` in MSpec currently gives: ``` 1) Integer#- fixnum raises a TypeError when given a non-Integer ERROR Expected ArgumentError but got: TypeError (String can't be coerced into Integer) /home/eregon/code/rubyspec/core/integer/minus_spec.rb:21:in 'Integer#-' ``` But it could show something like: ``` 1) Integer#- fixnum raises a TypeError when given a non-Integer ERROR Expected ArgumentError but got: TypeError (String can't be coerced into Integer) For expression `13 - "10"` /home/eregon/code/rubyspec/core/integer/minus_spec.rb:21:in 'Integer#-' ``` ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115725 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by mame (Yusuke Endoh).
And also to be able to use `Prism.node_for` one of course needs `Proc#source_location`.
Huh? `Proc` already has its `node_id` (via its ISeq), so `Prism.node_for` is implementable without `source_location`. In fact, by doing the following, you can obtain the Prism node for Proc. ```ruby f = -> {} require "prism" node_id = RubyVM::InstructionSequence.of(-> {}).to_a[4][:node_id] pp Prism.parse_file(__FILE__).value.breadth_first_search { |node| node.node_id == node_id } #=> @ LambdaNode (location: (4,41)-(4,46)) # ├── flags: ∅ # ├── locals: [] # ├── operator_loc: (4,41)-(4,43) = "->" # ├── opening_loc: (4,44)-(4,45) = "{" # ├── closing_loc: (4,45)-(4,46) = "}" # ├── parameters: ∅ # └── body: ∅ ``` Note that I don't say users should write `RubyVM::InstructionSequence.of(-> {}).to_a[4][:node_id]` manually. It should be hidden within `Prism.node_for`. This is what `RubyVM::AbstractSyntaxTree.of` does actually. I want to say that `source_location` is unnecessary to implement `Prism.node_for`. Regarding to power_assert, I talked with @ktsj, and he said that `Proc#source_location` is not more useful than `Kernel#caller_locations`. Since power_assert pinpoints expressions within a block at a finer granularity, rough block's location information isn't particulary useful. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115734 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by byroot (Jean Boussier).
Regarding to power_assert
So it's great power_assert has much more advanced parsing capabilities, but I for one would use the column information for things like Rails' `assert_difference -> { User.count }, +1`, as it would be quite trivial with column info in `source_location`. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115735 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by mame (Yusuke Endoh). I have the following hypothesis: When people want column information, what they really need is an AST. #21005 was precisely such an example. In some cases, column information alone may suffice, but obtaining the AST inevitably yields column information. #21784 discusses whether Proc#source_location should include the position of `->` or not, but if the user has the AST, they can choose to include or exclude the range containing `->` as desired. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115737 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-17:
Huh? `Proc` already has its `node_id` (via its ISeq), so `Prism.node_for` is implementable without `source_location`. I want to say that `source_location` is unnecessary to implement `Prism.node_for`.
No, `node_id` is CRuby-specific and this code to extract it is a hack. `source_location` with column information has the proper information, is a proper API, is portable and avoids that hack. I think it would be unacceptable to have any functionality in Prism depending on `RubyVM` for example, we need to find general solutions, not CRuby-specific hacks. FWIW I also consider the concept of `node_id` CRuby-specific, at least currently. TruffleRuby doesn't have `node_id` for instance, but it has columns and that's a general concept.
I have the following hypothesis: When people want column information, what they really need is an AST.
Yes, probably true for a lot of cases. But such column information (or byte offsets from the start of line/source, any of these works) is also necessary to locate the right node in the AST, just the `start_line` like currently is insufficient (e.g. there can be multiple blocks on a single line). For other cases it's unnecessary to have the AST, e.g. to print the source of a method, e.g. for RDoc or similar, i.e. the `method_source` use case (700 millions downloads, https://rubygems.org/gems/method_source).
Since power_assert pinpoints expressions within a block at a finer granularity, rough block's location information isn't particulary useful.
Then `power_assert` is likely incorrect/broken if there are multiple `assert { ... }` block on a single line. It could identify the correct block with column information. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115754 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). Indeed, `power_assert` is broken with 2 blocks on the same line: ```ruby require 'test/unit' class FooTest < Test::Unit::TestCase def test_foo assert { 3.times.to_a.include?(2) }; assert { 3.times.to_a.include?(3) } end end ``` gives ``` ============================================================================================================================== Failure: test_foo(FooTest): assert { 3.times.to_a.include?(2) }; assert { 3.times.to_a.include?(3) } | | | | | false | [0, 1, 2] #<Enumerator: 3:times> pa.rb:5:in 'FooTest#test_foo' 2: 3: class FooTest < Test::Unit::TestCase 4: def test_foo => 5: assert { 3.times.to_a.include?(2) }; assert { 3.times.to_a.include?(3) } 6: end 7: end ============================================================================================================================== ``` So `power_assert` shows the wrong block and shows confusing information (`[0, 1, 2].include?(2)` is true not false). Column information from `Proc#source_location` would fix this by allowing power_assert to find the correct block. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115755 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by mame (Yusuke Endoh). Eregon (Benoit Daloze) wrote in #note-20:
FWIW I also consider the concept of `node_id` CRuby-specific, at least currently.
Since `Prism::Node#node_id` exists, I don't think it's CRuby-specific anymore. It's true that the part retrieving the `node_id` from a Proc is CRuby-specific, and that design requires some consideration. A simple solution would be for Ruby to provide `Proc#node_id` or `Method#node_id`, but we need to discuss. `source_location` does not uniquely identify a node. For example, code `x` is represented as an AST with three nodes: `ProgramNode`, `StatementsNode`, and `CallNode`. They all have the same `source_location`, so cannot be distinguished. Whether TruffleRuby implements `node_id` is up to you, but I believe this is the consistnt approach in CRuby.
Since power_assert pinpoints expressions within a block at a finer granularity, rough block's location information isn't particulary useful.
Then `power_assert` is likely incorrect/broken if there are multiple `assert { ... }` block on a single line.
@ktsj said that `assert { condition ? expected == actual : expected == actual }` cannot be distinguished at present. `Proc#source_location` is not helpful to resolve this. I think that what is needed is an API like `RubyVM::AST.node_id_for_tracepoint` to retrieve the `node_id` from a TracePoint. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115763 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). mame (Yusuke Endoh) wrote in #note-22:
`source_location` does not uniquely identify a node. For example, code `x` is represented as an AST with three nodes: `ProgramNode`, `StatementsNode`, and `CallNode`. They all have the same `source_location`, so cannot be distinguished.
True but that's not a problem for `Prism.node_for(Method | UnboundMethod | Proc)` because it returns `DefNode | LambdaNode | CallNode | ForNode`, i.e. we choose the relevant node to return even if multiple nodes would have the same source location. And in general the user could choose that e.g. they want the outermost node, or a node of a specific node class. mame (Yusuke Endoh) wrote in #note-22:
[@ktsj (Kazuki Tsujimoto)](/users/3007) said that `assert { condition ? expected == actual : expected == actual }` cannot be distinguished at present.
That seems a separate and unrelated problem to having `assert { ... }; assert { ... }`. (BTW no multi-line block seems currently a big limitation for `power_assert`) For `assert { ... }; assert { ... }`, I'm convinced source_location with column information is helpful and a good portable solution. --- Is there any chance to merge https://github.com/ruby/ruby/pull/15580 before the 4.0.0 release? If not, it will establish a precedent of "asking clarification in an issue can result in removing a feature that has been on master for almost a year". I think that's a bad precedent to set, i.e. to remove features implemented for a long time at the last minute with little evidence of a problem and not even discussing before reverting. It's disrespectful of gems & users which have adapted to the changes. Especially after discussing the feature for years, it feels like a betrayal to remove it now. I think it might drive people away from Ruby. Concretely I don't see much the point to discuss features on this tracker if they can be removed in such a way. The only good out of that I could see is if somehow that results in adding `Ruby::CodeLocation` or so with an API [like this](https://bugs.ruby-lang.org/issues/6012#note-19) in 4.1. Or even in 4.0.X given it's a compatible change and CRuby doesn't follow SemVer anyway, so the 3rd component is a mix of SemVer minor/patch. That would then be more flexible and e.g. expose columns, offsets, etc by having various method, and remove the need for things like `RubyVM::InstructionSequence#script_lines` by having a proper replacement. In general the goal here is to avoid `RubyVM` since that's CRuby-only, experimental, unstable and shouldn't be used anyway. The documentation of `RubyVM` makes it clear it's only `debugging, prototyping, and research` and clearly it has been abused over and over, because there is no proper API for some things like column information. After years of discussions to make some progress in this area with `source_location` column information, it's disheartening to see it reverted at the last minute. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115782 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
Issue #21783 has been updated by Eregon (Benoit Daloze). Assignee set to Eregon (Benoit Daloze) Target version set to 4.1 Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115914 * Author: Eregon (Benoit Daloze) * Status: Closed * Assignee: Eregon (Benoit Daloze) * Target version: 4.1 * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def été; end; p method(:été).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def été; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/
participants (5)
-
byroot (Jean Boussier) -
Eregon (Benoit Daloze) -
kddnewton (Kevin Newton) -
mame (Yusuke Endoh) -
matz (Yukihiro Matsumoto)