[ruby-core:112744] [Ruby master Bug#19485] Unexpected behavior in squiggly heredocs

Issue #19485 has been reported by jemmai (Jemma Issroff). ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485 * Author: jemmai (Jemma Issroff) * Status: Open * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by Dan0042 (Daniel DeLorme). I think what's happening here is that tabs are not converted directly to 8 spaces, but to "move ahead to next multiple of 8 chars". So in that sense "\t" and " \t" are equivalent. It's the same behavior as `10.times{ |i| print " "*i,"\t",i,"\n" }` ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102379 * Author: jemmai (Jemma Issroff) * Status: Open * Priority: Normal * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by nobu (Nobuyoshi Nakada). Status changed from Open to Assigned Assignee set to core Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED My draft is:
Note that the "indentation" is counted like as each horizontal tabs are expanded to spaces up to the next tab stop column (per 8 columns), and each indentation to be removed is the longest tabs and spaces sequence where the next column does not exceed the least-indentation.
Does this make sense? ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102404 * Author: jemmai (Jemma Issroff) * Status: Assigned * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by sawa (Tsuyoshi Sawada). nobu (Nobuyoshi Nakada) wrote in #note-2:
My [draft] is:
Note that the "indentation" is counted like as each horizontal tabs are expanded to spaces up to the next tab stop column (per 8 columns), and each indentation to be removed is the longest tabs and spaces sequence where the next column does not exceed the least-indentation.
I find the sentence too long and a little too difficult to parse/understand. What about something like this:
For the purpose of measuring indentation, a horizontal tab is regarded as a sequence of one to eight spaces such that the column position corresponding to the end of the horizontal tab is a multiple of eight. The amount to be removed is counted in terms of number of spaces. If the boundary appears in the middle of a tab character, that tab character is removed.
---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102406 * Author: jemmai (Jemma Issroff) * Status: Assigned * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by ioquatix (Samuel Williams). I don't think it's a good idea to assume a tab is 8 spaces. Regarding indentation, it might be a nice simplification to only consider the first line in the squiggly heredoc. That's what I've done in the past - it's predictable and easy to explain. i.e. ``` x = <<~FOO 1 2 3 FOO ``` At most 4 spaces is removed from each line. The first line determines this. Anyway, maybe it's irrelevant to this discussion. But that's how I've implemented it in my own language/interpreter in the past. ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102408 * Author: jemmai (Jemma Issroff) * Status: Assigned * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by Eregon (Benoit Daloze). Another condition could be only accept tabs in squiggly heredoc if they prefix all lines of the squiggly heredoc? (otherwise SyntaxError, including for the 2 cases in the description) (I wish tabs would just not be accepted as indentation for Ruby, but well that's probably a pointless discussion, even though it seems 99% of the community agrees there) ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102417 * Author: jemmai (Jemma Issroff) * Status: Assigned * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by jemmai (Jemma Issroff). sawa (Tsuyoshi Sawada) wrote in #note-3:
For the purpose of measuring an indentation, a horizontal tab is regarded as a sequence of one to eight spaces such that the column position corresponding to its end is a multiple of eight. The amount to be removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that tab is not removed.
This documentation is very clear to me, and explains both cases I've mentioned in a way that is easy to understand. ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102419 * Author: jemmai (Jemma Issroff) * Status: Assigned * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by naruse (Yui NARUSE). Backport changed from 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE ruby_3_2 b93e2223300bc54dfa387ffb9fa3d48ecbe670f0 merged revision(s) e7342e76dfd26237c604e42f9a59a1eaa578c94e. ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102513 * Author: jemmai (Jemma Issroff) * Status: Closed * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/

Issue #19485 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE to 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE ruby_3_1 19af12ff195aba64bdca7a83f564f2c0e46061c0 merged revision(s) e7342e76dfd26237c604e42f9a59a1eaa578c94e. ---------------------------------------- Bug #19485: Unexpected behavior in squiggly heredocs https://bugs.ruby-lang.org/issues/19485#change-102551 * Author: jemmai (Jemma Issroff) * Status: Closed * Priority: Normal * Assignee: core * ruby -v: 3.2.1 * Backport: 2.7: UNKNOWN, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE ---------------------------------------- Based on [the squiggly heredoc documentation](https://ruby-doc.org/3.2.1/syntax/literals_rdoc.html), I found the following to be unexpected behavior. Explicitly, the documentation specifies, "The indentation of the least-indented line will be removed from each line of the content." After running: ```ruby File.write("test.rb", "p <<~EOF\n\ta\n b\nEOF\n") ``` and then `ruby test.rb`, I get the following output: ``` "\ta\nb\n" ``` The least-indented line above is ` b`, however, no leading whitespace is removed from the line containing `\ta`. For another example: ```ruby File.write("test.rb", "p <<~EOF\n\tA\n \tB\nEOF\n") ``` `ruby test.rb` gives: ``` "A\nB\n" ``` In this case, the `\t` was removed from the line containing `A`, but more whitespace than that (` \t`) was removed from the line containing `B`. After seeing the first example, I assumed that the documentation was out of date, and that I should fix it to read that `\t` would never be converted into space characters in order to remove leading whitespace. But after the second example, it seems like this is a bug in removing leading whitespace. Can someone please explain what the rules should be on squiggly heredocs? I can implement a fix to adhere to the rules, or can update the documentation, I am just unsure of what the rules should be because the above two examples reflect unexpected behavior in two distinct ways. -- https://bugs.ruby-lang.org/
participants (8)
-
Dan0042 (Daniel DeLorme)
-
Eregon (Benoit Daloze)
-
ioquatix (Samuel Williams)
-
jemmai (Jemma Issroff)
-
nagachika (Tomoyuki Chikanaga)
-
naruse (Yui NARUSE)
-
nobu (Nobuyoshi Nakada)
-
sawa (Tsuyoshi Sawada)