[ruby-core:112638] [Ruby master Bug#19470] Frequent small range-reads from and then writes to a large array are very slow

Issue #19470 has been reported by giner (Stanislav German-Evtushenko). ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by mame (Yusuke Endoh). We the core team know the potential problem with the memory sharing technique. We could be wrong, but we currently believe its advantage (performance improvement in many typical and major cases) is larger enough than its disadvantage (potential performance degeneration in some rare cases). The question is whether the potential problem is really rare. Do you have any evidence that this problem actually occurs frequently? ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102086 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by ioquatix (Samuel Williams). It's interesting to me that reading items can mark the array as shared, that seems non-trivial/non-obvious behaviour to me. What's the rational for it? ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102087 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by ioquatix (Samuel Williams). Sorry, I read the example more closely, it seems we are taking a small slice, which marks the original array as shared. So both the original array and the slice point at the same memory allocation, and on mutation, a copy is made. I think @mame is right, in general this is a good optimisation. But I see the problem, there is a degenerate case here. I wonder if we can propose some way to avoid this degenerate case if the mutation is frequent or the slice should definitely be a copy rather than invoke the CoW. Maybe a percentage based approach, or maybe we can detect the CoW frequently triggering a full copy = don't do further CoW. ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102088 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by Hanmac (Hans Mackowiak). i think maybe the GC could optimize something? ``` 100000.times do arr[0..3] #=> GC notices that the result is unused? GC.stress # would the GC delete the shared copy? time_start = Time.now arr[5] = 100 # takes ? times[-1] += Time.now - time_start end ``` if the GC could recognize that the slice of the array isn't assigned somewhere and would remove it is the structure of the array object smart enough that there isn't any shared pieces anymore? or would we need a shared count for this instead? if the main array can notice that it isn't shared anymore because all the ones that shared it are gone, would it be smart enough to not make a copy? ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102093 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by giner (Stanislav German-Evtushenko).
Do you have any evidence that this problem actually occurs frequently?
I cannot say whether this affects many projects. I've encountered this twice so far when optimizing code to reduce memory allocations (by writing to the same array instead of allocating new ones). In both cases it took quite a while to figure out why instead of being faster and using less memory it did the opposite, i.e. became slower and used more memory. ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102100 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by headius (Charles Nutter). This looks like a pretty typical effect of slicing, so I'm not sure there's something to "fix" here. JRuby has similar behavior. Improving the sharing heuristic might help, but it's hard to know where to draw the line; should a slice of 5 elements trigger copy? Ten elements? Perhaps only share when slice size is greater than some percentage of the total size? Perhaps we should make COW less hidden? Add something like `Array#fresh_slice` to opt into non-COW slicing when you know you're still mutating the original? ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102119 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/

Issue #19470 has been updated by giner (Stanislav German-Evtushenko).
Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?
Sounds like a good compromise ---------------------------------------- Bug #19470: Frequent small range-reads from and then writes to a large array are very slow https://bugs.ruby-lang.org/issues/19470#change-102166 * Author: giner (Stanislav German-Evtushenko) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects. ```ruby times = [] arr = [0] * 100000 times.push 0 100000.times do time_start = Time.now arr[5] = 100 # takes 0.01662315899999512 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..2] time_start = Time.now arr[5] = 100 # takes 0.01826406799999659 times[-1] += Time.now - time_start end times.push 0 100000.times do arr[0..3] time_start = Time.now arr[5] = 100 # takes 7.757753919000069 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.dup time_start = Time.now arr[5] = 100 # takes 7.626929300999957 times[-1] += Time.now - time_start end times.push 0 100000.times do arr.clone time_start = Time.now arr[5] = 100 # takes 8.216933763000046 times[-1] += Time.now - time_start end p times ``` -- https://bugs.ruby-lang.org/
participants (5)
-
giner (Stanislav German-Evtushenko)
-
Hanmac (Hans Mackowiak)
-
headius (Charles Nutter)
-
ioquatix (Samuel Williams)
-
mame (Yusuke Endoh)