[ruby-core:112866] [Ruby master Feature#19528] `JSON.load` defaults are surprising (`create_additions: true`)

Issue #19528 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #19528: `JSON.load` defaults are surprising (`create_additions: true`) https://bugs.ruby-lang.org/issues/19528 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- I'm not sure if it was actually intended, but there's some tacit naming convention for serializers in Ruby to use `load` and `dump` as methods, likely inspired from `Marshal` and `YAML`. Because of this it's extremely common to see code that uses `JSON.load` expecting a simple, no surprise, and safe JSON parsing. However that's `JSON.parse`. `JSON.load` has this very surprising behavior (albeit perfectly documented), of de-serializing more complex types: ```ruby
JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello"
It's particularly weird because aside from the `String` extension that is eagerly defined, for other types you have to `require "json/add/core"`.
Seasoned Ruby developers know about this of course, and [it is banned by various linters](https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Security/JSONLoad), but it keeps popping regularly in gems security releases and such.
### Proposal
Assuming entirely removing this feature is not an option, I think `json 2.x` should warn when this feature is actually being used, and `json 3.x` should disable it by default and require users to explicitly use `JSON.load(str, create_additions: true)` to keep the old behavior.
--
https://bugs.ruby-lang.org/

Issue #19528 has been updated by mame (Yusuke Endoh). We discussed this at the dev meeting. First of all, json upstream is https://github.com/flori/json. No one at the dev meeting has its ownership. So the following is just an opinion. None of the attendees of the dev meeting were against making this safe by default. @naruse suggested the following migration path. * Calling `JSON.load` without `create_additions` keyword should be warned as "`Use JSON.parse instead of JSON.load`" * If a user does not need object deserialization feature (we guess this is a major case), `JSON.parse` is good enough. * Otherwise, they should opt-in the feature by adding `create_additions: true`. * After some period, we can change the default to `create_additions: false`. @matsuda said the name `create_additions` is incomprehensible and he would like another good name. @akr suggested `JSON.unsafe_load`. ---------------------------------------- Feature #19528: `JSON.load` defaults are surprising (`create_additions: true`) https://bugs.ruby-lang.org/issues/19528#change-102783 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- I'm not sure if it was actually intended, but there's some tacit naming convention for serializers in Ruby to use `load` and `dump` as methods, likely inspired from `Marshal` and `YAML`. Because of this it's extremely common to see code that uses `JSON.load` expecting a simple, no surprise, and safe JSON parsing. However that's `JSON.parse`. `JSON.load` has this very surprising behavior (albeit perfectly documented), of de-serializing more complex types: ```ruby
JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello"
It's particularly weird because aside from the `String` extension that is eagerly defined, for other types you have to `require "json/add/core"`.
Seasoned Ruby developers know about this of course, and [it is banned by various linters](https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Security/JSONLoad), but it keeps popping regularly in [gems security releases](https://discuss.rubyonrails.org/t/cve-2023-27531-possible-deserialization-of-untrusted-data-vulnerability-in-kredis-json/82467) and such.
### Proposal
Assuming entirely removing this feature is not an option, I think `json 2.x` should warn when this feature is actually being used, and `json 3.x` should disable it by default and require users to explicitly use `JSON.load(str, create_additions: true)` to keep the old behavior.
--
https://bugs.ruby-lang.org/

Issue #19528 has been updated by duerst (Martin Dürst). I have some very vague recollection that we had a similar issue quite a long time ago. I seem to remember that @tenderlovemaking was somewhat involved. It may have been YAML, or JSON. It may have been the load method or another method. What I remember of the discussion then was that we devised a plan to gradually move from unsafe being the default to safe being the default. I'm sorry I currently don't have the time to try and find the relevant issue. ---------------------------------------- Feature #19528: `JSON.load` defaults are surprising (`create_additions: true`) https://bugs.ruby-lang.org/issues/19528#change-102801 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- I'm not sure if it was actually intended, but there's some tacit naming convention for serializers in Ruby to use `load` and `dump` as methods, likely inspired from `Marshal` and `YAML`. Because of this it's extremely common to see code that uses `JSON.load` expecting a simple, no surprise, and safe JSON parsing. However that's `JSON.parse`. `JSON.load` has this very surprising behavior (albeit perfectly documented), of de-serializing more complex types: ```ruby
JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello"
It's particularly weird because aside from the `String` extension that is eagerly defined, for other types you have to `require "json/add/core"`.
Seasoned Ruby developers know about this of course, and [it is banned by various linters](https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Security/JSONLoad), but it keeps popping regularly in [gems security releases](https://discuss.rubyonrails.org/t/cve-2023-27531-possible-deserialization-of-untrusted-data-vulnerability-in-kredis-json/82467) and such.
### Proposal
Assuming entirely removing this feature is not an option, I think `json 2.x` should warn when this feature is actually being used, and `json 3.x` should disable it by default and require users to explicitly use `JSON.load(str, create_additions: true)` to keep the old behavior.
--
https://bugs.ruby-lang.org/

Issue #19528 has been updated by Eregon (Benoit Daloze). @duerst Yes, that's for psych where YAML.load became safe. @mame One issue with the above is it would make usages of `JSON.load` which don't intend to deserialize objects have to change to `JSON.parse`, which feels a bit suboptimal. I would think most usages of `JSON.load` actually intend no object deserialization, so it'd be great if we don't need to change those and they are not warned. OTOH I can see changing to `JSON.parse` would work nicely and safely on older JSON versions, so that makes sense from that POV. Maybe we could warn only if object deserialization is actually used? --- IMO there are so many problems with the `json` gem, notably this is nonsense: ```
JSON.dump(Object.new) => "\"#<Object:0x00007fed9189d6c8>\"" JSON.load JSON.dump(Object.new) => "#<Object:0x00007fed9177cf28>" # returns a String but an Object was given? WTF? It should have failed earlier, on dump.
And then we had many problems when it comes to releasing `json`.
And the performance of it is abysmal.
Maybe it's time to redo json from scracth, e.g. for version 2. And have a repository at ruby/json.
----------------------------------------
Feature #19528: `JSON.load` defaults are surprising (`create_additions: true`)
https://bugs.ruby-lang.org/issues/19528#change-102805
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
I'm not sure if it was actually intended, but there's some tacit naming convention for serializers in Ruby to use `load` and `dump` as methods, likely inspired from `Marshal` and `YAML`.
Because of this it's extremely common to see code that uses `JSON.load` expecting a simple, no surprise, and safe JSON parsing.
However that's `JSON.parse`.
`JSON.load` has this very surprising behavior (albeit perfectly documented), of de-serializing more complex types:
```ruby
>> JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }')
=> "Hello"
It's particularly weird because aside from the `String` extension that is eagerly defined, for other types you have to `require "json/add/core"`. Seasoned Ruby developers know about this of course, and [it is banned by various linters](https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Security/JSONLoad), but it keeps popping regularly in [gems security releases](https://discuss.rubyonrails.org/t/cve-2023-27531-possible-deserialization-of...) and such. ### Proposal Assuming entirely removing this feature is not an option, I think `json 2.x` should warn when this feature is actually being used, and `json 3.x` should disable it by default and require users to explicitly use `JSON.load(str, create_additions: true)` to keep the old behavior. -- https://bugs.ruby-lang.org/

Issue #19528 has been updated by mame (Yusuke Endoh). Eregon (Benoit Daloze) wrote in #note-4:
Maybe we could warn only if object deserialization is actually used?
This idea came up at the dev meeting, but if we change it to safe by default in the future, it will not be a migration path. In any case, I think this should be discussed at https://github.com/flori/json. ---------------------------------------- Feature #19528: `JSON.load` defaults are surprising (`create_additions: true`) https://bugs.ruby-lang.org/issues/19528#change-102810 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- I'm not sure if it was actually intended, but there's some tacit naming convention for serializers in Ruby to use `load` and `dump` as methods, likely inspired from `Marshal` and `YAML`. Because of this it's extremely common to see code that uses `JSON.load` expecting a simple, no surprise, and safe JSON parsing. However that's `JSON.parse`. `JSON.load` has this very surprising behavior (albeit perfectly documented), of de-serializing more complex types: ```ruby
JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello"
It's particularly weird because aside from the `String` extension that is eagerly defined, for other types you have to `require "json/add/core"`.
Seasoned Ruby developers know about this of course, and [it is banned by various linters](https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Security/JSONLoad), but it keeps popping regularly in [gems security releases](https://discuss.rubyonrails.org/t/cve-2023-27531-possible-deserialization-of-untrusted-data-vulnerability-in-kredis-json/82467) and such.
### Proposal
Assuming entirely removing this feature is not an option, I think `json 2.x` should warn when this feature is actually being used, and `json 3.x` should disable it by default and require users to explicitly use `JSON.load(str, create_additions: true)` to keep the old behavior.
--
https://bugs.ruby-lang.org/
participants (4)
-
byroot (Jean Boussier)
-
duerst
-
Eregon (Benoit Daloze)
-
mame (Yusuke Endoh)