[ruby-core:121772] [Ruby Bug#21294] URI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address

Issue #21294 has been reported by Keeyan (Keeyan Nejad). ---------------------------------------- Bug #21294: URI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address https://bugs.ruby-lang.org/issues/21294 * Author: Keeyan (Keeyan Nejad) * Status: Open * ruby -v: ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The following is not a valid URI: `http://[127.0.0.1]`. So `URI.extract` should not extract it. It seems it is extracting it, though. So if you have code which extracts all URIs and then parses them, like the following, an error will be raised: ``` ruby require 'uri' URI.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]'] URI.parse(uri) # => raise URI::InvalidURIError end ``` ``` /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:130:in 'URI::RFC3986_Parser#split': bad URI (is not URI?): "http://[127.0.0.1]" (URI::InvalidURIError) from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:135:in 'URI::RFC3986_Parser#parse' from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/common.rb:212:in 'URI.parse' from test.rb:4:in 'block in <main>' from test.rb:3:in 'Array#each' from test.rb:3:in '<main>' ``` Instead, I believe `URI.extract`, should return an empty array. -- https://bugs.ruby-lang.org/

Issue #21294 has been updated by mame (Yusuke Endoh). `URI.extract` is obsolete. You can confirm this by running the code in `$VERBOSE` mode: ``` $ ruby -w -ruri -e 'URI.extract("Fake URL: http://[127.0.0.1]" , :http)' -e:1: warning: URI.extract is obsolete /home/mame/.rbenv/versions/ruby-dev/lib/ruby/3.5.0+0/uri/common.rb:268: warning: URI::RFC3986_PARSER.extract is obsolete. Use URI::RFC2396_PARSER.extract explicitly. ``` If you still need this functionality, you should use `URI::RFC2396_PARSER.extract` along with `URI::RFC2396_PARSER.parse`. `URI::RFC2396_PARSER.parse` can successfully parse `http://[127.0.0.1]`. However, please note that this behavior is based on an older RFC. ```ruby require 'uri' URI::RFC2396_PARSER.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]'] p URI::RFC2396_PARSER.parse(uri) # => #<URI::HTTP http://[127.0.0.1]> end ``` ---------------------------------------- Bug #21294: URI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address https://bugs.ruby-lang.org/issues/21294#change-112831 * Author: Keeyan (Keeyan Nejad) * Status: Open * ruby -v: ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The following is not a valid URI: `http://[127.0.0.1]`. So `URI.extract` should not extract it. It seems it is extracting it, though. So if you have code which extracts all URIs and then parses them, like the following, an error will be raised: ``` ruby require 'uri' URI.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]'] URI.parse(uri) # => raise URI::InvalidURIError end ``` ``` /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:130:in 'URI::RFC3986_Parser#split': bad URI (is not URI?): "http://[127.0.0.1]" (URI::InvalidURIError) from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:135:in 'URI::RFC3986_Parser#parse' from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/common.rb:212:in 'URI.parse' from test.rb:4:in 'block in <main>' from test.rb:3:in 'Array#each' from test.rb:3:in '<main>' ``` Instead, I believe `URI.extract`, should return an empty array. -- https://bugs.ruby-lang.org/

Issue #21294 has been updated by Keeyan (Keeyan Nejad). Ah thank you @mame! I wasn't aware it was obsolete. We can use `URI::RFC2396_PARSER` for our cases. Do you happen to know why `extract` is not being included in the newer parses? I had a look at the relevant PRs for in the URI repo, but couldn't find anything explaining the reasoning. ---------------------------------------- Bug #21294: URI.extract is extracting invalid URIs with a mishmash of IPv6 notation with IPv4 address https://bugs.ruby-lang.org/issues/21294#change-112835 * Author: Keeyan (Keeyan Nejad) * Status: Open * ruby -v: ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The following is not a valid URI: `http://[127.0.0.1]`. So `URI.extract` should not extract it. It seems it is extracting it, though. So if you have code which extracts all URIs and then parses them, like the following, an error will be raised: ``` ruby require 'uri' URI.extract("Fake URL: http://[127.0.0.1]" , :http).each do |uri| # => ['http://[127.0.0.1]'] URI.parse(uri) # => raise URI::InvalidURIError end ``` ``` /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:130:in 'URI::RFC3986_Parser#split': bad URI (is not URI?): "http://[127.0.0.1]" (URI::InvalidURIError) from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/rfc3986_parser.rb:135:in 'URI::RFC3986_Parser#parse' from /home/keeyan/.local/share/mise/installs/ruby/3.4.3/lib/ruby/3.4.0/uri/common.rb:212:in 'URI.parse' from test.rb:4:in 'block in <main>' from test.rb:3:in 'Array#each' from test.rb:3:in '<main>' ``` Instead, I believe `URI.extract`, should return an empty array. -- https://bugs.ruby-lang.org/
participants (2)
-
Keeyan (Keeyan Nejad)
-
mame (Yusuke Endoh)