
Issue #19756 has been updated by austin (Austin Ziegler). Dan0042 (Daniel DeLorme) wrote in #note-4:
shugo (Shugo Maeda) wrote in #note-3:
is there any use case to use them with URI::HTTP.build?
I assume the purpose of `URI::HTTP.build` is the same as `URI.parse` but with a hash instead of a string. While writing a crawler I have seen HTTP hostnames with an underscore, that would fail because of URI restrictions, which I had to monkey patch in order to accept the underscore. Since `http://not_std.example.com` is possible and present in the wild, I think it should be possible to build a URI::HTTP object to represent it, either with `.parse` or `.build`. "Be liberal in what you accept."
BTW the same error is raised for `URI::Generic::build(host: "_dmarc.example.com")` which seems to me like it should be a valid way of storing a DMARC domain.
RFC1123 and related RFCs suggest that network reachable hostnames *may not* have underscores, although they are permitted in informational DNS records. Strictly disallowing underscores from `URI::HTTP.build` seems to be correct (I do not know of any hostnames with underscores in them). On the other hand, allowing them in `URI::Generic` may be permissible, although I would probably want something to flag that I’m explicitly allowing underscores (`_dmarc.example.com` would IMO be the DNS equivalent of `dmarc://example.com` in terms of a URI as it refers to the DMARC configuration record *for* `example.com` emails). https://stackoverflow.com/questions/10959757/the-use-of-the-underscore-in-ho... suggests that there are widespread systems that Do This Wrong, but whether they can be reached over the network is an entirely different issue. ---------------------------------------- Bug #19756: URI::HTTP.build does not accept a host of `_gateway`, but `URI.parse` will. https://bugs.ruby-lang.org/issues/19756#change-103771 * Author: postmodern (Hal Brodigan) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- I noticed a difference in behavior between `URI::HTTP.build` and `URI.parse`. `URI::HTTP.build` will not accept `host:` value of `_gateway`, but `URI.parse` will. ## Steps To Reproduce ```ruby URI::HTTP.build(host: "_gateway") ``` vs. ```ruby URI.parse("http://_gateway") ``` ### Expected Results Both raise the same exception, or return the same URI object. ### Actual Results ``` URI::HTTP.build(host: "_gateway") /usr/share/ruby/uri/generic.rb:601:in `check_host': bad component(expected host component): _gateway (URI::InvalidComponentError) from /usr/share/ruby/uri/generic.rb:640:in `host=' from /usr/share/ruby/uri/generic.rb:673:in `hostname=' from /usr/share/ruby/uri/generic.rb:190:in `initialize' from /usr/share/ruby/uri/generic.rb:136:in `new' from /usr/share/ruby/uri/generic.rb:136:in `build' from /usr/share/ruby/uri/http.rb:61:in `build' from (irb):2:in `<main>' from /usr/local/share/gems/gems/irb-1.7.0/exe/irb:9:in `<top (required)>' from /usr/local/bin/irb:25:in `load' from /usr/local/bin/irb:25:in `<main>' ``` ``` URI.parse("https://_gateway") # => #<URI::HTTPS https://_gateway> ``` ## Additional Information ``` $ gem list uri uri (default: 0.12.1) ``` -- https://bugs.ruby-lang.org/