URI is a module providing classes to handle Uniform Resource Identifiers (RFC2396).
Features
-
Uniform way of handling URIs.
-
Flexibility to introduce custom
URIschemes. -
Flexibility to have an alternate URI::Parser (or just different patterns and regexp’s).
Basic example
require 'uri' uri = URI("http://foo.com/posts?id=30&limit=5#time=1305298413") #=> #<URI::HTTP http://foo.com/posts?id=30&limit=5#time=1305298413> uri.scheme #=> "http" uri.host #=> "foo.com" uri.path #=> "/posts" uri.query #=> "id=30&limit=5" uri.fragment #=> "time=1305298413" uri.to_s #=> "http://foo.com/posts?id=30&limit=5#time=1305298413"
Adding custom URIs
module URI class RSYNC < Generic DEFAULT_PORT = 873 end register_scheme 'RSYNC', RSYNC end #=> URI::RSYNC URI.scheme_list #=> {"FILE"=>URI::File, "FTP"=>URI::FTP, "HTTP"=>URI::HTTP, # "HTTPS"=>URI::HTTPS, "LDAP"=>URI::LDAP, "LDAPS"=>URI::LDAPS, # "MAILTO"=>URI::MailTo, "RSYNC"=>URI::RSYNC} uri = URI("rsync://rsync.foo.com") #=> #<URI::RSYNC rsync://rsync.foo.com>
RFC References
A good place to view an RFC spec is www.ietf.org/rfc.html.
Here is a list of all related RFC’s:
Class tree
-
URI::Generic(in uri/generic.rb)-
URI::File- (in uri/file.rb) -
URI::FTP- (in uri/ftp.rb) -
URI::HTTP- (in uri/http.rb)-
URI::HTTPS- (in uri/https.rb)
-
-
URI::LDAP- (in uri/ldap.rb)-
URI::LDAPS- (in uri/ldaps.rb)
-
-
URI::MailTo- (in uri/mailto.rb)
-
-
URI::Parser - (in uri/common.rb)
-
URI::REGEXP - (in uri/common.rb)
-
URI::REGEXP::PATTERN - (in uri/common.rb)
-
-
URI::Util - (in uri/common.rb)
-
URI::Error- (in uri/common.rb)-
URI::InvalidURIError- (in uri/common.rb) -
URI::InvalidComponentError- (in uri/common.rb) -
URI::BadURIError- (in uri/common.rb)
-
Copyright Info
- Author
-
Akira Yamada <akira@ruby-lang.org>
- Documentation
-
Akira Yamada <akira@ruby-lang.org> Dmitry V. Sabanin <sdmitry@lrn.ru> Vincent Batts <vbatts@hashbangbash.com>
- License
-
Copyright © 2001 akira yamada <akira@ruby-lang.org> You can redistribute it and/or modify it under the same term as
Ruby.
The default parser instance.
The default parser instance for RFC 2396.
The default parser instance for RFC 3986.
curl https://encoding.spec.whatwg.org/encodings.json|
ruby -rjson -e 'H={}
h={
"shift_jis"=>"Windows-31J",
"euc-jp"=>"cp51932",
"iso-2022-jp"=>"cp50221",
"x-mac-cyrillic"=>"macCyrillic",
}
JSON($<.read).map{|x|x["encodings"]}.flatten.each{|x|
Encoding.find(n=h.fetch(n=x["name"].downcase,n))rescue next
x["labels"].each{|y|H[y]=n}
}
puts "{"
H.each{|k,v|puts %[ #{k.dump}=>#{v.dump},]}
puts "}"
‘
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 463
def self._decode_uri_component(regexp, str, enc)
raise ArgumentError, "invalid %-encoding (#{str})" if /%(?!\h\h)/.match?(str)
str.b.gsub(regexp, TBLDECWWWCOMP_).force_encoding(enc)
end
Returns a string decoding characters matching regexp from the given URL-encoded string str.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 441
def self.decode_uri_component(str, enc=Encoding::UTF_8)
_decode_uri_component(/%\h\h/, str, enc)
end
Like URI.decode_www_form_component, except that '+' is preserved.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 620
def self.decode_www_form(str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)
raise ArgumentError, "the input of #{self.name}.#{__method__} must be ASCII only string" unless str.ascii_only?
ary = []
return ary if str.empty?
enc = Encoding.find(enc)
str.b.each_line(separator) do |string|
string.chomp!(separator)
key, sep, val = string.partition('=')
if isindex
if sep.empty?
val = key
key = +''
end
isindex = false
end
if use__charset_ and key == '_charset_' and e = get_encoding(val)
enc = e
use__charset_ = false
end
key.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
if val
val.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
else
val = +''
end
ary << [key, val]
end
ary.each do |k, v|
k.force_encoding(enc)
k.scrub!
v.force_encoding(enc)
v.scrub!
end
ary
end
Returns name/value pairs derived from the given string str, which must be an ASCII string.
The method may be used to decode the body of Net::HTTPResponse object res for which res['Content-Type'] is 'application/x-www-form-urlencoded'.
The returned data is an array of 2-element subarrays; each subarray is a name/value pair (both are strings). Each returned string has encoding enc, and has had invalid characters removed via String#scrub.
A simple example:
URI.decode_www_form('foo=0&bar=1&baz') # => [["foo", "0"], ["bar", "1"], ["baz", ""]]
The returned strings have certain conversions, similar to those performed in URI.decode_www_form_component:
URI.decode_www_form('f%23o=%2F&b-r=%24&b+z=%40') # => [["f#o", "/"], ["b-r", "$"], ["b z", "@"]]
The given string may contain consecutive separators:
URI.decode_www_form('foo=0&&bar=1&&baz=2') # => [["foo", "0"], ["", ""], ["bar", "1"], ["", ""], ["baz", "2"]]
A different separator may be specified:
URI.decode_www_form('foo=0--bar=1--baz', separator: '--') # => [["foo", "0"], ["bar", "1"], ["baz", ""]]
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 430
def self.decode_www_form_component(str, enc=Encoding::UTF_8)
_decode_uri_component(/\+|%\h\h/, str, enc)
end
Returns a string decoded from the given URL-encoded string str.
The given string is first encoded as Encoding::ASCII-8BIT (using String#b), then decoded (as below), and finally force-encoded to the given encoding enc.
The returned string:
-
Preserves:
-
Characters
'*','.','-', and'_'. -
Character in ranges
'a'..'z','A'..'Z', and'0'..'9'.
Example:
URI.decode_www_form_component('*.-_azAZ09') # => "*.-_azAZ09"
-
-
Converts:
-
Character
'+'to character' '. -
Each “percent notation” to an ASCII character.
Example:
URI.decode_www_form_component('Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A') # => "Here are some punctuation characters: ,;?:"
-
Related: URI.decode_uri_component (preserves '+').
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 447
def self._encode_uri_component(regexp, table, str, enc)
str = str.to_s.dup
if str.encoding != Encoding::ASCII_8BIT
if enc && enc != Encoding::ASCII_8BIT
str.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace)
str.encode!(enc, fallback: ->(x){"&##{x.ord};"})
end
str.force_encoding(Encoding::ASCII_8BIT)
end
str.gsub!(regexp, table)
str.force_encoding(Encoding::US_ASCII)
end
Returns a string derived from the given string str with URI-encoded characters matching regexp according to table.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 436
def self.encode_uri_component(str, enc=nil)
_encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCURICOMP_, str, enc)
end
Like URI.encode_www_form_component, except that ' ' (space) is encoded as '%20' (instead of '+').
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 567
def self.encode_www_form(enum, enc=nil)
enum.map do |k,v|
if v.nil?
encode_www_form_component(k, enc)
elsif v.respond_to?(:to_ary)
v.to_ary.map do |w|
str = encode_www_form_component(k, enc)
unless w.nil?
str << '='
str << encode_www_form_component(w, enc)
end
end.join('&')
else
str = encode_www_form_component(k, enc)
str << '='
str << encode_www_form_component(v, enc)
end
end.join('&')
end
Returns a URL-encoded string derived from the given Enumerable enum.
The result is suitable for use as form data for an HTTP request whose Content-Type is 'application/x-www-form-urlencoded'.
The returned string consists of the elements of enum, each converted to one or more URL-encoded strings, and all joined with character '&'.
Simple examples:
URI.encode_www_form([['foo', 0], ['bar', 1], ['baz', 2]]) # => "foo=0&bar=1&baz=2" URI.encode_www_form({foo: 0, bar: 1, baz: 2}) # => "foo=0&bar=1&baz=2"
The returned string is formed using method URI.encode_www_form_component, which converts certain characters:
URI.encode_www_form('f#o': '/', 'b-r': '$', 'b z': '@') # => "f%23o=%2F&b-r=%24&b+z=%40"
When enum is Array-like, each element ele is converted to a field:
-
If
eleis an array of two or more elements, the field is formed from its first two elements (and any additional elements are ignored):name = URI.encode_www_form_component(ele[0], enc) value = URI.encode_www_form_component(ele[1], enc) "#{name}=#{value}"
Examples:
URI.encode_www_form([%w[foo bar], %w[baz bat bah]]) # => "foo=bar&baz=bat" URI.encode_www_form([['foo', 0], ['bar', :baz, 'bat']]) # => "foo=0&bar=baz"
-
If
eleis an array of one element, the field is formed fromele[0]:URI.encode_www_form_component(ele[0])
Example:
URI.encode_www_form([['foo'], [:bar], [0]]) # => "foo&bar&0"
-
Otherwise the field is formed from
ele:URI.encode_www_form_component(ele)
Example:
URI.encode_www_form(['foo', :bar, 0]) # => "foo&bar&0"
The elements of an Array-like enum may be mixture:
URI.encode_www_form([['foo', 0], ['bar', 1, 2], ['baz'], :bat]) # => "foo=0&bar=1&baz&bat"
When enum is Hash-like, each key/value pair is converted to one or more fields:
-
If
valueis Array-convertible, each elementeleinvalueis paired withkeyto form a field:name = URI.encode_www_form_component(key, enc) value = URI.encode_www_form_component(ele, enc) "#{name}=#{value}"
Example:
URI.encode_www_form({foo: [:bar, 1], baz: [:bat, :bam, 2]}) # => "foo=bar&foo=1&baz=bat&baz=bam&baz=2"
-
Otherwise,
keyandvalueare paired to form a field:name = URI.encode_www_form_component(key, enc) value = URI.encode_www_form_component(value, enc) "#{name}=#{value}"
Example:
URI.encode_www_form({foo: 0, bar: 1, baz: 2}) # => "foo=0&bar=1&baz=2"
The elements of a Hash-like enum may be mixture:
URI.encode_www_form({foo: [0, 1], bar: 2}) # => "foo=0&foo=1&bar=2"
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 397
def self.encode_www_form_component(str, enc=nil)
_encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_, str, enc)
end
Returns a URL-encoded string derived from the given string str.
The returned string:
-
Preserves:
-
Characters
'*','.','-', and'_'. -
Character in ranges
'a'..'z','A'..'Z', and'0'..'9'.
Example:
URI.encode_www_form_component('*.-_azAZ09') # => "*.-_azAZ09"
-
-
Converts:
-
Character
' 'to character'+'. -
Any other character to “percent notation”; the percent notation for character c is
'%%%X' % c.ord.
Example:
URI.encode_www_form_component('Here are some punctuation characters: ,;?:') # => "Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A"
-
Encoding:
-
If
strhas encoding Encoding::ASCII_8BIT, argumentencis ignored. -
Otherwise
stris converted first to Encoding::UTF_8 (with suitable character replacements), and then to encodingenc.
In either case, the returned string has forced encoding Encoding::US_ASCII.
Related: URI.encode_uri_component (encodes ' ' as '%20').
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 187
def self.for(scheme, *arguments, default: Generic)
const_name = Schemes.escape(scheme)
uri_class = INITIAL_SCHEMES[const_name]
uri_class ||= Schemes.find(const_name)
uri_class ||= default
return uri_class.new(scheme, *arguments)
end
Returns a new object constructed from the given scheme, arguments, and default:
-
The new object is an instance of
URI.scheme_list[scheme.upcase]. -
The object is initialized by calling the class initializer using
schemeandarguments. SeeURI::Generic.new.
Examples:
values = ['john.doe', 'www.example.com', '123', nil, '/forum/questions/', nil, 'tag=networking&order=newest', 'top'] URI.for('https', *values) # => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top> URI.for('foo', *values, default: URI::HTTP) # => #<URI::HTTP foo://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 273
def self.join(*str)
DEFAULT_PARSER.join(*str)
end
Merges the given URI strings str per RFC 2396.
Each string in str is converted to an RFC3986 URI before being merged.
Examples:
URI.join("http://example.com/","main.rbx") # => #<URI::HTTP http://example.com/main.rbx> URI.join('http://example.com', 'foo') # => #<URI::HTTP http://example.com/foo> URI.join('http://example.com', '/foo', '/bar') # => #<URI::HTTP http://example.com/bar> URI.join('http://example.com', '/foo', 'bar') # => #<URI::HTTP http://example.com/bar> URI.join('http://example.com', '/foo/', 'bar') # => #<URI::HTTP http://example.com/foo/bar>
# File tmp/rubies/ruby-master/lib/open-uri.rb, line 26
def self.open(name, *rest, &block)
if name.respond_to?(:open)
name.open(*rest, &block)
elsif name.respond_to?(:to_str) &&
%r{\A[A-Za-z][A-Za-z0-9+\-\.]*://} =~ name &&
(uri = URI.parse(name)).respond_to?(:open)
uri.open(*rest, &block)
else
super
end
end
Allows the opening of various resources including URIs. Example:
require "open-uri" URI.open("http://example.com") { |f| f.read }
If the first argument responds to the open method, open is called on it with the rest of the arguments.
If the first argument is a string that begins with (protocol)://, it is parsed by URI.parse. If the parsed object responds to the open method, open is called on it with the rest of the arguments.
Otherwise, Kernel#open is called.
OpenURI::OpenRead#open provides URI::HTTP#open, URI::HTTPS#open and URI::FTP#open, Kernel#open.
We can accept URIs and strings that begin with http://, https:// and ftp://. In these cases, the opened file object is extended by OpenURI::Meta.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 246
def self.parse(uri)
PARSER.parse(uri)
end
Returns a new URI object constructed from the given string uri:
URI.parse('https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') # => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top> URI.parse('http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') # => #<URI::HTTP http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
It’s recommended to first URI::RFC2396_PARSER.escape string uri if it may contain invalid URI characters.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 29
def self.parser=(parser = RFC3986_PARSER)
remove_const(:Parser) if defined?(::URI::Parser)
const_set("Parser", parser.class)
remove_const(:PARSER) if defined?(::URI::PARSER)
const_set("PARSER", parser)
remove_const(:REGEXP) if defined?(::URI::REGEXP)
remove_const(:PATTERN) if defined?(::URI::PATTERN)
if Parser == RFC2396_Parser
const_set("REGEXP", URI::RFC2396_REGEXP)
const_set("PATTERN", URI::RFC2396_REGEXP::PATTERN)
end
Parser.new.regexp.each_pair do |sym, str|
remove_const(sym) if const_defined?(sym, false)
const_set(sym, str)
end
end
Set the default parser instance.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 143
def self.register_scheme(scheme, klass)
Schemes.register(scheme, klass)
end
Registers the given klass as the class to be instantiated when parsing a URI with the given scheme:
URI.register_scheme('MS_SEARCH', URI::Generic) # => URI::Generic URI.scheme_list['MS_SEARCH'] # => URI::Generic
Note that after calling String#upcase on scheme, it must be a valid constant name.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 161
def self.scheme_list
Schemes.list
end
Returns a hash of the defined schemes:
URI.scheme_list # => {"MAILTO"=>URI::MailTo, "LDAPS"=>URI::LDAPS, "WS"=>URI::WS, "HTTP"=>URI::HTTP, "HTTPS"=>URI::HTTPS, "LDAP"=>URI::LDAP, "FILE"=>URI::File, "FTP"=>URI::FTP}
Related: URI.register_scheme.
# File tmp/rubies/ruby-master/lib/uri/common.rb, line 232
def self.split(uri)
PARSER.split(uri)
end
Returns a 9-element array representing the parts of the URI formed from the string uri; each array element is a string or nil:
names = %w[scheme userinfo host port registry path opaque query fragment] values = URI.split('https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') names.zip(values) # => [["scheme", "https"], ["userinfo", "john.doe"], ["host", "www.example.com"], ["port", "123"], ["registry", nil], ["path", "/forum/questions/"], ["opaque", nil], ["query", "tag=networking&order=newest"], ["fragment", "top"]]