public class ParsedIRI extends Object implements Cloneable, Serializable
Aside from some minor deviations noted below, an instance of this class represents a IRI reference as defined by RFC 3987: Internationalized Resource Identifiers (IRI): IRI Syntax. This class provides constructors for creating IRI instances from their components or by parsing their string forms, methods for accessing the various components of an instance, and methods for normalizing, resolving, and relativizing IRI instances. Instances of this class are immutable.
An IRI instance has the following seven components in string form has the syntax
[scheme:
][//
[user-info@
]host[:
port]][path][?
query][#
fragment]
In a given instance any particular component is either undefined or defined with a distinct value.
Undefined string components are represented by null
, while undefined integer components are represented by
-1
. A string component may be defined to have the empty string as its value; this is not equivalent to that
component being undefined.
Whether a particular component is or is not defined in an instance depends upon the type of the IRI being represented. An absolute IRI has a scheme component. An opaque IRI has a scheme, a scheme-specific part, and possibly a fragment, but has no other components. A hierarchical IRI always has a path (though it may be empty) and a scheme-specific-part (which at least contains the path), and may have any of the other components.
IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire.
Internationalized Resource Identifier (IRI) is a complement to the Uniform Resource Identifier (URI). An IRI is a
sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined
using toASCIIString()
, which means that IRIs can be used instead of URIs, where appropriate, to identify
resources. While all URIs are also IRIs, the normalize()
method can be used to convert a URI back into a
normalized IRI.
A URI is a uniform resource identifier while a URL is a uniform resource locator. Hence every URL is a
URI, abstractly speaking, but not every URI is a URL. This is because there is another subcategory of URIs, uniform
resource names (URNs), which name resources but do not specify how to locate them. The mailto
,
news
, and isbn
URIs shown above are examples of URNs.
jar: This implementation treats the first colon as part of the scheme if the scheme starts with "jar:". For example the IRI jar:http://www.foo.com/bar/jar.jar!/baz/entry.txt is parsed with the scheme jar:http and the path /bar/jar.jar!/baz/entry.txt.
Constructor and Description |
---|
ParsedIRI(String iri)
Constructs a ParsedIRI by parsing the given string.
|
ParsedIRI(String scheme,
String userInfo,
String host,
int port,
String path,
String query,
String fragment)
Constructs a hierarchical IRI from the given components.
|
Modifier and Type | Method and Description |
---|---|
static ParsedIRI |
create(String str)
Creates a ParsedIRI by parsing the given string.
|
boolean |
equals(Object obj)
Tests this IRI for simple string comparison with another object.
|
String |
getFragment()
Returns the raw fragment component of this IRI after the hash.
|
String |
getHost()
Returns the host component of this IRI.
|
String |
getPath()
Returns the raw path component of this IRI.
|
int |
getPort()
Returns the port number of this IRI.
|
String |
getQuery()
Returns the raw query component of this IRI after the first question mark.
|
String |
getScheme()
Returns the scheme component of this IRI.
|
String |
getUserInfo()
Returns the raw user-information component of this IRI.
|
int |
hashCode() |
boolean |
isAbsolute()
Tells whether or not this IRI is absolute.
|
boolean |
isOpaque()
Tells whether or not this IRI is opaque.
|
ParsedIRI |
normalize()
Normalizes this IRI's components.
|
ParsedIRI |
relativize(ParsedIRI absolute)
Relativizes the given IRI against this ParsedIRI.
|
String |
relativize(String iri)
Relativizes the given IRI against this ParsedIRI.
|
ParsedIRI |
resolve(ParsedIRI relative)
Resolves the given IRI against this ParsedIRI.
|
String |
resolve(String iri)
Resolves the given IRI against this ParsedIRI.
|
String |
toASCIIString()
Returns the content of this IRI as a US-ASCII string.
|
String |
toString()
Returns the content of this IRI as a string.
|
public ParsedIRI(String iri) throws URISyntaxException
iri
- The string to be parsed into a IRINullPointerException
- If iri
is null
URISyntaxException
- If the given string violates RFC 3987, as augmented by the above deviationspublic ParsedIRI(String scheme, String userInfo, String host, int port, String path, String query, String fragment)
This constructor first builds a IRI string from the given components according to the rules specified in RFC 3987
scheme
- Scheme nameuserInfo
- User name and authorization informationhost
- Host nameport
- Port numberpath
- Pathquery
- Queryfragment
- Fragmentpublic static ParsedIRI create(String str)
This convenience factory method works as if by invoking the ParsedIRI(String)
constructor; any
URISyntaxException
thrown by the constructor is caught and the error code point is percent encoded. This
process is repeated until a syntactically valid IRI is formed or a IllegalArgumentException
is thrown.
This method is provided for use in situations where it is known that the given string is an IRI, even if it is
not completely syntactically valid, for example a IRI constants declared within in a program. The constructors,
which throw URISyntaxException
directly, should be used situations where a IRI is being constructed from
user input or from some other source that may be prone to errors.
str
- The string to be parsed into an IRINullPointerException
- If str
is null
IllegalArgumentException
- If the given string could not be converted into an IRIpublic boolean equals(Object obj)
If two IRI strings are identical, then it is safe to conclude that they are equivalent. However, even if the IRI
strings are not identical the IRIs might still be equivalent. Further comparison can be made using the
normalize()
forms.
public String toString()
If this URI was created by invoking one of the constructors in this class then a string equivalent to the original input string, or to the string computed from the originally-given components, as appropriate, is returned. Otherwise this IRI was created by normalization, resolution, or relativization, and so a string is constructed from this IRI's components according to the rules specified in RFC 3987
public String toASCIIString()
If this IRI only contains 8bit characters then an invocation of this method will return the same value as an
invocation of the toString
method. Otherwise this method works as if by encoding the host via
RFC 3490 and all other components by percent encoding their
UTF-8 values.
public boolean isAbsolute()
true
if, and only if, this IRI has a scheme componentpublic boolean isOpaque()
A IRI is opaque if, and only if, it is absolute and its path part does not begin with a slash character ('/'). An opaque IRI has a scheme, a path, and possibly a query or fragment; all other components (userInfo, host, and port) are undefined.
true
if, and only if, this IRI is absolute and its path does not start with a slashpublic String getScheme()
The scheme component of a IRI, if defined, only contains characters in the alphanum category and in the
string "-.+"
, unless the scheme starts with "jar:"
, in which case it may also contain one colon.
A scheme always starts with an alpha character.
The scheme component of a IRI cannot contain escaped octets.
null
if the scheme is undefinedpublic String getUserInfo()
null
if the user information is undefinedpublic String getHost()
null
if the host is undefinedpublic int getPort()
The port component of a IRI, if defined, is a non-negative integer.
-1
if the port is undefinedpublic String getPath()
null
)public String getQuery()
The query component of a IRI, if defined, only contains legal IRI characters.
null
if the IRI does not contain a question markpublic String getFragment()
The fragment component of a IRI, if defined, only contains legal IRI characters and does not contain a hash.
null
if the IRI does not contain a hashpublic ParsedIRI normalize()
Because IRIs exist to identify resources, presumably they should be considered equivalent when they identify the same resource. However, this definition of equivalence is not of much practical use, as there is no way for an implementation to compare two resources unless it has full knowledge or control of them. Therefore, IRI normalization is designed to minimize false negatives while strictly avoiding false positives.
Case Normalization the hexadecimal digits within a percent-encoding triplet (e.g., "%3a" versus "%3A") are case-insensitive and are normalized to use uppercase letters for the digits A - F. The scheme and host are case insensitive and are normalized to lowercase.
Character Normalization The Unicode Standard defines various equivalences between sequences of characters for various purposes. Unicode Standard Annex defines various Normalization Forms for these equivalences and is applied to the IRI components.
Percent-Encoding Normalization decodes any percent-encoded octet sequence that corresponds to an unreserved character anywhere in the IRI.
Path Segment Normalization is the process of removing unnecessary "."
and ".."
segments
from the path component of a hierarchical IRI. Each "."
segment is simply removed. A ".."
segment
is removed only if it is preceded by a non-".."
segment or the start of the path.
HTTP(S) Scheme Normalization if the port uses the default port number or not given it is set to undefined. An empty path is replaced with "/".
File Scheme Normalization if the host is "localhost" or empty it is set to undefined.
Internationalized Domain Name Normalization of the host component to Unicode.
public String resolve(String iri)
iri
- The IRI to be resolved against this ParsedIRINullPointerException
- If relative
is null
resolve(ParsedIRI)
public ParsedIRI resolve(ParsedIRI relative)
Resolution is the process of resolving one IRI against another, base IRI. The resulting IRI is constructed from components of both IRIs in the manner specified by RFC 3986, taking components from the base IRI for those not specified in the original. For hierarchical IRIs, the path of the original is resolved against the path of the base and then normalized.
If the given IRI is already absolute, or if this IRI is opaque, then the given IRI is returned.
If the given URI's fragment component is defined, its path component is empty, and
its scheme, authority, and query components are undefined, then a URI with the given fragment but with all other
components equal to those of this URI is returned. This allows an IRI representing a standalone fragment
reference, such as "#foo"
, to be usefully resolved against a base IRI.
Otherwise this method constructs a new hierarchical IRI in a manner consistent with RFC 3987
The result of this method is absolute if, and only if, either this IRI is absolute or the given IRI is absolute.
relative
- The IRI to be resolved against this ParsedIRINullPointerException
- If relative
is null
public String relativize(String iri)
iri
- The IRI to be relativized against this ParsedIRINullPointerException
- If absolute
is null
relativize(ParsedIRI)
public ParsedIRI relativize(ParsedIRI absolute)
Relativization is the inverse of resolution. This operation is often useful when constructing a document containing IRIs that must be made relative to the base IRI of the document wherever possible.
The relativization of the given URI against this URI is computed as follows:
If either this IRI or the given IRI are opaque, or if the scheme and authority components of the two IRIs are not identical, or if the path of this IRI is not a prefix of the path of the given URI, then the given IRI is returned.
Otherwise a new relative hierarchical IRI is constructed with query and fragment components taken from the given IRI and with a path component computed by removing this IRI's path from the beginning of the given IRI's path.
absolute
- The IRI to be relativized against this ParsedIRINullPointerException
- If absolute
is null
Copyright © 2015-2020 Eclipse Foundation. All Rights Reserved.