URL is an abbreviation of "Uniform Resource Locator". URLs are used as references to documents that are located on the Internet, on intranets, on local filesystems etc., or they may even refer to application-internal resources. In the following subsections, different kinds of URLs together with their handling within w3browse are described. More about the gory details of URLs (and URIs) can be found in [URL/URI/IRI].
A generic URL consists of two parts: a scheme and a scheme-specific part. The scheme determines how the rest of the URL is to be interpreted. The syntax is:
rest-of-url part may only consist of a
restricted set of characters such as letters, digits and a few graphic
symbols, all taken from the US-ASCII character set. Other characters have to
be escaped, that is, they are replaced by another sequence of
characters which represent that character, e.g. a space is replaced by
%20. Unescaping reverses this process.
Scheme-specific parts that consist of different components, such as those of
server-based URLs (see below), may require certain
reserved characters to be escaped too if they are to be used within a
Any URL that does not fit into another category is treated as a generic URL, e.g.
Note that some older web-browsers do not recognize URL-escaped characters
Many URLs are used to access resources that are provided by servers which are located on a network. The schemes of such server-based URLs are usually named by the protocol that is used for the transport and share a common syntax:
Most parts of such a URL are optional, the shortest useful form is
The following schemes that are supported by w3browse fall into the category of server-based URLs:
The URL part
is sometimes called netloc (network location) and is used to
specify the address of a server. The mandatory subpart
allows the DNS name or IP address of a host to be specified. The optional
:port may specify a different port number in
case the desired service is not available on the scheme-specific default
port. The leading optional subpart
specifies a user-id and a password for login-based schemes such as
ftps. When used together with other
schemes, w3browse generates an appropriate HTTP
Authorization: header while connecting to such a server.
/directory/basename of a URL
is commonly known as path and may be regarded as a hierarchy of
documents on the server, but note that this hierarchy is not necessarily
on or part of a filesystem. It is up to the server to decide what kind of
action to perform when a certain path (and query) is requested. The subparts
directory is actually a sequence of
names that are separated by slashes (
The so-called query part
?query of a
URL has often the form
where named parameters are used to transfer values, e.g. entered into an HTML formular, back to a server, e.g. in order to perform a search.
The last URL part
#fragment is not sent to a
server, instead it is used by the requestor to identify or address a part of
the retrieved document. The exact interpretation of the fragment
identifier depends on the content-type of that document.
URLs of type file are used to access files on the local
filesystem. The syntax of such URLs is the same as for server-based URLs, but
because there is no server involved, the netloc part can be left empty or may
be set to the value
localhost. The following three forms are all
valid and are normalized to the last one by w3browse:
file://localhost/usr/share/doc/ file:///usr/share/doc/ file:/usr/share/doc/
The part following the
file: prefix is really a
filename or directory, so all variants of them are
also valid here, but the path components have to be escaped if they contain
special characters, e.g.
The shortest useful form is
refers to the root of the local filesystem.
Mailto URLs are used to denote e-mail addresses and have the following general format (as implemented in w3browse):
Some or all parts of the query part may be missing, but the
e-mail-address should be given because it is the primary
To: header field of an e-mail message. The
cc= parts may be repeated multiple times. w3browse
invokes its e-mail composer
automatically when following such a link, e.g.
mailto:aleks_at_aksware_dot_de mailto:support(at)aksware(dot)de?subject=w3browse mailto:
The parameter MailDir of the dialog "Open URL Window" and further settings that have been made within the "e-Mail Application" for that environment are in effect when the e-mail composer is invoked.
These kinds of URLs are special to w3browse and are used to refer to certain application-internal resources. The syntax of so-called internal URLs is similar to that of server-based URLs:
But in this case, the
netloc part identifies a
certain subsystem of the application, e.g.
Another set of special URLs is introduced by the prefix
about: and shares its syntax with generic
URLs. These URLs are used to provide some shortcuts to other internal
resources and applications:
A detailed description of all kinds of internal applications together with ways of how to access them is given in chapter "Internal Applications".
wstps are currently not
implemented natively in w3browse, but they may be used in
connection with proxy-servers and gateways, because in this case a URL is
just handed over to the peer, which has then to perform the dirty work.