Form Submission

During form submission, the form data values of all currently active so-called successful form elements that are contained in a particular formular are being sent back to a server. This action is initiated by the use of a submit or image button. All information about a particular formular may be displayed by invoking the FormInfo window.

More about the whole process can be found in the following subsections:

Basic Form Parameters

A formular is a container of form elements and defines the basic parameters that are used in order to properly submit the form data values of those form elements that are part of the form and that are suitable for this submission. Most parameters can be specified on the originating element of a form, but others may result from other sources such as the document that contains the form.

The following parameters are relevant for form submission:

Method
Specifies the request method by which the form data is to be transmitted to the server. The two methods GET and POST are available. The default method is GET if not specified.
Action
Defines the target URL of the receiving server process for the form data. The default value for this parameter is the URL of the document that contains the form.
Enctype
Specifies the encoding type and determines how the form data values are to be encoded and packed in order to properly transmit them to the server. This value is ignored for the GET method and defaults to application/x-www-form-urlencoded.
Charset
This value is derived from the character encoding of the document that contains the form and is used to transcode the form data values to this character encoding during form submission if possible, otherwise utf-8 is used instead for this purpose.

Request Methods

The request method specifies the way how the form data, after it has been processed, is to be transmitted to the server for further processing. There are two common methods available: GET and POST.

With GET requests, the form data is just appended as a (url-encoded) query part to the target URL of the receiver, see also section "Server-based URLs". In contrast to many other web-browsers, w3browse is also able to append the form data to an already defined query part of the target URL. In any case, the character encoding (charset) of the form data values cannot be transmitted in a standard way to the server and because of this, it has to be guessed or determined by other means by the receiving process if needed.

With POST requests, the form data is being sent in the body of the request and therefore, it is possible to transmit data in any format and of any size. It is also possible to specify the character encoding of the transmitted form data values for the sake of receiving processes. There are several common form data encodings available that can be used with POST requests, which are described in the next section.

Form Data Encodings

Before the form data can be sent to the server, it needs to be collected, transcoded, encoded and packed according to the basic formular parameters. The process of transforming the form element values into the form data output that is finally being transmitted consists of several steps:

  1. First, the names and values of all form elements that are suitable to be submitted according to the encoding type are collected in the order in which they appear in the document.
  2. Then, the names and values of the first step are transcoded to the target character encoding as specified by the Charset parameter. Other characters such as the delimiters that are used in later steps may also need to be transcoded, e.g. when transcoding to utf-16.
  3. Dependent on the target encoding type, the transcoded names and values of the previous step may need to be additionally encoded, e.g. URL-encoded. This encoding step may be necessary in order to escape certain reserved characters that are used in the next step.
  4. Finally, the names and values as processed so far are packed together according to the encoding type. This step performs the actual formatting of the form data output.

The following encoding types are implemented in w3browse and are described in detail together with their handling in further subsections:

application/x-www-form-urlencoded
This is the conventional well-known so-called URL encoding method of composing the form data output. This encoding type can be used with both request methods and is the default in the absence of other information or when the specified encoding type is unknown to the application.
text/plain
The text plain encoding produces a human readable form of the form data and is used only in very special cases.
multipart/form-data
The multipart form-data encoding is the most powerful encoding type and is mainly used in situations where large amounts of data are to be submitted, e.g. for files.

URL Encoding

The names and values of all suitable form elements are first transcoded to the target character encoding and then are URL-encoded. Finally, the resulting names and values are packed as follows:

name1=value1&name2=value2&...

The name and the value of a name-value pair is separated by a "=" character, while adjacent name-value pairs are separated by "&" characters.

For POST requests, the target character encoding is always transmitted to the server and makes use of the charset parameter of the Content-Type: header field, e.g.

Content-Type: application/x-www-form-urlencoded; charset=utf-8

Note that some old web-server software has trouble when faced with such a header field contents. Note also that file selection elements are ignored completely for this encoding type.

Plain Text

The names and values of all suitable form elements are packed as follows:

name1=value1
name2=value2
...

The name and the value of a name-value pair is separated by a "=" character, while each name-value pair is terminated by CRLF, which causes all name-value pairs to appear on separate lines. Note that the contents of textarea form elements can also span multiple lines, which can lead to ambiguities when the output of such a form submission is parsed on the server.

Finally, the whole form data output is transcoded to the target character encoding before it is being submitted. The used character encoding is also always transmitted to the server and makes use of the charset parameter of the Content-Type: header field, e.g.

Content-Type: text/plain; charset=utf-8

Note that file selection elements are ignored completely for this encoding type.

Multipart Form-Data

This is the most powerful encoding type, but also the most wasteful because of its overhead. The contents of all form elements including file selection elements (see next section) can be transmitted to the server, and it is further possible to attach character encoding information to each name-value pair separately.

The form data output that is to be transmitted to the server has the format of a multipart MIME message and looks like this:

--boundary
Content-Disposition: form-data; name="name1"
Content-Type: text/plain; charset=utf-8

value1
--boundary
Content-Disposition: form-data; name="name2"
Content-Type: text/plain; charset=utf-8

value2
--boundary
...
--boundary--

For each name-value pair that is to be submitted, a message part is created that starts with a boundary line and ends at either the next part or the terminating boundary line (the last line of the example). The names and values are transcoded first before they are inserted into a part. The value of the charset parameter of the Content-Type: header field reflects the used character encoding. If a certain value is empty, the corresponding Content-Type: line is just omitted because it is not supposed to provide any useful information to the server in this case.

The header field for the Content-Type: of the form data output that is finally being transmitted to the server looks like this:

Content-Type: multipart/form-data; boundary="boundary"

Note that the value for the boundary (whether with or without the prefix and/or suffix "--") is supposed to not occur anywhere within the message contents and therefore, it is recommended to choose a sufficiently long random string. The same is also true for other boundary strings that may be used in subparts.

Submission of Files

The processing of file selection elements is more complex, because first, instead of the element value, the contents of a file is to be sent, and second, one or more files or even none at all may be sent:

The ability to send othername as a replacement for basename and the transmission of the last modification date of a file are extensions that are specific to w3browse, most if not all other web-browsers are not capable of doing that. Even the ability to submit more than one file per file selection element is not widely supported.