On the World Wide Web , a query is a part of a uniform resource locator (URL) containing data that does not fit into a hierarchical path structure. The query string commonly includes fields Added by a Web browser or other application, for example as part of an HTML form . [1]

A web server can handle a Hypertext Transfer Protocol request Either by reading a file from ict file system based on the URL path or by handling the request using logic That is specific to the kind of resource. In cases where special logic is invoked, the query will be available for that logic for use in its processing, along with the path component of the URL.

Structure

Typical URL containing a query string is as follows:

http://example.com/over/there?name=ferret

When a server receives a request for such a page, it is a program, passing the query string, which is in this case, name=ferretunchanged, to the program. The first question mark is used as a separator, and is not part of the query string. [2] [3]

Web frameworks can provide methods for parsing multiple parameters in the query string, separated by some delimiter. [4] In the example below, multiple query parameters are separated by the ampersand , ‘ &‘:

http://example.com/path/to/page?name=ferret&color=purple

The exact structure of the query is not standardized. Methods used to parse the query string

A link in a web page may have a URL that contains a query string. HTML defines three ways a user can generate the query string:

  • an HTML form via the <form>...</form>element
  • a picture server-side map via the ismapattribute on the <img>element with a <img ismap>building
  • an indexed search via the now deprecated <isindex>element

Web forms

One of the original uses of the HTML form , also known as web form. In Particular, When A form Containing the fields field1field2field3is Submitted, the content of the fields is encoded as a query string as follows:

field1=value1&field2=value2&field3=value3...

  • The query string is composed of a series of field-value pairs.
  • Within each peer, the field name and value are separated by an equals sign , ‘ =‘.
  • The series of pairs is separated by the ampersand , ‘ &‘ (or semicolon , ‘ ;‘ for URLs embedded in HTML and not generated by a <form>...</form>.

While there is no definitive standard, most web frameworks allow multiple values ​​to be associated with a single field (eg field1=value1&field1=value2&field2=value3). [5] [6]

For each field of the form, the query string contains a pair . Web forms may include fields that are not visible to the user; These fields are included in the query string when the form is submittedfield=value

This convention is a W3C recommendation. [4] W3C recommends that all web servers support semicolon separators in addition to ampersand separators [7] to allow application / x-www-form-urlencoded query strings in URLs within HTML documents.

The form is only in GET . The same encoding is used when the submission method is POST , but the result is submitted to the HTTP request body rather than being included in a modified URL. [1]

Indexed search

Before forms Were added to HTML, browsers rendered the <isindex>element as a single-line text-input control. The text has been added to the GET request for the database URL or another URL specified by the actionattribute. [8] This could not be done without a list of matching pages. [9]

When the text is inputted to the indexed search is submitted, it is encoded as a query string as follows:

argument1+argument2+argument3...

  • The query string is composed of a series of arguments.
  • The series is separated by the plus sign , ‘ +‘.

Though the <isindex>element is deprecated and no follow MOST browsers or media render it, there are still vestiges Some of indexed search in existence. For example, this is the source of the special handling of plus sign , ‘ +‘ within browser URL percent encoding (which today, with the deprecation of indexed search, is all but redundant with %20). Also some web servers supporting CGI (eg, Apache ) will process the query string into command line arguments if it does not contain an equals sign , ‘ =‘ (as per section 4.4 of CGI 1.1). Some CGI scripts still depend on this behavior for URLs embedded in HTML.

URL encoding

Main article: Percent-encoding

Some characters can not be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character #can be used to further specify a subsection (or fragment ) of a document. In HTML forms, the character =is used to separate a name from a value. The URI generic syntax uses URL encoding to deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as ‘ +‘ or ” %20“. [10]

HTML 5 specifies the following transformation for submitting HTML forms with the “get” method to a web server: [1]

  • Characters that can not be converted to the correct charset are replaced by HTML numeric character references [11]
  • SPACE is encoded as ‘ +‘ or ‘ %20
  • Letters ( A– Zand a– z), numbers ( 0– 9) and the characters ‘ *‘, ‘ -‘, ‘ .‘ and ‘ _‘ are left as-is
  • All other characters are encoded as %HH hex representation With Any non-ASCII characters first encoded as UTF-8 (or other specified encoding)

The octet corresponding to the tilde (” ~“) is permitted in query strings by RFC3986 but is required to be percent-encoded in HTML forms to ” %7E“.

The encoding of SPACE as ‘ +‘ and the selection of “as-is” characters distinguishing this encoding from RFC 3986 .

Example

If a form is embedded in an HTML page as follows:

< form action = "cgi-bin / test.cgi" method = "get" >
 < input type = "text" name = "first" />
 < input type = "text" name = "second" />
 < input type = "submit" />
</ form >

and the user inserts the strings “this is a field” and “was it clear (already)?” in the two text fields and presses the submit button, the program test.cgi(the program specified by the action attribute of the form element in the above example) Will receive The Following query string: first=this+is+a+field&second=was+it+clear+%28already%29%3F.

If the form is processed on the server by a CGI script , the script Typically May Receive the query string as an environment variable named QUERY_STRING.

Tracking

A program receiving a query can not ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query is used or not, the whole URL is stored in the server log files .

These facts allow query to be used by HTTP cookies . For this to work, every time the user downloads a page, a unique identifier must be chosen and added as a query to the URLs of all links. As soon as the user follows these links, the corresponding URL is requested to the server. This way, the download of this page is linked to the previous one.

For example, when a web page containing the following is requested:

 < A href = "foo.html" > see my page! </ A >
 < a href = "bar.html" > mine is better </ a >

a unique string, such as e0a72cb2a2c7is chosen, and the page is modified as follows:

 < A href = "foo.html? E0a72cb2a2c7" > see my page! </ A >
 < a href = "bar.html? E0a72cb2a2c7" > mine is better </ a >

The addition of the query string does not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the page foo.html?e0a72cb2a2c7to the server, which ignores what follows ?and sends the page foo.htmlas expected, adding the query string to its links as well.

This way, e0a72cb2a2c7it will be possible to carry the same query string , making it possible to establish that these pages have been viewed by the same user. Query strings are often used in association with web beacons .

The main differences between query used for tracking and HTTP cookies are that:

  1. Query strings form part of the URL, and are also included if the user saves or sends the URL to another user; cookies can be maintained across browsing sessions, but are not saved with the URL.
  2. If the user arrives at the same web server by two (or more) independent paths, it will be assigned two different query strings, while the stored cookies are the same.
  3. The user can disable cookies, in which case using cookies for tracking does not work. However, using query strings for tracking should work in all situations.
  4. Different query strings passed by different visits to the web page (or proxy, if present) cache les augmentants increasing the load on the web server and slowing down the user experience.

Compatibility issues

According to the HTTP specification:

Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 bytes. [12]

If the URL is too long, the web server fails with the 414 Request-URI Too Long HTTP status code.

The common workaround for these problems is to use POST instead of GET and store the parameters in the request body. The length of time on request is much higher than those on URL length. For example, the limit on POST size, by default, is 2 MB on IIS 4.0 and 128 KB on IIS 5.0. The limit is configurable on Apache2 using the LimitRequestBodydirective, which specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2 GB) which are allowed in a request body. [13]

See also

  • Common Gateway Interface (CGI)
  • HTTP cookie
  • HyperText Transfer Protocol (HTTP)
  • Semantic URLs
  • URI scheme
  • UTM parameters
  • Web beacon

References

  1. ^ Jump up to:c Form submission algorithm , HTML5, W3C recommendation, 28 October 2014
  2. Jump up^ T. Berners-Lee, W3C / MIT, Fielding R., Day Software, Masinter L., Adobe Systems (January 2005). “RFC 3986” . “Syntax Components” (section 3).
  3. Jump up^ T. Berners-Lee, W3C / MIT, Fielding R., Day Software, Masinter L., Adobe Systems (January 2005). “RFC 3986” . “Query” (section 3.4).
  4. ^ Jump up to:b Forms in HTML documents . W3.org. Retrieved on 2013-09-08.
  5. Jump up^ ServletRequest (Java EE 6). Docs.oracle.com (2011-02-10). Retrieved on 2013-09-08.
  6. Jump up^ uri – Authoritative position of Duplicate HTTP GET query keys. Stack Overflow (2013-06-09). Retrieved on 2013-09-08.
  7. Jump up^ Performance, Implementation, and Design Notes. W3.org. Retrieved on 2013-09-08.
  8. Jump up^ “<isindex>” . HTML (HyperText Markup Language) .
  9. Jump up^ “HTML / Elements / isindex” . W3C Wiki .
  10. Jump up^ “HTML URL Encoding Reference” . W3Schools . Retrieved May 1,2013 .
  11. Jump up^ The application / x-www-form-urlencoded encoding algorithm, HTML5, W3C recommendation, 28 October 2014
  12. Jump up^ HTTP / 1.1 Message Syntax and Routing. ietf.org. Retrieved on 2014-07-31.
  13. Jump up^ core – Apache HTTP Server. Httpd.apache.org. Retrieved on 2013-09-08.