Uniform Resource Locator ( URL ), colloquially termed a web address , [1] is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), [1] interchangeably. [3] [a] URLs occur most commonly to reference web pages ( http ), but are also used for file transfer ( ftp ), email ( mailto ), database access ( JDBC ), and many other applications.

Most web browsers display the URL of a web page in above-the-year address bar . A typical URL could have the form http://www.example.com/index.html, which indicates a protocol ( http), a hostname ( www.example.com), and a file name ( index.html).


Uniform Resource Locators Were defined in RFC  1738 in 1994 by Tim Berners-Lee , the inventor of the World Wide Web , and the URI working group of the Internet Engineering Task Force (IETF), [6] as an outcome of cooperation started at the IETF Living Documents Birds of a feather session in 1992. [7] [8]

The format combines the pre-existing system of domain names (created in 1985) with file path syntax, where slashes are used to separate directories and filenames . Conventions already existed where the names could be prefixed to complete file paths, preceded by a double slash ( //). [9]

Berners-Lee later Expressed regret at the use of dots to separate the parts of the domain name Within URIs , Wishing He Had used slashes Throughout, [9] and aussi Said That, Given the colon Following The first component of a URI, the two slashes before the domain name were unnecessary. [10]

An early (1993) draft of the HTML Specification [11] referred to “Universal” Resource Locators. This was released some time between June 1994 ( RFC 1630 ) and October 1994 (draft-ietf-uri-url-08.txt). [12]


Every HTTP URL conforms to the syntax of a generic URI. A generic URI is of the form:

 scheme: [ // [ user [ : password ] @ ] host [ : port ]] [ / path ] [? query ] [# fragment ]

It included:

  • The scheme , consisting of a sequence of characters beginning with a letter and following by any combination of letters, digits, plus ( +), period ( .), or hyphen ( -). Certain schemes are case-insensitive, and the canonical form is lowercase and documents that it is necessary to do so with lowercase letters. It is followed by a colon ( :). Examples of popular schemes include http(s)ftpmailtofiledata, and irc. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA) , although non-registered schemes are used in practice. [b]
  • Two slashes ( //): This is required by some schemes and not required by some others. When the authority component (explained below) is missing, the path component can not begin with two slashes. [14]
  • An authority part , comprising:
    • An optional authentication section of a user name and password , separated by a colon, followed by an at symbol ( @)
    • A ” host ” consistant en Either a registered name (Including but not limited to a hostname ), or an IP address . IPv4 addresses must be in dot-decimal notation , and IPv6 addresses must be enclosed in brackets ( [ ]). [15] [c]
    • An optional port number , separated from the hostname by a colon
  • path , which contains data, usually organized in hierarchical form, which appears as a sequence of segments separated by slashes. Such a sequence May resemble gold map exactly to a file system path , goal does not always Imply a relationship to one. [17] The path must begin with a single slash ( /) if an authority share is present, and may also be, but not begin with a double slash. The path is always defined, but the defined path may be empty (zero length), therefore no trailing slash.
Query delimiter example
Ampersand ( &) key1=value1&key2=value2
Semicolon ( ;[d] [ incomplete short quote ] key1=value1;key2=value2
  • An optional query , separated from the preceding part by a question mark ( ?), containing a query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of attribute-value pairs separated by a delimiter .
  • An optional fragment , separated from the preceding part by a hash ( #). The fragment contains a fragment identification providing a direction to a secondary resource, such a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an idattribute of a specific element.

A web browser will usually dereference a URL by performing an HTTP request to the specified host, by default port number is 80. URLs using the httpsscheme require That requests and responses will be made over a secure connection to the website .

Internationalized URL

Internet users are distributed throughout the world using a wide variety of languages ​​and alphabets and expect to be able to create URLs in their own local alphabets. An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs. The parts of the URL require special treatment for different alphabets are the domain name and path. [19] [20]

The domain name in the IRI is known as an Internationalized Domain Name (IDN) . Web and Internet software automatically convert the domain name into punycode usable by the Domain Name System; for example, the Chinese URL http://例子.卷筒纸becomes http://xn--fsqu00a.xn--3lr804guic/. The xn--indication that the character was not originally ASCII. [21]

The URL can be specified by the user in the local writing system. If not already encoded, it is converted to UTF-8 , and any characters not part of the basic URL are set to escape hexadecimal using percent-encoding ; for example, the Japanese URL http://example.com/引き割り.htmlbecomes http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html. The target computer decodes the address and displays the page. [19]

Protocol-relative URLs

Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, //example.comwill use the protocol of the current page, either HTTP or HTTPS. [22] [23]

See also

  • CURIE (Compact URI)
  • Use of slashes in networking
  • Fragment identify
  • Internationalized resource identifier (IRI)
  • Semantic URL
  • Typosquatting
  • URL normalization


  1. Jump up^ A URL implies an access to an access resource, which is not true of every URI. [4] [3] Thushttp://www.example.comis a URL, whilewww.example.comis not. [5]
  2. Jump up^ The procedures for registering new URI schemes were originally defined in 1999 by RFC 2717, and are now defined by RFC 7595, published in June 2015. [13]
  3. Jump up^ For URIs relating to resources on the World Wide Web, some web browsers allow.0portions of dot-decimal notation to be dropped or raw integer IP addresses to be used. [16]
  4. Jump up^ Historic RFC 1866 (obsoleted by RFC 2854) Encourage CGI authors to support ‘;’ in addition to ‘&’. [18]


