A Uniform Resource Locator ( URL ), colloquially termed a web address , [1] is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), [1] interchangeably. [3] [a] URLs occur most commonly to reference web pages ( http ), but are also used for file transfer ( ftp ), email ( mailto ), database access ( JDBC ), and many other applications.
Most web browsers display the URL of a web page in above-the-year address bar . A typical URL could have the form http://www.example.com/index.html
, which indicates a protocol ( http
), a hostname ( www.example.com
), and a file name ( index.html
).
History
Uniform Resource Locators Were defined in RFC 1738 in 1994 by Tim Berners-Lee , the inventor of the World Wide Web , and the URI working group of the Internet Engineering Task Force (IETF), [6] as an outcome of cooperation started at the IETF Living Documents Birds of a feather session in 1992. [7] [8]
The format combines the pre-existing system of domain names (created in 1985) with file path syntax, where slashes are used to separate directories and filenames . Conventions already existed where the names could be prefixed to complete file paths, preceded by a double slash ( //
). [9]
Berners-Lee later Expressed regret at the use of dots to separate the parts of the domain name Within URIs , Wishing He Had used slashes Throughout, [9] and aussi Said That, Given the colon Following The first component of a URI, the two slashes before the domain name were unnecessary. [10]
An early (1993) draft of the HTML Specification [11] referred to “Universal” Resource Locators. This was released some time between June 1994 ( RFC 1630 ) and October 1994 (draft-ietf-uri-url-08.txt). [12]
Syntax
Every HTTP URL conforms to the syntax of a generic URI. A generic URI is of the form:
scheme: [ // [ user [ : password ] @ ] host [ : port ]] [ / path ] [? query ] [# fragment ]
It included:
- The scheme , consisting of a sequence of characters beginning with a letter and following by any combination of letters, digits, plus (
+
), period (.
), or hyphen (-
). Certain schemes are case-insensitive, and the canonical form is lowercase and documents that it is necessary to do so with lowercase letters. It is followed by a colon (:
). Examples of popular schemes includehttp(s)
,ftp
,mailto
,file
,data
, andirc
. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA) , although non-registered schemes are used in practice. [b] - Two slashes (
//
): This is required by some schemes and not required by some others. When the authority component (explained below) is missing, the path component can not begin with two slashes. [14] - An authority part , comprising:
- An optional authentication section of a user name and password , separated by a colon, followed by an at symbol (
@
) - A ” host ” consistant en Either a registered name (Including but not limited to a hostname ), or an IP address . IPv4 addresses must be in dot-decimal notation , and IPv6 addresses must be enclosed in brackets (
[ ]
). [15] [c] - An optional port number , separated from the hostname by a colon
- An optional authentication section of a user name and password , separated by a colon, followed by an at symbol (
- A path , which contains data, usually organized in hierarchical form, which appears as a sequence of segments separated by slashes. Such a sequence May resemble gold map exactly to a file system path , goal does not always Imply a relationship to one. [17] The path must begin with a single slash (
/
) if an authority share is present, and may also be, but not begin with a double slash. The path is always defined, but the defined path may be empty (zero length), therefore no trailing slash.
Query delimiter | example |
---|---|
Ampersand ( & ) |
key1=value1&key2=value2 |
Semicolon ( ; ) [d] [ incomplete short quote ] |
key1=value1;key2=value2 |
- An optional query , separated from the preceding part by a question mark (
?
), containing a query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of attribute-value pairs separated by a delimiter . - An optional fragment , separated from the preceding part by a hash (
#
). The fragment contains a fragment identification providing a direction to a secondary resource, such a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often anid
attribute of a specific element.
A web browser will usually dereference a URL by performing an HTTP request to the specified host, by default port number is 80. URLs using the https
scheme require That requests and responses will be made over a secure connection to the website .
Internationalized URL
Internet users are distributed throughout the world using a wide variety of languages and alphabets and expect to be able to create URLs in their own local alphabets. An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs. The parts of the URL require special treatment for different alphabets are the domain name and path. [19] [20]
The domain name in the IRI is known as an Internationalized Domain Name (IDN) . Web and Internet software automatically convert the domain name into punycode usable by the Domain Name System; for example, the Chinese URL http://例子.卷筒纸
becomes http://xn--fsqu00a.xn--3lr804guic/
. The xn--
indication that the character was not originally ASCII. [21]
The URL can be specified by the user in the local writing system. If not already encoded, it is converted to UTF-8 , and any characters not part of the basic URL are set to escape hexadecimal using percent-encoding ; for example, the Japanese URL http://example.com/引き割り.html
becomes http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
. The target computer decodes the address and displays the page. [19]
Protocol-relative URLs
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, //example.com
will use the protocol of the current page, either HTTP or HTTPS. [22] [23]
See also
- CURIE (Compact URI)
- Use of slashes in networking
- Fragment identify
- Internationalized resource identifier (IRI)
- Semantic URL
- Typosquatting
- URL normalization
Notes
- Jump up^ A URL implies an access to an access resource, which is not true of every URI. [4] [3] Thus
http://www.example.com
is a URL, whilewww.example.com
is not. [5] - Jump up^ The procedures for registering new URI schemes were originally defined in 1999 by RFC 2717, and are now defined by RFC 7595, published in June 2015. [13]
- Jump up^ For URIs relating to resources on the World Wide Web, some web browsers allow
.0
portions of dot-decimal notation to be dropped or raw integer IP addresses to be used. [16] - Jump up^ Historic RFC 1866 (obsoleted by RFC 2854) Encourage CGI authors to support ‘;’ in addition to ‘&’. [18]
citations
- Jump up^ W3C (2009).
- Jump up^ RFC 3986 (2005).
- ^ Jump up to:a b W3C / IETF Joint URI Planning Interest Group (2002) .
- Jump up^ RFC 2396 (1998).
- Jump up^ Miessler, Daniel. “The Difference Between URLs and URIs” .
- Jump up^ W3C (1994).
- Jump up^ IETF (1992).
- Jump up^ Berners-Lee (1994).
- ^ Jump up to:a b Berners-Lee (2000) .
- Jump up^ BBC News (2009).
- Jump up^ Berners-Lee, Tim; Connolly, Daniel (March 1993). Hypertext Markup Language (draft RFCxxx) (Technical report). p. 28.
- Jump up^ Berners-Lee, T; Masinter, L; McCahill, M (October 1994). Uniform Resource Locators (URL) (Technical report). cited inAng, CS; Martin, DC (January 1995). Constituent Component Interface ++ (Technical report). UCSF Library and Center for Knowledge Management.
- Jump up^ IETF (2015).
- Jump up^ RFC 3986 (2005), §3.
- Jump up^ RFC 3986 (2005), §3.2.2.
- Jump up^ Lawrence (2014).
- Jump up^ RFC 2396 (1998), §3.3.
- Jump up^ RFC 1866 (1995), §8.2.1.
- ^ Jump up to:a b W3C (2008) .
- Jump up^ W3C (2014).
- Jump up^ IANA (2003).
- Jump up^ JD Glaser (2013). Secure Development for Mobile Apps: How to Design and Secure Code Mobile Applications with PHP and JavaScript. CRC Press. p. 193 . Retrieved 12 October 2015 .
- Jump up^ Steven M. Schafer (2011). HTML, XHTML, and CSS Bible . John Wiley & Sons. p. 124 . Retrieved 12 October2015 .
References
- “Berners-Lee” sorry “for slashes” . BBC News. 2009-10-14 . Retrieved 2010-02-14 .
- “Living Documents BoF Minutes” . World Wide Web Consortium . March 18, 1992 . Retrieved 2011-12-26 .
- Berners-Lee, Tim (March 21, 1994). “Uniform Resource Locators (URL): A Syntax for the Expression of Access Information of Objects on the Network” . World Wide Web Consortium . Retrieved 13 September 2015 .
- Berners-Lee, Tim ; Masinter, Larry; McCahill, Mark (August 1998). “Uniform Resource Locators (URL)” . Internet Engineering Task Force . Retrieved 31 August 2015 .
- Berners-Lee, Tim (2015) [2000]. “Why the //, #, etc?” . Frequently asked questions . World Wide Web Consortium . Retrieved 2010-02-03 .
- Connolly, Dan; Sperberg-McQueen, CM, eds. (May 21, 2009). “Web addresses in HTML 5” . World Wide Web Consortium . Retrieved 13 September 2015 .
- Internet Assigned Numbers Authority (14 February 2003). “Completion of IANA Selection of IDNA Prefix” . IETF-Announce mailing list . Retrieved 3 September 2015 .
- Berners-Lee, Tim ; Fielding, Roy ; Masinter, Larry (August 1998). “Uniform Resource Identifiers (URI): Generic Syntax” . Internet Engineering Task Force . Retrieved 31 August 2015 .
- Hansen, T .; Hardie, T. (June 2015). Thaler, D., ed. “Guidelines and Registration Procedures for URI Schemes” . Internet Engineering Task Force . ISSN 2070-1721 .
- Mealling, M .; Denenberg, R., eds. (August 2002). “Report from the Joint W3C / IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations” . World Wide Web Consortium . Retrieved 13 September 2015 .
- Berners-Lee, Tim ; Fielding, Roy ; Masinter, Larry (January 2005). “Uniform Resource Identifiers (URI): Generic Syntax” . Internet Engineering Task Force . Retrieved 31 August 2015 .
- “An Introduction to Multilingual Web Addresses” . May 9, 2008 . Retrieved 11 January 2015 .
- Phillip, A. (2014). “What is Happening with” International URLs ” ” . World Wide Web Consortium . Retrieved 11 January 2015 .
Leave a Reply