RosettaCodeData/Task/URL-parser/00-TASK.txt

56 lines
2.5 KiB
Plaintext

URLs are strings with a simple syntax:
scheme://[username:password@]domain[:port]/path?query_string#fragment_id
;Task:
Parse a well-formed URL to retrieve the relevant information:   '''scheme''', '''domain''', '''path''', ...
Note:   this task has nothing to do with [[URL encoding]] or [[URL decoding]].
According to the standards, the characters:
:::: &nbsp; <big><big> ! * ' ( ) ; : @ & = + $ , / ? % # [ ] </big></big>
only need to be percent-encoded &nbsp; ('''%''') &nbsp; in case of possible confusion.
Also note that the '''path''', '''query''' and '''fragment''' are case sensitive, even if the '''scheme''' and '''domain''' are not.
The way the returned information is provided (set of variables, array, structured, record, object,...)
is language-dependent and left to the programmer, but the code should be clear enough to reuse.
Extra credit is given for clear error diagnostics.
* &nbsp; Here is the official standard: &nbsp; &nbsp; https://tools.ietf.org/html/rfc3986,
* &nbsp; and here is a simpler &nbsp; BNF: &nbsp; &nbsp; http://www.w3.org/Addressing/URL/5_URI_BNF.html.
;Test cases:
According to T. Berners-Lee
'''<nowiki>foo://example.com:8042/over/there?name=ferret#nose</nowiki>''' &nbsp; &nbsp; should parse into:
::* &nbsp; scheme = foo
::* &nbsp; domain = example.com
::* &nbsp; port = :8042
::* &nbsp; path = over/there
::* &nbsp; query = name=ferret
::* &nbsp; fragment = nose
'''<nowiki>urn:example:animal:ferret:nose</nowiki>''' &nbsp; &nbsp; should parse into:
::* &nbsp; scheme = urn
::* &nbsp; path = example:animal:ferret:nose
'''other URLs that must be parsed include:'''
:* &nbsp; <nowiki> jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true </nowiki>
:* &nbsp; <nowiki> ftp://ftp.is.co.za/rfc/rfc1808.txt </nowiki>
:* &nbsp; <nowiki> http://www.ietf.org/rfc/rfc2396.txt#header1 </nowiki>
:* &nbsp; <nowiki> ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two </nowiki>
:* &nbsp; <nowiki> mailto:John.Doe@example.com </nowiki>
:* &nbsp; <nowiki> news:comp.infosystems.www.servers.unix </nowiki>
:* &nbsp; <nowiki> tel:+1-816-555-1212 </nowiki>
:* &nbsp; <nowiki> telnet://192.0.2.16:80/ </nowiki>
:* &nbsp; <nowiki> urn:oasis:names:specification:docbook:dtd:xml:4.1.2 </nowiki>
<br><br>