Back to index

python3.2  3.2.2
Classes | Functions | Variables
urllib.parse Namespace Reference

Classes

class  _ResultMixinStr
class  _ResultMixinBytes
class  _NetlocResultMixinBase
class  _NetlocResultMixinStr
class  _NetlocResultMixinBytes
class  DefragResult
class  SplitResult
class  ParseResult
class  DefragResultBytes
class  SplitResultBytes
class  ParseResultBytes
class  Quoter

Functions

def clear_cache
def _noop
def _encode_result
def _decode_args
def _coerce_args
def _fix_result_transcoding
def urlparse
def _splitparams
def _splitnetloc
def urlsplit
def urlunparse
def urlunsplit
def urljoin
def urldefrag
def unquote_to_bytes
def unquote
def parse_qs
def parse_qsl
def unquote_plus
def quote
def quote_plus
def quote_from_bytes
def urlencode
def to_bytes
def unwrap
def splittype
def splithost
def splituser
def splitpasswd
def splitport
def splitnport
def splitquery
def splittag
def splitattr
def splitvalue

Variables

list __all__
list uses_relative
list uses_netloc
list non_hierarchical
list uses_params
list uses_query
list uses_fragment
tuple scheme_chars
int MAX_CACHE_SIZE = 20
dictionary _parse_cache = {}
string _implicit_encoding = 'ascii'
string _implicit_errors = 'strict'
tuple _DefragResultBase = namedtuple('DefragResult', 'url fragment')
tuple _SplitResultBase = namedtuple('SplitResult', 'scheme netloc path query fragment')
tuple _ParseResultBase = namedtuple('ParseResult', 'scheme netloc path params query fragment')
 ResultBase = _NetlocResultMixinStr
tuple _ALWAYS_SAFE
tuple _ALWAYS_SAFE_BYTES = bytes(_ALWAYS_SAFE)
dictionary _safe_quoters = {}
 _typeprog = None
 _hostprog = None
 _userprog = None
 _passwdprog = None
 _portprog = None
 _nportprog = None
 _queryprog = None
 _tagprog = None
 _valueprog = None

Detailed Description

Parse (absolute and relative) URLs.

urlparse module is based upon the following RFC specifications.

RFC 3986 (STD66): "Uniform Resource Identifiers" by T. Berners-Lee, R. Fielding
and L.  Masinter, January 2005.

RFC 2732 : "Format for Literal IPv6 Addresses in URL's by R.Hinden, B.Carpenter
and L.Masinter, December 1999.

RFC 2396:  "Uniform Resource Identifiers (URI)": Generic Syntax by T.
Berners-Lee, R. Fielding, and L. Masinter, August 1998.

RFC 2368: "The mailto URL scheme", by P.Hoffman , L Masinter, J. Zawinski, July 1998.

RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June
1995.

RFC 1738: "Uniform Resource Locators (URL)" by T. Berners-Lee, L. Masinter, M.
McCahill, December 1994

RFC 3986 is considered the current standard and any future changes to
urlparse module should conform with it.  The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes, some
parsing quirks from older RFCs are retained. The testcases in
test_urlparse.py provides a good indicator of parsing behavior.

Function Documentation

def urllib.parse._coerce_args (   args) [private]

Definition at line 94 of file parse.py.

00094 
00095 def _coerce_args(*args):
00096     # Invokes decode if necessary to create str args
00097     # and returns the coerced inputs along with
00098     # an appropriate result coercion function
00099     #   - noop for str inputs
00100     #   - encoding function otherwise
00101     str_input = isinstance(args[0], str)
00102     for arg in args[1:]:
00103         # We special-case the empty string to support the
00104         # "scheme=''" default argument to some functions
00105         if arg and isinstance(arg, str) != str_input:
00106             raise TypeError("Cannot mix str and non-str arguments")
00107     if str_input:
00108         return args + (_noop,)
00109     return _decode_args(args) + (_encode_result,)
00110 
# Result objects are more helpful than simple tuples

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse._decode_args (   args,
  encoding = _implicit_encoding,
  errors = _implicit_errors 
) [private]

Definition at line 91 of file parse.py.

00091 
00092                        errors=_implicit_errors):
00093     return tuple(x.decode(encoding, errors) if x else '' for x in args)

Here is the caller graph for this function:

def urllib.parse._encode_result (   obj,
  encoding = _implicit_encoding,
  errors = _implicit_errors 
) [private]

Definition at line 87 of file parse.py.

00087 
00088                         errors=_implicit_errors):
00089     return obj.encode(encoding, errors)

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 266 of file parse.py.

00266 
00267 def _fix_result_transcoding():
00268     _result_pairs = (
00269         (DefragResult, DefragResultBytes),
00270         (SplitResult, SplitResultBytes),
00271         (ParseResult, ParseResultBytes),
00272     )
00273     for _decoded, _encoded in _result_pairs:
00274         _decoded._encoded_counterpart = _encoded
00275         _encoded._decoded_counterpart = _decoded
00276 
_fix_result_transcoding()
def urllib.parse._noop (   obj) [private]

Definition at line 83 of file parse.py.

00083 
00084 def _noop(obj):
00085     return obj

Here is the call graph for this function:

def urllib.parse._splitnetloc (   url,
  start = 0 
) [private]

Definition at line 304 of file parse.py.

00304 
00305 def _splitnetloc(url, start=0):
00306     delim = len(url)   # position of end of domain part of url, default is end
00307     for c in '/?#':    # look for delimiters; the order is NOT important
00308         wdelim = url.find(c, start)        # find first of this delim
00309         if wdelim >= 0:                    # if found
00310             delim = min(delim, wdelim)     # use earliest delim position
00311     return url[start:delim], url[delim:]   # return (domain, rest)

Here is the caller graph for this function:

def urllib.parse._splitparams (   url) [private]

Definition at line 295 of file parse.py.

00295 
00296 def _splitparams(url):
00297     if '/'  in url:
00298         i = url.find(';', url.rfind('/'))
00299         if i < 0:
00300             return url, ''
00301     else:
00302         i = url.find(';')
00303     return url[:i], url[i+1:]

Here is the caller graph for this function:

Clear the parse cache and the quoters cache.

Definition at line 68 of file parse.py.

00068 
00069 def clear_cache():
00070     """Clear the parse cache and the quoters cache."""
00071     _parse_cache.clear()
00072     _safe_quoters.clear()
00073 
00074 
00075 # Helpers for bytes handling
00076 # For 3.2, we deliberately require applications that
00077 # handle improperly quoted URLs to do their own
00078 # decoding and encoding. If valid use cases are
00079 # presented, we may relax this by using latin-1
# decoding internally for 3.3

Here is the caller graph for this function:

def urllib.parse.parse_qs (   qs,
  keep_blank_values = False,
  strict_parsing = False,
  encoding = 'utf-8',
  errors = 'replace' 
)
Parse a query given as a string argument.

    Arguments:

    qs: percent-encoded query string to be parsed

    keep_blank_values: flag indicating whether blank values in
        percent-encoded queries should be treated as blank strings.
        A true value indicates that blanks should be retained as
        blank strings.  The default false value indicates that
        blank values are to be ignored and treated as if they were
        not included.

    strict_parsing: flag indicating what to do with parsing errors.
        If false (the default), errors are silently ignored.
        If true, errors raise a ValueError exception.

    encoding and errors: specify how to decode percent-encoded sequences
        into Unicode characters, as accepted by the bytes.decode() method.

Definition at line 533 of file parse.py.

00533 
00534              encoding='utf-8', errors='replace'):
00535     """Parse a query given as a string argument.
00536 
00537         Arguments:
00538 
00539         qs: percent-encoded query string to be parsed
00540 
00541         keep_blank_values: flag indicating whether blank values in
00542             percent-encoded queries should be treated as blank strings.
00543             A true value indicates that blanks should be retained as
00544             blank strings.  The default false value indicates that
00545             blank values are to be ignored and treated as if they were
00546             not included.
00547 
00548         strict_parsing: flag indicating what to do with parsing errors.
00549             If false (the default), errors are silently ignored.
00550             If true, errors raise a ValueError exception.
00551 
00552         encoding and errors: specify how to decode percent-encoded sequences
00553             into Unicode characters, as accepted by the bytes.decode() method.
00554     """
00555     dict = {}
00556     pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
00557                       encoding=encoding, errors=errors)
00558     for name, value in pairs:
00559         if name in dict:
00560             dict[name].append(value)
00561         else:
00562             dict[name] = [value]
00563     return dict

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.parse_qsl (   qs,
  keep_blank_values = False,
  strict_parsing = False,
  encoding = 'utf-8',
  errors = 'replace' 
)
Parse a query given as a string argument.

Arguments:

qs: percent-encoded query string to be parsed

keep_blank_values: flag indicating whether blank values in
    percent-encoded queries should be treated as blank strings.  A
    true value indicates that blanks should be retained as blank
    strings.  The default false value indicates that blank values
    are to be ignored and treated as if they were  not included.

strict_parsing: flag indicating what to do with parsing errors. If
    false (the default), errors are silently ignored. If true,
    errors raise a ValueError exception.

encoding and errors: specify how to decode percent-encoded sequences
    into Unicode characters, as accepted by the bytes.decode() method.

Returns a list, as G-d intended.

Definition at line 565 of file parse.py.

00565 
00566               encoding='utf-8', errors='replace'):
00567     """Parse a query given as a string argument.
00568 
00569     Arguments:
00570 
00571     qs: percent-encoded query string to be parsed
00572 
00573     keep_blank_values: flag indicating whether blank values in
00574         percent-encoded queries should be treated as blank strings.  A
00575         true value indicates that blanks should be retained as blank
00576         strings.  The default false value indicates that blank values
00577         are to be ignored and treated as if they were  not included.
00578 
00579     strict_parsing: flag indicating what to do with parsing errors. If
00580         false (the default), errors are silently ignored. If true,
00581         errors raise a ValueError exception.
00582 
00583     encoding and errors: specify how to decode percent-encoded sequences
00584         into Unicode characters, as accepted by the bytes.decode() method.
00585 
00586     Returns a list, as G-d intended.
00587     """
00588     qs, _coerce_result = _coerce_args(qs)
00589     pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
00590     r = []
00591     for name_value in pairs:
00592         if not name_value and not strict_parsing:
00593             continue
00594         nv = name_value.split('=', 1)
00595         if len(nv) != 2:
00596             if strict_parsing:
00597                 raise ValueError("bad query field: %r" % (name_value,))
00598             # Handle case of a control-name with no equal sign
00599             if keep_blank_values:
00600                 nv.append('')
00601             else:
00602                 continue
00603         if len(nv[1]) or keep_blank_values:
00604             name = nv[0].replace('+', ' ')
00605             name = unquote(name, encoding=encoding, errors=errors)
00606             name = _coerce_result(name)
00607             value = nv[1].replace('+', ' ')
00608             value = unquote(value, encoding=encoding, errors=errors)
00609             value = _coerce_result(value)
00610             r.append((name, value))
00611     return r

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.quote (   string,
  safe = '/',
  encoding = None,
  errors = None 
)
quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

By default, the quote function is intended for quoting the path
section of a URL.  Thus, it will not encode '/'.  This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.

string and safe may be either str or bytes objects. encoding must
not be specified if string is a str.

The optional encoding and errors parameters specify how to deal with
non-ASCII characters, as accepted by the str.encode method.
By default, encoding='utf-8' (characters are encoded with UTF-8), and
errors='strict' (unsupported characters raise a UnicodeEncodeError).

Definition at line 650 of file parse.py.

00650 
00651 def quote(string, safe='/', encoding=None, errors=None):
00652     """quote('abc def') -> 'abc%20def'
00653 
00654     Each part of a URL, e.g. the path info, the query, etc., has a
00655     different set of reserved characters that must be quoted.
00656 
00657     RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
00658     the following reserved characters.
00659 
00660     reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
00661                   "$" | ","
00662 
00663     Each of these characters is reserved in some component of a URL,
00664     but not necessarily in all of them.
00665 
00666     By default, the quote function is intended for quoting the path
00667     section of a URL.  Thus, it will not encode '/'.  This character
00668     is reserved, but in typical usage the quote function is being
00669     called on a path where the existing slash characters are used as
00670     reserved characters.
00671 
00672     string and safe may be either str or bytes objects. encoding must
00673     not be specified if string is a str.
00674 
00675     The optional encoding and errors parameters specify how to deal with
00676     non-ASCII characters, as accepted by the str.encode method.
00677     By default, encoding='utf-8' (characters are encoded with UTF-8), and
00678     errors='strict' (unsupported characters raise a UnicodeEncodeError).
00679     """
00680     if isinstance(string, str):
00681         if not string:
00682             return string
00683         if encoding is None:
00684             encoding = 'utf-8'
00685         if errors is None:
00686             errors = 'strict'
00687         string = string.encode(encoding, errors)
00688     else:
00689         if encoding is not None:
00690             raise TypeError("quote() doesn't support 'encoding' for bytes")
00691         if errors is not None:
00692             raise TypeError("quote() doesn't support 'errors' for bytes")
00693     return quote_from_bytes(string, safe)

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.quote_from_bytes (   bs,
  safe = '/' 
)
Like quote(), but accepts a bytes object rather than a str, and does
not perform string-to-bytes encoding.  It always returns an ASCII string.
quote_from_bytes(b'abc def\xab') -> 'abc%20def%AB'

Definition at line 711 of file parse.py.

00711 
00712 def quote_from_bytes(bs, safe='/'):
00713     """Like quote(), but accepts a bytes object rather than a str, and does
00714     not perform string-to-bytes encoding.  It always returns an ASCII string.
00715     quote_from_bytes(b'abc def\xab') -> 'abc%20def%AB'
00716     """
00717     if not isinstance(bs, (bytes, bytearray)):
00718         raise TypeError("quote_from_bytes() expected bytes")
00719     if not bs:
00720         return ''
00721     if isinstance(safe, str):
00722         # Normalize 'safe' by converting to bytes and removing non-ASCII chars
00723         safe = safe.encode('ascii', 'ignore')
00724     else:
00725         safe = bytes([c for c in safe if c < 128])
00726     if not bs.rstrip(_ALWAYS_SAFE_BYTES + safe):
00727         return bs.decode()
00728     try:
00729         quoter = _safe_quoters[safe]
00730     except KeyError:
00731         _safe_quoters[safe] = quoter = Quoter(safe).__getitem__
00732     return ''.join([quoter(char) for char in bs])

Here is the caller graph for this function:

def urllib.parse.quote_plus (   string,
  safe = '',
  encoding = None,
  errors = None 
)
Like quote(), but also replace ' ' with '+', as required for quoting
HTML form values. Plus signs in the original string are escaped unless
they are included in safe. It also does not have safe default to '/'.

Definition at line 694 of file parse.py.

00694 
00695 def quote_plus(string, safe='', encoding=None, errors=None):
00696     """Like quote(), but also replace ' ' with '+', as required for quoting
00697     HTML form values. Plus signs in the original string are escaped unless
00698     they are included in safe. It also does not have safe default to '/'.
00699     """
00700     # Check if ' ' in string, where string may either be a str or bytes.  If
00701     # there are no spaces, the regular quote will produce the right answer.
00702     if ((isinstance(string, str) and ' ' not in string) or
00703         (isinstance(string, bytes) and b' ' not in string)):
00704         return quote(string, safe, encoding, errors)
00705     if isinstance(safe, str):
00706         space = ' '
00707     else:
00708         space = b' '
00709     string = quote(string, safe + space, encoding, errors)
00710     return string.replace(' ', '+')

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitattr (   url)
splitattr('/path;attr1=value1;attr2=value2;...') ->
    '/path', ['attr1=value1', 'attr2=value2', ...].

Definition at line 961 of file parse.py.

00961 
00962 def splitattr(url):
00963     """splitattr('/path;attr1=value1;attr2=value2;...') ->
00964         '/path', ['attr1=value1', 'attr2=value2', ...]."""
00965     words = url.split(';')
00966     return words[0], words[1:]

Here is the caller graph for this function:

def urllib.parse.splithost (   url)
splithost('//host[:port]/path') --> 'host[:port]', '/path'.

Definition at line 862 of file parse.py.

00862 
00863 def splithost(url):
00864     """splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
00865     global _hostprog
00866     if _hostprog is None:
00867         import re
00868         _hostprog = re.compile('^//([^/?]*)(.*)$')
00869 
00870     match = _hostprog.match(url)
00871     if match:
00872         host_port = match.group(1)
00873         path = match.group(2)
00874         if path and not path.startswith('/'):
00875             path = '/' + path
00876         return host_port, path
00877     return None, url

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitnport (   host,
  defport = -1 
)
Split host and port, returning numeric port.
Return given default port if no ':' found; defaults to -1.
Return numerical port if a valid number are found after ':'.
Return None if ':' but not a valid number.

Definition at line 916 of file parse.py.

00916 
00917 def splitnport(host, defport=-1):
00918     """Split host and port, returning numeric port.
00919     Return given default port if no ':' found; defaults to -1.
00920     Return numerical port if a valid number are found after ':'.
00921     Return None if ':' but not a valid number."""
00922     global _nportprog
00923     if _nportprog is None:
00924         import re
00925         _nportprog = re.compile('^(.*):(.*)$')
00926 
00927     match = _nportprog.match(host)
00928     if match:
00929         host, port = match.group(1, 2)
00930         try:
00931             if not port: raise ValueError("no digits")
00932             nport = int(port)
00933         except ValueError:
00934             nport = None
00935         return host, nport
00936     return host, defport

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitpasswd (   user)
splitpasswd('user:passwd') -> 'user', 'passwd'.

Definition at line 891 of file parse.py.

00891 
00892 def splitpasswd(user):
00893     """splitpasswd('user:passwd') -> 'user', 'passwd'."""
00894     global _passwdprog
00895     if _passwdprog is None:
00896         import re
00897         _passwdprog = re.compile('^([^:]*):(.*)$',re.S)
00898 
00899     match = _passwdprog.match(user)
00900     if match: return match.group(1, 2)
00901     return user, None
00902 
# splittag('/path#tag') --> '/path', 'tag'

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitport (   host)
splitport('host:port') --> 'host', 'port'.

Definition at line 904 of file parse.py.

00904 
00905 def splitport(host):
00906     """splitport('host:port') --> 'host', 'port'."""
00907     global _portprog
00908     if _portprog is None:
00909         import re
00910         _portprog = re.compile('^(.*):([0-9]+)$')
00911 
00912     match = _portprog.match(host)
00913     if match: return match.group(1, 2)
00914     return host, None

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitquery (   url)
splitquery('/path?query') --> '/path', 'query'.

Definition at line 938 of file parse.py.

00938 
00939 def splitquery(url):
00940     """splitquery('/path?query') --> '/path', 'query'."""
00941     global _queryprog
00942     if _queryprog is None:
00943         import re
00944         _queryprog = re.compile('^(.*)\?([^?]*)$')
00945 
00946     match = _queryprog.match(url)
00947     if match: return match.group(1, 2)
00948     return url, None

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splittag (   url)
splittag('/path#tag') --> '/path', 'tag'.

Definition at line 950 of file parse.py.

00950 
00951 def splittag(url):
00952     """splittag('/path#tag') --> '/path', 'tag'."""
00953     global _tagprog
00954     if _tagprog is None:
00955         import re
00956         _tagprog = re.compile('^(.*)#([^#]*)$')
00957 
00958     match = _tagprog.match(url)
00959     if match: return match.group(1, 2)
00960     return url, None

Here is the call graph for this function:

def urllib.parse.splittype (   url)
splittype('type:opaquestring') --> 'type', 'opaquestring'.

Definition at line 848 of file parse.py.

00848 
00849 def splittype(url):
00850     """splittype('type:opaquestring') --> 'type', 'opaquestring'."""
00851     global _typeprog
00852     if _typeprog is None:
00853         import re
00854         _typeprog = re.compile('^([^/:]+):')
00855 
00856     match = _typeprog.match(url)
00857     if match:
00858         scheme = match.group(1)
00859         return scheme.lower(), url[len(scheme) + 1:]
00860     return None, url

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splituser (   host)
splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'.

Definition at line 879 of file parse.py.

00879 
00880 def splituser(host):
00881     """splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'."""
00882     global _userprog
00883     if _userprog is None:
00884         import re
00885         _userprog = re.compile('^(.*)@(.*)$')
00886 
00887     match = _userprog.match(host)
00888     if match: return match.group(1, 2)
00889     return None, host

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.splitvalue (   attr)
splitvalue('attr=value') --> 'attr', 'value'.

Definition at line 968 of file parse.py.

00968 
00969 def splitvalue(attr):
00970     """splitvalue('attr=value') --> 'attr', 'value'."""
00971     global _valueprog
00972     if _valueprog is None:
00973         import re
00974         _valueprog = re.compile('^([^=]*)=(.*)$')
00975 
00976     match = _valueprog.match(attr)
00977     if match: return match.group(1, 2)
00978     return attr, None

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.to_bytes (   url)
to_bytes(u"URL") --> 'URL'.

Definition at line 826 of file parse.py.

00826 
00827 def to_bytes(url):
00828     """to_bytes(u"URL") --> 'URL'."""
00829     # Most URL schemes require ASCII. If that changes, the conversion
00830     # can be relaxed.
00831     # XXX get rid of to_bytes()
00832     if isinstance(url, str):
00833         try:
00834             url = url.encode("ASCII").decode()
00835         except UnicodeError:
00836             raise UnicodeError("URL " + repr(url) +
00837                                " contains non-ASCII characters")
00838     return url

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.unquote (   string,
  encoding = 'utf-8',
  errors = 'replace' 
)
Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.

unquote('abc%20def') -> 'abc def'.

Definition at line 488 of file parse.py.

00488 
00489 def unquote(string, encoding='utf-8', errors='replace'):
00490     """Replace %xx escapes by their single-character equivalent. The optional
00491     encoding and errors parameters specify how to decode percent-encoded
00492     sequences into Unicode characters, as accepted by the bytes.decode()
00493     method.
00494     By default, percent-encoded sequences are decoded with UTF-8, and invalid
00495     sequences are replaced by a placeholder character.
00496 
00497     unquote('abc%20def') -> 'abc def'.
00498     """
00499     if string == '':
00500         return string
00501     res = string.split('%')
00502     if len(res) == 1:
00503         return string
00504     if encoding is None:
00505         encoding = 'utf-8'
00506     if errors is None:
00507         errors = 'replace'
00508     # pct_sequence: contiguous sequence of percent-encoded bytes, decoded
00509     pct_sequence = b''
00510     string = res[0]
00511     for item in res[1:]:
00512         try:
00513             if not item:
00514                 raise ValueError
00515             pct_sequence += bytes.fromhex(item[:2])
00516             rest = item[2:]
00517             if not rest:
00518                 # This segment was just a single percent-encoded character.
00519                 # May be part of a sequence of code units, so delay decoding.
00520                 # (Stored in pct_sequence).
00521                 continue
00522         except ValueError:
00523             rest = '%' + item
00524         # Encountered non-percent-encoded characters. Flush the current
00525         # pct_sequence.
00526         string += pct_sequence.decode(encoding, errors) + rest
00527         pct_sequence = b''
00528     if pct_sequence:
00529         # Flush the final pct_sequence
00530         string += pct_sequence.decode(encoding, errors)
00531     return string

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.unquote_plus (   string,
  encoding = 'utf-8',
  errors = 'replace' 
)
Like unquote(), but also replace plus signs by spaces, as required for
unquoting HTML form values.

unquote_plus('%7e/abc+def') -> '~/abc def'

Definition at line 612 of file parse.py.

00612 
00613 def unquote_plus(string, encoding='utf-8', errors='replace'):
00614     """Like unquote(), but also replace plus signs by spaces, as required for
00615     unquoting HTML form values.
00616 
00617     unquote_plus('%7e/abc+def') -> '~/abc def'
00618     """
00619     string = string.replace('+', ' ')
00620     return unquote(string, encoding, errors)

Here is the call graph for this function:

Here is the caller graph for this function:

unquote_to_bytes('abc%20def') -> b'abc def'.

Definition at line 467 of file parse.py.

00467 
00468 def unquote_to_bytes(string):
00469     """unquote_to_bytes('abc%20def') -> b'abc def'."""
00470     # Note: strings are encoded as UTF-8. This is only an issue if it contains
00471     # unescaped non-ASCII characters, which URIs should not.
00472     if not string:
00473         # Is it a string-like object?
00474         string.split
00475         return b''
00476     if isinstance(string, str):
00477         string = string.encode('utf-8')
00478     res = string.split(b'%')
00479     if len(res) == 1:
00480         return string
00481     string = res[0]
00482     for item in res[1:]:
00483         try:
00484             string += bytes([int(item[:2], 16)]) + item[2:]
00485         except ValueError:
00486             string += b'%' + item
00487     return string

Here is the caller graph for this function:

def urllib.parse.unwrap (   url)
unwrap('<URL:type://host/path>') --> 'type://host/path'.

Definition at line 839 of file parse.py.

00839 
00840 def unwrap(url):
00841     """unwrap('<URL:type://host/path>') --> 'type://host/path'."""
00842     url = str(url).strip()
00843     if url[:1] == '<' and url[-1:] == '>':
00844         url = url[1:-1].strip()
00845     if url[:4] == 'URL:': url = url[4:].strip()
00846     return url

Here is the caller graph for this function:

def urllib.parse.urldefrag (   url)
Removes any existing fragment from URL.

Returns a tuple of the defragmented URL and the fragment.  If
the URL contained no fragments, the second element is the
empty string.

Definition at line 451 of file parse.py.

00451 
00452 def urldefrag(url):
00453     """Removes any existing fragment from URL.
00454 
00455     Returns a tuple of the defragmented URL and the fragment.  If
00456     the URL contained no fragments, the second element is the
00457     empty string.
00458     """
00459     url, _coerce_result = _coerce_args(url)
00460     if '#' in url:
00461         s, n, p, a, q, frag = urlparse(url)
00462         defrag = urlunparse((s, n, p, a, q, ''))
00463     else:
00464         frag = ''
00465         defrag = url
00466     return _coerce_result(DefragResult(defrag, frag))

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urlencode (   query,
  doseq = False,
  safe = '',
  encoding = None,
  errors = None 
)
Encode a sequence of two-element tuples or dictionary into a URL query string.

If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.

The query arg may be either a string or a bytes type. When query arg is a
string, the safe, encoding and error parameters are sent the quote_plus for
encoding.

Definition at line 733 of file parse.py.

00733 
00734 def urlencode(query, doseq=False, safe='', encoding=None, errors=None):
00735     """Encode a sequence of two-element tuples or dictionary into a URL query string.
00736 
00737     If any values in the query arg are sequences and doseq is true, each
00738     sequence element is converted to a separate parameter.
00739 
00740     If the query arg is a sequence of two-element tuples, the order of the
00741     parameters in the output will match the order of parameters in the
00742     input.
00743 
00744     The query arg may be either a string or a bytes type. When query arg is a
00745     string, the safe, encoding and error parameters are sent the quote_plus for
00746     encoding.
00747     """
00748 
00749     if hasattr(query, "items"):
00750         query = query.items()
00751     else:
00752         # It's a bother at times that strings and string-like objects are
00753         # sequences.
00754         try:
00755             # non-sequence items should not work with len()
00756             # non-empty strings will fail this
00757             if len(query) and not isinstance(query[0], tuple):
00758                 raise TypeError
00759             # Zero-length sequences of all types will get here and succeed,
00760             # but that's a minor nit.  Since the original implementation
00761             # allowed empty dicts that type of behavior probably should be
00762             # preserved for consistency
00763         except TypeError:
00764             ty, va, tb = sys.exc_info()
00765             raise TypeError("not a valid non-string sequence "
00766                             "or mapping object").with_traceback(tb)
00767 
00768     l = []
00769     if not doseq:
00770         for k, v in query:
00771             if isinstance(k, bytes):
00772                 k = quote_plus(k, safe)
00773             else:
00774                 k = quote_plus(str(k), safe, encoding, errors)
00775 
00776             if isinstance(v, bytes):
00777                 v = quote_plus(v, safe)
00778             else:
00779                 v = quote_plus(str(v), safe, encoding, errors)
00780             l.append(k + '=' + v)
00781     else:
00782         for k, v in query:
00783             if isinstance(k, bytes):
00784                 k = quote_plus(k, safe)
00785             else:
00786                 k = quote_plus(str(k), safe, encoding, errors)
00787 
00788             if isinstance(v, bytes):
00789                 v = quote_plus(v, safe)
00790                 l.append(k + '=' + v)
00791             elif isinstance(v, str):
00792                 v = quote_plus(v, safe, encoding, errors)
00793                 l.append(k + '=' + v)
00794             else:
00795                 try:
00796                     # Is this a sufficient test for sequence-ness?
00797                     x = len(v)
00798                 except TypeError:
00799                     # not a sequence
00800                     v = quote_plus(str(v), safe, encoding, errors)
00801                     l.append(k + '=' + v)
00802                 else:
00803                     # loop over the sequence
00804                     for elt in v:
00805                         if isinstance(elt, bytes):
00806                             elt = quote_plus(elt, safe)
00807                         else:
00808                             elt = quote_plus(str(elt), safe, encoding, errors)
00809                         l.append(k + '=' + elt)
00810     return '&'.join(l)
00811 
00812 # Utilities to parse URLs (most of these return None for missing parts):
00813 # unwrap('<URL:type://host/path>') --> 'type://host/path'
00814 # splittype('type:opaquestring') --> 'type', 'opaquestring'
00815 # splithost('//host[:port]/path') --> 'host[:port]', '/path'
00816 # splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'
00817 # splitpasswd('user:passwd') -> 'user', 'passwd'
00818 # splitport('host:port') --> 'host', 'port'
00819 # splitquery('/path?query') --> '/path', 'query'
00820 # splittag('/path#tag') --> '/path', 'tag'
00821 # splitattr('/path;attr1=value1;attr2=value2;...') ->
00822 #   '/path', ['attr1=value1', 'attr2=value2', ...]
00823 # splitvalue('attr=value') --> 'attr', 'value'
00824 # urllib.parse.unquote('abc%20def') -> 'abc def'
00825 # quote('abc def') -> 'abc%20def')

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urljoin (   base,
  url,
  allow_fragments = True 
)
Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter.

Definition at line 398 of file parse.py.

00398 
00399 def urljoin(base, url, allow_fragments=True):
00400     """Join a base URL and a possibly relative URL to form an absolute
00401     interpretation of the latter."""
00402     if not base:
00403         return url
00404     if not url:
00405         return base
00406     base, url, _coerce_result = _coerce_args(base, url)
00407     bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
00408             urlparse(base, '', allow_fragments)
00409     scheme, netloc, path, params, query, fragment = \
00410             urlparse(url, bscheme, allow_fragments)
00411     if scheme != bscheme or scheme not in uses_relative:
00412         return _coerce_result(url)
00413     if scheme in uses_netloc:
00414         if netloc:
00415             return _coerce_result(urlunparse((scheme, netloc, path,
00416                                               params, query, fragment)))
00417         netloc = bnetloc
00418     if path[:1] == '/':
00419         return _coerce_result(urlunparse((scheme, netloc, path,
00420                                           params, query, fragment)))
00421     if not path and not params:
00422         path = bpath
00423         params = bparams
00424         if not query:
00425             query = bquery
00426         return _coerce_result(urlunparse((scheme, netloc, path,
00427                                           params, query, fragment)))
00428     segments = bpath.split('/')[:-1] + path.split('/')
00429     # XXX The stuff below is bogus in various ways...
00430     if segments[-1] == '.':
00431         segments[-1] = ''
00432     while '.' in segments:
00433         segments.remove('.')
00434     while 1:
00435         i = 1
00436         n = len(segments) - 1
00437         while i < n:
00438             if (segments[i] == '..'
00439                 and segments[i-1] not in ('', '..')):
00440                 del segments[i-1:i+1]
00441                 break
00442             i = i+1
00443         else:
00444             break
00445     if segments == ['', '..']:
00446         segments[-1] = ''
00447     elif len(segments) >= 2 and segments[-1] == '..':
00448         segments[-2:] = ['']
00449     return _coerce_result(urlunparse((scheme, netloc, '/'.join(segments),
00450                                       params, query, fragment)))

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urlparse (   url,
  scheme = '',
  allow_fragments = True 
)
Parse a URL into 6 components:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes.

Definition at line 279 of file parse.py.

00279 
00280 def urlparse(url, scheme='', allow_fragments=True):
00281     """Parse a URL into 6 components:
00282     <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
00283     Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
00284     Note that we don't break the components up in smaller bits
00285     (e.g. netloc is a single string) and we don't expand % escapes."""
00286     url, scheme, _coerce_result = _coerce_args(url, scheme)
00287     tuple = urlsplit(url, scheme, allow_fragments)
00288     scheme, netloc, url, query, fragment = tuple
00289     if scheme in uses_params and ';' in url:
00290         url, params = _splitparams(url)
00291     else:
00292         params = ''
00293     result = ParseResult(scheme, netloc, url, params, query, fragment)
00294     return _coerce_result(result)

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urlsplit (   url,
  scheme = '',
  allow_fragments = True 
)
Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes.

Definition at line 312 of file parse.py.

00312 
00313 def urlsplit(url, scheme='', allow_fragments=True):
00314     """Parse a URL into 5 components:
00315     <scheme>://<netloc>/<path>?<query>#<fragment>
00316     Return a 5-tuple: (scheme, netloc, path, query, fragment).
00317     Note that we don't break the components up in smaller bits
00318     (e.g. netloc is a single string) and we don't expand % escapes."""
00319     url, scheme, _coerce_result = _coerce_args(url, scheme)
00320     allow_fragments = bool(allow_fragments)
00321     key = url, scheme, allow_fragments, type(url), type(scheme)
00322     cached = _parse_cache.get(key, None)
00323     if cached:
00324         return _coerce_result(cached)
00325     if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
00326         clear_cache()
00327     netloc = query = fragment = ''
00328     i = url.find(':')
00329     if i > 0:
00330         if url[:i] == 'http': # optimize the common case
00331             scheme = url[:i].lower()
00332             url = url[i+1:]
00333             if url[:2] == '//':
00334                 netloc, url = _splitnetloc(url, 2)
00335                 if (('[' in netloc and ']' not in netloc) or
00336                         (']' in netloc and '[' not in netloc)):
00337                     raise ValueError("Invalid IPv6 URL")
00338             if allow_fragments and '#' in url:
00339                 url, fragment = url.split('#', 1)
00340             if '?' in url:
00341                 url, query = url.split('?', 1)
00342             v = SplitResult(scheme, netloc, url, query, fragment)
00343             _parse_cache[key] = v
00344             return _coerce_result(v)
00345         for c in url[:i]:
00346             if c not in scheme_chars:
00347                 break
00348         else:
00349             try:
00350                 # make sure "url" is not actually a port number (in which case
00351                 # "scheme" is really part of the path
00352                 _testportnum = int(url[i+1:])
00353             except ValueError:
00354                 scheme, url = url[:i].lower(), url[i+1:]
00355 
00356     if url[:2] == '//':
00357         netloc, url = _splitnetloc(url, 2)
00358         if (('[' in netloc and ']' not in netloc) or
00359                 (']' in netloc and '[' not in netloc)):
00360             raise ValueError("Invalid IPv6 URL")
00361     if allow_fragments and scheme in uses_fragment and '#' in url:
00362         url, fragment = url.split('#', 1)
00363     if scheme in uses_query and '?' in url:
00364         url, query = url.split('?', 1)
00365     v = SplitResult(scheme, netloc, url, query, fragment)
00366     _parse_cache[key] = v
00367     return _coerce_result(v)

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urlunparse (   components)
Put a parsed URL back together again.  This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent).

Definition at line 368 of file parse.py.

00368 
00369 def urlunparse(components):
00370     """Put a parsed URL back together again.  This may result in a
00371     slightly different, but equivalent URL, if the URL that was parsed
00372     originally had redundant delimiters, e.g. a ? with an empty query
00373     (the draft states that these are equivalent)."""
00374     scheme, netloc, url, params, query, fragment, _coerce_result = (
00375                                                   _coerce_args(*components))
00376     if params:
00377         url = "%s;%s" % (url, params)
00378     return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))

Here is the call graph for this function:

Here is the caller graph for this function:

def urllib.parse.urlunsplit (   components)
Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The data argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the URL that
was parsed originally had unnecessary delimiters (for example, a ? with an
empty query; the RFC states that these are equivalent).

Definition at line 379 of file parse.py.

00379 
00380 def urlunsplit(components):
00381     """Combine the elements of a tuple as returned by urlsplit() into a
00382     complete URL as a string. The data argument can be any five-item iterable.
00383     This may result in a slightly different, but equivalent URL, if the URL that
00384     was parsed originally had unnecessary delimiters (for example, a ? with an
00385     empty query; the RFC states that these are equivalent)."""
00386     scheme, netloc, url, query, fragment, _coerce_result = (
00387                                           _coerce_args(*components))
00388     if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
00389         if url and url[:1] != '/': url = '/' + url
00390         url = '//' + (netloc or '') + url
00391     if scheme:
00392         url = scheme + ':' + url
00393     if query:
00394         url = url + '?' + query
00395     if fragment:
00396         url = url + '#' + fragment
00397     return _coerce_result(url)

Here is the call graph for this function:

Here is the caller graph for this function:


Variable Documentation

Initial value:
00001 ["urlparse", "urlunparse", "urljoin", "urldefrag",
00002            "urlsplit", "urlunsplit", "urlencode", "parse_qs",
00003            "parse_qsl", "quote", "quote_plus", "quote_from_bytes",
00004            "unquote", "unquote_plus", "unquote_to_bytes"]

Definition at line 33 of file parse.py.

Initial value:
00001 frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
00002                          b'abcdefghijklmnopqrstuvwxyz'
00003                          b'0123456789'
00004                          b'_.-')

Definition at line 621 of file parse.py.

Definition at line 625 of file parse.py.

tuple urllib.parse._DefragResultBase = namedtuple('DefragResult', 'url fragment')

Definition at line 218 of file parse.py.

Definition at line 861 of file parse.py.

Definition at line 80 of file parse.py.

Definition at line 81 of file parse.py.

Definition at line 915 of file parse.py.

Definition at line 66 of file parse.py.

tuple urllib.parse._ParseResultBase = namedtuple('ParseResult', 'scheme netloc path params query fragment')

Definition at line 220 of file parse.py.

Definition at line 890 of file parse.py.

Definition at line 903 of file parse.py.

Definition at line 937 of file parse.py.

Definition at line 626 of file parse.py.

tuple urllib.parse._SplitResultBase = namedtuple('SplitResult', 'scheme netloc path query fragment')

Definition at line 219 of file parse.py.

Definition at line 949 of file parse.py.

Definition at line 847 of file parse.py.

Definition at line 878 of file parse.py.

Definition at line 967 of file parse.py.

Definition at line 65 of file parse.py.

Initial value:
00001 ['gopher', 'hdl', 'mailto', 'news',
00002                     'telnet', 'wais', 'imap', 'snews', 'sip', 'sips']

Definition at line 47 of file parse.py.

Definition at line 225 of file parse.py.

Initial value:
00001 ('abcdefghijklmnopqrstuvwxyz'
00002                 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
00003                 '0123456789'
00004                 '+-.')

Definition at line 59 of file parse.py.

Initial value:
00001 ['ftp', 'hdl', 'http', 'gopher', 'news',
00002                  'nntp', 'wais', 'https', 'shttp', 'snews',
00003                  'file', 'prospero', '']

Definition at line 54 of file parse.py.

Initial value:
00001 ['ftp', 'http', 'gopher', 'nntp', 'telnet',
00002                'imap', 'wais', 'file', 'mms', 'https', 'shttp',
00003                'snews', 'prospero', 'rtsp', 'rtspu', 'rsync', '',
00004                'svn', 'svn+ssh', 'sftp', 'nfs', 'git', 'git+ssh']

Definition at line 43 of file parse.py.

Initial value:
00001 ['ftp', 'hdl', 'prospero', 'http', 'imap',
00002                'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
00003                'mms', '', 'sftp']

Definition at line 49 of file parse.py.

Initial value:
00001 ['http', 'wais', 'imap', 'https', 'shttp', 'mms',
00002               'gopher', 'rtsp', 'rtspu', 'sip', 'sips', '']

Definition at line 52 of file parse.py.

Initial value:
00001 ['ftp', 'http', 'gopher', 'nntp', 'imap',
00002                  'wais', 'file', 'https', 'shttp', 'mms',
00003                  'prospero', 'rtsp', 'rtspu', '', 'sftp',
00004                  'svn', 'svn+ssh']

Definition at line 39 of file parse.py.