Back to index

python3.2  3.2.2
Namespaces | Classes | Functions | Variables
encodings Namespace Reference

Namespaces

namespace  aliases
namespace  ascii
namespace  base64_codec
namespace  big5
namespace  big5hkscs
namespace  bz2_codec
namespace  charmap
namespace  cp037
namespace  cp1006
namespace  cp1026
namespace  cp1140
namespace  cp1250
namespace  cp1251
namespace  cp1252
namespace  cp1253
namespace  cp1254
namespace  cp1255
namespace  cp1256
namespace  cp1257
namespace  cp1258
namespace  cp424
namespace  cp437
namespace  cp500
namespace  cp720
namespace  cp737
namespace  cp775
namespace  cp850
namespace  cp852
namespace  cp855
namespace  cp856
namespace  cp857
namespace  cp858
namespace  cp860
namespace  cp861
namespace  cp862
namespace  cp863
namespace  cp864
namespace  cp865
namespace  cp866
namespace  cp869
namespace  cp874
namespace  cp875
namespace  cp932
namespace  cp949
namespace  cp950
namespace  euc_jis_2004
namespace  euc_jisx0213
namespace  euc_kr
namespace  gb18030
namespace  gb2312
namespace  gbk
namespace  hex_codec
namespace  hp_roman8
namespace  hz
namespace  idna
namespace  iso2022_kr
namespace  iso8859_1
namespace  iso8859_10
namespace  iso8859_11
namespace  iso8859_13
namespace  iso8859_14
namespace  iso8859_15
namespace  iso8859_16
namespace  iso8859_2
namespace  iso8859_3
namespace  iso8859_4
namespace  iso8859_5
namespace  iso8859_6
namespace  iso8859_7
namespace  iso8859_8
namespace  iso8859_9
namespace  johab
namespace  koi8_r
namespace  koi8_u
namespace  latin_1
namespace  mac_arabic
namespace  mac_centeuro
namespace  mac_croatian
namespace  mac_cyrillic
namespace  mac_farsi
namespace  mac_greek
namespace  mac_iceland
namespace  mac_latin2
namespace  mac_roman
namespace  mac_romanian
namespace  mac_turkish
namespace  mbcs
namespace  palmos
namespace  ptcp154
namespace  punycode
namespace  quopri_codec
namespace  raw_unicode_escape
namespace  rot_13
namespace  shift_jis
namespace  shift_jis_2004
namespace  shift_jisx0213
namespace  tis_620
namespace  undefined
namespace  unicode_escape
namespace  unicode_internal
namespace  utf_16
namespace  utf_16_be
namespace  utf_16_le
namespace  utf_32
namespace  utf_32_be
namespace  utf_32_le
namespace  utf_7
namespace  utf_8
namespace  utf_8_sig
namespace  uu_codec
namespace  zlib_codec

Classes

class  CodecRegistryError

Functions

def normalize_encoding
def search_function

Variables

dictionary _cache = {}
string _unknown = '--unknown--'
list _import_tail = ['*']
 _aliases = aliases.aliases

Detailed Description

Standard "encodings" Package

    Standard Python encoding modules are stored in this package
    directory.

    Codec modules must have names corresponding to normalized encoding
    names as defined in the normalize_encoding() function below, e.g.
    'utf-8' must be implemented by the module 'utf_8.py'.

    Each codec module must export the following interface:

    * getregentry() -> codecs.CodecInfo object
    The getregentry() API must return a CodecInfo object with encoder, decoder,
    incrementalencoder, incrementaldecoder, streamwriter and streamreader
    atttributes which adhere to the Python Codec Interface Standard.

    In addition, a module may optionally also define the following
    APIs which are then used by the package's codec search function:

    * getaliases() -> sequence of encoding name strings to use as aliases

    Alias names returned by getaliases() must be normalized encoding
    names as defined by normalize_encoding().

Written by Marc-Andre Lemburg (mal@lemburg.com).

(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.

Class Documentation

class encodings::CodecRegistryError

Definition at line 39 of file __init__.py.


Function Documentation

def encodings.normalize_encoding (   encoding)
Normalize an encoding name.

    Normalization works as follows: all non-alphanumeric
    characters except the dot used for Python package names are
    collapsed and replaced with a single underscore, e.g. '  -;#'
    becomes '_'. Leading and trailing underscores are removed.

    Note that encoding names should be ASCII only; if they do use
    non-ASCII characters, these must be Latin-1 compatible.

Definition at line 42 of file __init__.py.

00042 
00043 def normalize_encoding(encoding):
00044 
00045     """ Normalize an encoding name.
00046 
00047         Normalization works as follows: all non-alphanumeric
00048         characters except the dot used for Python package names are
00049         collapsed and replaced with a single underscore, e.g. '  -;#'
00050         becomes '_'. Leading and trailing underscores are removed.
00051 
00052         Note that encoding names should be ASCII only; if they do use
00053         non-ASCII characters, these must be Latin-1 compatible.
00054 
00055     """
00056     if isinstance(encoding, bytes):
00057         encoding = str(encoding, "ascii")
00058     chars = []
00059     punct = False
00060     for c in encoding:
00061         if c.isalnum() or c == '.':
00062             if punct and chars:
00063                 chars.append('_')
00064             chars.append(c)
00065             punct = False
00066         else:
00067             punct = True
00068     return ''.join(chars)

Here is the caller graph for this function:

def encodings.search_function (   encoding)

Definition at line 69 of file __init__.py.

00069 
00070 def search_function(encoding):
00071 
00072     # Cache lookup
00073     entry = _cache.get(encoding, _unknown)
00074     if entry is not _unknown:
00075         return entry
00076 
00077     # Import the module:
00078     #
00079     # First try to find an alias for the normalized encoding
00080     # name and lookup the module using the aliased name, then try to
00081     # lookup the module using the standard import scheme, i.e. first
00082     # try in the encodings package, then at top-level.
00083     #
00084     norm_encoding = normalize_encoding(encoding)
00085     aliased_encoding = _aliases.get(norm_encoding) or \
00086                        _aliases.get(norm_encoding.replace('.', '_'))
00087     if aliased_encoding is not None:
00088         modnames = [aliased_encoding,
00089                     norm_encoding]
00090     else:
00091         modnames = [norm_encoding]
00092     for modname in modnames:
00093         if not modname or '.' in modname:
00094             continue
00095         try:
00096             # Import is absolute to prevent the possibly malicious import of a
00097             # module with side-effects that is not in the 'encodings' package.
00098             mod = __import__('encodings.' + modname, fromlist=_import_tail,
00099                              level=0)
00100         except ImportError:
00101             pass
00102         else:
00103             break
00104     else:
00105         mod = None
00106 
00107     try:
00108         getregentry = mod.getregentry
00109     except AttributeError:
00110         # Not a codec module
00111         mod = None
00112 
00113     if mod is None:
00114         # Cache misses
00115         _cache[encoding] = None
00116         return None
00117 
00118     # Now ask the module for the registry entry
00119     entry = getregentry()
00120     if not isinstance(entry, codecs.CodecInfo):
00121         if not 4 <= len(entry) <= 7:
00122             raise CodecRegistryError('module "%s" (%s) failed to register'
00123                                      % (mod.__name__, mod.__file__))
00124         if not hasattr(entry[0], '__call__') or \
00125            not hasattr(entry[1], '__call__') or \
00126            (entry[2] is not None and not hasattr(entry[2], '__call__')) or \
00127            (entry[3] is not None and not hasattr(entry[3], '__call__')) or \
00128            (len(entry) > 4 and entry[4] is not None and not hasattr(entry[4], '__call__')) or \
00129            (len(entry) > 5 and entry[5] is not None and not hasattr(entry[5], '__call__')):
00130             raise CodecRegistryError('incompatible codecs in module "%s" (%s)'
00131                                      % (mod.__name__, mod.__file__))
00132         if len(entry)<7 or entry[6] is None:
00133             entry += (None,)*(6-len(entry)) + (mod.__name__.split(".", 1)[1],)
00134         entry = codecs.CodecInfo(*entry)
00135 
00136     # Cache the codec registry entry
00137     _cache[encoding] = entry
00138 
00139     # Register its aliases (without overwriting previously registered
00140     # aliases)
00141     try:
00142         codecaliases = mod.getaliases()
00143     except AttributeError:
00144         pass
00145     else:
00146         for alias in codecaliases:
00147             if alias not in _aliases:
00148                 _aliases[alias] = modname
00149 
00150     # Return the registry entry
00151     return entry
00152 
00153 # Register the search_function in the Python codec registry
00154 codecs.register(search_function)

Here is the call graph for this function:


Variable Documentation

Definition at line 37 of file __init__.py.

Definition at line 34 of file __init__.py.

list encodings._import_tail = ['*']

Definition at line 36 of file __init__.py.

string encodings._unknown = '--unknown--'

Definition at line 35 of file __init__.py.