Back to index

plone3  3.1.7
Functions | Variables
plone.i18n.normalizer.base Namespace Reference

Functions

def mapUnicode
def baseNormalize

Variables

dictionary mapping
string whitespace = ''
 allowed = string.ascii_letters+string.digits+string.punctuation+whitespace

Function Documentation

This method is used for normalization of unicode characters to the base ASCII
letters. Output is ASCII encoded string (or char) with only ASCII letters,
digits, punctuation and whitespace characters. Case is preserved.

  >>> baseNormalize(123)
  '123'

  >>> baseNormalize(u'\u0fff')
  'fff'

  >>> baseNormalize(u"foo\N{LATIN CAPITAL LETTER I WITH CARON}")
  'fooI'

Definition at line 39 of file base.py.

00039 
00040 def baseNormalize(text):
00041     """
00042     This method is used for normalization of unicode characters to the base ASCII
00043     letters. Output is ASCII encoded string (or char) with only ASCII letters,
00044     digits, punctuation and whitespace characters. Case is preserved.
00045 
00046       >>> baseNormalize(123)
00047       '123'
00048 
00049       >>> baseNormalize(u'\u0fff')
00050       'fff'
00051 
00052       >>> baseNormalize(u"foo\N{LATIN CAPITAL LETTER I WITH CARON}")
00053       'fooI'
00054     """
00055     if not isinstance(text, basestring):
00056         # This most surely ends up in something the user does not expect
00057         # to see. But at least it does not break.
00058         return repr(text)
00059 
00060     text = text.strip()
00061 
00062     res = u''
00063     for ch in text:
00064         if ch in allowed:
00065             # ASCII chars, digits etc. stay untouched
00066             res += ch
00067         else:
00068             ordinal = ord(ch)
00069             if mapping.has_key(ordinal):
00070                 # try to apply custom mappings
00071                 res += mapping.get(ordinal)
00072             elif decomposition(ch):
00073                 normalized = normalize('NFKD', ch).strip()
00074                 # string may contain non-letter chars too. Remove them
00075                 # string may result to more than one char
00076                 res += ''.join([c for c in normalized if c in allowed])
00077             else:
00078                 # hex string instead of unknown char
00079                 res += "%x" % ordinal
00080 
00081     return res.encode('ascii')

Here is the call graph for this function:

Here is the caller graph for this function:

def plone.i18n.normalizer.base.mapUnicode (   text,
  mapping = {} 
)
This method is used for replacement of special characters found in a mapping
before baseNormalize is applied.

Definition at line 22 of file base.py.

00022 
00023 def mapUnicode(text, mapping={}):
00024     """
00025     This method is used for replacement of special characters found in a mapping
00026     before baseNormalize is applied.
00027     """
00028     res = u''
00029     for ch in text:
00030         ordinal = ord(ch)
00031         if mapping.has_key(ordinal):
00032             # try to apply custom mappings
00033             res += mapping.get(ordinal)
00034         else:
00035             # else leave untouched
00036             res += ch
00037     # always apply base normalization
00038     return baseNormalize(res)

Here is the call graph for this function:

Here is the caller graph for this function:


Variable Documentation

plone.i18n.normalizer.base.allowed = string.ascii_letters+string.digits+string.punctuation+whitespace

Definition at line 20 of file base.py.

Initial value:
00001 {
00002 138 : 's', 140 : 'O', 142 : 'z', 154 : 's', 156 : 'o', 158 : 'z', 159 : 'Y',
00003 192 : 'A', 193 : 'A', 194 : 'A', 195 : 'a', 196 : 'A', 197 : 'Aa', 198 : 'E',
00004 199 : 'C', 200 : 'E', 201 : 'E', 202 : 'E', 203 : 'E', 204 : 'I', 205 : 'I',
00005 206 : 'I', 207 : 'I', 208 : 'Th', 209 : 'N', 210 : 'O', 211 : 'O', 212 : 'O',
00006 213 : 'O', 214 : 'O', 215 : 'x', 216 : 'O', 217 : 'U', 218 : 'U', 219 : 'U',
00007 220 : 'U', 222 : 'th', 221 : 'Y', 223 : 's', 224 : 'a', 225 : 'a', 226 : 'a',
00008 227 : 'a', 228 : 'ae', 229 : 'aa', 230 : 'ae', 231 : 'c', 232 : 'e', 233 : 'e',
00009 234 : 'e', 235 : 'e', 236 : 'i', 237 : 'i', 238 : 'i', 239 : 'i', 240 : 'th',
00010 241 : 'n', 242 : 'o', 243 : 'o', 244 : 'o', 245 : 'o', 246 : 'oe', 248 : 'oe',
00011 249 : 'u', 250 : 'u', 251 : 'u', 252 : 'u', 253 : 'y', 254 : 'Th', 255 : 'y' }

Definition at line 5 of file base.py.

Definition at line 19 of file base.py.