Back to index

plone3  3.1.7
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | Static Private Attributes
plone.app.portlets.portlets.feedparser._BaseHTMLProcessor Class Reference
Inheritance diagram for plone.app.portlets.portlets.feedparser._BaseHTMLProcessor:
Inheritance graph
[legend]
Collaboration diagram for plone.app.portlets.portlets.feedparser._BaseHTMLProcessor:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def reset
def feed
def normalize_attrs
def unknown_starttag
def unknown_endtag
def handle_charref
def handle_entityref
def handle_data
def handle_comment
def handle_pi
def handle_decl
def output

Public Attributes

 encoding
 pieces

Static Public Attributes

list elements_no_end_tag

Private Member Functions

def _shorttag_replace
def _scan_name

Static Private Attributes

tuple _new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*')

Detailed Description

Definition at line 1413 of file feedparser.py.


Constructor & Destructor Documentation

Definition at line 1417 of file feedparser.py.

01417 
01418     def __init__(self, encoding):
01419         self.encoding = encoding
01420         if _debug: sys.stderr.write('entering BaseHTMLProcessor, encoding=%s\n' % self.encoding)
01421         sgmllib.SGMLParser.__init__(self)
        

Here is the caller graph for this function:


Member Function Documentation

def plone.app.portlets.portlets.feedparser._BaseHTMLProcessor._scan_name (   self,
  i,
  declstartpos 
) [private]

Definition at line 1507 of file feedparser.py.

01507 
01508     def _scan_name(self, i, declstartpos):
01509         rawdata = self.rawdata
01510         n = len(rawdata)
01511         if i == n:
01512             return None, -1
01513         m = self._new_declname_match(rawdata, i)
01514         if m:
01515             s = m.group()
01516             name = s.strip()
01517             if (i + len(s)) == n:
01518                 return None, -1  # end of buffer
01519             return name.lower(), m.end()
01520         else:
01521             self.handle_data(rawdata)
01522 #            self.updatepos(declstartpos, i)
01523             return None, -1

Here is the call graph for this function:

Definition at line 1426 of file feedparser.py.

01426 
01427     def _shorttag_replace(self, match):
01428         tag = match.group(1)
01429         if tag in self.elements_no_end_tag:
01430             return '<' + tag + ' />'
01431         else:
01432             return '<' + tag + '></' + tag + '>'
        

Here is the caller graph for this function:

Definition at line 1433 of file feedparser.py.

01433 
01434     def feed(self, data):
01435         data = re.compile(r'<!((?!DOCTYPE|--|\[))', re.IGNORECASE).sub(r'&lt;!\1', data)
01436         #data = re.sub(r'<(\S+?)\s*?/>', self._shorttag_replace, data) # bug [ 1399464 ] Bad regexp for _shorttag_replace
01437         data = re.sub(r'<([^<\s]+?)\s*/>', self._shorttag_replace, data) 
01438         data = data.replace('&#39;', "'")
01439         data = data.replace('&#34;', '"')
01440         if self.encoding and type(data) == type(u''):
01441             data = data.encode(self.encoding)
01442         sgmllib.SGMLParser.feed(self, data)

Here is the call graph for this function:

Definition at line 1472 of file feedparser.py.

01472 
01473     def handle_charref(self, ref):
01474         # called for each character reference, e.g. for '&#160;', ref will be '160'
01475         # Reconstruct the original character reference.
01476         self.pieces.append('&#%(ref)s;' % locals())
        

Definition at line 1489 of file feedparser.py.

01489 
01490     def handle_comment(self, text):
01491         # called for each HTML comment, e.g. <!-- insert Javascript code here -->
01492         # Reconstruct the original comment.
01493         self.pieces.append('<!--%(text)s-->' % locals())
        

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer.

Definition at line 1482 of file feedparser.py.

01482 
01483     def handle_data(self, text):
01484         # called for each block of plain text, i.e. outside of any tag and
01485         # not containing any character or entity references
01486         # Store the original text verbatim.
01487         if _debug: sys.stderr.write('_BaseHTMLProcessor, handle_text, text=%s\n' % text)
01488         self.pieces.append(text)
        

Here is the caller graph for this function:

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer.

Definition at line 1499 of file feedparser.py.

01499 
01500     def handle_decl(self, text):
01501         # called for the DOCTYPE, if present, e.g.
01502         # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
01503         #     "http://www.w3.org/TR/html4/loose.dtd">
01504         # Reconstruct original DOCTYPE
01505         self.pieces.append('<!%(text)s>' % locals())
        

Definition at line 1477 of file feedparser.py.

01477 
01478     def handle_entityref(self, ref):
01479         # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
01480         # Reconstruct the original entity reference.
01481         self.pieces.append('&%(ref)s;' % locals())

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer.

Definition at line 1494 of file feedparser.py.

01494 
01495     def handle_pi(self, text):
01496         # called for each processing instruction, e.g. <?instruction>
01497         # Reconstruct original processing instruction.
01498         self.pieces.append('<?%(text)s>' % locals())

Definition at line 1443 of file feedparser.py.

01443 
01444     def normalize_attrs(self, attrs):
01445         # utility method to be called by descendants
01446         attrs = [(k.lower(), v) for k, v in attrs]
01447         attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
01448         return attrs

Here is the caller graph for this function:

Return processed HTML as a single string

Definition at line 1524 of file feedparser.py.

01524 
01525     def output(self):
01526         '''Return processed HTML as a single string'''
01527         return ''.join([str(p) for p in self.pieces])

Here is the caller graph for this function:

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer.

Definition at line 1422 of file feedparser.py.

01422 
01423     def reset(self):
01424         self.pieces = []
01425         sgmllib.SGMLParser.reset(self)

Here is the caller graph for this function:

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer.

Definition at line 1466 of file feedparser.py.

01466 
01467     def unknown_endtag(self, tag):
01468         # called for each end tag, e.g. for </pre>, tag will be 'pre'
01469         # Reconstruct the original end tag.
01470         if tag not in self.elements_no_end_tag:
01471             self.pieces.append("</%(tag)s>" % locals())

Reimplemented in plone.app.portlets.portlets.feedparser._HTMLSanitizer, and plone.app.portlets.portlets.feedparser._RelativeURIResolver.

Definition at line 1449 of file feedparser.py.

01449 
01450     def unknown_starttag(self, tag, attrs):
01451         # called for each start tag
01452         # attrs is a list of (attr, value) tuples
01453         # e.g. for <pre class='screen'>, tag='pre', attrs=[('class', 'screen')]
01454         if _debug: sys.stderr.write('_BaseHTMLProcessor, unknown_starttag, tag=%s\n' % tag)
01455         uattrs = []
01456         # thanks to Kevin Marks for this breathtaking hack to deal with (valid) high-bit attribute values in UTF-8 feeds
01457         for key, value in attrs:
01458             if type(value) != type(u''):
01459                 value = unicode(value, self.encoding)
01460             uattrs.append((unicode(key, self.encoding), value))
01461         strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in uattrs]).encode(self.encoding)
01462         if tag in self.elements_no_end_tag:
01463             self.pieces.append('<%(tag)s%(strattrs)s />' % locals())
01464         else:
01465             self.pieces.append('<%(tag)s%(strattrs)s>' % locals())

Here is the call graph for this function:


Member Data Documentation

tuple plone.app.portlets.portlets.feedparser._BaseHTMLProcessor._new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*') [static, private]

Definition at line 1506 of file feedparser.py.

Initial value:
['area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
      'img', 'input', 'isindex', 'link', 'meta', 'param']

Definition at line 1414 of file feedparser.py.

Definition at line 1418 of file feedparser.py.

Definition at line 1423 of file feedparser.py.


The documentation for this class was generated from the following file: