Back to index

plone3  3.1.7
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions
kss.core.BeautifulSoup.BeautifulStoneSoup Class Reference
Inheritance diagram for kss.core.BeautifulSoup.BeautifulStoneSoup:
Inheritance graph
[legend]
Collaboration diagram for kss.core.BeautifulSoup.BeautifulStoneSoup:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def __getattr__
def isSelfClosingTag
def reset
def popTag
def pushTag
def endData
def unknown_starttag
def unknown_endtag
def handle_data
def handle_pi
def handle_comment
def handle_charref
def handle_entityref
def handle_decl
def parse_declaration
def setup
def replaceWith
def extract
def insert
def findNext
def findAllNext
def findNextSibling
def findNextSiblings
def findPrevious
def findAllPrevious
def findPreviousSibling
def findPreviousSiblings
def findParent
def findParents
def nextGenerator
def nextSiblingGenerator
def previousGenerator
def previousSiblingGenerator
def parentGenerator
def substituteEncoding
def toEncoding

Public Attributes

 parseOnlyThese
 fromEncoding
 smartQuotesTo
 convertHTMLEntities
 convertXMLEntities
 HTML_ENTITIES
 XML_ENTITIES
 instanceSelfClosingTags
 markup
 markupMassage
 originalEncoding
 hidden
 currentData
 currentTag
 tagStack
 quoteStack
 previous
 literal
 parent
 next
 previousSibling
 nextSibling

Static Public Attributes

dictionary SELF_CLOSING_TAGS = {}
dictionary NESTABLE_TAGS = {}
dictionary RESET_NESTING_TAGS = {}
dictionary QUOTE_TAGS = {}
list MARKUP_MASSAGE
string ROOT_TAG_NAME = u'[document]'
string HTML_ENTITIES = "html"
string XML_ENTITIES = "xml"
list ALL_ENTITIES = [HTML_ENTITIES, XML_ENTITIES]
 fetchNextSiblings = findNextSiblings
 fetchPrevious = findAllPrevious
 fetchPreviousSiblings = findPreviousSiblings
 fetchParents = findParents

Private Member Functions

def _feed
def _popToTag
def _smartPop
def _toStringSubclass

Detailed Description

This class contains the basic parser and search code. It defines
a parser that knows nothing about tag behavior except for the
following:
   
  You can't close a tag without closing all the tags it encloses.
  That is, "<foo><bar></foo>" actually means
  "<foo><bar></bar></foo>".

[Another possible explanation is "<foo><bar /></foo>", but since
this class defines no SELF_CLOSING_TAGS, it will never use that
explanation.]

This class is useful for parsing XML or made-up markup languages,
or when BeautifulSoup makes an assumption counter to what you were
expecting.

Definition at line 885 of file BeautifulSoup.py.


Constructor & Destructor Documentation

def kss.core.BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser. 

sgmllib will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
sgmllib, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke sgmllib:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 922 of file BeautifulSoup.py.

00922 
00923                  convertEntities=None, selfClosingTags=None):
00924         """The Soup object is initialized as the 'root tag', and the
00925         provided markup (which can be a string or a file-like object)
00926         is fed into the underlying parser. 
00927 
00928         sgmllib will process most bad HTML, and the BeautifulSoup
00929         class has some tricks for dealing with some HTML that kills
00930         sgmllib, but Beautiful Soup can nonetheless choke or lose data
00931         if your data uses self-closing tags or declarations
00932         incorrectly.
00933 
00934         By default, Beautiful Soup uses regexes to sanitize input,
00935         avoiding the vast majority of these problems. If the problems
00936         don't apply to you, pass in False for markupMassage, and
00937         you'll get better performance.
00938 
00939         The default parser massage techniques fix the two most common
00940         instances of invalid HTML that choke sgmllib:
00941 
00942          <br/> (No space between name of closing tag and tag close)
00943          <! --Comment--> (Extraneous whitespace in declaration)
00944 
00945         You can pass in a custom list of (RE object, replace method)
00946         tuples to get Beautiful Soup to scrub your input the way you
00947         want."""
00948 
00949         self.parseOnlyThese = parseOnlyThese
00950         self.fromEncoding = fromEncoding
00951         self.smartQuotesTo = smartQuotesTo
00952 
00953         if convertEntities:
00954             # It doesn't make sense to convert encoded characters to
00955             # entities even while you're converting entities to Unicode.
00956             # Just convert it all to Unicode.
00957             self.smartQuotesTo = None
00958 
00959         if isList(convertEntities):
00960             self.convertHTMLEntities = self.HTML_ENTITIES in convertEntities
00961             self.convertXMLEntities = self.XML_ENTITIES in convertEntities
00962         else:
00963             self.convertHTMLEntities = self.HTML_ENTITIES == convertEntities
00964             self.convertXMLEntities = self.XML_ENTITIES == convertEntities
00965 
00966         self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
00967         SGMLParser.__init__(self)
00968             
00969         if hasattr(markup, 'read'):        # It's a file-type object.
00970             markup = markup.read()
00971         self.markup = markup
00972         self.markupMassage = markupMassage
00973         try:
00974             self._feed()
00975         except StopParsing:
00976             pass
00977         self.markup = None                 # The markup can now be GCed

Here is the caller graph for this function:


Member Function Documentation

This method routes method call requests to either the SGMLParser
superclass or the Tag superclass, depending on the method name.

Definition at line 1005 of file BeautifulSoup.py.

01005 
01006     def __getattr__(self, methodName):
01007         """This method routes method call requests to either the SGMLParser
01008         superclass or the Tag superclass, depending on the method name."""
01009         #print "__getattr__ called on %s.%s" % (self.__class__, methodName)
01010 
01011         if methodName.find('start_') == 0 or methodName.find('end_') == 0 \
01012                or methodName.find('do_') == 0:
01013             return SGMLParser.__getattr__(self, methodName)
01014         elif methodName.find('__') != 0:
01015             return Tag.__getattr__(self, methodName)
01016         else:
01017             raise AttributeError

Here is the caller graph for this function:

def kss.core.BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None 
) [private]

Definition at line 978 of file BeautifulSoup.py.

00978 
00979     def _feed(self, inDocumentEncoding=None):
00980         # Convert the document to Unicode.
00981         markup = self.markup
00982         if isinstance(markup, unicode):
00983             if not hasattr(self, 'originalEncoding'):
00984                 self.originalEncoding = None
00985         else:
00986             dammit = UnicodeDammit\
00987                      (markup, [self.fromEncoding, inDocumentEncoding],
00988                       smartQuotesTo=self.smartQuotesTo)
00989             markup = dammit.unicode
00990             self.originalEncoding = dammit.originalEncoding
00991         if markup:
00992             if self.markupMassage:
00993                 if not isList(self.markupMassage):
00994                     self.markupMassage = self.MARKUP_MASSAGE            
00995                 for fix, m in self.markupMassage:
00996                     markup = fix.sub(m, markup)
00997         self.reset()
00998 
00999         SGMLParser.feed(self, markup or "")
01000         SGMLParser.close(self)
01001         # Close out any unfinished strings and close all the open tags.
01002         self.endData()
01003         while self.currentTag.name != self.ROOT_TAG_NAME:
01004             self.popTag()

def kss.core.BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
) [private]
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1078 of file BeautifulSoup.py.

01078 
01079     def _popToTag(self, name, inclusivePop=True):
01080         """Pops the tag stack up to and including the most recent
01081         instance of the given tag. If inclusivePop is false, pops the tag
01082         stack up to but *not* including the most recent instqance of
01083         the given tag."""
01084         #print "Popping to %s" % name
01085         if name == self.ROOT_TAG_NAME:
01086             return            
01087 
01088         numPops = 0
01089         mostRecentTag = None
01090         for i in range(len(self.tagStack)-1, 0, -1):
01091             if name == self.tagStack[i].name:
01092                 numPops = len(self.tagStack)-i
01093                 break
01094         if not inclusivePop:
01095             numPops = numPops - 1
01096 
01097         for i in range(0, numPops):
01098             mostRecentTag = self.popTag()
01099         return mostRecentTag    

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
) [private]
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar<p> should pop to 'p', not 'b'.
 <p>Foo<table>Bar<p> should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar<p> should pop to 'tr', not 'p'.
 <p>Foo<b>Bar<p> should pop to 'p', not 'b'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1100 of file BeautifulSoup.py.

01100 
01101     def _smartPop(self, name):
01102 
01103         """We need to pop up to the previous tag of this type, unless
01104         one of this tag's nesting reset triggers comes between this
01105         tag and the previous tag of this type, OR unless this tag is a
01106         generic nesting trigger and another generic nesting trigger
01107         comes between this tag and the previous tag of this type.
01108 
01109         Examples:
01110          <p>Foo<b>Bar<p> should pop to 'p', not 'b'.
01111          <p>Foo<table>Bar<p> should pop to 'table', not 'p'.
01112          <p>Foo<table><tr>Bar<p> should pop to 'tr', not 'p'.
01113          <p>Foo<b>Bar<p> should pop to 'p', not 'b'.
01114 
01115          <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
01116          <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
01117          <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
01118         """
01119 
01120         nestingResetTriggers = self.NESTABLE_TAGS.get(name)
01121         isNestable = nestingResetTriggers != None
01122         isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
01123         popTo = None
01124         inclusive = True
01125         for i in range(len(self.tagStack)-1, 0, -1):
01126             p = self.tagStack[i]
01127             if (not p or p.name == name) and not isNestable:
01128                 #Non-nestable tags get popped to the top or to their
01129                 #last occurance.
01130                 popTo = name
01131                 break
01132             if (nestingResetTriggers != None
01133                 and p.name in nestingResetTriggers) \
01134                 or (nestingResetTriggers == None and isResetNesting
01135                     and self.RESET_NESTING_TAGS.has_key(p.name)):
01136                 
01137                 #If we encounter one of the nesting reset triggers
01138                 #peculiar to this tag, or we encounter another tag
01139                 #that causes nesting to reset, pop up to but not
01140                 #including that tag.
01141                 popTo = p.name
01142                 inclusive = False
01143                 break
01144             p = p.parent
01145         if popTo:
01146             self._popToTag(popTo, inclusive)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.BeautifulStoneSoup._toStringSubclass (   self,
  text,
  subclass 
) [private]
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1200 of file BeautifulSoup.py.

01200 
01201     def _toStringSubclass(self, text, subclass):
01202         """Adds a certain piece of text to the tree as a NavigableString
01203         subclass."""
01204         self.endData()
01205         self.handle_data(text)
01206         self.endData(subclass)

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 1055 of file BeautifulSoup.py.

01055 
01056     def endData(self, containerClass=NavigableString):
01057         if self.currentData:
01058             currentData = ''.join(self.currentData)
01059             if currentData.endswith('<') and self.convertHTMLEntities:
01060                 currentData = currentData[:-1] + '&lt;'
01061             if not currentData.strip():
01062                 if '\n' in currentData:
01063                     currentData = '\n'
01064                 else:
01065                     currentData = ' '
01066             self.currentData = []
01067             if self.parseOnlyThese and len(self.tagStack) <= 1 and \
01068                    (not self.parseOnlyThese.text or \
01069                     not self.parseOnlyThese.search(currentData)):
01070                 return
01071             o = containerClass(currentData)
01072             o.setup(self.currentTag, self.previous)
01073             if self.previous:
01074                 self.previous.next = o
01075             self.previous = o
01076             self.currentTag.contents.append(o)
01077 

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.extract (   self) [inherited]
Destructively rips this element out of the tree.

Definition at line 102 of file BeautifulSoup.py.

00102 
00103     def extract(self):
00104         """Destructively rips this element out of the tree."""        
00105         if self.parent:
00106             try:
00107                 self.parent.contents.remove(self)
00108             except ValueError:
00109                 pass
00110 
00111         #Find the two elements that would be next to each other if
00112         #this element (and any children) hadn't been parsed. Connect
00113         #the two.        
00114         lastChild = self._lastRecursiveChild()
00115         nextElement = lastChild.next
00116 
00117         if self.previous:
00118             self.previous.next = nextElement
00119         if nextElement:
00120             nextElement.previous = self.previous
00121         self.previous = None
00122         lastChild.next = None
00123 
00124         self.parent = None        
00125         if self.previousSibling:
00126             self.previousSibling.nextSibling = self.nextSibling
00127         if self.nextSibling:
00128             self.nextSibling.previousSibling = self.previousSibling
00129         self.previousSibling = self.nextSibling = None       

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.findAllNext (   self,
  name = None,
  attrs = {},
  text = None,
  limit = None,
  kwargs 
) [inherited]
Returns all items that match the given criteria and appear
before after Tag in the document.

Definition at line 203 of file BeautifulSoup.py.

00203 
00204                     **kwargs):
00205         """Returns all items that match the given criteria and appear
00206         before after Tag in the document."""
00207         return self._findAll(name, attrs, text, limit, self.nextGenerator)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.findAllPrevious (   self,
  name = None,
  attrs = {},
  text = None,
  limit = None,
  kwargs 
) [inherited]
Returns all items that match the given criteria and appear
before this Tag in the document.

Definition at line 228 of file BeautifulSoup.py.

00228 
00229                         **kwargs):
00230         """Returns all items that match the given criteria and appear
00231         before this Tag in the document."""
00232         return self._findAll(name, attrs, text, limit, self.previousGenerator,
                           **kwargs)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.findNext (   self,
  name = None,
  attrs = {},
  text = None,
  kwargs 
) [inherited]
Returns the first item that matches the given criteria and
appears after this Tag in the document.

Definition at line 197 of file BeautifulSoup.py.

00197 
00198     def findNext(self, name=None, attrs={}, text=None, **kwargs):
00199         """Returns the first item that matches the given criteria and
00200         appears after this Tag in the document."""
00201         return self._findOne(self.findAllNext, name, attrs, text, **kwargs)

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.findNextSibling (   self,
  name = None,
  attrs = {},
  text = None,
  kwargs 
) [inherited]
Returns the closest sibling to this Tag that matches the
given criteria and appears after this Tag in the document.

Definition at line 208 of file BeautifulSoup.py.

00208 
00209     def findNextSibling(self, name=None, attrs={}, text=None, **kwargs):
00210         """Returns the closest sibling to this Tag that matches the
00211         given criteria and appears after this Tag in the document."""
00212         return self._findOne(self.findNextSiblings, name, attrs, text,
00213                              **kwargs)

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.findNextSiblings (   self,
  name = None,
  attrs = {},
  text = None,
  limit = None,
  kwargs 
) [inherited]
Returns the siblings of this Tag that match the given
criteria and appear after this Tag in the document.

Definition at line 215 of file BeautifulSoup.py.

00215 
00216                          **kwargs):
00217         """Returns the siblings of this Tag that match the given
00218         criteria and appear after this Tag in the document."""
00219         return self._findAll(name, attrs, text, limit,
                             self.nextSiblingGenerator, **kwargs)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.findParent (   self,
  name = None,
  attrs = {},
  kwargs 
) [inherited]
Returns the closest parent of this Tag that matches the given
criteria.

Definition at line 249 of file BeautifulSoup.py.

00249 
00250     def findParent(self, name=None, attrs={}, **kwargs):
00251         """Returns the closest parent of this Tag that matches the given
00252         criteria."""
00253         # NOTE: We can't use _findOne because findParents takes a different
00254         # set of arguments.
00255         r = None
00256         l = self.findParents(name, attrs, 1)
00257         if l:
00258             r = l[0]
00259         return r

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.findParents (   self,
  name = None,
  attrs = {},
  limit = None,
  kwargs 
) [inherited]
Returns the parents of this Tag that match the given
criteria.

Definition at line 260 of file BeautifulSoup.py.

00260 
00261     def findParents(self, name=None, attrs={}, limit=None, **kwargs):
00262         """Returns the parents of this Tag that match the given
00263         criteria."""
00264 
00265         return self._findAll(name, attrs, None, limit, self.parentGenerator,
                             **kwargs)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.findPrevious (   self,
  name = None,
  attrs = {},
  text = None,
  kwargs 
) [inherited]
Returns the first item that matches the given criteria and
appears before this Tag in the document.

Definition at line 222 of file BeautifulSoup.py.

00222 
00223     def findPrevious(self, name=None, attrs={}, text=None, **kwargs):
00224         """Returns the first item that matches the given criteria and
00225         appears before this Tag in the document."""
00226         return self._findOne(self.findAllPrevious, name, attrs, text, **kwargs)

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.findPreviousSibling (   self,
  name = None,
  attrs = {},
  text = None,
  kwargs 
) [inherited]
Returns the closest sibling to this Tag that matches the
given criteria and appears before this Tag in the document.

Definition at line 235 of file BeautifulSoup.py.

00235 
00236     def findPreviousSibling(self, name=None, attrs={}, text=None, **kwargs):
00237         """Returns the closest sibling to this Tag that matches the
00238         given criteria and appears before this Tag in the document."""
00239         return self._findOne(self.findPreviousSiblings, name, attrs, text,
00240                              **kwargs)

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.findPreviousSiblings (   self,
  name = None,
  attrs = {},
  text = None,
  limit = None,
  kwargs 
) [inherited]
Returns the siblings of this Tag that match the given
criteria and appear before this Tag in the document.

Definition at line 242 of file BeautifulSoup.py.

00242 
00243                              limit=None, **kwargs):
00244         """Returns the siblings of this Tag that match the given
00245         criteria and appear before this Tag in the document."""
00246         return self._findAll(name, attrs, text, limit,
                             self.previousSiblingGenerator, **kwargs)

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 1219 of file BeautifulSoup.py.

01219 
01220     def handle_charref(self, ref):
01221         "Handle character references as data."
01222         if ref[0] == 'x':
01223             data = unichr(int(ref[1:],16))
01224         else:
01225             data = unichr(int(ref))
01226         
01227         if u'\x80' <= data <= u'\x9F':
01228             data = UnicodeDammit.subMSChar(chr(ord(data)), self.smartQuotesTo)
01229         elif not self.convertHTMLEntities and not self.convertXMLEntities:
01230             data = '&#%s;' % ref
01231 
01232         self.handle_data(data)

Here is the call graph for this function:

Definition at line 1215 of file BeautifulSoup.py.

01215 
01216     def handle_comment(self, text):
01217         "Handle comments as Comment objects."
01218         self._toStringSubclass(text, Comment)

Here is the call graph for this function:

Definition at line 1190 of file BeautifulSoup.py.

01190 
01191     def handle_data(self, data):
01192         if self.convertHTMLEntities:
01193             if data[0] == '&':
01194                 data = self.BARE_AMPERSAND.sub("&amp;",data)
01195             else:
01196                 data = data.replace('&','&amp;') \
01197                            .replace('<','&lt;') \
01198                            .replace('>','&gt;')
01199         self.currentData.append(data)

Here is the caller graph for this function:

Definition at line 1251 of file BeautifulSoup.py.

01251 
01252     def handle_decl(self, data):
01253         "Handle DOCTYPEs and the like as Declaration objects."
01254         self._toStringSubclass(data, Declaration)

Here is the call graph for this function:

Handle entity references as data, possibly converting known
HTML entity references to the corresponding Unicode
characters.

Definition at line 1233 of file BeautifulSoup.py.

01233 
01234     def handle_entityref(self, ref):
01235         """Handle entity references as data, possibly converting known
01236         HTML entity references to the corresponding Unicode
01237         characters."""
01238         replaceWithXMLEntity = self.convertXMLEntities and \
01239                                self.XML_ENTITIES_TO_CHARS.has_key(ref)
01240         if self.convertHTMLEntities or replaceWithXMLEntity:
01241             try:
01242                 data = unichr(name2codepoint[ref])
01243             except KeyError:
01244                 if replaceWithXMLEntity:
01245                     data = self.XML_ENTITIES_TO_CHARS.get(ref)
01246                 else:
01247                     data="&amp;%s" % ref
01248         else:
01249             data = '&%s;' % ref
01250         self.handle_data(data)
        

Here is the call graph for this function:

Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1207 of file BeautifulSoup.py.

01207 
01208     def handle_pi(self, text):
01209         """Handle a processing instruction as a ProcessingInstruction
01210         object, possibly one with a %SOUP-ENCODING% slot into which an
01211         encoding will be plugged later."""
01212         if text[:3] == "xml":
01213             text = "xml version='1.0' encoding='%SOUP-ENCODING%'"
01214         self._toStringSubclass(text, ProcessingInstruction)

Here is the call graph for this function:

def kss.core.BeautifulSoup.PageElement.insert (   self,
  position,
  newChild 
) [inherited]

Definition at line 137 of file BeautifulSoup.py.

00137 
00138     def insert(self, position, newChild):
00139         if (isinstance(newChild, basestring)
00140             or isinstance(newChild, unicode)) \
00141             and not isinstance(newChild, NavigableString):
00142             newChild = NavigableString(newChild)        
00143 
00144         position =  min(position, len(self.contents))
00145         if hasattr(newChild, 'parent') and newChild.parent != None:
00146             # We're 'inserting' an element that's already one
00147             # of this object's children. 
00148             if newChild.parent == self:
00149                 index = self.find(newChild)
00150                 if index and index < position:
00151                     # Furthermore we're moving it further down the
00152                     # list of this object's children. That means that
00153                     # when we extract this element, our target index
00154                     # will jump down one.
00155                     position = position - 1
00156             newChild.extract()
00157             
00158         newChild.parent = self
00159         previousChild = None
00160         if position == 0:
00161             newChild.previousSibling = None
00162             newChild.previous = self
00163         else:
00164             previousChild = self.contents[position-1]
00165             newChild.previousSibling = previousChild
00166             newChild.previousSibling.nextSibling = newChild
00167             newChild.previous = previousChild._lastRecursiveChild()
00168         if newChild.previous:
00169             newChild.previous.next = newChild        
00170 
00171         newChildsLastElement = newChild._lastRecursiveChild()
00172 
00173         if position >= len(self.contents):
00174             newChild.nextSibling = None
00175             
00176             parent = self
00177             parentsNextSibling = None
00178             while not parentsNextSibling:
00179                 parentsNextSibling = parent.nextSibling
00180                 parent = parent.parent
00181                 if not parent: # This is the last element in the document.
00182                     break
00183             if parentsNextSibling:
00184                 newChildsLastElement.next = parentsNextSibling
00185             else:
00186                 newChildsLastElement.next = None
00187         else:
00188             nextChild = self.contents[position]            
00189             newChild.nextSibling = nextChild            
00190             if newChild.nextSibling:
00191                 newChild.nextSibling.previousSibling = newChild
00192             newChildsLastElement.next = nextChild
00193 
00194         if newChildsLastElement.next:
00195             newChildsLastElement.next.previous = newChildsLastElement
00196         self.contents.insert(position, newChild)

Here is the caller graph for this function:

Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1018 of file BeautifulSoup.py.

01018 
01019     def isSelfClosingTag(self, name):
01020         """Returns true iff the given string is the name of a
01021         self-closing tag according to this parser."""
01022         return self.SELF_CLOSING_TAGS.has_key(name) \
01023                or self.instanceSelfClosingTags.has_key(name)
            

Here is the caller graph for this function:

Definition at line 302 of file BeautifulSoup.py.

00302 
00303     def nextGenerator(self):
00304         i = self
00305         while i:
00306             i = i.next
00307             yield i

Here is the caller graph for this function:

Definition at line 308 of file BeautifulSoup.py.

00308 
00309     def nextSiblingGenerator(self):
00310         i = self
00311         while i:
00312             i = i.nextSibling
00313             yield i

Here is the caller graph for this function:

Definition at line 326 of file BeautifulSoup.py.

00326 
00327     def parentGenerator(self):
00328         i = self
00329         while i:
00330             i = i.parent
00331             yield i

Here is the caller graph for this function:

Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1255 of file BeautifulSoup.py.

01255 
01256     def parse_declaration(self, i):
01257         """Treat a bogus SGML declaration as raw data. Treat a CDATA
01258         declaration as a CData object."""
01259         j = None
01260         if self.rawdata[i:i+9] == '<![CDATA[':
01261              k = self.rawdata.find(']]>', i)
01262              if k == -1:
01263                  k = len(self.rawdata)
01264              data = self.rawdata[i+9:k]
01265              j = k+3
01266              self._toStringSubclass(data, CData)
01267         else:
01268             try:
01269                 j = SGMLParser.parse_declaration(self, i)
01270             except SGMLParseError:
01271                 toHandle = self.rawdata[i:]
01272                 self.handle_data(toHandle)
01273                 j = i + len(toHandle)
01274         return j

Here is the call graph for this function:

Reimplemented in kss.core.BeautifulSoup.BeautifulSOAP.

Definition at line 1034 of file BeautifulSoup.py.

01034 
01035     def popTag(self):
01036         tag = self.tagStack.pop()
01037         # Tags with just one string-owning child get the child as a
01038         # 'string' property, so that soup.tag.string is shorthand for
01039         # soup.tag.contents[0]
01040         if len(self.currentTag.contents) == 1 and \
01041            isinstance(self.currentTag.contents[0], NavigableString):
01042             self.currentTag.string = self.currentTag.contents[0]
01043 
01044         #print "Pop", tag.name
01045         if self.tagStack:
01046             self.currentTag = self.tagStack[-1]
01047         return self.currentTag

Here is the caller graph for this function:

Definition at line 314 of file BeautifulSoup.py.

00314 
00315     def previousGenerator(self):
00316         i = self
00317         while i:
00318             i = i.previous
00319             yield i

Here is the caller graph for this function:

Definition at line 320 of file BeautifulSoup.py.

00320 
00321     def previousSiblingGenerator(self):
00322         i = self
00323         while i:
00324             i = i.previousSibling
00325             yield i

Here is the caller graph for this function:

Definition at line 1048 of file BeautifulSoup.py.

01048 
01049     def pushTag(self, tag):
01050         #print "Push", tag.name
01051         if self.currentTag:
01052             self.currentTag.append(tag)
01053         self.tagStack.append(tag)
01054         self.currentTag = self.tagStack[-1]

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.replaceWith (   self,
  replaceWith 
) [inherited]

Definition at line 88 of file BeautifulSoup.py.

00088 
00089     def replaceWith(self, replaceWith):        
00090         oldParent = self.parent
00091         myIndex = self.parent.contents.index(self)
00092         if hasattr(replaceWith, 'parent') and replaceWith.parent == self.parent:
00093             # We're replacing this element with one of its siblings.
00094             index = self.parent.contents.index(replaceWith)
00095             if index and index < myIndex:
00096                 # Furthermore, it comes before this element. That
00097                 # means that when we extract it, the index of this
00098                 # element will change.
00099                 myIndex = myIndex - 1
00100         self.extract()        
00101         oldParent.insert(myIndex, replaceWith)
        

Here is the call graph for this function:

Definition at line 1024 of file BeautifulSoup.py.

01024 
01025     def reset(self):
01026         Tag.__init__(self, self, self.ROOT_TAG_NAME)
01027         self.hidden = 1
01028         SGMLParser.reset(self)
01029         self.currentData = []
01030         self.currentTag = None
01031         self.tagStack = []
01032         self.quoteStack = []
01033         self.pushTag(self)
    

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.setup (   self,
  parent = None,
  previous = None 
) [inherited]
Sets up the initial relations between this element and
other elements.

Definition at line 76 of file BeautifulSoup.py.

00076 
00077     def setup(self, parent=None, previous=None):
00078         """Sets up the initial relations between this element and
00079         other elements."""        
00080         self.parent = parent
00081         self.previous = previous
00082         self.next = None
00083         self.previousSibling = None
00084         self.nextSibling = None
00085         if self.parent and self.parent.contents:
00086             self.previousSibling = self.parent.contents[-1]
00087             self.previousSibling.nextSibling = self

def kss.core.BeautifulSoup.PageElement.substituteEncoding (   self,
  str,
  encoding = None 
) [inherited]

Definition at line 333 of file BeautifulSoup.py.

00333 
00334     def substituteEncoding(self, str, encoding=None):
00335         encoding = encoding or "utf-8"
00336         return str.replace("%SOUP-ENCODING%", encoding)    

Here is the caller graph for this function:

def kss.core.BeautifulSoup.PageElement.toEncoding (   self,
  s,
  encoding = None 
) [inherited]
Encodes an object to a string in some encoding, or to Unicode.
.

Definition at line 337 of file BeautifulSoup.py.

00337 
00338     def toEncoding(self, s, encoding=None):
00339         """Encodes an object to a string in some encoding, or to Unicode.
00340         ."""
00341         if isinstance(s, unicode):
00342             if encoding:
00343                 s = s.encode(encoding)
00344         elif isinstance(s, str):
00345             if encoding:
00346                 s = s.encode(encoding)
00347             else:
00348                 s = unicode(s)
00349         else:
00350             if encoding:
00351                 s  = self.toEncoding(str(s), encoding)
00352             else:
00353                 s = unicode(s)
00354         return s

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 1177 of file BeautifulSoup.py.

01177 
01178     def unknown_endtag(self, name):
01179         #print "End tag %s" % name
01180         if self.quoteStack and self.quoteStack[-1] != name:
01181             #This is not a real end tag.
01182             #print "</%s> is not real!" % name
01183             self.currentData.append('</%s>' % name)
01184             return
01185         self.endData()
01186         self._popToTag(name)
01187         if self.quoteStack and self.quoteStack[-1] == name:
01188             self.quoteStack.pop()
01189             self.literal = (len(self.quoteStack) > 0)

Here is the call graph for this function:

Here is the caller graph for this function:

def kss.core.BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1147 of file BeautifulSoup.py.

01147 
01148     def unknown_starttag(self, name, attrs, selfClosing=0):
01149         #print "Start tag %s: %s" % (name, attrs)
01150         if self.quoteStack:
01151             #This is not a real tag.
01152             #print "<%s> is not real!" % name
01153             attrs = ''.join(map(lambda(x, y): ' %s="%s"' % (x, y), attrs))
01154             self.currentData.append('<%s%s>' % (name, attrs))
01155             return        
01156         self.endData()
01157 
01158         if not self.isSelfClosingTag(name) and not selfClosing:
01159             self._smartPop(name)
01160 
01161         if self.parseOnlyThese and len(self.tagStack) <= 1 \
01162                and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
01163             return
01164 
01165         tag = Tag(self, name, attrs, self.currentTag, self.previous)
01166         if self.previous:
01167             self.previous.next = tag
01168         self.previous = tag
01169         self.pushTag(tag)
01170         if selfClosing or self.isSelfClosingTag(name):
01171             self.popTag()                
01172         if name in self.QUOTE_TAGS:
01173             #print "Beginning quote (%s)" % name
01174             self.quoteStack.append(name)
01175             self.literal = 1
01176         return tag

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

Definition at line 918 of file BeautifulSoup.py.

Definition at line 959 of file BeautifulSoup.py.

Definition at line 960 of file BeautifulSoup.py.

Definition at line 1028 of file BeautifulSoup.py.

Definition at line 1029 of file BeautifulSoup.py.

Definition at line 220 of file BeautifulSoup.py.

Definition at line 266 of file BeautifulSoup.py.

Definition at line 233 of file BeautifulSoup.py.

Definition at line 247 of file BeautifulSoup.py.

Definition at line 949 of file BeautifulSoup.py.

Definition at line 1026 of file BeautifulSoup.py.

Definition at line 916 of file BeautifulSoup.py.

Definition at line 962 of file BeautifulSoup.py.

Definition at line 965 of file BeautifulSoup.py.

Definition at line 1174 of file BeautifulSoup.py.

Definition at line 970 of file BeautifulSoup.py.

Initial value:
[(re.compile('(<[^<>]*)/>'),
                       lambda x: x.group(1) + ' />'),
                      (re.compile('<!\s+([^<>]*)>'),
                       lambda x: '<!' + x.group(1) + '>')
                      ]

Definition at line 908 of file BeautifulSoup.py.

Definition at line 971 of file BeautifulSoup.py.

Definition at line 81 of file BeautifulSoup.py.

Definition at line 83 of file BeautifulSoup.py.

Reimplemented in kss.core.BeautifulSoup.BeautifulSoup.

Definition at line 983 of file BeautifulSoup.py.

Definition at line 79 of file BeautifulSoup.py.

Definition at line 948 of file BeautifulSoup.py.

Reimplemented from kss.core.BeautifulSoup.PageElement.

Definition at line 1074 of file BeautifulSoup.py.

Definition at line 82 of file BeautifulSoup.py.

Reimplemented in kss.core.BeautifulSoup.BeautifulSoup.

Definition at line 906 of file BeautifulSoup.py.

Definition at line 1031 of file BeautifulSoup.py.

Definition at line 914 of file BeautifulSoup.py.

Reimplemented in kss.core.BeautifulSoup.BeautifulSoup.

Definition at line 903 of file BeautifulSoup.py.

Definition at line 950 of file BeautifulSoup.py.

Definition at line 1030 of file BeautifulSoup.py.

Definition at line 917 of file BeautifulSoup.py.

Definition at line 963 of file BeautifulSoup.py.


The documentation for this class was generated from the following file: