Back to index

plone3  3.1.7
Classes | Functions | Variables
kss.core.BeautifulSoup Namespace Reference

Classes

class  PageElement
class  NavigableString
class  CData
class  ProcessingInstruction
class  Comment
class  Declaration
class  Tag
class  SoupStrainer
class  ResultSet
class  BeautifulStoneSoup
class  BeautifulSoup
class  StopParsing
class  ICantBelieveItsBeautifulSoup
class  MinimalSoup
class  BeautifulSOAP
class  RobustXMLParser
class  RobustHTMLParser
class  RobustWackAssHTMLParser
class  RobustInsanelyWackAssHTMLParser
class  SimplifyingSOAPParser
class  UnicodeDammit

Functions

def isList
def isString
def buildTagMap

Variables

string __author__ = "Leonard Richardson (crummy.com)"
list __contributors__
string __version__ = "3.0.3"
string __copyright__ = "Copyright (c) 2004-2006 Leonard Richardson"
string __license__ = "PSF"
string DEFAULT_OUTPUT_ENCODING = "utf-8"
 chardet = None
tuple soup = BeautifulSoup(sys.stdin.read())

Detailed Description

Beautiful Soup
Elixir and Tonic
"The Screen-Scraper's Friend"
http://www.crummy.com/software/BeautifulSoup/

Beautiful Soup parses a (possibly invalid) XML or HTML document into a
tree representation. It provides methods and Pythonic idioms that make
it easy to navigate, search, and modify the tree.

A well-structured XML/HTML document yields a well-behaved data
structure. An ill-structured XML/HTML document yields a
correspondingly ill-behaved data structure. If your document is only
locally well-structured, you can use this library to find and process
the well-structured part of it.

Beautiful Soup works with Python 2.2 and up. It has no external
dependencies, but you'll have more success at converting data to UTF-8
if you also install these three packages:

* chardet, for auto-detecting character encodings
  http://chardet.feedparser.org/
* cjkcodecs and iconv_codec, which add more encodings to the ones supported
  by stock Python.
  http://cjkpython.i18n.org/

Beautiful Soup defines classes for two main parsing strategies:
    
 * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
   language that kind of looks like XML.

 * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
   or invalid. This class has web browser-like heuristics for
   obtaining a sensible parse tree in the face of common HTML errors.

Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
the encoding of an HTML or XML document, and converting it to
Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed
Parser.

For more than you ever wanted to know about Beautiful Soup, see the
documentation:
http://www.crummy.com/software/BeautifulSoup/documentation.html

Class Documentation

class kss::core::BeautifulSoup::StopParsing

Definition at line 1422 of file BeautifulSoup.py.


Function Documentation

def kss.core.BeautifulSoup.buildTagMap (   default,
  args 
)
Turns a list of maps, lists, or scalars into a single map.
Used to build the SELF_CLOSING_TAGS, NESTABLE_TAGS, and
NESTING_RESET_TAGS maps out of lists and partial maps.

Definition at line 864 of file BeautifulSoup.py.

00864 
00865 def buildTagMap(default, *args):
00866     """Turns a list of maps, lists, or scalars into a single map.
00867     Used to build the SELF_CLOSING_TAGS, NESTABLE_TAGS, and
00868     NESTING_RESET_TAGS maps out of lists and partial maps."""
00869     built = {}
00870     for portion in args:
00871         if hasattr(portion, 'items'):
00872             #It's a map. Merge it.
00873             for k,v in portion.items():
00874                 built[k] = v
00875         elif isList(portion):
00876             #It's a list. Map each item to the default.
00877             for k in portion:
00878                 built[k] = default
00879         else:
00880             #It's a scalar. Map it to the default.
00881             built[portion] = default
00882     return built
00883 
00884 # Now, the parser classes.

Here is the call graph for this function:

Convenience method that works with all 2.x versions of Python
to determine whether or not something is listlike.

Definition at line 850 of file BeautifulSoup.py.

00850 
00851 def isList(l):
00852     """Convenience method that works with all 2.x versions of Python
00853     to determine whether or not something is listlike."""
00854     return hasattr(l, '__iter__') \
00855            or (type(l) in (types.ListType, types.TupleType))

Here is the caller graph for this function:

Convenience method that works with all 2.x versions of Python
to determine whether or not something is stringlike.

Definition at line 856 of file BeautifulSoup.py.

00856 
00857 def isString(s):
00858     """Convenience method that works with all 2.x versions of Python
00859     to determine whether or not something is stringlike."""
00860     try:
00861         return isinstance(s, unicode) or isinstance(s, basestring) 
00862     except NameError:
00863         return isinstance(s, str)

Here is the caller graph for this function:


Variable Documentation

string kss.core.BeautifulSoup.__author__ = "Leonard Richardson (crummy.com)"

Definition at line 46 of file BeautifulSoup.py.

Initial value:
00001 ["Sam Ruby (intertwingly.net)",
00002                     "the unwitting Mark Pilgrim (diveintomark.org)",
00003                     "http://www.crummy.com/software/BeautifulSoup/AUTHORS.html"]

Definition at line 47 of file BeautifulSoup.py.

string kss.core.BeautifulSoup.__copyright__ = "Copyright (c) 2004-2006 Leonard Richardson"

Definition at line 51 of file BeautifulSoup.py.

Definition at line 52 of file BeautifulSoup.py.

Definition at line 50 of file BeautifulSoup.py.

Definition at line 1542 of file BeautifulSoup.py.

Definition at line 68 of file BeautifulSoup.py.

tuple kss.core.BeautifulSoup.soup = BeautifulSoup(sys.stdin.read())

Definition at line 1811 of file BeautifulSoup.py.