Back to index

plone3  3.1.7
Classes | Functions | Variables
plone.app.portlets.portlets.feedparser Namespace Reference

Classes

class  ThingsNobodyCaresAboutButMe
class  CharacterEncodingOverride
class  CharacterEncodingUnknown
class  NonXMLContentType
class  UndeclaredNamespace
class  FeedParserDict
class  _FeedParserMixin
class  _StrictFeedParser
class  _BaseHTMLProcessor
class  _LooseFeedParser
class  _RelativeURIResolver
class  _HTMLSanitizer
class  _FeedURLHandler

Functions

def _xmlescape
def dict
def zopeCompatibilityHack
def _ebcdic_to_ascii
def _urljoin
def _resolveRelativeURIs
def _sanitizeHTML
def _open_resource
def registerDateHandler
def _parse_date_iso8601
def _parse_date_onblog
def _parse_date_nate
def _parse_date_mssql
def _parse_date_greek
def _parse_date_hungarian
def _parse_date_w3dtf
def _parse_date_rfc822
def _parse_date
def _getCharacterEncoding
def _toUTF8
def _stripDoctype
def parse

Variables

string __version__ = "4.1"
string __license__
string __author__ = "Mark Pilgrim <http://diveintomark.org/>"
list __contributors__
int _debug = 0
string USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/"
string ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1"
list PREFERRED_XML_PARSERS = ["drv_libxml2"]
int TIDY_MARKUP = 0
list PREFERRED_TIDY_INTERFACES = ["uTidy", "mxTidy"]
 gzip = None
 zlib = None
int _XML_AVAILABLE = 1
 base64 = binasciiNone
 chardet = None
dictionary SUPPORTED_VERSIONS
 UserDict = dict
 _ebcdic_to_ascii_map = None
tuple _urifixer = re.compile('^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)')
list _date_handlers = []
list _iso8601_tmpl
list _iso8601_re
list _iso8601_matches = [re.compile(regex).match for regex in _iso8601_re]
string _korean_year = u'\ub144'
string _korean_month = u'\uc6d4'
string _korean_day = u'\uc77c'
string _korean_am = u'\uc624\uc804'
string _korean_pm = u'\uc624\ud6c4'
 _korean_onblog_date_re = \
 _korean_nate_date_re = \
 _mssql_date_re = \
 _greek_months = \
 _greek_wdays = \
 _greek_date_format_re = \
 _hungarian_months = \
 _hungarian_date_format_re = \
dictionary _additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800}
list urls = sys.argv[1:]
tuple result = parse(url)

Class Documentation

class plone::app::portlets::portlets::feedparser::UndeclaredNamespace

Definition at line 136 of file feedparser.py.


Function Documentation

Definition at line 257 of file feedparser.py.

00257 
00258 def _ebcdic_to_ascii(s):
00259     global _ebcdic_to_ascii_map
00260     if not _ebcdic_to_ascii_map:
00261         emap = (
00262             0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
00263             16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
00264             128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
00265             144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
00266             32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
00267             38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
00268             45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
00269             186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
00270             195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,201,
00271             202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
00272             209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
00273             216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
00274             123,65,66,67,68,69,70,71,72,73,232,233,234,235,236,237,
00275             125,74,75,76,77,78,79,80,81,82,238,239,240,241,242,243,
00276             92,159,83,84,85,86,87,88,89,90,244,245,246,247,248,249,
00277             48,49,50,51,52,53,54,55,56,57,250,251,252,253,254,255
00278             )
00279         import string
00280         _ebcdic_to_ascii_map = string.maketrans( \
00281             ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
00282     return s.translate(_ebcdic_to_ascii_map)

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._getCharacterEncoding (   http_headers,
  xml_data 
) [private]
Get the character encoding of the XML document

http_headers is a dictionary
xml_data is a raw string (not Unicode)

This is so much trickier than it sounds, it's not even funny.
According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type
is application/xml, application/*+xml,
application/xml-external-parsed-entity, or application/xml-dtd,
the encoding given in the charset parameter of the HTTP Content-Type
takes precedence over the encoding given in the XML prefix within the
document, and defaults to 'utf-8' if neither are specified.  But, if
the HTTP Content-Type is text/xml, text/*+xml, or
text/xml-external-parsed-entity, the encoding given in the XML prefix
within the document is ALWAYS IGNORED and only the encoding given in
the charset parameter of the HTTP Content-Type header should be
respected, and it defaults to 'us-ascii' if not specified.

Furthermore, discussion on the atom-syntax mailing list with the
author of RFC 3023 leads me to the conclusion that any document
served with a Content-Type of text/* and no charset parameter
must be treated as us-ascii.  (We now do this.)  And also that it
must always be flagged as non-well-formed.  (We now do this too.)

If Content-Type is unspecified (input was local file or non-HTTP source)
or unrecognized (server just got it totally wrong), then go by the
encoding given in the XML prefix of the document and default to
'iso-8859-1' as per the HTTP specification (RFC 2616).

Then, assuming we didn't find a character encoding in the HTTP headers
(and the HTTP Content-type allowed us to look in the body), we need
to sniff the first few bytes of the XML data and try to determine
whether the encoding is ASCII-compatible.  Section F of the XML
specification shows the way here:
http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info

If the sniffed encoding is not ASCII-compatible, we need to make it
ASCII compatible so that we can sniff further into the XML declaration
to find the encoding attribute, which will tell us the true encoding.

Of course, none of this guarantees that we will be able to parse the
feed in the declared character encoding (assuming it was declared
correctly, which many are not).  CJKCodecs and iconv_codec help a lot;
you should definitely install them if you can.
http://cjkpython.i18n.org/

Definition at line 2242 of file feedparser.py.

02242 
02243 def _getCharacterEncoding(http_headers, xml_data):
02244     '''Get the character encoding of the XML document
02245 
02246     http_headers is a dictionary
02247     xml_data is a raw string (not Unicode)
02248     
02249     This is so much trickier than it sounds, it's not even funny.
02250     According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type
02251     is application/xml, application/*+xml,
02252     application/xml-external-parsed-entity, or application/xml-dtd,
02253     the encoding given in the charset parameter of the HTTP Content-Type
02254     takes precedence over the encoding given in the XML prefix within the
02255     document, and defaults to 'utf-8' if neither are specified.  But, if
02256     the HTTP Content-Type is text/xml, text/*+xml, or
02257     text/xml-external-parsed-entity, the encoding given in the XML prefix
02258     within the document is ALWAYS IGNORED and only the encoding given in
02259     the charset parameter of the HTTP Content-Type header should be
02260     respected, and it defaults to 'us-ascii' if not specified.
02261 
02262     Furthermore, discussion on the atom-syntax mailing list with the
02263     author of RFC 3023 leads me to the conclusion that any document
02264     served with a Content-Type of text/* and no charset parameter
02265     must be treated as us-ascii.  (We now do this.)  And also that it
02266     must always be flagged as non-well-formed.  (We now do this too.)
02267     
02268     If Content-Type is unspecified (input was local file or non-HTTP source)
02269     or unrecognized (server just got it totally wrong), then go by the
02270     encoding given in the XML prefix of the document and default to
02271     'iso-8859-1' as per the HTTP specification (RFC 2616).
02272     
02273     Then, assuming we didn't find a character encoding in the HTTP headers
02274     (and the HTTP Content-type allowed us to look in the body), we need
02275     to sniff the first few bytes of the XML data and try to determine
02276     whether the encoding is ASCII-compatible.  Section F of the XML
02277     specification shows the way here:
02278     http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
02279 
02280     If the sniffed encoding is not ASCII-compatible, we need to make it
02281     ASCII compatible so that we can sniff further into the XML declaration
02282     to find the encoding attribute, which will tell us the true encoding.
02283 
02284     Of course, none of this guarantees that we will be able to parse the
02285     feed in the declared character encoding (assuming it was declared
02286     correctly, which many are not).  CJKCodecs and iconv_codec help a lot;
02287     you should definitely install them if you can.
02288     http://cjkpython.i18n.org/
02289     '''
02290 
02291     def _parseHTTPContentType(content_type):
02292         '''takes HTTP Content-Type header and returns (content type, charset)
02293 
02294         If no charset is specified, returns (content type, '')
02295         If no content type is specified, returns ('', '')
02296         Both return parameters are guaranteed to be lowercase strings
02297         '''
02298         content_type = content_type or ''
02299         content_type, params = cgi.parse_header(content_type)
02300         return content_type, params.get('charset', '').replace("'", '')
02301 
02302     sniffed_xml_encoding = ''
02303     xml_encoding = ''
02304     true_encoding = ''
02305     http_content_type, http_encoding = _parseHTTPContentType(http_headers.get('content-type'))
02306     # Must sniff for non-ASCII-compatible character encodings before
02307     # searching for XML declaration.  This heuristic is defined in
02308     # section F of the XML specification:
02309     # http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
02310     try:
02311         if xml_data[:4] == '\x4c\x6f\xa7\x94':
02312             # EBCDIC
02313             xml_data = _ebcdic_to_ascii(xml_data)
02314         elif xml_data[:4] == '\x00\x3c\x00\x3f':
02315             # UTF-16BE
02316             sniffed_xml_encoding = 'utf-16be'
02317             xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
02318         elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') and (xml_data[2:4] != '\x00\x00'):
02319             # UTF-16BE with BOM
02320             sniffed_xml_encoding = 'utf-16be'
02321             xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
02322         elif xml_data[:4] == '\x3c\x00\x3f\x00':
02323             # UTF-16LE
02324             sniffed_xml_encoding = 'utf-16le'
02325             xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
02326         elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and (xml_data[2:4] != '\x00\x00'):
02327             # UTF-16LE with BOM
02328             sniffed_xml_encoding = 'utf-16le'
02329             xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
02330         elif xml_data[:4] == '\x00\x00\x00\x3c':
02331             # UTF-32BE
02332             sniffed_xml_encoding = 'utf-32be'
02333             xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
02334         elif xml_data[:4] == '\x3c\x00\x00\x00':
02335             # UTF-32LE
02336             sniffed_xml_encoding = 'utf-32le'
02337             xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
02338         elif xml_data[:4] == '\x00\x00\xfe\xff':
02339             # UTF-32BE with BOM
02340             sniffed_xml_encoding = 'utf-32be'
02341             xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
02342         elif xml_data[:4] == '\xff\xfe\x00\x00':
02343             # UTF-32LE with BOM
02344             sniffed_xml_encoding = 'utf-32le'
02345             xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
02346         elif xml_data[:3] == '\xef\xbb\xbf':
02347             # UTF-8 with BOM
02348             sniffed_xml_encoding = 'utf-8'
02349             xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
02350         else:
02351             # ASCII-compatible
02352             pass
02353         xml_encoding_match = re.compile('^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
02354     except:
02355         xml_encoding_match = None
02356     if xml_encoding_match:
02357         xml_encoding = xml_encoding_match.groups()[0].lower()
02358         if sniffed_xml_encoding and (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', 'iso-10646-ucs-4', 'ucs-4', 'csucs4', 'utf-16', 'utf-32', 'utf_16', 'utf_32', 'utf16', 'u16')):
02359             xml_encoding = sniffed_xml_encoding
02360     acceptable_content_type = 0
02361     application_content_types = ('application/xml', 'application/xml-dtd', 'application/xml-external-parsed-entity')
02362     text_content_types = ('text/xml', 'text/xml-external-parsed-entity')
02363     if (http_content_type in application_content_types) or \
02364        (http_content_type.startswith('application/') and http_content_type.endswith('+xml')):
02365         acceptable_content_type = 1
02366         true_encoding = http_encoding or xml_encoding or 'utf-8'
02367     elif (http_content_type in text_content_types) or \
02368          (http_content_type.startswith('text/')) and http_content_type.endswith('+xml'):
02369         acceptable_content_type = 1
02370         true_encoding = http_encoding or 'us-ascii'
02371     elif http_content_type.startswith('text/'):
02372         true_encoding = http_encoding or 'us-ascii'
02373     elif http_headers and (not http_headers.has_key('content-type')):
02374         true_encoding = xml_encoding or 'iso-8859-1'
02375     else:
02376         true_encoding = xml_encoding or 'utf-8'
02377     return true_encoding, http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type
    

Here is the call graph for this function:

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._open_resource (   url_file_stream_or_string,
  etag,
  modified,
  agent,
  referrer,
  handlers 
) [private]
URL, filename, or string --> stream

This function lets you define parsers that take any input source
(URL, pathname to local or network file, or actual data as a string)
and deal with it in a uniform manner.  Returned object is guaranteed
to have all the basic stdio read methods (read, readline, readlines).
Just .close() the object when you're done with it.

If the etag argument is supplied, it will be used as the value of an
If-None-Match request header.

If the modified argument is supplied, it must be a tuple of 9 integers
as returned by gmtime() in the standard Python time module. This MUST
be in GMT (Greenwich Mean Time). The formatted date/time will be used
as the value of an If-Modified-Since request header.

If the agent argument is supplied, it will be used as the value of a
User-Agent request header.

If the referrer argument is supplied, it will be used as the value of a
Referer[sic] request header.

If handlers is supplied, it is a list of handlers used to build a
urllib2 opener.

Definition at line 1743 of file feedparser.py.

01743 
01744 def _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers):
01745     """URL, filename, or string --> stream
01746 
01747     This function lets you define parsers that take any input source
01748     (URL, pathname to local or network file, or actual data as a string)
01749     and deal with it in a uniform manner.  Returned object is guaranteed
01750     to have all the basic stdio read methods (read, readline, readlines).
01751     Just .close() the object when you're done with it.
01752 
01753     If the etag argument is supplied, it will be used as the value of an
01754     If-None-Match request header.
01755 
01756     If the modified argument is supplied, it must be a tuple of 9 integers
01757     as returned by gmtime() in the standard Python time module. This MUST
01758     be in GMT (Greenwich Mean Time). The formatted date/time will be used
01759     as the value of an If-Modified-Since request header.
01760 
01761     If the agent argument is supplied, it will be used as the value of a
01762     User-Agent request header.
01763 
01764     If the referrer argument is supplied, it will be used as the value of a
01765     Referer[sic] request header.
01766 
01767     If handlers is supplied, it is a list of handlers used to build a
01768     urllib2 opener.
01769     """
01770 
01771     if hasattr(url_file_stream_or_string, 'read'):
01772         return url_file_stream_or_string
01773 
01774     if url_file_stream_or_string == '-':
01775         return sys.stdin
01776 
01777     if urlparse.urlparse(url_file_stream_or_string)[0] in ('http', 'https', 'ftp'):
01778         if not agent:
01779             agent = USER_AGENT
01780         # test for inline user:password for basic auth
01781         auth = None
01782         if base64:
01783             urltype, rest = urllib.splittype(url_file_stream_or_string)
01784             realhost, rest = urllib.splithost(rest)
01785             if realhost:
01786                 user_passwd, realhost = urllib.splituser(realhost)
01787                 if user_passwd:
01788                     url_file_stream_or_string = '%s://%s%s' % (urltype, realhost, rest)
01789                     auth = base64.encodestring(user_passwd).strip()
01790         # try to open with urllib2 (to use optional headers)
01791         request = urllib2.Request(url_file_stream_or_string)
01792         request.add_header('User-Agent', agent)
01793         if etag:
01794             request.add_header('If-None-Match', etag)
01795         if modified:
01796             # format into an RFC 1123-compliant timestamp. We can't use
01797             # time.strftime() since the %a and %b directives can be affected
01798             # by the current locale, but RFC 2616 states that dates must be
01799             # in English.
01800             short_weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
01801             months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
01802             request.add_header('If-Modified-Since', '%s, %02d %s %04d %02d:%02d:%02d GMT' % (short_weekdays[modified[6]], modified[2], months[modified[1] - 1], modified[0], modified[3], modified[4], modified[5]))
01803         if referrer:
01804             request.add_header('Referer', referrer)
01805         if gzip and zlib:
01806             request.add_header('Accept-encoding', 'gzip, deflate')
01807         elif gzip:
01808             request.add_header('Accept-encoding', 'gzip')
01809         elif zlib:
01810             request.add_header('Accept-encoding', 'deflate')
01811         else:
01812             request.add_header('Accept-encoding', '')
01813         if auth:
01814             request.add_header('Authorization', 'Basic %s' % auth)
01815         if ACCEPT_HEADER:
01816             request.add_header('Accept', ACCEPT_HEADER)
01817         request.add_header('A-IM', 'feed') # RFC 3229 support
01818         opener = apply(urllib2.build_opener, tuple([_FeedURLHandler()] + handlers))
01819         opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
01820         try:
01821             return opener.open(request)
01822         finally:
01823             opener.close() # JohnD
01824     
01825     # try to open with native open function (if url_file_stream_or_string is a filename)
01826     try:
01827         return open(url_file_stream_or_string)
01828     except:
01829         pass
01830 
01831     # treat url_file_stream_or_string as string
01832     return _StringIO(str(url_file_stream_or_string))

Here is the call graph for this function:

Here is the caller graph for this function:

Parses a variety of date formats into a 9-tuple in GMT

Definition at line 2226 of file feedparser.py.

02226 
02227 def _parse_date(dateString):
02228     '''Parses a variety of date formats into a 9-tuple in GMT'''
02229     for handler in _date_handlers:
02230         try:
02231             date9tuple = handler(dateString)
02232             if not date9tuple: continue
02233             if len(date9tuple) != 9:
02234                 if _debug: sys.stderr.write('date handler function must return 9-tuple\n')
02235                 raise ValueError
02236             map(int, date9tuple)
02237             return date9tuple
02238         except Exception, e:
02239             if _debug: sys.stderr.write('%s raised %s\n' % (handler.__name__, repr(e)))
02240             pass
02241     return None

Here is the caller graph for this function:

Parse a string according to a Greek 8-bit date format.

Definition at line 2044 of file feedparser.py.

02044 
02045 def _parse_date_greek(dateString):
02046     '''Parse a string according to a Greek 8-bit date format.'''
02047     m = _greek_date_format_re.match(dateString)
02048     if not m: return
02049     try:
02050         wday = _greek_wdays[m.group(1)]
02051         month = _greek_months[m.group(3)]
02052     except:
02053         return
02054     rfc822date = '%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)s' % \
02055                  {'wday': wday, 'day': m.group(2), 'month': month, 'year': m.group(4),\
02056                   'hour': m.group(5), 'minute': m.group(6), 'second': m.group(7),\
02057                   'zonediff': m.group(8)}
02058     if _debug: sys.stderr.write('Greek date parsed as: %s\n' % rfc822date)
02059     return _parse_date_rfc822(rfc822date)
02060 registerDateHandler(_parse_date_greek)
02061 
# Unicode strings for Hungarian date strings

Here is the call graph for this function:

Parse a string according to a Hungarian 8-bit date format.

Definition at line 2081 of file feedparser.py.

02081 
02082 def _parse_date_hungarian(dateString):
02083     '''Parse a string according to a Hungarian 8-bit date format.'''
02084     m = _hungarian_date_format_re.match(dateString)
02085     if not m: return
02086     try:
02087         month = _hungarian_months[m.group(2)]
02088         day = m.group(3)
02089         if len(day) == 1:
02090             day = '0' + day
02091         hour = m.group(4)
02092         if len(hour) == 1:
02093             hour = '0' + hour
02094     except:
02095         return
02096     w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)s' % \
02097                 {'year': m.group(1), 'month': month, 'day': day,\
02098                  'hour': hour, 'minute': m.group(5),\
02099                  'zonediff': m.group(6)}
02100     if _debug: sys.stderr.write('Hungarian date parsed as: %s\n' % w3dtfdate)
02101     return _parse_date_w3dtf(w3dtfdate)
02102 registerDateHandler(_parse_date_hungarian)
02103 
02104 # W3DTF-style date parsing adapted from PyXML xml.utils.iso8601, written by
02105 # Drake and licensed under the Python license.  Removed all range checking
02106 # for month, day, hour, minute, and second, since mktime will normalize
# these later

Here is the call graph for this function:

Parse a variety of ISO-8601-compatible formats like 20040105

Definition at line 1868 of file feedparser.py.

01868 
01869 def _parse_date_iso8601(dateString):
01870     '''Parse a variety of ISO-8601-compatible formats like 20040105'''
01871     m = None
01872     for _iso8601_match in _iso8601_matches:
01873         m = _iso8601_match(dateString)
01874         if m: break
01875     if not m: return
01876     if m.span() == (0, 0): return
01877     params = m.groupdict()
01878     ordinal = params.get('ordinal', 0)
01879     if ordinal:
01880         ordinal = int(ordinal)
01881     else:
01882         ordinal = 0
01883     year = params.get('year', '--')
01884     if not year or year == '--':
01885         year = time.gmtime()[0]
01886     elif len(year) == 2:
01887         # ISO 8601 assumes current century, i.e. 93 -> 2093, NOT 1993
01888         year = 100 * int(time.gmtime()[0] / 100) + int(year)
01889     else:
01890         year = int(year)
01891     month = params.get('month', '-')
01892     if not month or month == '-':
01893         # ordinals are NOT normalized by mktime, we simulate them
01894         # by setting month=1, day=ordinal
01895         if ordinal:
01896             month = 1
01897         else:
01898             month = time.gmtime()[1]
01899     month = int(month)
01900     day = params.get('day', 0)
01901     if not day:
01902         # see above
01903         if ordinal:
01904             day = ordinal
01905         elif params.get('century', 0) or \
01906                  params.get('year', 0) or params.get('month', 0):
01907             day = 1
01908         else:
01909             day = time.gmtime()[2]
01910     else:
01911         day = int(day)
01912     # special case of the century - is the first year of the 21st century
01913     # 2000 or 2001 ? The debate goes on...
01914     if 'century' in params.keys():
01915         year = (int(params['century']) - 1) * 100 + 1
01916     # in ISO 8601 most fields are optional
01917     for field in ['hour', 'minute', 'second', 'tzhour', 'tzmin']:
01918         if not params.get(field, None):
01919             params[field] = 0
01920     hour = int(params.get('hour', 0))
01921     minute = int(params.get('minute', 0))
01922     second = int(params.get('second', 0))
01923     # weekday is normalized by mktime(), we can ignore it
01924     weekday = 0
01925     # daylight savings is complex, but not needed for feedparser's purposes
01926     # as time zones, if specified, include mention of whether it is active
01927     # (e.g. PST vs. PDT, CET). Using -1 is implementation-dependent and
01928     # and most implementations have DST bugs
01929     daylight_savings_flag = 0
01930     tm = [year, month, day, hour, minute, second, weekday,
01931           ordinal, daylight_savings_flag]
01932     # ISO 8601 time zone adjustments
01933     tz = params.get('tz')
01934     if tz and tz != 'Z':
01935         if tz[0] == '-':
01936             tm[3] += int(params.get('tzhour', 0))
01937             tm[4] += int(params.get('tzmin', 0))
01938         elif tz[0] == '+':
01939             tm[3] -= int(params.get('tzhour', 0))
01940             tm[4] -= int(params.get('tzmin', 0))
01941         else:
01942             return None
01943     # Python's time.mktime() is a wrapper around the ANSI C mktime(3c)
01944     # which is guaranteed to normalize d/m/y/h/m/s.
01945     # Many implementations have bugs, but we'll pretend they don't.
01946     return time.localtime(time.mktime(tm))
01947 registerDateHandler(_parse_date_iso8601)
01948     
# 8-bit date handling routines written by ytrewq1.

Here is the call graph for this function:

Parse a string according to the MS SQL date format

Definition at line 1994 of file feedparser.py.

01994 
01995 def _parse_date_mssql(dateString):
01996     '''Parse a string according to the MS SQL date format'''
01997     m = _mssql_date_re.match(dateString)
01998     if not m: return
01999     w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
02000                 {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
02001                  'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
02002                  'zonediff': '+09:00'}
02003     if _debug: sys.stderr.write('MS SQL date parsed as: %s\n' % w3dtfdate)
02004     return _parse_date_w3dtf(w3dtfdate)
02005 registerDateHandler(_parse_date_mssql)
02006 
# Unicode strings for Greek date strings

Here is the call graph for this function:

Parse a string according to the Nate 8-bit date format

Definition at line 1973 of file feedparser.py.

01973 
01974 def _parse_date_nate(dateString):
01975     '''Parse a string according to the Nate 8-bit date format'''
01976     m = _korean_nate_date_re.match(dateString)
01977     if not m: return
01978     hour = int(m.group(5))
01979     ampm = m.group(4)
01980     if (ampm == _korean_pm):
01981         hour += 12
01982     hour = str(hour)
01983     if len(hour) == 1:
01984         hour = '0' + hour
01985     w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
01986                 {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
01987                  'hour': hour, 'minute': m.group(6), 'second': m.group(7),\
01988                  'zonediff': '+09:00'}
01989     if _debug: sys.stderr.write('Nate date parsed as: %s\n' % w3dtfdate)
01990     return _parse_date_w3dtf(w3dtfdate)
01991 registerDateHandler(_parse_date_nate)

Here is the call graph for this function:

Parse a string according to the OnBlog 8-bit date format

Definition at line 1961 of file feedparser.py.

01961 
01962 def _parse_date_onblog(dateString):
01963     '''Parse a string according to the OnBlog 8-bit date format'''
01964     m = _korean_onblog_date_re.match(dateString)
01965     if not m: return
01966     w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
01967                 {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
01968                  'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
01969                  'zonediff': '+09:00'}
01970     if _debug: sys.stderr.write('OnBlog date parsed as: %s\n' % w3dtfdate)
01971     return _parse_date_w3dtf(w3dtfdate)
01972 registerDateHandler(_parse_date_onblog)

Here is the call graph for this function:

Parse an RFC822, RFC1123, RFC2822, or asctime-style date

Definition at line 2202 of file feedparser.py.

02202 
02203 def _parse_date_rfc822(dateString):
02204     '''Parse an RFC822, RFC1123, RFC2822, or asctime-style date'''
02205     data = dateString.split()
02206     if data[0][-1] in (',', '.') or data[0].lower() in rfc822._daynames:
02207         del data[0]
02208     if len(data) == 4:
02209         s = data[3]
02210         i = s.find('+')
02211         if i > 0:
02212             data[3:] = [s[:i], s[i+1:]]
02213         else:
02214             data.append('')
02215         dateString = " ".join(data)
02216     if len(data) < 5:
02217         dateString += ' 00:00:00 GMT'
02218     tm = rfc822.parsedate_tz(dateString)
02219     if tm:
02220         return time.gmtime(rfc822.mktime_tz(tm))
02221 # rfc822.py defines several time zones, but we define some extra ones.
# 'ET' is equivalent to 'EST', etc.

Here is the caller graph for this function:

Definition at line 2107 of file feedparser.py.

02107 
02108 def _parse_date_w3dtf(dateString):
02109     def __extract_date(m):
02110         year = int(m.group('year'))
02111         if year < 100:
02112             year = 100 * int(time.gmtime()[0] / 100) + int(year)
02113         if year < 1000:
02114             return 0, 0, 0
02115         julian = m.group('julian')
02116         if julian:
02117             julian = int(julian)
02118             month = julian / 30 + 1
02119             day = julian % 30 + 1
02120             jday = None
02121             while jday != julian:
02122                 t = time.mktime((year, month, day, 0, 0, 0, 0, 0, 0))
02123                 jday = time.gmtime(t)[-2]
02124                 diff = abs(jday - julian)
02125                 if jday > julian:
02126                     if diff < day:
02127                         day = day - diff
02128                     else:
02129                         month = month - 1
02130                         day = 31
02131                 elif jday < julian:
02132                     if day + diff < 28:
02133                        day = day + diff
02134                     else:
02135                         month = month + 1
02136             return year, month, day
02137         month = m.group('month')
02138         day = 1
02139         if month is None:
02140             month = 1
02141         else:
02142             month = int(month)
02143             day = m.group('day')
02144             if day:
02145                 day = int(day)
02146             else:
02147                 day = 1
02148         return year, month, day
02149 
02150     def __extract_time(m):
02151         if not m:
02152             return 0, 0, 0
02153         hours = m.group('hours')
02154         if not hours:
02155             return 0, 0, 0
02156         hours = int(hours)
02157         minutes = int(m.group('minutes'))
02158         seconds = m.group('seconds')
02159         if seconds:
02160             seconds = int(seconds)
02161         else:
02162             seconds = 0
02163         return hours, minutes, seconds
02164 
02165     def __extract_tzd(m):
02166         '''Return the Time Zone Designator as an offset in seconds from UTC.'''
02167         if not m:
02168             return 0
02169         tzd = m.group('tzd')
02170         if not tzd:
02171             return 0
02172         if tzd == 'Z':
02173             return 0
02174         hours = int(m.group('tzdhours'))
02175         minutes = m.group('tzdminutes')
02176         if minutes:
02177             minutes = int(minutes)
02178         else:
02179             minutes = 0
02180         offset = (hours*60 + minutes) * 60
02181         if tzd[0] == '+':
02182             return -offset
02183         return offset
02184 
02185     __date_re = ('(?P<year>\d\d\d\d)'
02186                  '(?:(?P<dsep>-|)'
02187                  '(?:(?P<julian>\d\d\d)'
02188                  '|(?P<month>\d\d)(?:(?P=dsep)(?P<day>\d\d))?))?')
02189     __tzd_re = '(?P<tzd>[-+](?P<tzdhours>\d\d)(?::?(?P<tzdminutes>\d\d))|Z)'
02190     __tzd_rx = re.compile(__tzd_re)
02191     __time_re = ('(?P<hours>\d\d)(?P<tsep>:|)(?P<minutes>\d\d)'
02192                  '(?:(?P=tsep)(?P<seconds>\d\d(?:[.,]\d+)?))?'
02193                  + __tzd_re)
02194     __datetime_re = '%s(?:T%s)?' % (__date_re, __time_re)
02195     __datetime_rx = re.compile(__datetime_re)
02196     m = __datetime_rx.match(dateString)
02197     if (m is None) or (m.group() != dateString): return
02198     gmt = __extract_date(m) + __extract_time(m) + (0, 0, 0)
02199     if gmt[0] == 0: return
02200     return time.gmtime(time.mktime(gmt) + __extract_tzd(m) - time.timezone)
02201 registerDateHandler(_parse_date_w3dtf)

Here is the call graph for this function:

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._resolveRelativeURIs (   htmlSource,
  baseURI,
  encoding 
) [private]

Definition at line 1591 of file feedparser.py.

01591 
01592 def _resolveRelativeURIs(htmlSource, baseURI, encoding):
01593     if _debug: sys.stderr.write('entering _resolveRelativeURIs\n')
01594     p = _RelativeURIResolver(baseURI, encoding)
01595     p.feed(htmlSource)
01596     return p.output()

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._sanitizeHTML (   htmlSource,
  encoding 
) [private]

Definition at line 1650 of file feedparser.py.

01650 
01651 def _sanitizeHTML(htmlSource, encoding):
01652     p = _HTMLSanitizer(encoding)
01653     p.feed(htmlSource)
01654     data = p.output()
01655     if TIDY_MARKUP:
01656         # loop through list of preferred Tidy interfaces looking for one that's installed,
01657         # then set up a common _tidy function to wrap the interface-specific API.
01658         _tidy = None
01659         for tidy_interface in PREFERRED_TIDY_INTERFACES:
01660             try:
01661                 if tidy_interface == "uTidy":
01662                     from tidy import parseString as _utidy
01663                     def _tidy(data, **kwargs):
01664                         return str(_utidy(data, **kwargs))
01665                     break
01666                 elif tidy_interface == "mxTidy":
01667                     from mx.Tidy import Tidy as _mxtidy
01668                     def _tidy(data, **kwargs):
01669                         nerrors, nwarnings, data, errordata = _mxtidy.tidy(data, **kwargs)
01670                         return data
01671                     break
01672             except:
01673                 pass
01674         if _tidy:
01675             utf8 = type(data) == type(u'')
01676             if utf8:
01677                 data = data.encode('utf-8')
01678             data = _tidy(data, output_xhtml=1, numeric_entities=1, wrap=0, char_encoding="utf8")
01679             if utf8:
01680                 data = unicode(data, 'utf-8')
01681             if data.count('<body'):
01682                 data = data.split('<body', 1)[1]
01683                 if data.count('>'):
01684                     data = data.split('>', 1)[1]
01685             if data.count('</body'):
01686                 data = data.split('</body', 1)[0]
01687     data = data.strip().replace('\r\n', '\n')
01688     return data

Here is the caller graph for this function:

Strips DOCTYPE from XML document, returns (rss_version, stripped_data)

rss_version may be 'rss091n' or None
stripped_data is the same XML document, minus the DOCTYPE

Definition at line 2431 of file feedparser.py.

02431 
02432 def _stripDoctype(data):
02433     '''Strips DOCTYPE from XML document, returns (rss_version, stripped_data)
02434 
02435     rss_version may be 'rss091n' or None
02436     stripped_data is the same XML document, minus the DOCTYPE
02437     '''
02438     entity_pattern = re.compile(r'<!ENTITY([^>]*?)>', re.MULTILINE)
02439     data = entity_pattern.sub('', data)
02440     doctype_pattern = re.compile(r'<!DOCTYPE([^>]*?)>', re.MULTILINE)
02441     doctype_results = doctype_pattern.findall(data)
02442     doctype = doctype_results and doctype_results[0] or ''
02443     if doctype.lower().count('netscape'):
02444         version = 'rss091n'
02445     else:
02446         version = None
02447     data = doctype_pattern.sub('', data)
02448     return version, data
    

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._toUTF8 (   data,
  encoding 
) [private]
Changes an XML data stream on the fly to specify a new encoding

data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already
encoding is a string recognized by encodings.aliases

Definition at line 2378 of file feedparser.py.

02378 
02379 def _toUTF8(data, encoding):
02380     '''Changes an XML data stream on the fly to specify a new encoding
02381 
02382     data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already
02383     encoding is a string recognized by encodings.aliases
02384     '''
02385     if _debug: sys.stderr.write('entering _toUTF8, trying encoding %s\n' % encoding)
02386     # strip Byte Order Mark (if present)
02387     if (len(data) >= 4) and (data[:2] == '\xfe\xff') and (data[2:4] != '\x00\x00'):
02388         if _debug:
02389             sys.stderr.write('stripping BOM\n')
02390             if encoding != 'utf-16be':
02391                 sys.stderr.write('trying utf-16be instead\n')
02392         encoding = 'utf-16be'
02393         data = data[2:]
02394     elif (len(data) >= 4) and (data[:2] == '\xff\xfe') and (data[2:4] != '\x00\x00'):
02395         if _debug:
02396             sys.stderr.write('stripping BOM\n')
02397             if encoding != 'utf-16le':
02398                 sys.stderr.write('trying utf-16le instead\n')
02399         encoding = 'utf-16le'
02400         data = data[2:]
02401     elif data[:3] == '\xef\xbb\xbf':
02402         if _debug:
02403             sys.stderr.write('stripping BOM\n')
02404             if encoding != 'utf-8':
02405                 sys.stderr.write('trying utf-8 instead\n')
02406         encoding = 'utf-8'
02407         data = data[3:]
02408     elif data[:4] == '\x00\x00\xfe\xff':
02409         if _debug:
02410             sys.stderr.write('stripping BOM\n')
02411             if encoding != 'utf-32be':
02412                 sys.stderr.write('trying utf-32be instead\n')
02413         encoding = 'utf-32be'
02414         data = data[4:]
02415     elif data[:4] == '\xff\xfe\x00\x00':
02416         if _debug:
02417             sys.stderr.write('stripping BOM\n')
02418             if encoding != 'utf-32le':
02419                 sys.stderr.write('trying utf-32le instead\n')
02420         encoding = 'utf-32le'
02421         data = data[4:]
02422     newdata = unicode(data, encoding)
02423     if _debug: sys.stderr.write('successfully converted %s data to unicode\n' % encoding)
02424     declmatch = re.compile('^<\?xml[^>]*?>')
02425     newdecl = '''<?xml version='1.0' encoding='utf-8'?>'''
02426     if declmatch.search(newdata):
02427         newdata = declmatch.sub(newdecl, newdata)
02428     else:
02429         newdata = newdecl + u'\n' + newdata
02430     return newdata.encode('utf-8')

Here is the caller graph for this function:

def plone.app.portlets.portlets.feedparser._urljoin (   base,
  uri 
) [private]

Definition at line 284 of file feedparser.py.

00284 
00285 def _urljoin(base, uri):
00286     uri = _urifixer.sub(r'\1\3', uri)
00287     return urlparse.urljoin(base, uri)

Here is the caller graph for this function:

Definition at line 98 of file feedparser.py.

00098 
00099     def _xmlescape(data):
00100         data = data.replace('&', '&amp;')
00101         data = data.replace('>', '&gt;')
00102         data = data.replace('<', '&lt;')
00103         return data
00104 
00105 # base64 support for Atom feeds that contain embedded binary data
try:

Here is the caller graph for this function:

Definition at line 166 of file feedparser.py.

00166 
00167     def dict(aList):
00168         rc = {}
00169         for k, v in aList:
00170             rc[k] = v
00171         return rc

def plone.app.portlets.portlets.feedparser.parse (   url_file_stream_or_string,
  etag = None,
  modified = None,
  agent = None,
  referrer = None,
  handlers = [] 
)
Parse a feed from a URL, file, stream, or string

Definition at line 2449 of file feedparser.py.

02449 
02450 def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=[]):
02451     '''Parse a feed from a URL, file, stream, or string'''
02452     result = FeedParserDict()
02453     result['feed'] = FeedParserDict()
02454     result['entries'] = []
02455     if _XML_AVAILABLE:
02456         result['bozo'] = 0
02457     if type(handlers) == types.InstanceType:
02458         handlers = [handlers]
02459     try:
02460         f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers)
02461         data = f.read()
02462     except Exception, e:
02463         result['bozo'] = 1
02464         result['bozo_exception'] = e
02465         data = ''
02466         f = None
02467 
02468     # if feed is gzip-compressed, decompress it
02469     if f and data and hasattr(f, 'headers'):
02470         if gzip and f.headers.get('content-encoding', '') == 'gzip':
02471             try:
02472                 data = gzip.GzipFile(fileobj=_StringIO(data)).read()
02473             except Exception, e:
02474                 # Some feeds claim to be gzipped but they're not, so
02475                 # we get garbage.  Ideally, we should re-request the
02476                 # feed without the 'Accept-encoding: gzip' header,
02477                 # but we don't.
02478                 result['bozo'] = 1
02479                 result['bozo_exception'] = e
02480                 data = ''
02481         elif zlib and f.headers.get('content-encoding', '') == 'deflate':
02482             try:
02483                 data = zlib.decompress(data, -zlib.MAX_WBITS)
02484             except Exception, e:
02485                 result['bozo'] = 1
02486                 result['bozo_exception'] = e
02487                 data = ''
02488 
02489     # save HTTP headers
02490     if hasattr(f, 'info'):
02491         info = f.info()
02492         result['etag'] = info.getheader('ETag')
02493         last_modified = info.getheader('Last-Modified')
02494         if last_modified:
02495             result['modified'] = _parse_date(last_modified)
02496     if hasattr(f, 'url'):
02497         result['href'] = f.url
02498         result['status'] = 200
02499     if hasattr(f, 'status'):
02500         result['status'] = f.status
02501     if hasattr(f, 'headers'):
02502         result['headers'] = f.headers.dict
02503     if hasattr(f, 'close'):
02504         f.close()
02505 
02506     # there are four encodings to keep track of:
02507     # - http_encoding is the encoding declared in the Content-Type HTTP header
02508     # - xml_encoding is the encoding declared in the <?xml declaration
02509     # - sniffed_encoding is the encoding sniffed from the first 4 bytes of the XML data
02510     # - result['encoding'] is the actual encoding, as per RFC 3023 and a variety of other conflicting specifications
02511     http_headers = result.get('headers', {})
02512     result['encoding'], http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type = \
02513         _getCharacterEncoding(http_headers, data)
02514     if http_headers and (not acceptable_content_type):
02515         if http_headers.has_key('content-type'):
02516             bozo_message = '%s is not an XML media type' % http_headers['content-type']
02517         else:
02518             bozo_message = 'no Content-type specified'
02519         result['bozo'] = 1
02520         result['bozo_exception'] = NonXMLContentType(bozo_message)
02521         
02522     result['version'], data = _stripDoctype(data)
02523 
02524     baseuri = http_headers.get('content-location', result.get('href'))
02525     baselang = http_headers.get('content-language', None)
02526 
02527     # if server sent 304, we're done
02528     if result.get('status', 0) == 304:
02529         result['version'] = ''
02530         result['debug_message'] = 'The feed has not changed since you last checked, ' + \
02531             'so the server sent no data.  This is a feature, not a bug!'
02532         return result
02533 
02534     # if there was a problem downloading, we're done
02535     if not data:
02536         return result
02537 
02538     # determine character encoding
02539     use_strict_parser = 0
02540     known_encoding = 0
02541     tried_encodings = []
02542     # try: HTTP encoding, declared XML encoding, encoding sniffed from BOM
02543     for proposed_encoding in (result['encoding'], xml_encoding, sniffed_xml_encoding):
02544         if not proposed_encoding: continue
02545         if proposed_encoding in tried_encodings: continue
02546         tried_encodings.append(proposed_encoding)
02547         try:
02548             data = _toUTF8(data, proposed_encoding)
02549             known_encoding = use_strict_parser = 1
02550             break
02551         except:
02552             pass
02553     # if no luck and we have auto-detection library, try that
02554     if (not known_encoding) and chardet:
02555         try:
02556             proposed_encoding = chardet.detect(data)['encoding']
02557             if proposed_encoding and (proposed_encoding not in tried_encodings):
02558                 tried_encodings.append(proposed_encoding)
02559                 data = _toUTF8(data, proposed_encoding)
02560                 known_encoding = use_strict_parser = 1
02561         except:
02562             pass
02563     # if still no luck and we haven't tried utf-8 yet, try that
02564     if (not known_encoding) and ('utf-8' not in tried_encodings):
02565         try:
02566             proposed_encoding = 'utf-8'
02567             tried_encodings.append(proposed_encoding)
02568             data = _toUTF8(data, proposed_encoding)
02569             known_encoding = use_strict_parser = 1
02570         except:
02571             pass
02572     # if still no luck and we haven't tried windows-1252 yet, try that
02573     if (not known_encoding) and ('windows-1252' not in tried_encodings):
02574         try:
02575             proposed_encoding = 'windows-1252'
02576             tried_encodings.append(proposed_encoding)
02577             data = _toUTF8(data, proposed_encoding)
02578             known_encoding = use_strict_parser = 1
02579         except:
02580             pass
02581     # if still no luck, give up
02582     if not known_encoding:
02583         result['bozo'] = 1
02584         result['bozo_exception'] = CharacterEncodingUnknown( \
02585             'document encoding unknown, I tried ' + \
02586             '%s, %s, utf-8, and windows-1252 but nothing worked' % \
02587             (result['encoding'], xml_encoding))
02588         result['encoding'] = ''
02589     elif proposed_encoding != result['encoding']:
02590         result['bozo'] = 1
02591         result['bozo_exception'] = CharacterEncodingOverride( \
02592             'documented declared as %s, but parsed as %s' % \
02593             (result['encoding'], proposed_encoding))
02594         result['encoding'] = proposed_encoding
02595 
02596     if not _XML_AVAILABLE:
02597         use_strict_parser = 0
02598     if use_strict_parser:
02599         # initialize the SAX parser
02600         feedparser = _StrictFeedParser(baseuri, baselang, 'utf-8')
02601         saxparser = xml.sax.make_parser(PREFERRED_XML_PARSERS)
02602         saxparser.setFeature(xml.sax.handler.feature_namespaces, 1)
02603         saxparser.setContentHandler(feedparser)
02604         saxparser.setErrorHandler(feedparser)
02605         source = xml.sax.xmlreader.InputSource()
02606         source.setByteStream(_StringIO(data))
02607         if hasattr(saxparser, '_ns_stack'):
02608             # work around bug in built-in SAX parser (doesn't recognize xml: namespace)
02609             # PyXML doesn't have this problem, and it doesn't have _ns_stack either
02610             saxparser._ns_stack.append({'http://www.w3.org/XML/1998/namespace':'xml'})
02611         try:
02612             saxparser.parse(source)
02613         except Exception, e:
02614             if _debug:
02615                 import traceback
02616                 traceback.print_stack()
02617                 traceback.print_exc()
02618                 sys.stderr.write('xml parsing failed\n')
02619             result['bozo'] = 1
02620             result['bozo_exception'] = feedparser.exc or e
02621             use_strict_parser = 0
02622     if not use_strict_parser:
02623         feedparser = _LooseFeedParser(baseuri, baselang, known_encoding and 'utf-8' or '')
02624         feedparser.feed(data)
02625     result['feed'] = feedparser.feeddata
02626     result['entries'] = feedparser.entries
02627     result['version'] = result['version'] or feedparser.version
02628     result['namespaces'] = feedparser.namespacesInUse
02629     return result

Here is the call graph for this function:

Here is the caller graph for this function:

Register a date handler function (takes string, returns 9-tuple date in GMT)

Definition at line 1834 of file feedparser.py.

01834 
01835 def registerDateHandler(func):
01836     '''Register a date handler function (takes string, returns 9-tuple date in GMT)'''
01837     _date_handlers.insert(0, func)
01838     
01839 # ISO-8601 date parsing routines written by Fazal Majid.
01840 # The ISO 8601 standard is very convoluted and irregular - a full ISO 8601
01841 # parser is beyond the scope of feedparser and would be a worthwhile addition
01842 # to the Python library.
01843 # A single regular expression cannot parse ISO 8601 date formats into groups
01844 # as the standard is highly irregular (for instance is 030104 2003-01-04 or
01845 # 0301-04-01), so we use templates instead.
01846 # Please note the order in templates is significant because we need a
# greedy match.

Here is the caller graph for this function:

Definition at line 247 of file feedparser.py.

00247 
00248 def zopeCompatibilityHack():
00249     global FeedParserDict
00250     del FeedParserDict
00251     def FeedParserDict(aDict=None):
00252         rc = {}
00253         if aDict:
00254             rc.update(aDict)
00255         return rc


Variable Documentation

string plone.app.portlets.portlets.feedparser.__author__ = "Mark Pilgrim <http://diveintomark.org/>"

Definition at line 37 of file feedparser.py.

Initial value:
00001 ["Jason Diamond <http://injektilo.org/>",
00002                     "John Beimler <http://john.beimler.org/>",
00003                     "Fazal Majid <http://www.majid.info/mylos/weblog/>",
00004                     "Aaron Swartz <http://aaronsw.com/>",
00005                     "Kevin Marks <http://epeus.blogspot.com/>"]

Definition at line 38 of file feedparser.py.

Initial value:
00001 """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
00002 
00003 Redistribution and use in source and binary forms, with or without modification,
00004 are permitted provided that the following conditions are met:
00005 
00006 * Redistributions of source code must retain the above copyright notice,
00007   this list of conditions and the following disclaimer.
00008 * Redistributions in binary form must reproduce the above copyright notice,
00009   this list of conditions and the following disclaimer in the documentation
00010   and/or other materials provided with the distribution.
00011 
00012 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
00013 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
00014 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
00015 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
00016 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
00017 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
00018 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
00019 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
00020 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
00021 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
00022 POSSIBILITY OF SUCH DAMAGE."""

Definition at line 15 of file feedparser.py.

Definition at line 14 of file feedparser.py.

dictionary plone.app.portlets.portlets.feedparser._additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800}

Definition at line 2222 of file feedparser.py.

Definition at line 1833 of file feedparser.py.

Definition at line 43 of file feedparser.py.

Definition at line 256 of file feedparser.py.

Definition at line 2041 of file feedparser.py.

Definition at line 2007 of file feedparser.py.

Definition at line 2030 of file feedparser.py.

Definition at line 2078 of file feedparser.py.

Definition at line 2062 of file feedparser.py.

list plone.app.portlets.portlets.feedparser._iso8601_matches = [re.compile(regex).match for regex in _iso8601_re]

Definition at line 1866 of file feedparser.py.

Initial value:
00001 [
00002     tmpl.replace(
00003     'YYYY', r'(?P<year>\d{4})').replace(
00004     'YY', r'(?P<year>\d\d)').replace(
00005     'MM', r'(?P<month>[01]\d)').replace(
00006     'DD', r'(?P<day>[0123]\d)').replace(
00007     'OOO', r'(?P<ordinal>[0123]\d\d)').replace(
00008     'CC', r'(?P<century>\d\d$)')
00009     + r'(T?(?P<hour>\d{2}):(?P<minute>\d{2})'
00010     + r'(:(?P<second>\d{2}))?'
00011     + r'(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?'
00012     for tmpl in _iso8601_tmpl]

Definition at line 1853 of file feedparser.py.

Initial value:
00001 ['YYYY-?MM-?DD', 'YYYY-MM', 'YYYY-?OOO',
00002                 'YY-?MM-?DD', 'YY-?OOO', 'YYYY', 
00003                 '-YY-?MM', '-OOO', '-YY',
00004                 '--MM-?DD', '--MM',
00005                 '---DD',
00006                 'CC', '']

Definition at line 1847 of file feedparser.py.

Definition at line 1952 of file feedparser.py.

Definition at line 1951 of file feedparser.py.

Definition at line 1950 of file feedparser.py.

Definition at line 1958 of file feedparser.py.

Definition at line 1955 of file feedparser.py.

Definition at line 1953 of file feedparser.py.

Definition at line 1949 of file feedparser.py.

Definition at line 1992 of file feedparser.py.

tuple plone.app.portlets.portlets.feedparser._urifixer = re.compile('^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)')

Definition at line 283 of file feedparser.py.

Definition at line 95 of file feedparser.py.

string plone.app.portlets.portlets.feedparser.ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1"

Definition at line 52 of file feedparser.py.

Definition at line 108 of file feedparser.py.

Definition at line 129 of file feedparser.py.

Definition at line 81 of file feedparser.py.

Definition at line 66 of file feedparser.py.

Definition at line 57 of file feedparser.py.

Definition at line 2641 of file feedparser.py.

Initial value:
00001 {'': 'unknown',
00002                       'rss090': 'RSS 0.90',
00003                       'rss091n': 'RSS 0.91 (Netscape)',
00004                       'rss091u': 'RSS 0.91 (Userland)',
00005                       'rss092': 'RSS 0.92',
00006                       'rss093': 'RSS 0.93',
00007                       'rss094': 'RSS 0.94',
00008                       'rss20': 'RSS 2.0',
00009                       'rss10': 'RSS 1.0',
00010                       'rss': 'RSS (unknown version)',
00011                       'atom01': 'Atom 0.1',
00012                       'atom02': 'Atom 0.2',
00013                       'atom03': 'Atom 0.3',
00014                       'atom10': 'Atom 1.0',
00015                       'atom': 'Atom (unknown version)',
00016                       'cdf': 'CDF',
00017                       'hotrss': 'Hot RSS'
00018                       }

Definition at line 142 of file feedparser.py.

Definition at line 62 of file feedparser.py.

Definition at line 2635 of file feedparser.py.

string plone.app.portlets.portlets.feedparser.USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/"

Definition at line 48 of file feedparser.py.

Definition at line 162 of file feedparser.py.

Definition at line 85 of file feedparser.py.