Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Private Member Functions | Static Private Attributes
Bio.Phylo.PhyloXMLIO.Parser Class Reference

List of all members.

Public Member Functions

def __init__
def read
def parse
def other
def accession
def annotation
def binary_characters
def clade_relation
def color
def confidence
def date
def distribution
def domain
def domain_architecture
def events
def id
def mol_seq
def point
def polygon
def property
def reference
def sequence_relation
def uri

Public Attributes

 root
 context

Private Member Functions

def _parse_phylogeny
def _parse_clade
def _parse_sequence
def _parse_taxonomy

Static Private Attributes

list _clade_complex_types = ['color', 'events', 'binary_characters', 'date']
dictionary _clade_list_types
tuple _clade_tracked_tags

Detailed Description

Methods for parsing all phyloXML nodes from an XML stream.

To minimize memory use, the tree of ElementTree parsing events is cleared
after completing each phylogeny, clade, and top-level 'other' element.
Elements below the clade level are kept in memory until parsing of the
current clade is finished -- this shouldn't be a problem because clade is
the only recursive element, and non-clade nodes below this level are of
bounded size.

Definition at line 259 of file PhyloXMLIO.py.


Constructor & Destructor Documentation

def Bio.Phylo.PhyloXMLIO.Parser.__init__ (   self,
  file 
)

Definition at line 270 of file PhyloXMLIO.py.

00270 
00271     def __init__(self, file):
00272         # Get an iterable context for XML parsing events
00273         context = iter(ElementTree.iterparse(file, events=('start', 'end')))
00274         event, root = context.next()
00275         self.root = root
00276         self.context = context


Member Function Documentation

def Bio.Phylo.PhyloXMLIO.Parser._parse_clade (   self,
  parent 
) [private]
Parse a Clade node and its children, recursively.

Definition at line 366 of file PhyloXMLIO.py.

00366 
00367     def _parse_clade(self, parent):
00368         """Parse a Clade node and its children, recursively."""
00369         clade = PX.Clade(**parent.attrib)
00370         if clade.branch_length is not None:
00371             clade.branch_length = float(clade.branch_length)
00372         # NB: Only evaluate nodes at the current level
00373         tag_stack = []
00374         for event, elem in self.context:
00375             namespace, tag = _split_namespace(elem.tag)
00376             if event == 'start':
00377                 if tag == 'clade':
00378                     clade.clades.append(self._parse_clade(elem))
00379                     continue
00380                 if tag == 'taxonomy':
00381                     clade.taxonomies.append(self._parse_taxonomy(elem))
00382                     continue
00383                 if tag == 'sequence':
00384                     clade.sequences.append(self._parse_sequence(elem))
00385                     continue
00386                 if tag in self._clade_tracked_tags:
00387                     tag_stack.append(tag)
00388             if event == 'end':
00389                 if tag == 'clade':
00390                     elem.clear()
00391                     break
00392                 if tag != tag_stack[-1]:
00393                     continue
00394                 tag_stack.pop()
00395                 # Handle the other non-recursive children
00396                 if tag in self._clade_list_types:
00397                     getattr(clade, self._clade_list_types[tag]).append(
00398                             getattr(self, tag)(elem))
00399                 elif tag in self._clade_complex_types:
00400                     setattr(clade, tag, getattr(self, tag)(elem))
00401                 elif tag == 'branch_length':
00402                     # NB: possible collision with the attribute
00403                     if clade.branch_length is not None:
00404                         raise PhyloXMLError(
00405                                 'Attribute branch_length was already set '
00406                                 'for this Clade.')
00407                     clade.branch_length = _float(elem.text)
00408                 elif tag == 'width':
00409                     clade.width = _float(elem.text)
00410                 elif tag == 'name':
00411                     clade.name = _collapse_wspace(elem.text)
00412                 elif tag == 'node_id':
00413                     clade.node_id = PX.Id(elem.text.strip(),
00414                                           elem.attrib.get('provider'))
00415                 elif namespace != NAMESPACES['phy']:
00416                     clade.other.append(self.other(elem, namespace, tag))
00417                     elem.clear()
00418                 else:
00419                     raise PhyloXMLError('Misidentified tag: ' + tag)
00420         return clade

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser._parse_phylogeny (   self,
  parent 
) [private]
Parse a single phylogeny within the phyloXML tree.

Recursively builds a phylogenetic tree with help from parse_clade, then
clears the XML event history for the phylogeny element and returns
control to the top-level parsing function.

Definition at line 310 of file PhyloXMLIO.py.

00310 
00311     def _parse_phylogeny(self, parent):
00312         """Parse a single phylogeny within the phyloXML tree.
00313 
00314         Recursively builds a phylogenetic tree with help from parse_clade, then
00315         clears the XML event history for the phylogeny element and returns
00316         control to the top-level parsing function.
00317         """
00318         phylogeny = PX.Phylogeny(**_dict_str2bool(parent.attrib,
00319                                                    ['rooted', 'rerootable']))
00320         list_types = {
00321                 # XML tag, plural attribute
00322                 'confidence':   'confidences',
00323                 'property':     'properties',
00324                 'clade_relation': 'clade_relations',
00325                 'sequence_relation': 'sequence_relations',
00326                 }
00327         for event, elem in self.context:
00328             namespace, tag = _split_namespace(elem.tag)
00329             if event == 'start' and tag == 'clade':
00330                 assert phylogeny.root is None, \
00331                         "Phylogeny object should only have 1 clade"
00332                 phylogeny.root = self._parse_clade(elem)
00333                 continue
00334             if event == 'end':
00335                 if tag == 'phylogeny':
00336                     parent.clear()
00337                     break
00338                 # Handle the other non-recursive children
00339                 if tag in list_types:
00340                     getattr(phylogeny, list_types[tag]).append(
00341                             getattr(self, tag)(elem))
00342                 # Complex types
00343                 elif tag in ('date', 'id'):
00344                     setattr(phylogeny, tag, getattr(self, tag)(elem))
00345                 # Simple types
00346                 elif tag in ('name', 'description'):
00347                     setattr(phylogeny, tag, _collapse_wspace(elem.text))
00348                 # Unknown tags
00349                 elif namespace != NAMESPACES['phy']:
00350                     phylogeny.other.append(self.other(elem, namespace, tag))
00351                     parent.clear()
00352                 else:
00353                     # NB: This shouldn't happen in valid files
00354                     raise PhyloXMLError('Misidentified tag: ' + tag)
00355         return phylogeny

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser._parse_sequence (   self,
  parent 
) [private]

Definition at line 421 of file PhyloXMLIO.py.

00421 
00422     def _parse_sequence(self, parent):
00423         sequence = PX.Sequence(**parent.attrib)
00424         for event, elem in self.context:
00425             namespace, tag = _split_namespace(elem.tag)
00426             if event == 'end':
00427                 if tag == 'sequence':
00428                     parent.clear()
00429                     break
00430                 if tag in ('accession', 'mol_seq', 'uri',
00431                         'domain_architecture'):
00432                     setattr(sequence, tag, getattr(self, tag)(elem))
00433                 elif tag == 'annotation':
00434                     sequence.annotations.append(self.annotation(elem))
00435                 elif tag == 'name': 
00436                     sequence.name = _collapse_wspace(elem.text)
00437                 elif tag in ('symbol', 'location'):
00438                     setattr(sequence, tag, elem.text)
00439                 elif namespace != NAMESPACES['phy']:
00440                     sequence.other.append(self.other(elem, namespace, tag))
00441                     parent.clear()
00442         return sequence

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser._parse_taxonomy (   self,
  parent 
) [private]

Definition at line 443 of file PhyloXMLIO.py.

00443 
00444     def _parse_taxonomy(self, parent):
00445         taxonomy = PX.Taxonomy(**parent.attrib)
00446         for event, elem in self.context:
00447             namespace, tag = _split_namespace(elem.tag)
00448             if event == 'end':
00449                 if tag == 'taxonomy':
00450                     parent.clear()
00451                     break
00452                 if tag in ('id', 'uri'):
00453                     setattr(taxonomy, tag, getattr(self, tag)(elem))
00454                 elif tag == 'common_name':
00455                     taxonomy.common_names.append(_collapse_wspace(elem.text))
00456                 elif tag == 'synonym':
00457                     taxonomy.synonyms.append(elem.text)
00458                 elif tag in ('code', 'scientific_name', 'authority', 'rank'):
00459                     # ENH: check_str on rank
00460                     setattr(taxonomy, tag, elem.text)
00461                 elif namespace != NAMESPACES['phy']:
00462                     taxonomy.other.append(self.other(elem, namespace, tag))
00463                     parent.clear()
00464         return taxonomy

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.accession (   self,
  elem 
)

Definition at line 473 of file PhyloXMLIO.py.

00473 
00474     def accession(self, elem):
00475         return PX.Accession(elem.text.strip(), elem.get('source'))

def Bio.Phylo.PhyloXMLIO.Parser.annotation (   self,
  elem 
)

Definition at line 476 of file PhyloXMLIO.py.

00476 
00477     def annotation(self, elem):
00478         return PX.Annotation(
00479                 desc=_collapse_wspace(_get_child_text(elem, 'desc')),
00480                 confidence=_get_child_as(elem, 'confidence', self.confidence),
00481                 properties=_get_children_as(elem, 'property', self.property),
00482                 uri=_get_child_as(elem, 'uri', self.uri),
00483                 **elem.attrib)

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 484 of file PhyloXMLIO.py.

00484 
00485     def binary_characters(self, elem):
00486         def bc_getter(elem):
00487             return _get_children_text(elem, 'bc')
00488         return PX.BinaryCharacters(
00489                 type=elem.get('type'),
00490                 gained_count=_int(elem.get('gained_count')),
00491                 lost_count=_int(elem.get('lost_count')),
00492                 present_count=_int(elem.get('present_count')),
00493                 absent_count=_int(elem.get('absent_count')),
00494                 # Flatten BinaryCharacterList sub-nodes into lists of strings
00495                 gained=_get_child_as(elem, 'gained', bc_getter),
00496                 lost=_get_child_as(elem, 'lost', bc_getter),
00497                 present=_get_child_as(elem, 'present', bc_getter),
00498                 absent=_get_child_as(elem, 'absent', bc_getter))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.clade_relation (   self,
  elem 
)

Definition at line 499 of file PhyloXMLIO.py.

00499 
00500     def clade_relation(self, elem):
00501         return PX.CladeRelation(
00502                 elem.get('type'), elem.get('id_ref_0'), elem.get('id_ref_1'),
00503                 distance=elem.get('distance'),
00504                 confidence=_get_child_as(elem, 'confidence', self.confidence))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.color (   self,
  elem 
)

Definition at line 505 of file PhyloXMLIO.py.

00505 
00506     def color(self, elem):
00507         red, green, blue = (_get_child_text(elem, color, int) for color in
00508                             ('red', 'green', 'blue'))
00509         return PX.BranchColor(red, green, blue)

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.confidence (   self,
  elem 
)

Definition at line 510 of file PhyloXMLIO.py.

00510 
00511     def confidence(self, elem):
00512         return PX.Confidence(
00513                 _float(elem.text),
00514                 elem.get('type'))

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.date (   self,
  elem 
)

Definition at line 515 of file PhyloXMLIO.py.

00515 
00516     def date(self, elem):
00517         return PX.Date(
00518                 unit=elem.get('unit'),
00519                 desc=_collapse_wspace(_get_child_text(elem, 'desc')),
00520                 value=_get_child_text(elem, 'value', float),
00521                 minimum=_get_child_text(elem, 'minimum', float),
00522                 maximum=_get_child_text(elem, 'maximum', float),
00523                 )

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.distribution (   self,
  elem 
)

Definition at line 524 of file PhyloXMLIO.py.

00524 
00525     def distribution(self, elem):
00526         return PX.Distribution(
00527                 desc=_collapse_wspace(_get_child_text(elem, 'desc')),
00528                 points=_get_children_as(elem, 'point', self.point),
00529                 polygons=_get_children_as(elem, 'polygon', self.polygon))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.domain (   self,
  elem 
)

Definition at line 530 of file PhyloXMLIO.py.

00530 
00531     def domain(self, elem):
00532         return PX.ProteinDomain(elem.text.strip(),
00533                 int(elem.get('from')) - 1,
00534                 int(elem.get('to')),
00535                 confidence=_float(elem.get('confidence')),
00536                 id=elem.get('id'))

Here is the call graph for this function:

Here is the caller graph for this function:

Definition at line 537 of file PhyloXMLIO.py.

00537 
00538     def domain_architecture(self, elem):
00539         return PX.DomainArchitecture(
00540                 length=int(elem.get('length')),
00541                 domains=_get_children_as(elem, 'domain', self.domain))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.events (   self,
  elem 
)

Definition at line 542 of file PhyloXMLIO.py.

00542 
00543     def events(self, elem):
00544         return PX.Events(
00545                 type=_get_child_text(elem, 'type'),
00546                 duplications=_get_child_text(elem, 'duplications', int),
00547                 speciations=_get_child_text(elem, 'speciations', int),
00548                 losses=_get_child_text(elem, 'losses', int),
00549                 confidence=_get_child_as(elem, 'confidence', self.confidence))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.id (   self,
  elem 
)

Definition at line 550 of file PhyloXMLIO.py.

00550 
00551     def id(self, elem):
00552         provider = elem.get('provider') or elem.get('type')
00553         return PX.Id(elem.text.strip(), provider)

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.mol_seq (   self,
  elem 
)

Definition at line 554 of file PhyloXMLIO.py.

00554 
00555     def mol_seq(self, elem):
00556         is_aligned = elem.get('is_aligned')
00557         if is_aligned is not None:
00558             is_aligned = _str2bool(is_aligned)
00559         return PX.MolSeq(elem.text.strip(), is_aligned=is_aligned)

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.other (   self,
  elem,
  namespace,
  localtag 
)

Definition at line 465 of file PhyloXMLIO.py.

00465 
00466     def other(self, elem, namespace, localtag):
00467         return PX.Other(localtag, namespace, elem.attrib,
00468                   value=elem.text and elem.text.strip() or None,
00469                   children=[self.other(child, *_split_namespace(child.tag))
00470                             for child in elem])

Here is the call graph for this function:

Here is the caller graph for this function:

Parse the phyloXML file incrementally and return each phylogeny.

Definition at line 301 of file PhyloXMLIO.py.

00301 
00302     def parse(self):
00303         """Parse the phyloXML file incrementally and return each phylogeny."""
00304         phytag = _ns('phylogeny')
00305         for event, elem in self.context:
00306             if event == 'start' and elem.tag == phytag:
00307                 yield self._parse_phylogeny(elem)

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.point (   self,
  elem 
)

Definition at line 560 of file PhyloXMLIO.py.

00560 
00561     def point(self, elem):
00562         return PX.Point(
00563                 elem.get('geodetic_datum'),
00564                 _get_child_text(elem, 'lat', float),
00565                 _get_child_text(elem, 'long', float),
00566                 alt=_get_child_text(elem, 'alt', float),
00567                 alt_unit=elem.get('alt_unit'))

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.polygon (   self,
  elem 
)

Definition at line 568 of file PhyloXMLIO.py.

00568 
00569     def polygon(self, elem):
00570         return PX.Polygon(
00571                 points=_get_children_as(elem, 'point', self.point))

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.property (   self,
  elem 
)

Definition at line 572 of file PhyloXMLIO.py.

00572 
00573     def property(self, elem):
00574         return PX.Property(elem.text.strip(),
00575                 elem.get('ref'), elem.get('applies_to'), elem.get('datatype'),
00576                 unit=elem.get('unit'),
00577                 id_ref=elem.get('id_ref'))

Here is the caller graph for this function:

Parse the phyloXML file and create a single Phyloxml object.

Definition at line 277 of file PhyloXMLIO.py.

00277 
00278     def read(self):
00279         """Parse the phyloXML file and create a single Phyloxml object."""
00280         phyloxml = PX.Phyloxml(dict((_local(key), val)
00281                                 for key, val in self.root.items()))
00282         other_depth = 0
00283         for event, elem in self.context:
00284             namespace, localtag = _split_namespace(elem.tag)
00285             if event == 'start':
00286                 if namespace != NAMESPACES['phy']:
00287                     other_depth += 1
00288                     continue
00289                 if localtag == 'phylogeny':
00290                     phylogeny = self._parse_phylogeny(elem)
00291                     phyloxml.phylogenies.append(phylogeny)
00292             if event == 'end' and namespace != NAMESPACES['phy']:
00293                 # Deal with items not specified by phyloXML
00294                 other_depth -= 1
00295                 if other_depth == 0:
00296                     # We're directly under the root node -- evaluate
00297                     otr = self.other(elem, namespace, localtag)
00298                     phyloxml.other.append(otr)
00299                     self.root.clear()
00300         return phyloxml

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.reference (   self,
  elem 
)

Definition at line 578 of file PhyloXMLIO.py.

00578 
00579     def reference(self, elem):
00580         return PX.Reference(
00581                 doi=elem.get('doi'),
00582                 desc=_get_child_text(elem, 'desc'))

Here is the call graph for this function:

Definition at line 583 of file PhyloXMLIO.py.

00583 
00584     def sequence_relation(self, elem):
00585         return PX.SequenceRelation(
00586                 elem.get('type'), elem.get('id_ref_0'), elem.get('id_ref_1'),
00587                 distance=_float(elem.get('distance')),
00588                 confidence=_get_child_as(elem, 'confidence', self.confidence))

Here is the call graph for this function:

def Bio.Phylo.PhyloXMLIO.Parser.uri (   self,
  elem 
)

Definition at line 589 of file PhyloXMLIO.py.

00589 
00590     def uri(self, elem):
00591         return PX.Uri(elem.text.strip(),
00592                 desc=_collapse_wspace(elem.get('desc')),
00593                 type=elem.get('type'))
00594 
00595 
00596 
00597 # ---------------------------------------------------------
00598 # OUTPUT
00599 # ---------------------------------------------------------

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

Definition at line 356 of file PhyloXMLIO.py.

dictionary Bio.Phylo.PhyloXMLIO.Parser._clade_list_types [static, private]
Initial value:
{
            'confidence':   'confidences',
            'distribution': 'distributions',
            'reference':    'references',
            'property':     'properties',
            }

Definition at line 357 of file PhyloXMLIO.py.

Initial value:
set(_clade_complex_types + _clade_list_types.keys()
                              + ['branch_length', 'name', 'node_id', 'width'])

Definition at line 363 of file PhyloXMLIO.py.

Definition at line 275 of file PhyloXMLIO.py.

Definition at line 274 of file PhyloXMLIO.py.


The documentation for this class was generated from the following file: