Back to index

python-biopython  1.60
Public Member Functions | Private Member Functions | Private Attributes
Bio.Blast.NCBIXML.BlastParser Class Reference
Inheritance diagram for Bio.Blast.NCBIXML.BlastParser:
Inheritance graph
[legend]
Collaboration diagram for Bio.Blast.NCBIXML.BlastParser:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def reset
def startElement
def characters
def endElement

Private Member Functions

def _start_Iteration
def _end_Iteration
def _end_BlastOutput_program
def _end_BlastOutput_version
def _end_BlastOutput_reference
def _end_BlastOutput_db
def _end_BlastOutput_query_ID
def _end_BlastOutput_query_def
def _end_BlastOutput_query_len
def _end_Iteration_query_ID
def _end_Iteration_query_def
def _end_Iteration_query_len

Private Attributes

 _parser
 _records
 _header
 _parameters
 _blast

Detailed Description

Parse XML BLAST data into a Record.Blast object

All XML 'action' methods are private methods and may be:
_start_TAG      called when the start tag is found
_end_TAG        called when the end tag is found

Definition at line 119 of file NCBIXML.py.


Constructor & Destructor Documentation

def Bio.Blast.NCBIXML.BlastParser.__init__ (   self,
  debug = 0 
)
Constructor

debug - integer, amount of debug information to print

Reimplemented from Bio.Blast.NCBIXML._XMLparser.

Definition at line 127 of file NCBIXML.py.

00127 
00128     def __init__(self, debug=0):
00129         """Constructor
00130 
00131         debug - integer, amount of debug information to print
00132         """
00133         # Calling superclass method
00134         _XMLparser.__init__(self, debug)
00135         
00136         self._parser = xml.sax.make_parser()
00137         self._parser.setContentHandler(self)
00138         
00139         # To avoid ValueError: unknown url type: NCBI_BlastOutput.dtd
00140         self._parser.setFeature(xml.sax.handler.feature_validation, 0)
00141         self._parser.setFeature(xml.sax.handler.feature_namespaces, 0)
00142         self._parser.setFeature(xml.sax.handler.feature_external_pes, 0)
00143         self._parser.setFeature(xml.sax.handler.feature_external_ges, 0)
00144 
00145         self.reset()

Here is the caller graph for this function:


Member Function Documentation

the database(s) searched

Save this to put on each blast record object

Definition at line 251 of file NCBIXML.py.

00251 
00252     def _end_BlastOutput_db(self):
00253         """the database(s) searched
00254 
00255         Save this to put on each blast record object
00256         """
00257         self._header.database = self._value

BLAST program, e.g., blastp, blastn, etc.

Save this to put on each blast record object

Definition at line 214 of file NCBIXML.py.

00214 
00215     def _end_BlastOutput_program(self):
00216         """BLAST program, e.g., blastp, blastn, etc.
00217 
00218         Save this to put on each blast record object
00219         """
00220         self._header.application = self._value.upper()

the definition line of the query

Important in old pre 2.2.14 BLAST, for recent versions
<Iteration_query-def> is enough

Definition at line 266 of file NCBIXML.py.

00266 
00267     def _end_BlastOutput_query_def(self):
00268         """the definition line of the query
00269 
00270         Important in old pre 2.2.14 BLAST, for recent versions
00271         <Iteration_query-def> is enough
00272         """
00273         self._header.query = self._value

the identifier of the query

Important in old pre 2.2.14 BLAST, for recent versions
<Iteration_query-ID> is enough

Definition at line 258 of file NCBIXML.py.

00258 
00259     def _end_BlastOutput_query_ID(self):
00260         """the identifier of the query
00261 
00262         Important in old pre 2.2.14 BLAST, for recent versions
00263         <Iteration_query-ID> is enough
00264         """
00265         self._header.query_id = self._value

the length of the query

Important in old pre 2.2.14 BLAST, for recent versions
<Iteration_query-len> is enough

Definition at line 274 of file NCBIXML.py.

00274 
00275     def _end_BlastOutput_query_len(self):
00276         """the length of the query
00277 
00278         Important in old pre 2.2.14 BLAST, for recent versions
00279         <Iteration_query-len> is enough
00280         """
00281         self._header.query_letters = int(self._value)

a reference to the article describing the algorithm

Save this to put on each blast record object

Definition at line 244 of file NCBIXML.py.

00244 
00245     def _end_BlastOutput_reference(self):
00246         """a reference to the article describing the algorithm
00247 
00248         Save this to put on each blast record object
00249         """
00250         self._header.reference = self._value

version number and date of the BLAST engine.

e.g. "BLASTX 2.2.12 [Aug-07-2005]" but there can also be
variants like "BLASTP 2.2.18+" without the date.

Save this to put on each blast record object

Definition at line 221 of file NCBIXML.py.

00221 
00222     def _end_BlastOutput_version(self):
00223         """version number and date of the BLAST engine.
00224 
00225         e.g. "BLASTX 2.2.12 [Aug-07-2005]" but there can also be
00226         variants like "BLASTP 2.2.18+" without the date.
00227 
00228         Save this to put on each blast record object
00229         """
00230         parts = self._value.split()
00231         #TODO - Check the first word starts with BLAST?
00232 
00233         #The version is the second word (field one)
00234         self._header.version = parts[1]
00235         
00236         #Check there is a third word (the date)
00237         if len(parts) >= 3:
00238             if parts[2][0] == "[" and parts[2][-1] == "]":
00239                 self._header.date = parts[2][1:-1]
00240             else:
00241                 #Assume this is still a date, but without the
00242                 #square brackets
00243                 self._header.date = parts[2]

Definition at line 157 of file NCBIXML.py.

00157 
00158     def _end_Iteration(self):
00159         # We stored a lot of generic "top level" information
00160         # in self._header (an object of type Record.Header)
00161         self._blast.reference = self._header.reference
00162         self._blast.date = self._header.date
00163         self._blast.version = self._header.version
00164         self._blast.database = self._header.database
00165         self._blast.application = self._header.application
00166 
00167         # These are required for "old" pre 2.2.14 files
00168         # where only <BlastOutput_query-ID>, <BlastOutput_query-def>
00169         # and <BlastOutput_query-len> were used.  Now they
00170         # are suplemented/replaced by <Iteration_query-ID>,
00171         # <Iteration_query-def> and <Iteration_query-len>
00172         if not hasattr(self._blast, "query") \
00173         or not self._blast.query:
00174             self._blast.query = self._header.query
00175         if not hasattr(self._blast, "query_id") \
00176         or not self._blast.query_id:
00177             self._blast.query_id = self._header.query_id
00178         if not hasattr(self._blast, "query_letters") \
00179         or not self._blast.query_letters:
00180             self._blast.query_letters = self._header.query_letters
00181 
00182         # Hack to record the query length as both the query_letters and
00183         # query_length properties (as in the plain text parser, see
00184         # Bug 2176 comment 12):
00185         self._blast.query_length = self._blast.query_letters
00186         # Perhaps in the long term we should deprecate one, but I would
00187         # prefer to drop query_letters - so we need a transition period
00188         # with both.
00189 
00190         # Hack to record the claimed database size as database_length
00191         # (as well as in num_letters_in_database, see Bug 2176 comment 13):
00192         self._blast.database_length = self._blast.num_letters_in_database
00193         # TODO? Deprecate database_letters next?
00194 
00195         # Hack to record the claimed database sequence count as database_sequences
00196         self._blast.database_sequences = self._blast.num_sequences_in_database
00197 
00198         # Apply the "top level" parameter information
00199         self._blast.matrix = self._parameters.matrix
00200         self._blast.num_seqs_better_e = self._parameters.num_seqs_better_e
00201         self._blast.gap_penalties = self._parameters.gap_penalties
00202         self._blast.filter = self._parameters.filter
00203         self._blast.expect = self._parameters.expect
00204         self._blast.sc_match = self._parameters.sc_match
00205         self._blast.sc_mismatch = self._parameters.sc_mismatch
00206 
00207         #Add to the list
00208         self._records.append(self._blast)
00209         #Clear the object (a new empty one is create in _start_Iteration)
00210         self._blast = None
00211 
00212         if self._debug : "NCBIXML: Added Blast record to results"

the definition line of the query

Definition at line 287 of file NCBIXML.py.

00287 
00288     def _end_Iteration_query_def(self):
00289         """the definition line of the query
00290         """
00291         self._blast.query = self._value

the identifier of the query

Definition at line 282 of file NCBIXML.py.

00282 
00283     def _end_Iteration_query_ID(self):
00284         """the identifier of the query
00285         """
00286         self._blast.query_id = self._value

the length of the query

Definition at line 292 of file NCBIXML.py.

00292 
00293     def _end_Iteration_query_len(self):
00294         """the length of the query
00295         """
00296         self._blast.query_letters = int(self._value)

Definition at line 153 of file NCBIXML.py.

00153 
00154     def _start_Iteration(self):
00155         self._blast = Record.Blast()
00156         pass

def Bio.Blast.NCBIXML._XMLparser.characters (   self,
  ch 
) [inherited]
Found some text

ch -- characters read

Definition at line 87 of file NCBIXML.py.

00087 
00088     def characters(self, ch):
00089         """Found some text
00090 
00091         ch -- characters read
00092         """
00093         self._value += ch # You don't ever get the whole string

def Bio.Blast.NCBIXML._XMLparser.endElement (   self,
  name 
) [inherited]
Found XML end tag

name -- tag name

Definition at line 94 of file NCBIXML.py.

00094 
00095     def endElement(self, name):
00096         """Found XML end tag
00097 
00098         name -- tag name
00099         """
00100         # DON'T strip any white space, we may need it e.g. the hsp-midline
00101         
00102         # Try to call a method (defined in subclasses)
00103         method = self._secure_name('_end_' + name)
00104         #Note could use try / except AttributeError
00105         #BUT I found often triggered by nested errors...
00106         if hasattr(self, method):
00107             eval("self.%s()" % method)
00108             if self._debug > 2:
00109                 print "NCBIXML: Parsed:  " + method, self._value
00110         else:
00111             # Doesn't exist (yet)
00112             if method not in self._debug_ignore_list:
00113                 if self._debug > 1:
00114                     print "NCBIXML: Ignored: " + method, self._value
00115                 self._debug_ignore_list.append(method)
00116         
00117         # Reset character buffer
00118         self._value = ''
        

Here is the call graph for this function:

Reset all the data allowing reuse of the BlastParser() object

Definition at line 146 of file NCBIXML.py.

00146 
00147     def reset(self):
00148         """Reset all the data allowing reuse of the BlastParser() object"""
00149         self._records = []
00150         self._header = Record.Header()
00151         self._parameters = Record.Parameters()
00152         self._parameters.filter = None #Maybe I should update the class?

Here is the caller graph for this function:

def Bio.Blast.NCBIXML._XMLparser.startElement (   self,
  name,
  attr 
) [inherited]
Found XML start tag

No real need of attr, BLAST DTD doesn't use them

name -- name of the tag

attr -- tag attributes

Definition at line 53 of file NCBIXML.py.

00053 
00054     def startElement(self, name, attr):
00055         """Found XML start tag
00056 
00057         No real need of attr, BLAST DTD doesn't use them
00058 
00059         name -- name of the tag
00060 
00061         attr -- tag attributes
00062         """
00063         self._tag.append(name)
00064         
00065         # Try to call a method (defined in subclasses)
00066         method = self._secure_name('_start_' + name)
00067 
00068         #Note could use try / except AttributeError
00069         #BUT I found often triggered by nested errors...
00070         if hasattr(self, method):
00071             eval("self.%s()" % method)
00072             if self._debug > 4:
00073                 print "NCBIXML: Parsed:  " + method
00074         else:
00075             # Doesn't exist (yet)
00076             if method not in self._debug_ignore_list:
00077                 if self._debug > 3:
00078                     print "NCBIXML: Ignored: " + method
00079                 self._debug_ignore_list.append(method)
00080 
00081         #We don't care about white space in parent tags like Hsp,
00082         #but that white space doesn't belong to child tags like Hsp_midline
00083         if self._value.strip():
00084             raise ValueError("What should we do with %s before the %s tag?" \
00085                              % (repr(self._value), name))
00086         self._value = ""

Here is the call graph for this function:


Member Data Documentation

Definition at line 154 of file NCBIXML.py.

Definition at line 149 of file NCBIXML.py.

Definition at line 150 of file NCBIXML.py.

Definition at line 135 of file NCBIXML.py.

Definition at line 148 of file NCBIXML.py.


The documentation for this class was generated from the following file: