Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions
Bio.GenBank.Scanner.InsdcScanner Class Reference
Inheritance diagram for Bio.GenBank.Scanner.InsdcScanner:
Inheritance graph
[legend]
Collaboration diagram for Bio.GenBank.Scanner.InsdcScanner:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def set_handle
def find_start
def parse_header
def parse_features
def parse_feature
def parse_footer
def feed
def parse
def parse_records
def parse_cds_features

Public Attributes

 debug
 line
 handle

Static Public Attributes

string RECORD_START = "XXX"
int HEADER_WIDTH = 3
list FEATURE_START_MARKERS = ["XXX***FEATURES***XXX"]
list FEATURE_END_MARKERS = ["XXX***END FEATURES***XXX"]
int FEATURE_QUALIFIER_INDENT = 0
string FEATURE_QUALIFIER_SPACER = ""
list SEQUENCE_HEADERS = ["XXX"]

Private Member Functions

def _feed_first_line
def _feed_header_lines
def _feed_feature_table
def _feed_misc_lines

Detailed Description

Basic functions for breaking up a GenBank/EMBL file into sub sections.

The International Nucleotide Sequence Database Collaboration (INSDC)
between the DDBJ, EMBL, and GenBank.  These organisations all use the
same "Feature Table" layout in their plain text flat file formats.

However, the header and sequence sections of an EMBL file are very
different in layout to those produced by GenBank/DDBJ.

Definition at line 35 of file Scanner.py.


Constructor & Destructor Documentation

def Bio.GenBank.Scanner.InsdcScanner.__init__ (   self,
  debug = 0 
)

Definition at line 54 of file Scanner.py.

00054 
00055     def __init__(self, debug=0):
00056         assert len(self.RECORD_START)==self.HEADER_WIDTH
00057         for marker in self.SEQUENCE_HEADERS:
00058             assert marker==marker.rstrip()
00059         assert len(self.FEATURE_QUALIFIER_SPACER)==self.FEATURE_QUALIFIER_INDENT
00060         self.debug = debug
00061         self.line = None

Here is the caller graph for this function:


Member Function Documentation

def Bio.GenBank.Scanner.InsdcScanner._feed_feature_table (   self,
  consumer,
  feature_tuples 
) [private]
Handle the feature table (list of tuples), passing data to the comsumer

Used by the parse_records() and parse() methods.

Definition at line 342 of file Scanner.py.

00342 
00343     def _feed_feature_table(self, consumer, feature_tuples):
00344         """Handle the feature table (list of tuples), passing data to the comsumer
00345         
00346         Used by the parse_records() and parse() methods.
00347         """
00348         consumer.start_feature_table()
00349         for feature_key, location_string, qualifiers in feature_tuples:
00350             consumer.feature_key(feature_key)
00351             consumer.location(location_string)
00352             for q_key, q_value in qualifiers:
00353                 if q_value is None:
00354                     consumer.feature_qualifier(q_key, q_value)
00355                 else:
00356                     consumer.feature_qualifier(q_key, q_value.replace("\n"," "))
00357 

def Bio.GenBank.Scanner.InsdcScanner._feed_first_line (   self,
  consumer,
  line 
) [private]
Handle the LOCUS/ID line, passing data to the comsumer

This should be implemented by the EMBL / GenBank specific subclass

Used by the parse_records() and parse() methods.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 323 of file Scanner.py.

00323 
00324     def _feed_first_line(self, consumer, line):
00325         """Handle the LOCUS/ID line, passing data to the comsumer
00326         
00327         This should be implemented by the EMBL / GenBank specific subclass
00328         
00329         Used by the parse_records() and parse() methods.
00330         """
00331         pass

def Bio.GenBank.Scanner.InsdcScanner._feed_header_lines (   self,
  consumer,
  lines 
) [private]
Handle the header lines (list of strings), passing data to the comsumer

This should be implemented by the EMBL / GenBank specific subclass

Used by the parse_records() and parse() methods.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 332 of file Scanner.py.

00332 
00333     def _feed_header_lines(self, consumer, lines):
00334         """Handle the header lines (list of strings), passing data to the comsumer
00335         
00336         This should be implemented by the EMBL / GenBank specific subclass
00337         
00338         Used by the parse_records() and parse() methods.
00339         """
00340         pass
00341 

def Bio.GenBank.Scanner.InsdcScanner._feed_misc_lines (   self,
  consumer,
  lines 
) [private]
Handle any lines between features and sequence (list of strings), passing data to the consumer

This should be implemented by the EMBL / GenBank specific subclass

Used by the parse_records() and parse() methods.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 358 of file Scanner.py.

00358 
00359     def _feed_misc_lines(self, consumer, lines):
00360         """Handle any lines between features and sequence (list of strings), passing data to the consumer
00361         
00362         This should be implemented by the EMBL / GenBank specific subclass
00363         
00364         Used by the parse_records() and parse() methods.
00365         """
00366         pass

def Bio.GenBank.Scanner.InsdcScanner.feed (   self,
  handle,
  consumer,
  do_features = True 
)
Feed a set of data into the consumer.

This method is intended for use with the "old" code in Bio.GenBank

Arguments:
handle - A handle with the information to parse.
consumer - The consumer that should be informed of events.
do_features - Boolean, should the features be parsed?
      Skipping the features can be much faster.

Return values:
true  - Passed a record
false - Did not find a record

Definition at line 367 of file Scanner.py.

00367 
00368     def feed(self, handle, consumer, do_features=True):
00369         """Feed a set of data into the consumer.
00370 
00371         This method is intended for use with the "old" code in Bio.GenBank
00372 
00373         Arguments:
00374         handle - A handle with the information to parse.
00375         consumer - The consumer that should be informed of events.
00376         do_features - Boolean, should the features be parsed?
00377                       Skipping the features can be much faster.
00378 
00379         Return values:
00380         true  - Passed a record
00381         false - Did not find a record
00382         """        
00383         #Should work with both EMBL and GenBank files provided the
00384         #equivalent Bio.GenBank._FeatureConsumer methods are called...
00385         self.set_handle(handle)
00386         if not self.find_start():
00387             #Could not find (another) record
00388             consumer.data=None
00389             return False
00390                        
00391         #We use the above class methods to parse the file into a simplified format.
00392         #The first line, header lines and any misc lines after the features will be
00393         #dealt with by GenBank / EMBL specific derived classes.
00394 
00395         #First line and header:
00396         self._feed_first_line(consumer, self.line)
00397         self._feed_header_lines(consumer, self.parse_header())
00398 
00399         #Features (common to both EMBL and GenBank):
00400         if do_features:
00401             self._feed_feature_table(consumer, self.parse_features(skip=False))
00402         else:
00403             self.parse_features(skip=True) # ignore the data
00404         
00405         #Footer and sequence
00406         misc_lines, sequence_string = self.parse_footer()
00407         self._feed_misc_lines(consumer, misc_lines)
00408 
00409         consumer.sequence(sequence_string)
00410         #Calls to consumer.base_number() do nothing anyway
00411         consumer.record_end("//")
00412 
00413         assert self.line == "//"
00414 
00415         #And we are done
00416         return True

Here is the call graph for this function:

Here is the caller graph for this function:

Read in lines until find the ID/LOCUS line, which is returned.

Any preamble (such as the header used by the NCBI on *.seq.gz archives)
will we ignored.

Definition at line 66 of file Scanner.py.

00066 
00067     def find_start(self):
00068         """Read in lines until find the ID/LOCUS line, which is returned.
00069         
00070         Any preamble (such as the header used by the NCBI on *.seq.gz archives)
00071         will we ignored."""
00072         while True:
00073             if self.line:
00074                 line = self.line
00075                 self.line = ""
00076             else:
00077                 line = self.handle.readline()
00078             if not line:
00079                 if self.debug : print "End of file"
00080                 return None
00081             if line[:self.HEADER_WIDTH]==self.RECORD_START:
00082                 if self.debug > 1: print "Found the start of a record:\n" + line
00083                 break
00084             line = line.rstrip()
00085             if line == "//":
00086                 if self.debug > 1: print "Skipping // marking end of last record"
00087             elif line == "":
00088                 if self.debug > 1: print "Skipping blank line before record"
00089             else:
00090                 #Ignore any header before the first ID/LOCUS line.
00091                 if self.debug > 1:
00092                         print "Skipping header line before record:\n" + line
00093         self.line = line
00094         return line

Here is the caller graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.parse (   self,
  handle,
  do_features = True 
)
Returns a SeqRecord (with SeqFeatures if do_features=True)

See also the method parse_records() for use on multi-record files.

Definition at line 417 of file Scanner.py.

00417 
00418     def parse(self, handle, do_features=True):
00419         """Returns a SeqRecord (with SeqFeatures if do_features=True)
00420 
00421         See also the method parse_records() for use on multi-record files.
00422         """
00423         from Bio.GenBank import _FeatureConsumer
00424         from Bio.GenBank.utils import FeatureValueCleaner
00425 
00426         consumer = _FeatureConsumer(use_fuzziness = 1, 
00427                     feature_cleaner = FeatureValueCleaner())
00428 
00429         if self.feed(handle, consumer, do_features):
00430             return consumer.data
00431         else:
00432             return None
00433 
    

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.parse_cds_features (   self,
  handle,
  alphabet = generic_protein,
  tags2id = ('protein_id','locus_tag',
  product 
)
Returns SeqRecord object iterator

Each CDS feature becomes a SeqRecord.

alphabet - Used for any sequence found in a translation field.
tags2id  - Tupple of three strings, the feature keys to use
   for the record id, name and description,

This method is intended for use in Bio.SeqIO

Definition at line 454 of file Scanner.py.

00454 
00455                            tags2id=('protein_id','locus_tag','product')):
00456         """Returns SeqRecord object iterator
00457 
00458         Each CDS feature becomes a SeqRecord.
00459 
00460         alphabet - Used for any sequence found in a translation field.
00461         tags2id  - Tupple of three strings, the feature keys to use
00462                    for the record id, name and description,
00463 
00464         This method is intended for use in Bio.SeqIO
00465         """
00466         self.set_handle(handle)
00467         while self.find_start():
00468             #Got an EMBL or GenBank record...
00469             self.parse_header() # ignore header lines!
00470             feature_tuples = self.parse_features()
00471             #self.parse_footer() # ignore footer lines!
00472             while True:
00473                 line = self.handle.readline()
00474                 if not line : break
00475                 if line[:2]=="//" : break
00476             self.line = line.rstrip()
00477 
00478             #Now go though those features...
00479             for key, location_string, qualifiers in feature_tuples:
00480                 if key=="CDS":
00481                     #Create SeqRecord
00482                     #================
00483                     #SeqRecord objects cannot be created with annotations, they
00484                     #must be added afterwards.  So create an empty record and
00485                     #then populate it:
00486                     record = SeqRecord(seq=None)
00487                     annotations = record.annotations
00488 
00489                     #Should we add a location object to the annotations?
00490                     #I *think* that only makes sense for SeqFeatures with their
00491                     #sub features...
00492                     annotations['raw_location'] = location_string.replace(' ','')
00493 
00494                     for (qualifier_name, qualifier_data) in qualifiers:
00495                         if qualifier_data is not None \
00496                         and qualifier_data[0]=='"' and qualifier_data[-1]=='"':
00497                             #Remove quotes
00498                             qualifier_data = qualifier_data[1:-1]
00499                         #Append the data to the annotation qualifier...
00500                         if qualifier_name == "translation":
00501                             assert record.seq is None, "Multiple translations!"
00502                             record.seq = Seq(qualifier_data.replace("\n",""), alphabet)
00503                         elif qualifier_name == "db_xref":
00504                             #its a list, possibly empty.  Its safe to extend
00505                             record.dbxrefs.append(qualifier_data)
00506                         else:
00507                             if qualifier_data is not None:
00508                                 qualifier_data = qualifier_data.replace("\n"," ").replace("  "," ")
00509                             try:
00510                                 annotations[qualifier_name] += " " + qualifier_data
00511                             except KeyError:
00512                                 #Not an addition to existing data, its the first bit
00513                                 annotations[qualifier_name]= qualifier_data
00514                         
00515                     #Fill in the ID, Name, Description
00516                     #=================================
00517                     try:
00518                         record.id = annotations[tags2id[0]]
00519                     except KeyError:
00520                         pass
00521                     try:
00522                         record.name = annotations[tags2id[1]]
00523                     except KeyError:
00524                         pass
00525                     try:
00526                         record.description = annotations[tags2id[2]]
00527                     except KeyError:
00528                         pass
00529 
00530                     yield record
00531 

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.parse_feature (   self,
  feature_key,
  lines 
)
Expects a feature as a list of strings, returns a tuple (key, location, qualifiers)

For example given this GenBank feature:

     CDS             complement(join(490883..490885,1..879))
             /locus_tag="NEQ001"
             /note="conserved hypothetical [Methanococcus jannaschii];
             COG1583:Uncharacterized ACR; IPR001472:Bipartite nuclear
             localization signal; IPR002743: Protein of unknown
             function DUF57"
             /codon_start=1
             /transl_table=11
             /product="hypothetical protein"
             /protein_id="NP_963295.1"
             /db_xref="GI:41614797"
             /db_xref="GeneID:2732620"
             /translation="MRLLLELKALNSIDKKQLSNYLIQGFIYNILKNTEYSWLHNWKK
             EKYFNFTLIPKKDIIENKRYYLIISSPDKRFIEVLHNKIKDLDIITIGLAQFQLRKTK
             KFDPKLRFPWVTITPIVLREGKIVILKGDKYYKVFVKRLEELKKYNLIKKKEPILEEP
             IEISLNQIKDGWKIIDVKDRYYDFRNKSFSAFSNWLRDLKEQSLRKYNNFCGKNFYFE
             EAIFEGFTFYKTVSIRIRINRGEAVYIGTLWKELNVYRKLDKEEREFYKFLYDCGLGS
             LNSMGFGFVNTKKNSAR"

Then should give input key="CDS" and the rest of the data as a list of strings
lines=["complement(join(490883..490885,1..879))", ..., "LNSMGFGFVNTKKNSAR"]
where the leading spaces and trailing newlines have been removed.

Returns tuple containing: (key as string, location string, qualifiers as list)
as follows for this example:

key = "CDS", string
location = "complement(join(490883..490885,1..879))", string
qualifiers = list of string tuples:

[('locus_tag', '"NEQ001"'),
 ('note', '"conserved hypothetical [Methanococcus jannaschii];\nCOG1583:..."'),
 ('codon_start', '1'),
 ('transl_table', '11'),
 ('product', '"hypothetical protein"'),
 ('protein_id', '"NP_963295.1"'),
 ('db_xref', '"GI:41614797"'),
 ('db_xref', '"GeneID:2732620"'),
 ('translation', '"MRLLLELKALNSIDKKQLSNYLIQGFIYNILKNTEYSWLHNWKK\nEKYFNFT..."')]

In the above example, the "note" and "translation" were edited for compactness,
and they would contain multiple new line characters (displayed above as \n)

If a qualifier is quoted (in this case, everything except codon_start and
transl_table) then the quotes are NOT removed.

Note that no whitespace is removed.

Definition at line 192 of file Scanner.py.

00192 
00193     def parse_feature(self, feature_key, lines):
00194         """Expects a feature as a list of strings, returns a tuple (key, location, qualifiers)
00195 
00196         For example given this GenBank feature:
00197 
00198              CDS             complement(join(490883..490885,1..879))
00199                              /locus_tag="NEQ001"
00200                              /note="conserved hypothetical [Methanococcus jannaschii];
00201                              COG1583:Uncharacterized ACR; IPR001472:Bipartite nuclear
00202                              localization signal; IPR002743: Protein of unknown
00203                              function DUF57"
00204                              /codon_start=1
00205                              /transl_table=11
00206                              /product="hypothetical protein"
00207                              /protein_id="NP_963295.1"
00208                              /db_xref="GI:41614797"
00209                              /db_xref="GeneID:2732620"
00210                              /translation="MRLLLELKALNSIDKKQLSNYLIQGFIYNILKNTEYSWLHNWKK
00211                              EKYFNFTLIPKKDIIENKRYYLIISSPDKRFIEVLHNKIKDLDIITIGLAQFQLRKTK
00212                              KFDPKLRFPWVTITPIVLREGKIVILKGDKYYKVFVKRLEELKKYNLIKKKEPILEEP
00213                              IEISLNQIKDGWKIIDVKDRYYDFRNKSFSAFSNWLRDLKEQSLRKYNNFCGKNFYFE
00214                              EAIFEGFTFYKTVSIRIRINRGEAVYIGTLWKELNVYRKLDKEEREFYKFLYDCGLGS
00215                              LNSMGFGFVNTKKNSAR"
00216 
00217         Then should give input key="CDS" and the rest of the data as a list of strings
00218         lines=["complement(join(490883..490885,1..879))", ..., "LNSMGFGFVNTKKNSAR"]
00219         where the leading spaces and trailing newlines have been removed.
00220 
00221         Returns tuple containing: (key as string, location string, qualifiers as list)
00222         as follows for this example:
00223 
00224         key = "CDS", string
00225         location = "complement(join(490883..490885,1..879))", string
00226         qualifiers = list of string tuples:
00227 
00228         [('locus_tag', '"NEQ001"'),
00229          ('note', '"conserved hypothetical [Methanococcus jannaschii];\nCOG1583:..."'),
00230          ('codon_start', '1'),
00231          ('transl_table', '11'),
00232          ('product', '"hypothetical protein"'),
00233          ('protein_id', '"NP_963295.1"'),
00234          ('db_xref', '"GI:41614797"'),
00235          ('db_xref', '"GeneID:2732620"'),
00236          ('translation', '"MRLLLELKALNSIDKKQLSNYLIQGFIYNILKNTEYSWLHNWKK\nEKYFNFT..."')]
00237 
00238         In the above example, the "note" and "translation" were edited for compactness,
00239         and they would contain multiple new line characters (displayed above as \n)
00240 
00241         If a qualifier is quoted (in this case, everything except codon_start and
00242         transl_table) then the quotes are NOT removed.
00243 
00244         Note that no whitespace is removed.
00245         """
00246         #Skip any blank lines
00247         iterator = iter(filter(None, lines))
00248         try:
00249             line = iterator.next()
00250 
00251             feature_location = line.strip()
00252             while feature_location[-1:]==",":
00253                 #Multiline location, still more to come!
00254                 line = iterator.next()
00255                 feature_location += line.strip()
00256 
00257             qualifiers=[]
00258 
00259             for i, line in enumerate(iterator):
00260                 # check for extra wrapping of the location closing parentheses
00261                 if i == 0 and line.startswith(")"):
00262                     feature_location += line.strip()
00263                 elif line[0]=="/":
00264                     #New qualifier
00265                     i = line.find("=")
00266                     key = line[1:i] #does not work if i==-1
00267                     value = line[i+1:] #we ignore 'value' if i==-1
00268                     if i==-1:
00269                         #Qualifier with no key, e.g. /pseudo
00270                         key = line[1:]
00271                         qualifiers.append((key,None))
00272                     elif not value:
00273                         #ApE can output /note=
00274                         qualifiers.append((key,""))
00275                     elif value[0]=='"':
00276                         #Quoted...
00277                         if value[-1]!='"' or value!='"':
00278                             #No closing quote on the first line...
00279                             while value[-1] != '"':
00280                                 value += "\n" + iterator.next()
00281                         else:
00282                             #One single line (quoted)
00283                             assert value == '"'
00284                             if self.debug : print "Quoted line %s:%s" % (key, value)
00285                         #DO NOT remove the quotes...
00286                         qualifiers.append((key,value))
00287                     else:
00288                         #Unquoted
00289                         #if debug : print "Unquoted line %s:%s" % (key,value)
00290                         qualifiers.append((key,value))
00291                 else:
00292                     #Unquoted continuation
00293                     assert len(qualifiers) > 0
00294                     assert key==qualifiers[-1][0]
00295                     #if debug : print "Unquoted Cont %s:%s" % (key, line)
00296                     qualifiers[-1] = (key, qualifiers[-1][1] + "\n" + line)
00297             return (feature_key, feature_location, qualifiers)
00298         except StopIteration:
00299             #Bummer
00300             raise ValueError("Problem with '%s' feature:\n%s" \
00301                               % (feature_key, "\n".join(lines)))

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.parse_features (   self,
  skip = False 
)
Return list of tuples for the features (if present)

Each feature is returned as a tuple (key, location, qualifiers)
where key and location are strings (e.g. "CDS" and
"complement(join(490883..490885,1..879))") while qualifiers
is a list of two string tuples (feature qualifier keys and values).

Assumes you have already read to the start of the features table.

Reimplemented in Bio.GenBank.Scanner._ImgtScanner.

Definition at line 126 of file Scanner.py.

00126 
00127     def parse_features(self, skip=False):
00128         """Return list of tuples for the features (if present)
00129 
00130         Each feature is returned as a tuple (key, location, qualifiers)
00131         where key and location are strings (e.g. "CDS" and
00132         "complement(join(490883..490885,1..879))") while qualifiers
00133         is a list of two string tuples (feature qualifier keys and values).
00134 
00135         Assumes you have already read to the start of the features table.
00136         """
00137         if self.line.rstrip() not in self.FEATURE_START_MARKERS:
00138             if self.debug : print "Didn't find any feature table"
00139             return []
00140         
00141         while self.line.rstrip() in self.FEATURE_START_MARKERS:
00142             self.line = self.handle.readline()
00143 
00144         features = []
00145         line = self.line
00146         while True:
00147             if not line:
00148                 raise ValueError("Premature end of line during features table")
00149             if line[:self.HEADER_WIDTH].rstrip() in self.SEQUENCE_HEADERS:
00150                 if self.debug : print "Found start of sequence"
00151                 break
00152             line = line.rstrip()
00153             if line == "//":
00154                 raise ValueError("Premature end of features table, marker '//' found")
00155             if line in self.FEATURE_END_MARKERS:
00156                 if self.debug : print "Found end of features"
00157                 line = self.handle.readline()
00158                 break
00159             if line[2:self.FEATURE_QUALIFIER_INDENT].strip() == "":
00160                 #This is an empty feature line between qualifiers. Empty
00161                 #feature lines within qualifiers are handled below (ignored).
00162                 line = self.handle.readline()
00163                 continue
00164             
00165             if skip:
00166                 line = self.handle.readline()
00167                 while line[:self.FEATURE_QUALIFIER_INDENT] == self.FEATURE_QUALIFIER_SPACER:
00168                     line = self.handle.readline()
00169             else:
00170                 #Build up a list of the lines making up this feature:
00171                 if line[self.FEATURE_QUALIFIER_INDENT]!=" " \
00172                 and " " in line[self.FEATURE_QUALIFIER_INDENT:]:
00173                     #The feature table design enforces a length limit on the feature keys.
00174                     #Some third party files (e.g. IGMT's EMBL like files) solve this by
00175                     #over indenting the location and qualifiers.
00176                     feature_key, line = line[2:].strip().split(None,1)
00177                     feature_lines = [line]
00178                     warnings.warn("Overindented %s feature?" % feature_key)
00179                 else:
00180                     feature_key = line[2:self.FEATURE_QUALIFIER_INDENT].strip()
00181                     feature_lines = [line[self.FEATURE_QUALIFIER_INDENT:]]
00182                 line = self.handle.readline()
00183                 while line[:self.FEATURE_QUALIFIER_INDENT] == self.FEATURE_QUALIFIER_SPACER \
00184                 or line.rstrip() == "" : # cope with blank lines in the midst of a feature
00185                     #Use strip to remove any harmless trailing white space AND and leading
00186                     #white space (e.g. out of spec files with too much intentation)
00187                     feature_lines.append(line[self.FEATURE_QUALIFIER_INDENT:].strip())
00188                     line = self.handle.readline()
00189                 features.append(self.parse_feature(feature_key, feature_lines))
00190         self.line = line
00191         return features

Here is the call graph for this function:

Here is the caller graph for this function:

returns a tuple containing a list of any misc strings, and the sequence

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 302 of file Scanner.py.

00302 
00303     def parse_footer(self):
00304         """returns a tuple containing a list of any misc strings, and the sequence"""
00305         #This is a basic bit of code to scan and discard the sequence,
00306         #which was useful when developing the sub classes.
00307         if self.line in self.FEATURE_END_MARKERS:
00308             while self.line[:self.HEADER_WIDTH].rstrip() not in self.SEQUENCE_HEADERS:
00309                 self.line = self.handle.readline()
00310                 if not self.line:
00311                     raise ValueError("Premature end of file")
00312                 self.line = self.line.rstrip()
00313             
00314         assert self.line[:self.HEADER_WIDTH].rstrip() in self.SEQUENCE_HEADERS, \
00315                "Not at start of sequence"
00316         while True:
00317             line = self.handle.readline()
00318             if not line : raise ValueError("Premature end of line during sequence data")
00319             line = line.rstrip()
00320             if line == "//" : break
00321         self.line = line
00322         return ([],"") #Dummy values!

Here is the caller graph for this function:

Return list of strings making up the header

New line characters are removed.

Assumes you have just read in the ID/LOCUS line.

Definition at line 95 of file Scanner.py.

00095 
00096     def parse_header(self):
00097         """Return list of strings making up the header
00098 
00099         New line characters are removed.
00100 
00101         Assumes you have just read in the ID/LOCUS line.
00102         """
00103         assert self.line[:self.HEADER_WIDTH]==self.RECORD_START, \
00104                "Not at start of record"
00105         
00106         header_lines = []
00107         while True:
00108             line = self.handle.readline()
00109             if not line:
00110                 raise ValueError("Premature end of line during sequence data")
00111             line = line.rstrip()
00112             if line in self.FEATURE_START_MARKERS:
00113                 if self.debug : print "Found header table"
00114                 break
00115             #if line[:self.HEADER_WIDTH]==self.FEATURE_START_MARKER[:self.HEADER_WIDTH]:
00116             #    if self.debug : print "Found header table (?)"
00117             #    break
00118             if line[:self.HEADER_WIDTH].rstrip() in self.SEQUENCE_HEADERS:
00119                 if self.debug : print "Found start of sequence"
00120                 break
00121             if line == "//":
00122                 raise ValueError("Premature end of sequence data marker '//' found")
00123             header_lines.append(line)
00124         self.line = line
00125         return header_lines

Here is the caller graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.parse_records (   self,
  handle,
  do_features = True 
)
Returns a SeqRecord object iterator

Each record (from the ID/LOCUS line to the // line) becomes a SeqRecord

The SeqRecord objects include SeqFeatures if do_features=True

This method is intended for use in Bio.SeqIO

Definition at line 434 of file Scanner.py.

00434 
00435     def parse_records(self, handle, do_features=True):
00436         """Returns a SeqRecord object iterator
00437 
00438         Each record (from the ID/LOCUS line to the // line) becomes a SeqRecord
00439 
00440         The SeqRecord objects include SeqFeatures if do_features=True
00441         
00442         This method is intended for use in Bio.SeqIO
00443         """
00444         #This is a generator function
00445         while True:
00446             record = self.parse(handle, do_features)
00447             if record is None : break
00448             assert record.id is not None
00449             assert record.name != "<unknown name>"
00450             assert record.description != "<unknown description>"
00451             yield record

Here is the call graph for this function:

def Bio.GenBank.Scanner.InsdcScanner.set_handle (   self,
  handle 
)

Definition at line 62 of file Scanner.py.

00062 
00063     def set_handle(self, handle):
00064         self.handle = handle
00065         self.line = ""

Here is the caller graph for this function:


Member Data Documentation

Definition at line 59 of file Scanner.py.

list Bio.GenBank.Scanner.InsdcScanner.FEATURE_END_MARKERS = ["XXX***END FEATURES***XXX"] [static]

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 49 of file Scanner.py.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 50 of file Scanner.py.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 51 of file Scanner.py.

list Bio.GenBank.Scanner.InsdcScanner.FEATURE_START_MARKERS = ["XXX***FEATURES***XXX"] [static]

Definition at line 63 of file Scanner.py.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 47 of file Scanner.py.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 46 of file Scanner.py.

Reimplemented in Bio.GenBank.Scanner.GenBankScanner, and Bio.GenBank.Scanner.EmblScanner.

Definition at line 52 of file Scanner.py.


The documentation for this class was generated from the following file: