Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | Private Attributes
Bio.GenBank._FeatureConsumer Class Reference
Inheritance diagram for Bio.GenBank._FeatureConsumer:
Inheritance graph
[legend]
Collaboration diagram for Bio.GenBank._FeatureConsumer:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def locus
def size
def residue_type
def data_file_division
def date
def definition
def accession
def wgs
def add_wgs_scafld
def nid
def pid
def version
def project
def dblink
def version_suffix
def db_source
def gi
def keywords
def segment
def source
def organism
def taxonomy
def reference_num
def reference_bases
def authors
def consrtm
def title
def journal
def medline_id
def pubmed_id
def remark
def comment
def features_line
def start_feature_table
def feature_key
def location
def feature_qualifier
def feature_qualifier_name
def feature_qualifier_description
def contig_location
def origin_name
def base_count
def base_number
def sequence
def record_end
def __getattr__

Public Attributes

 data

Static Public Attributes

list remove_space_keys = ["translation"]

Private Member Functions

def _split_reference_locations

Private Attributes

 _use_fuzziness
 _feature_cleaner
 _seq_type
 _seq_data
 _cur_reference
 _cur_feature
 _expected_size

Detailed Description

Create a SeqRecord object with Features to return (PRIVATE).

Attributes:
o use_fuzziness - specify whether or not to parse with fuzziness in
feature locations.
o feature_cleaner - a class that will be used to provide specialized
cleaning-up of feature values.

Definition at line 579 of file __init__.py.


Constructor & Destructor Documentation

def Bio.GenBank._FeatureConsumer.__init__ (   self,
  use_fuzziness,
  feature_cleaner = None 
)

Definition at line 588 of file __init__.py.

00588 
00589     def __init__(self, use_fuzziness, feature_cleaner = None):
00590         from Bio.SeqRecord import SeqRecord
00591         _BaseGenBankConsumer.__init__(self)
00592         self.data = SeqRecord(None, id = None)
00593         self.data.id = None
00594         self.data.description = ""
00595 
00596         self._use_fuzziness = use_fuzziness
00597         self._feature_cleaner = feature_cleaner
00598 
00599         self._seq_type = ''
00600         self._seq_data = []
00601         self._cur_reference = None
00602         self._cur_feature = None
00603         self._expected_size = None

Here is the caller graph for this function:


Member Function Documentation

def Bio.GenBank._BaseGenBankConsumer.__getattr__ (   self,
  attr 
) [inherited]

Definition at line 479 of file __init__.py.

00479 
00480     def __getattr__(self, attr):
00481         return self._unhandled

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer._split_reference_locations (   self,
  location_string 
) [private]
Get reference locations out of a string of reference information

The passed string should be of the form:

    1 to 20; 20 to 100

This splits the information out and returns a list of location objects
based on the reference locations.

Definition at line 841 of file __init__.py.

00841 
00842     def _split_reference_locations(self, location_string):
00843         """Get reference locations out of a string of reference information
00844         
00845         The passed string should be of the form:
00846 
00847             1 to 20; 20 to 100
00848 
00849         This splits the information out and returns a list of location objects
00850         based on the reference locations.
00851         """
00852         # split possibly multiple locations using the ';'
00853         all_base_info = location_string.split(';')
00854 
00855         new_locations = []
00856         for base_info in all_base_info:
00857             start, end = base_info.split('to')
00858             new_start, new_end = \
00859               self._convert_to_python_numbers(int(start.strip()),
00860                                               int(end.strip()))
00861             this_location = SeqFeature.FeatureLocation(new_start, new_end)
00862             new_locations.append(this_location)
00863         return new_locations

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.accession (   self,
  acc_num 
)
Set the accession number as the id of the sequence.

If we have multiple accession numbers, the first one passed is
used.

Definition at line 634 of file __init__.py.

00634 
00635     def accession(self, acc_num):
00636         """Set the accession number as the id of the sequence.
00637 
00638         If we have multiple accession numbers, the first one passed is
00639         used.
00640         """
00641         new_acc_nums = self._split_accessions(acc_num)
00642 
00643         #Also record them ALL in the annotations
00644         try:
00645             #On the off chance there was more than one accession line:
00646             for acc in new_acc_nums:
00647                 #Prevent repeat entries
00648                 if acc not in self.data.annotations['accessions']:
00649                     self.data.annotations['accessions'].append(acc)
00650         except KeyError:
00651             self.data.annotations['accessions'] = new_acc_nums
00652 
00653         # if we haven't set the id information yet, add the first acc num
00654         if self.data.id is None:
00655             if len(new_acc_nums) > 0:
00656                 #self.data.id = new_acc_nums[0]
00657                 #Use the FIRST accession as the ID, not the first on this line!
00658                 self.data.id = self.data.annotations['accessions'][0]

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.add_wgs_scafld (   self,
  content 
)

Definition at line 662 of file __init__.py.

00662 
00663     def add_wgs_scafld(self, content):
00664         self.data.annotations.setdefault('wgs_scafld',[]).append(content.split('-'))

def Bio.GenBank._FeatureConsumer.authors (   self,
  content 
)

Definition at line 864 of file __init__.py.

00864 
00865     def authors(self, content):
00866         if self._cur_reference.authors:
00867             self._cur_reference.authors += ' ' + content
00868         else:
00869             self._cur_reference.authors = content

def Bio.GenBank._FeatureConsumer.base_count (   self,
  content 
)

Definition at line 1110 of file __init__.py.

01110 
01111     def base_count(self, content):
01112         pass

def Bio.GenBank._FeatureConsumer.base_number (   self,
  content 
)

Definition at line 1113 of file __init__.py.

01113 
01114     def base_number(self, content):
01115         pass

def Bio.GenBank._FeatureConsumer.comment (   self,
  content 
)

Definition at line 906 of file __init__.py.

00906 
00907     def comment(self, content):
00908         try:
00909             self.data.annotations['comment'] += "\n" + "\n".join(content)
00910         except KeyError:
00911             self.data.annotations['comment'] = "\n".join(content)

def Bio.GenBank._FeatureConsumer.consrtm (   self,
  content 
)

Definition at line 870 of file __init__.py.

00870 
00871     def consrtm(self, content):
00872         if self._cur_reference.consrtm:
00873             self._cur_reference.consrtm += ' ' + content
00874         else:
00875             self._cur_reference.consrtm = content

def Bio.GenBank._FeatureConsumer.contig_location (   self,
  content 
)
Deal with CONTIG information.

Definition at line 1088 of file __init__.py.

01088 
01089     def contig_location(self, content):
01090         """Deal with CONTIG information."""
01091         #Historically this was stored as a SeqFeature object, but it was
01092         #stored under record.annotations["contig"] and not under
01093         #record.features with the other SeqFeature objects.
01094         #
01095         #The CONTIG location line can include additional tokens like
01096         #Gap(), Gap(100) or Gap(unk100) which are not used in the feature
01097         #location lines, so storing it using SeqFeature based location
01098         #objects is difficult.
01099         #
01100         #We now store this a string, which means for BioSQL we are now in
01101         #much better agreement with how BioPerl records the CONTIG line
01102         #in the database.
01103         #
01104         #NOTE - This code assumes the scanner will return all the CONTIG
01105         #lines already combined into one long string!
01106         self.data.annotations["contig"] = content

def Bio.GenBank._FeatureConsumer.data_file_division (   self,
  division 
)

Definition at line 618 of file __init__.py.

00618 
00619     def data_file_division(self, division):
00620         self.data.annotations['data_file_division'] = division

def Bio.GenBank._FeatureConsumer.date (   self,
  submit_date 
)

Definition at line 621 of file __init__.py.

00621 
00622     def date(self, submit_date):
00623         self.data.annotations['date'] = submit_date 

def Bio.GenBank._FeatureConsumer.db_source (   self,
  content 
)

Definition at line 753 of file __init__.py.

00753 
00754     def db_source(self, content):
00755         self.data.annotations['db_source'] = content.rstrip()

def Bio.GenBank._FeatureConsumer.dblink (   self,
  content 
)
Store DBLINK cross references as dbxrefs in our record object.

This line type is expected to replace the PROJECT line in 2009. e.g.

During transition:

PROJECT     GenomeProject:28471
DBLINK      Project:28471
    Trace Assembly Archive:123456

Once the project line is dropped:

DBLINK      Project:28471
    Trace Assembly Archive:123456

Note GenomeProject -> Project.

We'll have to see some real examples to be sure, but based on the
above example we can expect one reference per line.

Note that at some point the NCBI have included an extra space, e.g.

DBLINK      Project: 28471

Definition at line 701 of file __init__.py.

00701 
00702     def dblink(self, content):
00703         """Store DBLINK cross references as dbxrefs in our record object.
00704 
00705         This line type is expected to replace the PROJECT line in 2009. e.g.
00706 
00707         During transition:
00708         
00709         PROJECT     GenomeProject:28471
00710         DBLINK      Project:28471
00711                     Trace Assembly Archive:123456
00712 
00713         Once the project line is dropped:
00714 
00715         DBLINK      Project:28471
00716                     Trace Assembly Archive:123456
00717 
00718         Note GenomeProject -> Project.
00719 
00720         We'll have to see some real examples to be sure, but based on the
00721         above example we can expect one reference per line.
00722 
00723         Note that at some point the NCBI have included an extra space, e.g.
00724 
00725         DBLINK      Project: 28471
00726         """
00727         #During the transition period with both PROJECT and DBLINK lines,
00728         #we don't want to add the same cross reference twice.
00729         while ": " in content:
00730             content = content.replace(": ", ":")
00731         if content.strip() not in self.data.dbxrefs:
00732             self.data.dbxrefs.append(content.strip())

def Bio.GenBank._FeatureConsumer.definition (   self,
  definition 
)
Set the definition as the description of the sequence.

Definition at line 624 of file __init__.py.

00624 
00625     def definition(self, definition):
00626         """Set the definition as the description of the sequence.
00627         """
00628         if self.data.description:
00629             #Append to any existing description
00630             #e.g. EMBL files with two DE lines.
00631             self.data.description += " " + definition
00632         else:
00633             self.data.description = definition

def Bio.GenBank._FeatureConsumer.feature_key (   self,
  content 
)

Definition at line 925 of file __init__.py.

00925 
00926     def feature_key(self, content):
00927         # start a new feature
00928         self._cur_feature = SeqFeature.SeqFeature()
00929         self._cur_feature.type = content
00930         self.data.features.append(self._cur_feature)

def Bio.GenBank._FeatureConsumer.feature_qualifier (   self,
  key,
  value 
)
When we get a qualifier key and its value.

Can receive None, since you can have valueless keys such as /pseudo

Definition at line 1058 of file __init__.py.

01058 
01059     def feature_qualifier(self, key, value):
01060         """When we get a qualifier key and its value.
01061         
01062         Can receive None, since you can have valueless keys such as /pseudo
01063         """
01064         # Hack to try to preserve historical behaviour of /pseudo etc
01065         if value is None:
01066             if key not in self._cur_feature.qualifiers:
01067                 self._cur_feature.qualifiers[key] = [""]
01068                 return
01069             
01070         value = value.replace('"', '')
01071         if self._feature_cleaner is not None:
01072             value = self._feature_cleaner.clean_value(key, value)
01073 
01074         # if the qualifier name exists, append the value
01075         if key in self._cur_feature.qualifiers:
01076             self._cur_feature.qualifiers[key].append(value)
01077         # otherwise start a new list of the key with its values
01078         else:
01079             self._cur_feature.qualifiers[key] = [value]
       
Use feature_qualifier instead (OBSOLETE).

Definition at line 1084 of file __init__.py.

01084 
01085     def feature_qualifier_description(self, content):
01086         """Use feature_qualifier instead (OBSOLETE)."""
01087         raise NotImplementedError("Use the feature_qualifier method instead.")

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.feature_qualifier_name (   self,
  content_list 
)
Use feature_qualifier instead (OBSOLETE).

Definition at line 1080 of file __init__.py.

01080 
01081     def feature_qualifier_name(self, content_list):
01082         """Use feature_qualifier instead (OBSOLETE)."""
01083         raise NotImplementedError("Use the feature_qualifier method instead.")

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.features_line (   self,
  content 
)
Get ready for the feature table when we reach the FEATURE line.

Definition at line 912 of file __init__.py.

00912 
00913     def features_line(self, content):
00914         """Get ready for the feature table when we reach the FEATURE line.
00915         """
00916         self.start_feature_table()

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.gi (   self,
  content 
)

Definition at line 756 of file __init__.py.

00756 
00757     def gi(self, content):
00758         self.data.annotations['gi'] = content

def Bio.GenBank._FeatureConsumer.journal (   self,
  content 
)

Definition at line 887 of file __init__.py.

00887 
00888     def journal(self, content):
00889         if self._cur_reference.journal:
00890             self._cur_reference.journal += ' ' + content
00891         else:
00892             self._cur_reference.journal = content

def Bio.GenBank._FeatureConsumer.keywords (   self,
  content 
)

Definition at line 759 of file __init__.py.

00759 
00760     def keywords(self, content):
00761         self.data.annotations['keywords'] = self._split_keywords(content)

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.location (   self,
  content 
)
Parse out location information from the location string.

This uses simple Python code with some regular expressions to do the
parsing, and then translates the results into appropriate objects.

Definition at line 931 of file __init__.py.

00931 
00932     def location(self, content):
00933         """Parse out location information from the location string.
00934 
00935         This uses simple Python code with some regular expressions to do the
00936         parsing, and then translates the results into appropriate objects.
00937         """
00938         # clean up newlines and other whitespace inside the location before
00939         # parsing - locations should have no whitespace whatsoever
00940         location_line = self._clean_location(content)
00941 
00942         # Older records have junk like replace(266,"c") in the
00943         # location line. Newer records just replace this with
00944         # the number 266 and have the information in a more reasonable
00945         # place. So we'll just grab out the number and feed this to the
00946         # parser. We shouldn't really be losing any info this way.
00947         if location_line.find('replace') != -1:
00948             comma_pos = location_line.find(',')
00949             location_line = location_line[8:comma_pos]
00950         
00951         cur_feature = self._cur_feature
00952 
00953         #Handle top level complement here for speed
00954         if location_line.startswith("complement("):
00955             assert location_line.endswith(")")
00956             location_line = location_line[11:-1]
00957             strand = -1
00958         elif self._seq_type.find("DNA") >= 0 \
00959         or self._seq_type.find("RNA") >= 0:
00960             #Nucleotide
00961             strand = 1
00962         else:
00963             #Protein
00964             strand = None
00965 
00966         #Special case handling of the most common cases for speed
00967         if _re_simple_location.match(location_line):
00968             #e.g. "123..456"
00969             s, e = location_line.split("..")
00970             cur_feature.location = SeqFeature.FeatureLocation(int(s)-1,
00971                                                               int(e),
00972                                                               strand)
00973             return
00974 
00975         if _re_simple_compound.match(location_line):
00976             #e.g. join(<123..456,480..>500)
00977             i = location_line.find("(")
00978             cur_feature.location_operator = location_line[:i]
00979             #we can split on the comma because these are simple locations
00980             for part in location_line[i+1:-1].split(","):
00981                 s, e = part.split("..")
00982                 f = SeqFeature.SeqFeature(SeqFeature.FeatureLocation(int(s)-1,
00983                                                                      int(e),
00984                                                                      strand),
00985                         location_operator=cur_feature.location_operator,
00986                         type=cur_feature.type)
00987                 cur_feature.sub_features.append(f)
00988             s = cur_feature.sub_features[0].location.start
00989             e = cur_feature.sub_features[-1].location.end
00990             cur_feature.location = SeqFeature.FeatureLocation(s,e, strand)
00991             return
00992         
00993         #Handle the general case with more complex regular expressions
00994         if _re_complex_location.match(location_line):
00995             #e.g. "AL121804.2:41..610"
00996             if ":" in location_line:
00997                 location_ref, location_line = location_line.split(":")
00998                 cur_feature.location = _loc(location_line, self._expected_size, strand)
00999                 cur_feature.location.ref = location_ref
01000             else:
01001                 cur_feature.location = _loc(location_line, self._expected_size, strand)
01002             return
01003 
01004         if _re_complex_compound.match(location_line):
01005             i = location_line.find("(")
01006             cur_feature.location_operator = location_line[:i]
01007             #Can't split on the comma because of positions like one-of(1,2,3)
01008             for part in _split_compound_loc(location_line[i+1:-1]):
01009                 if part.startswith("complement("):
01010                     assert part[-1]==")"
01011                     part = part[11:-1]
01012                     assert strand != -1, "Double complement?"
01013                     part_strand = -1
01014                 else:
01015                     part_strand = strand
01016                 if ":" in part:
01017                     ref, part = part.split(":")
01018                 else:
01019                     ref = None
01020                 try:
01021                     loc = _loc(part, self._expected_size, part_strand)
01022                 except ValueError, err:
01023                     print location_line
01024                     print part
01025                     raise err
01026                 f = SeqFeature.SeqFeature(location=loc, ref=ref,
01027                         location_operator=cur_feature.location_operator,
01028                         type=cur_feature.type)
01029                 cur_feature.sub_features.append(f)
01030             # Historically a join on the reverse strand has been represented
01031             # in Biopython with both the parent SeqFeature and its children
01032             # (the exons for a CDS) all given a strand of -1.  Likewise, for
01033             # a join feature on the forward strand they all have strand +1.
01034             # However, we must also consider evil mixed strand examples like
01035             # this, join(complement(69611..69724),139856..140087,140625..140650)
01036             strands = set(sf.strand for sf in cur_feature.sub_features)
01037             if len(strands)==1:
01038                 strand = cur_feature.sub_features[0].strand
01039             else:
01040                 strand = None # i.e. mixed strands
01041             s = cur_feature.sub_features[0].location.start
01042             e = cur_feature.sub_features[-1].location.end
01043             cur_feature.location = SeqFeature.FeatureLocation(s, e, strand)
01044             return
01045         #Not recognised
01046         if "order" in location_line and "join" in location_line:
01047             #See Bug 3197
01048             msg = 'Combinations of "join" and "order" within the same ' + \
01049                   'location (nested operators) are illegal:\n' + location_line
01050             raise LocationParserError(msg)
01051         #This used to be an error....
01052         cur_feature.location = None
01053         import warnings
01054         from Bio import BiopythonParserWarning
01055         warnings.warn(BiopythonParserWarning("Couldn't parse feature location: %r" \
01056                                              % (location_line)))
01057 

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.locus (   self,
  locus_name 
)
Set the locus name is set as the name of the Sequence.

Definition at line 604 of file __init__.py.

00604 
00605     def locus(self, locus_name):
00606         """Set the locus name is set as the name of the Sequence.
00607         """
00608         self.data.name = locus_name

def Bio.GenBank._FeatureConsumer.medline_id (   self,
  content 
)

Definition at line 893 of file __init__.py.

00893 
00894     def medline_id(self, content):
00895         self._cur_reference.medline_id = content

def Bio.GenBank._FeatureConsumer.nid (   self,
  content 
)

Definition at line 665 of file __init__.py.

00665 
00666     def nid(self, content):
00667         self.data.annotations['nid'] = content

def Bio.GenBank._FeatureConsumer.organism (   self,
  content 
)

Definition at line 776 of file __init__.py.

00776 
00777     def organism(self, content):
00778         self.data.annotations['organism'] = content

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.origin_name (   self,
  content 
)

Definition at line 1107 of file __init__.py.

01107 
01108     def origin_name(self, content):
01109         pass

def Bio.GenBank._FeatureConsumer.pid (   self,
  content 
)

Definition at line 668 of file __init__.py.

00668 
00669     def pid(self, content):
00670         self.data.annotations['pid'] = content

def Bio.GenBank._FeatureConsumer.project (   self,
  content 
)
Handle the information from the PROJECT line as a list of projects.

e.g.
PROJECT     GenomeProject:28471

or:
PROJECT     GenomeProject:13543  GenomeProject:99999

This is stored as dbxrefs in the SeqRecord to be consistent with the
projected switch of this line to DBLINK in future GenBank versions.
Note the NCBI plan to replace "GenomeProject:28471" with the shorter
"Project:28471" as part of this transition.

Definition at line 684 of file __init__.py.

00684 
00685     def project(self, content):
00686         """Handle the information from the PROJECT line as a list of projects.
00687 
00688         e.g.
00689         PROJECT     GenomeProject:28471
00690 
00691         or:
00692         PROJECT     GenomeProject:13543  GenomeProject:99999
00693 
00694         This is stored as dbxrefs in the SeqRecord to be consistent with the
00695         projected switch of this line to DBLINK in future GenBank versions.
00696         Note the NCBI plan to replace "GenomeProject:28471" with the shorter
00697         "Project:28471" as part of this transition.
00698         """
00699         content = content.replace("GenomeProject:", "Project:")
00700         self.data.dbxrefs.extend([p for p in content.split() if p])

def Bio.GenBank._FeatureConsumer.pubmed_id (   self,
  content 
)

Definition at line 896 of file __init__.py.

00896 
00897     def pubmed_id(self, content):
00898         self._cur_reference.pubmed_id = content

def Bio.GenBank._FeatureConsumer.record_end (   self,
  content 
)
Clean up when we've finished the record.

Definition at line 1126 of file __init__.py.

01126 
01127     def record_end(self, content):
01128         """Clean up when we've finished the record.
01129         """
01130         from Bio import Alphabet
01131         from Bio.Alphabet import IUPAC
01132         from Bio.Seq import Seq, UnknownSeq
01133 
01134         #Try and append the version number to the accession for the full id
01135         if self.data.id is None:
01136             assert 'accessions' not in self.data.annotations, \
01137                    self.data.annotations['accessions']
01138             self.data.id = self.data.name #Good fall back?
01139         elif self.data.id.count('.') == 0:
01140             try:
01141                 self.data.id+='.%i' % self.data.annotations['sequence_version']
01142             except KeyError:
01143                 pass
01144         
01145         # add the sequence information
01146         # first, determine the alphabet
01147         # we default to an generic alphabet if we don't have a
01148         # seq type or have strange sequence information.
01149         seq_alphabet = Alphabet.generic_alphabet
01150 
01151         # now set the sequence
01152         sequence = "".join(self._seq_data)
01153 
01154         if self._expected_size is not None \
01155         and len(sequence) != 0 \
01156         and self._expected_size != len(sequence):
01157             import warnings
01158             from Bio import BiopythonParserWarning
01159             warnings.warn("Expected sequence length %i, found %i (%s)." \
01160                           % (self._expected_size, len(sequence), self.data.id),
01161                           BiopythonParserWarning)
01162 
01163         if self._seq_type:
01164             # mRNA is really also DNA, since it is actually cDNA
01165             if self._seq_type.find('DNA') != -1 or \
01166                self._seq_type.find('mRNA') != -1:
01167                 seq_alphabet = IUPAC.ambiguous_dna
01168             # are there ever really RNA sequences in GenBank?
01169             elif self._seq_type.find('RNA') != -1:
01170                 #Even for data which was from RNA, the sequence string
01171                 #is usually given as DNA (T not U).  Bug 2408
01172                 if "T" in sequence and "U" not in sequence:
01173                     seq_alphabet = IUPAC.ambiguous_dna
01174                 else:
01175                     seq_alphabet = IUPAC.ambiguous_rna
01176             elif self._seq_type.upper().find('PROTEIN') != -1:
01177                 seq_alphabet = IUPAC.protein  # or extended protein?
01178             # work around ugly GenBank records which have circular or
01179             # linear but no indication of sequence type
01180             elif self._seq_type in ["circular", "linear", "unspecified"]:
01181                 pass
01182             # we have a bug if we get here
01183             else:
01184                 raise ValueError("Could not determine alphabet for seq_type %s"
01185                                  % self._seq_type)
01186 
01187         if not sequence and self.__expected_size:
01188             self.data.seq = UnknownSeq(self._expected_size, seq_alphabet)
01189         else:
01190             self.data.seq = Seq(sequence, seq_alphabet)

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.reference_bases (   self,
  content 
)
Attempt to determine the sequence region the reference entails.

Possible types of information we may have to deal with:

(bases 1 to 86436)
(sites)
(bases 1 to 105654; 110423 to 111122)
1  (residues 1 to 182)

Definition at line 800 of file __init__.py.

00800 
00801     def reference_bases(self, content):
00802         """Attempt to determine the sequence region the reference entails.
00803 
00804         Possible types of information we may have to deal with:
00805         
00806         (bases 1 to 86436)
00807         (sites)
00808         (bases 1 to 105654; 110423 to 111122)
00809         1  (residues 1 to 182)
00810         """
00811         # first remove the parentheses or other junk
00812         ref_base_info = content[1:-1]
00813 
00814         all_locations = []
00815         # parse if we've got 'bases' and 'to'
00816         if ref_base_info.find('bases') != -1 and \
00817             ref_base_info.find('to') != -1:
00818             # get rid of the beginning 'bases'
00819             ref_base_info = ref_base_info[5:]
00820             locations = self._split_reference_locations(ref_base_info)
00821             all_locations.extend(locations)
00822         elif (ref_base_info.find("residues") >= 0 and
00823               ref_base_info.find("to") >= 0):
00824             residues_start = ref_base_info.find("residues")
00825             # get only the information after "residues"
00826             ref_base_info = ref_base_info[(residues_start + len("residues ")):]
00827             locations = self._split_reference_locations(ref_base_info)
00828             all_locations.extend(locations)
00829 
00830         # make sure if we are not finding information then we have
00831         # the string 'sites' or the string 'bases'
00832         elif (ref_base_info == 'sites' or
00833               ref_base_info.strip() == 'bases'):
00834             pass
00835         # otherwise raise an error
00836         else:
00837             raise ValueError("Could not parse base info %s in record %s" %
00838                              (ref_base_info, self.data.id))
00839 
00840         self._cur_reference.location = all_locations

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.reference_num (   self,
  content 
)
Signal the beginning of a new reference object.

Definition at line 788 of file __init__.py.

00788 
00789     def reference_num(self, content):
00790         """Signal the beginning of a new reference object.
00791         """
00792         # if we have a current reference that hasn't been added to
00793         # the list of references, add it.
00794         if self._cur_reference is not None:
00795             self.data.annotations['references'].append(self._cur_reference)
00796         else:
00797             self.data.annotations['references'] = []
00798 
00799         self._cur_reference = SeqFeature.Reference()

def Bio.GenBank._FeatureConsumer.remark (   self,
  content 
)
Deal with a reference comment.

Definition at line 899 of file __init__.py.

00899 
00900     def remark(self, content):
00901         """Deal with a reference comment."""
00902         if self._cur_reference.comment:
00903             self._cur_reference.comment += ' ' + content
00904         else:
00905             self._cur_reference.comment = content

def Bio.GenBank._FeatureConsumer.residue_type (   self,
  type 
)
Record the sequence type so we can choose an appropriate alphabet.

Definition at line 613 of file __init__.py.

00613 
00614     def residue_type(self, type):
00615         """Record the sequence type so we can choose an appropriate alphabet.
00616         """
00617         self._seq_type = type

def Bio.GenBank._FeatureConsumer.segment (   self,
  content 
)

Definition at line 762 of file __init__.py.

00762 
00763     def segment(self, content):
00764         self.data.annotations['segment'] = content

def Bio.GenBank._FeatureConsumer.sequence (   self,
  content 
)
Add up sequence information as we get it.

To try and make things speedier, this puts all of the strings
into a list of strings, and then uses string.join later to put
them together. Supposedly, this is a big time savings

Definition at line 1116 of file __init__.py.

01116 
01117     def sequence(self, content):
01118         """Add up sequence information as we get it.
01119 
01120         To try and make things speedier, this puts all of the strings
01121         into a list of strings, and then uses string.join later to put
01122         them together. Supposedly, this is a big time savings
01123         """
01124         assert ' ' not in content
01125         self._seq_data.append(content.upper())

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.size (   self,
  content 
)
Record the sequence length.

Definition at line 609 of file __init__.py.

00609 
00610     def size(self, content):
00611         """Record the sequence length."""
00612         self._expected_size = int(content)

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.source (   self,
  content 
)

Definition at line 765 of file __init__.py.

00765 
00766     def source(self, content):
00767         #Note that some software (e.g. VectorNTI) may produce an empty
00768         #source (rather than using a dot/period as might be expected).
00769         if content == "":
00770             source_info = ""
00771         elif content[-1] == '.':
00772             source_info = content[:-1]
00773         else:
00774             source_info = content
00775         self.data.annotations['source'] = source_info

Indicate we've got to the start of the feature table.

Definition at line 917 of file __init__.py.

00917 
00918     def start_feature_table(self):
00919         """Indicate we've got to the start of the feature table.
00920         """
00921         # make sure we've added on our last reference object
00922         if self._cur_reference is not None:
00923             self.data.annotations['references'].append(self._cur_reference)
00924             self._cur_reference = None

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.taxonomy (   self,
  content 
)
Records (another line of) the taxonomy lineage.

Definition at line 779 of file __init__.py.

00779 
00780     def taxonomy(self, content):
00781         """Records (another line of) the taxonomy lineage.
00782         """
00783         lineage = self._split_taxonomy(content)
00784         try:
00785             self.data.annotations['taxonomy'].extend(lineage)
00786         except KeyError:
00787             self.data.annotations['taxonomy'] = lineage
        

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.title (   self,
  content 
)

Definition at line 876 of file __init__.py.

00876 
00877     def title(self, content):
00878         if self._cur_reference is None:
00879             import warnings
00880             from Bio import BiopythonParserWarning
00881             warnings.warn("GenBank TITLE line without REFERENCE line.",
00882                           BiopythonParserWarning)
00883         elif self._cur_reference.title:
00884             self._cur_reference.title += ' ' + content
00885         else:
00886             self._cur_reference.title = content

def Bio.GenBank._FeatureConsumer.version (   self,
  version_id 
)

Definition at line 671 of file __init__.py.

00671 
00672     def version(self, version_id):
00673         #Want to use the versioned accession as the record.id
00674         #This comes from the VERSION line in GenBank files, or the
00675         #obsolete SV line in EMBL.  For the new EMBL files we need
00676         #both the version suffix from the ID line and the accession
00677         #from the AC line.
00678         if version_id.count(".")==1 and version_id.split(".")[1].isdigit():
00679             self.accession(version_id.split(".")[0])
00680             self.version_suffix(version_id.split(".")[1])
00681         else:
00682             #For backwards compatibility...
00683             self.data.id = version_id

Here is the call graph for this function:

def Bio.GenBank._FeatureConsumer.version_suffix (   self,
  version 
)
Set the version to overwrite the id.

Since the verison provides the same information as the accession
number, plus some extra info, we set this as the id if we have
a version.

Definition at line 733 of file __init__.py.

00733 
00734     def version_suffix(self, version):
00735         """Set the version to overwrite the id.
00736 
00737         Since the verison provides the same information as the accession
00738         number, plus some extra info, we set this as the id if we have
00739         a version.
00740         """
00741         #e.g. GenBank line:
00742         #VERSION     U49845.1  GI:1293613
00743         #or the obsolete EMBL line:
00744         #SV   U49845.1
00745         #Scanner calls consumer.version("U49845.1")
00746         #which then calls consumer.version_suffix(1)
00747         #
00748         #e.g. EMBL new line:
00749         #ID   X56734; SV 1; linear; mRNA; STD; PLN; 1859 BP.
00750         #Scanner calls consumer.version_suffix(1)
00751         assert version.isdigit()
00752         self.data.annotations['sequence_version'] = int(version)

Here is the caller graph for this function:

def Bio.GenBank._FeatureConsumer.wgs (   self,
  content 
)

Definition at line 659 of file __init__.py.

00659 
00660     def wgs(self, content):
00661         self.data.annotations['wgs'] = content.split('-')


Member Data Documentation

Definition at line 601 of file __init__.py.

Definition at line 600 of file __init__.py.

Definition at line 602 of file __init__.py.

Definition at line 596 of file __init__.py.

Definition at line 599 of file __init__.py.

Definition at line 598 of file __init__.py.

Definition at line 595 of file __init__.py.

Definition at line 591 of file __init__.py.

list Bio.GenBank._BaseGenBankConsumer.remove_space_keys = ["translation"] [static, inherited]

Definition at line 471 of file __init__.py.


The documentation for this class was generated from the following file: