Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Properties | Private Member Functions
Bio.SeqFeature.SeqFeature Class Reference

List of all members.

Public Member Functions

def __init__
def __repr__
def __str__
def extract
def __nonzero__
def __len__
def __iter__
def __contains__

Public Attributes

 location
 type
 location_operator
 id
 qualifiers
 sub_features

Properties

 strand
 ref
 ref_db

Private Member Functions

def _get_strand
def _set_strand
def _get_ref
def _set_ref
def _get_ref_db
def _set_ref_db
def _shift
def _flip

Detailed Description

Represent a Sequence Feature on an object.

Attributes:
o location - the location of the feature on the sequence (FeatureLocation)
o type - the specified type of the feature (ie. CDS, exon, repeat...)
o location_operator - a string specifying how this SeqFeature may
be related to others. For example, in the example GenBank feature
shown below, the location_operator would be "join"
o strand - A value specifying on which strand (of a DNA sequence, for
instance) the feature deals with. 1 indicates the plus strand, -1 
indicates the minus strand, 0 indicates stranded but unknown (? in GFF3),
while the default of None indicates that strand doesn't apply (dot in GFF3,
e.g. features on proteins). Note this is a shortcut for accessing the
strand property of the feature's location.
o id - A string identifier for the feature.
o ref - A reference to another sequence. This could be an accession
number for some different sequence. Note this is a shortcut for the
reference property of the feature's location.
o ref_db - A different database for the reference accession number.
Note this is a shortcut for the reference property of the location
o qualifiers - A dictionary of qualifiers on the feature. These are
analagous to the qualifiers from a GenBank feature table. The keys of
the dictionary are qualifier names, the values are the qualifier
values.
o sub_features - Additional SeqFeatures which fall under this 'parent'
feature. For instance, if we having something like:

CDS    join(1..10,30..40,50..60)

Then the top level feature would be of type 'CDS' from 1 to 60 (actually 0
to 60 in Python counting) with location_operator='join', and the three sub-
features would also be of type 'CDS', and would be from 1 to 10, 30 to
40 and 50 to 60, respectively (although actually using Python counting).

To get the nucleotide sequence for this CDS, you would need to take the
parent sequence and do seq[0:10]+seq[29:40]+seq[49:60] (Python counting).
Things are more complicated with strands and fuzzy positions. To save you
dealing with all these special cases, the SeqFeature provides an extract
method to do this for you.

Definition at line 52 of file SeqFeature.py.


Constructor & Destructor Documentation

def Bio.SeqFeature.SeqFeature.__init__ (   self,
  location = None,
  type = '',
  location_operator = '',
  strand = None,
  id = "<unknown id>",
  qualifiers = None,
  sub_features = None,
  ref = None,
  ref_db = None 
)
Initialize a SeqFeature on a Sequence.

location can either be a FeatureLocation (with strand argument also
given if required), or None.

e.g. With no strand, on the forward strand, and on the reverse strand:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain")
>>> f1.strand == f1.location.strand == None
True
>>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS")
>>> f2.strand == f2.location.strand == +1
True
>>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS")
>>> f3.strand == f3.location.strand == -1
True

An invalid strand will trigger an exception:

>>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

Similarly if set via the FeatureLocation directly:

>>> loc4 = FeatureLocation(50, 60, strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

For exact start/end positions, an integer can be used (as shown above)
as shorthand for the ExactPosition object. For non-exact locations, the
FeatureLocation must be specified via the appropriate position objects.

Definition at line 96 of file SeqFeature.py.

00096 
00097                  ref = None, ref_db = None):
00098         """Initialize a SeqFeature on a Sequence.
00099 
00100         location can either be a FeatureLocation (with strand argument also
00101         given if required), or None.
00102 
00103         e.g. With no strand, on the forward strand, and on the reverse strand:
00104 
00105         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00106         >>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain")
00107         >>> f1.strand == f1.location.strand == None
00108         True
00109         >>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS")
00110         >>> f2.strand == f2.location.strand == +1
00111         True
00112         >>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS")
00113         >>> f3.strand == f3.location.strand == -1
00114         True
00115 
00116         An invalid strand will trigger an exception:
00117 
00118         >>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2)
00119         Traceback (most recent call last):
00120            ...
00121         ValueError: Strand should be +1, -1, 0 or None, not 2
00122 
00123         Similarly if set via the FeatureLocation directly:
00124 
00125         >>> loc4 = FeatureLocation(50, 60, strand=2)
00126         Traceback (most recent call last):
00127            ...
00128         ValueError: Strand should be +1, -1, 0 or None, not 2
00129 
00130         For exact start/end positions, an integer can be used (as shown above)
00131         as shorthand for the ExactPosition object. For non-exact locations, the
00132         FeatureLocation must be specified via the appropriate position objects.
00133         """
00134         if location is not None and not isinstance(location, FeatureLocation):
00135             raise TypeError("FeatureLocation (or None) required for the location")
00136         self.location = location
00137 
00138         self.type = type
00139         self.location_operator = location_operator
00140         if strand is not None:
00141             self.strand = strand
00142         self.id = id
00143         if qualifiers is None:
00144             qualifiers = {}
00145         self.qualifiers = qualifiers
00146         if sub_features is None:
00147             sub_features = []
00148         self.sub_features = sub_features
00149         if ref is not None:
00150             self.ref = ref
00151         if ref_db is not None:
00152             self.ref_db = ref_db


Member Function Documentation

def Bio.SeqFeature.SeqFeature.__contains__ (   self,
  value 
)
Check if an integer position is within the feature.

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1)
>>> len(f)
5
>>> [i for i in range(15) if i in f]
[5, 6, 7, 8, 9]

For example, to see which features include a SNP position, you could
use this:

>>> from Bio import SeqIO
>>> record = SeqIO.read("GenBank/NC_000932.gb", "gb")
>>> for f in record.features:
...     if 1750 in f:
...         print f.type, f.location
source [0:154478](+)
gene [1716:4347](-)
tRNA [1716:4347](-)

Note that for a feature defined as a join of several subfeatures (e.g.
the union of several exons) the gaps are not checked (e.g. introns).
In this example, the tRNA location is defined in the GenBank file as
complement(join(1717..1751,4311..4347)), so that position 1760 falls
in the gap:

>>> for f in record.features:
...     if 1760 in f:
...         print f.type, f.location
source [0:154478](+)
gene [1716:4347](-)

Note that additional care may be required with fuzzy locations, for
example just before a BeforePosition:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> from Bio.SeqFeature import BeforePosition
>>> f = SeqFeature(FeatureLocation(BeforePosition(3),8), type="domain")
>>> len(f)
5
>>> [i for i in range(10) if i in f]
[3, 4, 5, 6, 7]

Definition at line 378 of file SeqFeature.py.

00378 
00379     def __contains__(self, value):
00380         """Check if an integer position is within the feature.
00381 
00382         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00383         >>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1)
00384         >>> len(f)
00385         5
00386         >>> [i for i in range(15) if i in f]
00387         [5, 6, 7, 8, 9]
00388 
00389         For example, to see which features include a SNP position, you could
00390         use this:
00391 
00392         >>> from Bio import SeqIO
00393         >>> record = SeqIO.read("GenBank/NC_000932.gb", "gb")
00394         >>> for f in record.features:
00395         ...     if 1750 in f:
00396         ...         print f.type, f.location
00397         source [0:154478](+)
00398         gene [1716:4347](-)
00399         tRNA [1716:4347](-)
00400 
00401         Note that for a feature defined as a join of several subfeatures (e.g.
00402         the union of several exons) the gaps are not checked (e.g. introns).
00403         In this example, the tRNA location is defined in the GenBank file as
00404         complement(join(1717..1751,4311..4347)), so that position 1760 falls
00405         in the gap:
00406 
00407         >>> for f in record.features:
00408         ...     if 1760 in f:
00409         ...         print f.type, f.location
00410         source [0:154478](+)
00411         gene [1716:4347](-)
00412 
00413         Note that additional care may be required with fuzzy locations, for
00414         example just before a BeforePosition:
00415 
00416         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00417         >>> from Bio.SeqFeature import BeforePosition
00418         >>> f = SeqFeature(FeatureLocation(BeforePosition(3),8), type="domain")
00419         >>> len(f)
00420         5
00421         >>> [i for i in range(10) if i in f]
00422         [3, 4, 5, 6, 7]
00423         """
00424         if not isinstance(value, int):
00425             raise ValueError("Currently we only support checking for integer "
00426                              "positions being within a SeqFeature.")
00427         if self.sub_features:
00428             for f in self.sub_features:
00429                 if value in f:
00430                     return True
00431             return False
00432         else:
00433             return value in self.location
00434 
00435 # --- References
00436 
# TODO -- Will this hold PubMed and Medline information decently?
Iterate over the parent positions within the feature.

The iteration order is strand aware, and can be thought of as moving
along the feature using the parent sequence coordinates:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1)
>>> len(f)
5
>>> for i in f: print i
9
8
7
6
5
>>> list(f)
[9, 8, 7, 6, 5]

Definition at line 346 of file SeqFeature.py.

00346 
00347     def __iter__(self):
00348         """Iterate over the parent positions within the feature.
00349 
00350         The iteration order is strand aware, and can be thought of as moving
00351         along the feature using the parent sequence coordinates:
00352 
00353         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00354         >>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1)
00355         >>> len(f)
00356         5
00357         >>> for i in f: print i
00358         9
00359         8
00360         7
00361         6
00362         5
00363         >>> list(f)
00364         [9, 8, 7, 6, 5]
00365         """
00366         if self.sub_features:
00367             if self.strand == -1:
00368                 for f in self.sub_features[::-1]:
00369                     for i in f.location:
00370                         yield i
00371             else:
00372                 for f in self.sub_features:
00373                     for i in f.location:
00374                         yield i
00375         else:
00376             for i in self.location:
00377                 yield i

Here is the caller graph for this function:

Returns the length of the region described by a feature.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8,15), type="domain")
>>> len(f)
7
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
>>> len(f.extract(seq))
7

For simple features without subfeatures this is the same as the region
spanned (end position minus start position). However, for a feature
defined by combining several subfeatures (e.g. a CDS as the join of
several exons) the gaps are not counted (e.g. introns). This ensures
that len(f) == len(f.extract(parent_seq)), and also makes sure things
work properly with features wrapping the origin etc.

Definition at line 319 of file SeqFeature.py.

00319 
00320     def __len__(self):
00321         """Returns the length of the region described by a feature.
00322 
00323         >>> from Bio.Seq import Seq
00324         >>> from Bio.Alphabet import generic_protein
00325         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00326         >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
00327         >>> f = SeqFeature(FeatureLocation(8,15), type="domain")
00328         >>> len(f)
00329         7
00330         >>> f.extract(seq)
00331         Seq('VALIVIC', ProteinAlphabet())
00332         >>> len(f.extract(seq))
00333         7
00334 
00335         For simple features without subfeatures this is the same as the region
00336         spanned (end position minus start position). However, for a feature
00337         defined by combining several subfeatures (e.g. a CDS as the join of
00338         several exons) the gaps are not counted (e.g. introns). This ensures
00339         that len(f) == len(f.extract(parent_seq)), and also makes sure things
00340         work properly with features wrapping the origin etc.
00341         """
00342         if self.sub_features:
00343             return sum(len(f) for f in self.sub_features)
00344         else:
00345             return len(self.location)

Returns True regardless of the length of the feature.

This behaviour is for backwards compatibility, since until the
__len__ method was added, a SeqFeature always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all
evaluate to False if they have length zero.

WARNING: The SeqFeature may in future evaluate to False when its
length is zero (in order to better match normal python behaviour)!

Definition at line 305 of file SeqFeature.py.

00305 
00306     def __nonzero__(self):
00307         """Returns True regardless of the length of the feature.
00308 
00309         This behaviour is for backwards compatibility, since until the
00310         __len__ method was added, a SeqFeature always evaluated as True.
00311 
00312         Note that in comparison, Seq objects, strings, lists, etc, will all
00313         evaluate to False if they have length zero.
00314 
00315         WARNING: The SeqFeature may in future evaluate to False when its
00316         length is zero (in order to better match normal python behaviour)!
00317         """
00318         return True

A string representation of the record for debugging.

Definition at line 197 of file SeqFeature.py.

00197 
00198     def __repr__(self):
00199         """A string representation of the record for debugging."""
00200         answer = "%s(%s" % (self.__class__.__name__, repr(self.location))
00201         if self.type:
00202             answer += ", type=%s" % repr(self.type)
00203         if self.location_operator:
00204             answer += ", location_operator=%s" % repr(self.location_operator)
00205         if self.id and self.id != "<unknown id>":
00206             answer += ", id=%s" % repr(self.id)
00207         if self.ref:
00208             answer += ", ref=%s" % repr(self.ref)
00209         if self.ref_db:
00210             answer += ", ref_db=%s" % repr(self.ref_db)
00211         answer += ")"
00212         return answer

A readable summary of the feature intended to be printed to screen.

Definition at line 213 of file SeqFeature.py.

00213 
00214     def __str__(self):
00215         """A readable summary of the feature intended to be printed to screen.
00216         """
00217         out = "type: %s\n" % self.type
00218         out += "location: %s\n" % self.location
00219         if self.id and self.id != "<unknown id>":
00220             out += "id: %s\n" % self.id
00221         out += "qualifiers: \n"
00222         for qual_key in sorted(self.qualifiers):
00223             out += "    Key: %s, Value: %s\n" % (qual_key,
00224                                                self.qualifiers[qual_key])
00225         if len(self.sub_features) != 0:
00226             out += "Sub-Features\n"
00227             for sub_feature in self.sub_features:
00228                 out +="%s\n" % sub_feature
00229         return out

def Bio.SeqFeature.SeqFeature._flip (   self,
  length 
) [private]
Returns a copy of the feature with its location flipped (PRIVATE).

The argument length gives the length of the parent sequence. For
example a location 0..20 (+1 strand) with parent length 30 becomes
after flipping 10..30 (-1 strand). Strandless (None) or unknown
strand (0) remain like that - just their end points are changed.

The annotation qaulifiers are copied.

Definition at line 241 of file SeqFeature.py.

00241 
00242     def _flip(self, length):
00243         """Returns a copy of the feature with its location flipped (PRIVATE).
00244         
00245         The argument length gives the length of the parent sequence. For
00246         example a location 0..20 (+1 strand) with parent length 30 becomes
00247         after flipping 10..30 (-1 strand). Strandless (None) or unknown
00248         strand (0) remain like that - just their end points are changed.
00249 
00250         The annotation qaulifiers are copied.
00251         """
00252         return SeqFeature(location = self.location._flip(length),
00253             type = self.type,
00254             location_operator = self.location_operator,
00255             id = self.id,
00256             qualifiers = dict(self.qualifiers.iteritems()),
00257             sub_features = [f._flip(length) for f in self.sub_features[::-1]])

def Bio.SeqFeature.SeqFeature._get_ref (   self) [private]

Definition at line 170 of file SeqFeature.py.

00170 
00171     def _get_ref(self):
        return self.location.ref
def Bio.SeqFeature.SeqFeature._get_ref_db (   self) [private]

Definition at line 187 of file SeqFeature.py.

00187 
00188     def _get_ref_db(self):
        return self.location.ref_db
def Bio.SeqFeature.SeqFeature._get_strand (   self) [private]

Definition at line 153 of file SeqFeature.py.

00153 
00154     def _get_strand(self):
        return self.location.strand
def Bio.SeqFeature.SeqFeature._set_ref (   self,
  value 
) [private]

Definition at line 172 of file SeqFeature.py.

00172 
00173     def _set_ref(self, value):
00174         try:
00175             self.location.ref = value
00176         except AttributeError:
00177             if self.location is None:
00178                 if value is not None:
00179                     raise ValueError("Can't set ref without a location.")
00180             else:
                raise
def Bio.SeqFeature.SeqFeature._set_ref_db (   self,
  value 
) [private]

Definition at line 189 of file SeqFeature.py.

00189 
00190     def _set_ref_db(self, value):
        self.location.ref_db = value
def Bio.SeqFeature.SeqFeature._set_strand (   self,
  value 
) [private]

Definition at line 155 of file SeqFeature.py.

00155 
00156     def _set_strand(self, value):
00157         try:
00158             self.location.strand = value
00159         except AttributeError:
00160             if self.location is None:
00161                 if value is not None:
00162                     raise ValueError("Can't set strand without a location.")
00163             else:
                raise
def Bio.SeqFeature.SeqFeature._shift (   self,
  offset 
) [private]
Returns a copy of the feature with its location shifted (PRIVATE).

The annotation qaulifiers are copied.

Definition at line 230 of file SeqFeature.py.

00230 
00231     def _shift(self, offset):
00232         """Returns a copy of the feature with its location shifted (PRIVATE).
00233 
00234         The annotation qaulifiers are copied."""
00235         return SeqFeature(location = self.location._shift(offset),
00236             type = self.type,
00237             location_operator = self.location_operator,
00238             id = self.id,
00239             qualifiers = dict(self.qualifiers.iteritems()),
00240             sub_features = [f._shift(offset) for f in self.sub_features])

def Bio.SeqFeature.SeqFeature.extract (   self,
  parent_sequence 
)
Extract feature sequence from the supplied parent sequence.

The parent_sequence can be a Seq like object or a string, and will
generally return an object of the same type. The exception to this is
a MutableSeq as the parent sequence will return a Seq object.

This should cope with complex locations including complements, joins
and fuzzy positions. Even mixed strand features should work! This
also covers features on protein sequences (e.g. domains), although
here reverse strand features are not permitted.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8,15), type="domain")
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())

Note - currently only sub-features of type "join" are supported.

Definition at line 258 of file SeqFeature.py.

00258 
00259     def extract(self, parent_sequence):
00260         """Extract feature sequence from the supplied parent sequence.
00261 
00262         The parent_sequence can be a Seq like object or a string, and will
00263         generally return an object of the same type. The exception to this is
00264         a MutableSeq as the parent sequence will return a Seq object.
00265 
00266         This should cope with complex locations including complements, joins
00267         and fuzzy positions. Even mixed strand features should work! This
00268         also covers features on protein sequences (e.g. domains), although
00269         here reverse strand features are not permitted.
00270 
00271         >>> from Bio.Seq import Seq
00272         >>> from Bio.Alphabet import generic_protein
00273         >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
00274         >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
00275         >>> f = SeqFeature(FeatureLocation(8,15), type="domain")
00276         >>> f.extract(seq)
00277         Seq('VALIVIC', ProteinAlphabet())
00278 
00279         Note - currently only sub-features of type "join" are supported.
00280         """
00281         if isinstance(parent_sequence, MutableSeq):
00282             #This avoids complications with reverse complements
00283             #(the MutableSeq reverse complement acts in situ)
00284             parent_sequence = parent_sequence.toseq()
00285         if self.sub_features:
00286             if self.location_operator!="join":
00287                 raise ValueError(self.location_operator)
00288             if self.location.strand == -1:
00289                 #This is a special case given how the GenBank parser works.
00290                 #Must avoid doing the reverse complement twice.
00291                 parts = []
00292                 for f_sub in self.sub_features[::-1]:
00293                     assert f_sub.location.strand==-1
00294                     parts.append(f_sub.location.extract(parent_sequence))
00295             else:
00296                 #This copes with mixed strand features:
00297                 parts = [f_sub.location.extract(parent_sequence) \
00298                          for f_sub in self.sub_features]
00299             #We use addition rather than a join to avoid alphabet issues:
00300             f_seq = parts[0]
00301             for part in parts[1:] : f_seq += part
00302             return f_seq
00303         else:
00304             return self.location.extract(parent_sequence)
    

Member Data Documentation

Definition at line 141 of file SeqFeature.py.

Definition at line 135 of file SeqFeature.py.

Definition at line 138 of file SeqFeature.py.

Definition at line 144 of file SeqFeature.py.

Definition at line 147 of file SeqFeature.py.

Definition at line 137 of file SeqFeature.py.


Property Documentation

Initial value:
property(fget = _get_ref, fset = _set_ref,
                   doc = """Feature location reference (e.g. accession).      This is a shortcut for feature.location.ref      """)

Definition at line 181 of file SeqFeature.py.

Initial value:
property(fget = _get_ref_db, fset = _set_ref_db,
                      doc = """Feature location reference's database.      This is a shortcut for feature.location.ref_db      """)

Definition at line 191 of file SeqFeature.py.

Initial value:
property(fget = _get_strand, fset = _set_strand,
                      doc = """Feature's strand      This is a shortcut for feature.location.strand      """)

Definition at line 164 of file SeqFeature.py.


The documentation for this class was generated from the following file: