Back to index

python-biopython  1.60
Public Member Functions | Public Attributes
Bio.Seq.MutableSeq Class Reference

List of all members.

Public Member Functions

def __init__
def __repr__
def __str__
def __cmp__
def __len__
def __getitem__
def __setitem__
def __delitem__
def __add__
def __radd__
def append
def insert
def pop
def remove
def count
def index
def reverse
def complement
def reverse_complement
def extend
 Sorting a sequence makes no sense.
def tostring
def toseq

Public Attributes

 array_indicator
 data
 alphabet

Detailed Description

An editable sequence object (with an alphabet).

Unlike normal python strings and our basic sequence object (the Seq class)
which are immuatable, the MutableSeq lets you edit the sequence in place.
However, this means you cannot use a MutableSeq object as a dictionary key.

>>> from Bio.Seq import MutableSeq
>>> from Bio.Alphabet import generic_dna
>>> my_seq = MutableSeq("ACTCGTCGTCG", generic_dna)
>>> my_seq
MutableSeq('ACTCGTCGTCG', DNAAlphabet())
>>> my_seq[5]
'T'
>>> my_seq[5] = "A"
>>> my_seq
MutableSeq('ACTCGACGTCG', DNAAlphabet())
>>> my_seq[5]
'A'
>>> my_seq[5:8] = "NNN"
>>> my_seq
MutableSeq('ACTCGNNNTCG', DNAAlphabet())
>>> len(my_seq)
11

Note that the MutableSeq object does not support as many string-like
or biological methods as the Seq object.

Definition at line 1498 of file Seq.py.


Constructor & Destructor Documentation

def Bio.Seq.MutableSeq.__init__ (   self,
  data,
  alphabet = Alphabet.generic_alphabet 
)

Definition at line 1526 of file Seq.py.

01526 
01527     def __init__(self, data, alphabet = Alphabet.generic_alphabet):
01528         if sys.version_info[0] == 3:
01529             self.array_indicator = "u"
01530         else:
01531             self.array_indicator = "c"
01532         if isinstance(data, str): #TODO - What about unicode?
01533             self.data = array.array(self.array_indicator, data)
01534         else:
01535             self.data = data   # assumes the input is an array
01536         self.alphabet = alphabet
    

Member Function Documentation

def Bio.Seq.MutableSeq.__add__ (   self,
  other 
)
Add another sequence or string to this sequence.

Returns a new MutableSeq object.

Definition at line 1648 of file Seq.py.

01648 
01649     def __add__(self, other):
01650         """Add another sequence or string to this sequence.
01651 
01652         Returns a new MutableSeq object."""
01653         if hasattr(other, "alphabet"):
01654             #other should be a Seq or a MutableSeq
01655             if not Alphabet._check_type_compatible([self.alphabet,
01656                                                     other.alphabet]):
01657                 raise TypeError("Incompatable alphabets %s and %s" \
01658                                 % (repr(self.alphabet), repr(other.alphabet)))
01659             #They should be the same sequence type (or one of them is generic)
01660             a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
01661             if isinstance(other, MutableSeq):
01662                 #See test_GAQueens.py for an historic usage of a non-string
01663                 #alphabet!  Adding the arrays should support this.
01664                 return self.__class__(self.data + other.data, a)
01665             else:
01666                 return self.__class__(str(self) + str(other), a)
01667         elif isinstance(other, basestring):
01668             #other is a plain string - use the current alphabet
01669             return self.__class__(str(self) + str(other), self.alphabet)
01670         else:
01671             raise TypeError

def Bio.Seq.MutableSeq.__cmp__ (   self,
  other 
)
Compare the sequence to another sequence or a string (README).

Currently if compared to another sequence the alphabets must be
compatible. Comparing DNA to RNA, or Nucleotide to Protein will raise
an exception. Otherwise only the sequence itself is compared, not the
precise alphabet.

A future release of Biopython will change this (and the Seq object etc)
to use simple string comparison. The plan is that comparing sequences
with incompatible alphabets (e.g. DNA to RNA) will trigger a warning
but not an exception.

During this transition period, please just do explicit comparisons:

>>> seq1 = MutableSeq("ACGT")
>>> seq2 = MutableSeq("ACGT")
>>> id(seq1) == id(seq2)
False
>>> str(seq1) == str(seq2)
True

This method indirectly supports ==, < , etc.

Definition at line 1562 of file Seq.py.

01562 
01563     def __cmp__(self, other):
01564         """Compare the sequence to another sequence or a string (README).
01565 
01566         Currently if compared to another sequence the alphabets must be
01567         compatible. Comparing DNA to RNA, or Nucleotide to Protein will raise
01568         an exception. Otherwise only the sequence itself is compared, not the
01569         precise alphabet.
01570 
01571         A future release of Biopython will change this (and the Seq object etc)
01572         to use simple string comparison. The plan is that comparing sequences
01573         with incompatible alphabets (e.g. DNA to RNA) will trigger a warning
01574         but not an exception.
01575 
01576         During this transition period, please just do explicit comparisons:
01577 
01578         >>> seq1 = MutableSeq("ACGT")
01579         >>> seq2 = MutableSeq("ACGT")
01580         >>> id(seq1) == id(seq2)
01581         False
01582         >>> str(seq1) == str(seq2)
01583         True
01584 
01585         This method indirectly supports ==, < , etc.
01586         """
01587         if hasattr(other, "alphabet"):
01588             #other should be a Seq or a MutableSeq
01589             import warnings
01590             warnings.warn("In future comparing incompatible alphabets will "
01591                           "only trigger a warning (not an exception). In " 
01592                           "the interim please use id(seq1)==id(seq2) or "
01593                           "str(seq1)==str(seq2) to make your code explicit "
01594                           "and to avoid this warning.", FutureWarning)
01595             if not Alphabet._check_type_compatible([self.alphabet,
01596                                                     other.alphabet]):
01597                 raise TypeError("Incompatable alphabets %s and %s" \
01598                                 % (repr(self.alphabet), repr(other.alphabet)))
01599             #They should be the same sequence type (or one of them is generic)
01600             if isinstance(other, MutableSeq):
01601                 #See test_GAQueens.py for an historic usage of a non-string
01602                 #alphabet!  Comparing the arrays supports this.
01603                 return cmp(self.data, other.data)
01604             else:
01605                 return cmp(str(self), str(other))
01606         elif isinstance(other, basestring):
01607             return cmp(str(self), other)
01608         else:
01609             raise TypeError

def Bio.Seq.MutableSeq.__delitem__ (   self,
  index 
)

Definition at line 1640 of file Seq.py.

01640 
01641     def __delitem__(self, index):
01642         #Note since Python 2.0, __delslice__ is deprecated
01643         #and __delitem__ is used instead.
01644         #See http://docs.python.org/ref/sequence-methods.html
01645         
01646         #Could be deleting a single letter, or a slice
01647         del self.data[index]
    
def Bio.Seq.MutableSeq.__getitem__ (   self,
  index 
)

Definition at line 1612 of file Seq.py.

01612 
01613     def __getitem__(self, index):
01614         #Note since Python 2.0, __getslice__ is deprecated
01615         #and __getitem__ is used instead.
01616         #See http://docs.python.org/ref/sequence-methods.html
01617         if isinstance(index, int):
01618             #Return a single letter as a string
01619             return self.data[index]
01620         else:
01621             #Return the (sub)sequence as another Seq object
01622             return MutableSeq(self.data[index], self.alphabet)

Here is the caller graph for this function:

Definition at line 1610 of file Seq.py.

01610 
01611     def __len__(self): return len(self.data)

def Bio.Seq.MutableSeq.__radd__ (   self,
  other 
)

Definition at line 1672 of file Seq.py.

01672 
01673     def __radd__(self, other):
01674         if hasattr(other, "alphabet"):
01675             #other should be a Seq or a MutableSeq
01676             if not Alphabet._check_type_compatible([self.alphabet,
01677                                                     other.alphabet]):
01678                 raise TypeError("Incompatable alphabets %s and %s" \
01679                                 % (repr(self.alphabet), repr(other.alphabet)))
01680             #They should be the same sequence type (or one of them is generic)
01681             a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
01682             if isinstance(other, MutableSeq):
01683                 #See test_GAQueens.py for an historic usage of a non-string
01684                 #alphabet!  Adding the arrays should support this.
01685                 return self.__class__(other.data + self.data, a)
01686             else:
01687                 return self.__class__(str(other) + str(self), a)
01688         elif isinstance(other, basestring):
01689             #other is a plain string - use the current alphabet
01690             return self.__class__(str(other) + str(self), self.alphabet)
01691         else:
01692             raise TypeError

Returns a (truncated) representation of the sequence for debugging.

Definition at line 1537 of file Seq.py.

01537 
01538     def __repr__(self):
01539         """Returns a (truncated) representation of the sequence for debugging."""
01540         if len(self) > 60:
01541             #Shows the last three letters as it is often useful to see if there
01542             #is a stop codon at the end of a sequence.
01543             #Note total length is 54+3+3=60
01544             return "%s('%s...%s', %s)" % (self.__class__.__name__,
01545                                    str(self[:54]), str(self[-3:]),
01546                                    repr(self.alphabet))
01547         else:
01548             return "%s('%s', %s)" % (self.__class__.__name__,
01549                                    str(self),
01550                                    repr(self.alphabet))

def Bio.Seq.MutableSeq.__setitem__ (   self,
  index,
  value 
)

Definition at line 1623 of file Seq.py.

01623 
01624     def __setitem__(self, index, value):
01625         #Note since Python 2.0, __setslice__ is deprecated
01626         #and __setitem__ is used instead.
01627         #See http://docs.python.org/ref/sequence-methods.html
01628         if isinstance(index, int):
01629             #Replacing a single letter with a new string
01630             self.data[index] = value
01631         else:
01632             #Replacing a sub-sequence
01633             if isinstance(value, MutableSeq):
01634                 self.data[index] = value.data
01635             elif isinstance(value, type(self.data)):
01636                 self.data[index] = value
01637             else:
01638                 self.data[index] = array.array(self.array_indicator,
01639                                                str(value))

Returns the full sequence as a python string.

Note that Biopython 1.44 and earlier would give a truncated
version of repr(my_seq) for str(my_seq).  If you are writing code
which needs to be backwards compatible with old Biopython, you
should continue to use my_seq.tostring() rather than str(my_seq).

Definition at line 1551 of file Seq.py.

01551 
01552     def __str__(self):
01553         """Returns the full sequence as a python string.
01554 
01555         Note that Biopython 1.44 and earlier would give a truncated
01556         version of repr(my_seq) for str(my_seq).  If you are writing code
01557         which needs to be backwards compatible with old Biopython, you
01558         should continue to use my_seq.tostring() rather than str(my_seq).
01559         """
01560         #See test_GAQueens.py for an historic usage of a non-string alphabet!
01561         return "".join(self.data)

def Bio.Seq.MutableSeq.append (   self,
  c 
)

Definition at line 1693 of file Seq.py.

01693 
01694     def append(self, c):
01695         self.data.append(c)

Modify the mutable sequence to take on its complement.

Trying to complement a protein sequence raises an exception.

No return value.

Definition at line 1783 of file Seq.py.

01783 
01784     def complement(self):
01785         """Modify the mutable sequence to take on its complement.
01786 
01787         Trying to complement a protein sequence raises an exception.
01788 
01789         No return value.
01790         """
01791         if isinstance(Alphabet._get_base_alphabet(self.alphabet),
01792                       Alphabet.ProteinAlphabet):
01793             raise ValueError("Proteins do not have complements!")
01794         if self.alphabet in (IUPAC.ambiguous_dna, IUPAC.unambiguous_dna):
01795             d = ambiguous_dna_complement
01796         elif self.alphabet in (IUPAC.ambiguous_rna, IUPAC.unambiguous_rna):
01797             d = ambiguous_rna_complement
01798         elif 'U' in self.data and 'T' in self.data:
01799             #TODO - Handle this cleanly?
01800             raise ValueError("Mixed RNA/DNA found")
01801         elif 'U' in self.data:
01802             d = ambiguous_rna_complement
01803         else:
01804             d = ambiguous_dna_complement
01805         c = dict([(x.lower(), y.lower()) for x,y in d.iteritems()])
01806         d.update(c)
01807         self.data = map(lambda c: d[c], self.data)
01808         self.data = array.array(self.array_indicator, self.data)
        

Here is the caller graph for this function:

def Bio.Seq.MutableSeq.count (   self,
  sub,
  start = 0,
  end = sys.maxint 
)
Non-overlapping count method, like that of a python string.

This behaves like the python string method of the same name,
which does a non-overlapping count!

Returns an integer, the number of occurrences of substring
argument sub in the (sub)sequence given by [start:end].
Optional arguments start and end are interpreted as in slice
notation.
    
Arguments:
 - sub - a string or another Seq object to look for
 - start - optional integer, slice start
 - end - optional integer, slice end

e.g.

>>> from Bio.Seq import MutableSeq
>>> my_mseq = MutableSeq("AAAATGA")
>>> print my_mseq.count("A")
5
>>> print my_mseq.count("ATG")
1
>>> print my_mseq.count(Seq("AT"))
1
>>> print my_mseq.count("AT", 2, -1)
1

HOWEVER, please note because that python strings, Seq objects and
MutableSeq objects do a non-overlapping search, this may not give
the answer you expect:

>>> "AAAA".count("AA")
2
>>> print MutableSeq("AAAA").count("AA")
2

A non-overlapping search would give the answer as three!

Definition at line 1711 of file Seq.py.

01711 
01712     def count(self, sub, start=0, end=sys.maxint):
01713         """Non-overlapping count method, like that of a python string.
01714 
01715         This behaves like the python string method of the same name,
01716         which does a non-overlapping count!
01717 
01718         Returns an integer, the number of occurrences of substring
01719         argument sub in the (sub)sequence given by [start:end].
01720         Optional arguments start and end are interpreted as in slice
01721         notation.
01722     
01723         Arguments:
01724          - sub - a string or another Seq object to look for
01725          - start - optional integer, slice start
01726          - end - optional integer, slice end
01727 
01728         e.g.
01729         
01730         >>> from Bio.Seq import MutableSeq
01731         >>> my_mseq = MutableSeq("AAAATGA")
01732         >>> print my_mseq.count("A")
01733         5
01734         >>> print my_mseq.count("ATG")
01735         1
01736         >>> print my_mseq.count(Seq("AT"))
01737         1
01738         >>> print my_mseq.count("AT", 2, -1)
01739         1
01740         
01741         HOWEVER, please note because that python strings, Seq objects and
01742         MutableSeq objects do a non-overlapping search, this may not give
01743         the answer you expect:
01744 
01745         >>> "AAAA".count("AA")
01746         2
01747         >>> print MutableSeq("AAAA").count("AA")
01748         2
01749 
01750         A non-overlapping search would give the answer as three!
01751         """
01752         try:
01753             #TODO - Should we check the alphabet?
01754             search = sub.tostring()
01755         except AttributeError:
01756             search = sub
01757 
01758         if not isinstance(search, basestring):
01759             raise TypeError("expected a string, Seq or MutableSeq")
01760 
01761         if len(search) == 1:
01762             #Try and be efficient and work directly from the array.
01763             count = 0
01764             for c in self.data[start:end]:
01765                 if c == search: count += 1
01766             return count
01767         else:
01768             #TODO - Can we do this more efficiently?
01769             return self.tostring().count(search, start, end)

Here is the call graph for this function:

def Bio.Seq.MutableSeq.extend (   self,
  other 
)

Sorting a sequence makes no sense.

def sort(self, *args): self.data.sort(*args)

Definition at line 1822 of file Seq.py.

01822 
01823     def extend(self, other):
01824         if isinstance(other, MutableSeq):
01825             for c in other.data:
01826                 self.data.append(c)
01827         else:
01828             for c in other:
01829                 self.data.append(c)

def Bio.Seq.MutableSeq.index (   self,
  item 
)

Definition at line 1770 of file Seq.py.

01770 
01771     def index(self, item):
01772         for i in range(len(self.data)):
01773             if self.data[i] == item:
01774                 return i
01775         raise ValueError("MutableSeq.index(x): x not in list")

def Bio.Seq.MutableSeq.insert (   self,
  i,
  c 
)

Definition at line 1696 of file Seq.py.

01696 
01697     def insert(self, i, c):
01698         self.data.insert(i, c)

def Bio.Seq.MutableSeq.pop (   self,
  i = (-1) 
)

Definition at line 1699 of file Seq.py.

01699 
01700     def pop(self, i = (-1)):
01701         c = self.data[i]
01702         del self.data[i]
01703         return c

def Bio.Seq.MutableSeq.remove (   self,
  item 
)

Definition at line 1704 of file Seq.py.

01704 
01705     def remove(self, item):
01706         for i in range(len(self.data)):
01707             if self.data[i] == item:
01708                 del self.data[i]
01709                 return
01710         raise ValueError("MutableSeq.remove(x): x not in list")

Modify the mutable sequence to reverse itself.

No return value.

Definition at line 1776 of file Seq.py.

01776 
01777     def reverse(self):
01778         """Modify the mutable sequence to reverse itself.
01779 
01780         No return value.
01781         """
01782         self.data.reverse()

Here is the caller graph for this function:

Modify the mutable sequence to take on its reverse complement.

Trying to reverse complement a protein sequence raises an exception.

No return value.

Definition at line 1809 of file Seq.py.

01809 
01810     def reverse_complement(self):
01811         """Modify the mutable sequence to take on its reverse complement.
01812 
01813         Trying to reverse complement a protein sequence raises an exception.
01814 
01815         No return value.
01816         """
01817         self.complement()
01818         self.data.reverse()

Here is the call graph for this function:

def Bio.Seq.MutableSeq.toseq (   self)
Returns the full sequence as a new immutable Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_mseq = MutableSeq("MKQHKAMIVALIVICITAVVAAL", 
...                      IUPAC.protein)
>>> my_mseq
MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
>>> my_mseq.toseq()
Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())

Note that the alphabet is preserved.

Definition at line 1847 of file Seq.py.

01847 
01848     def toseq(self):
01849         """Returns the full sequence as a new immutable Seq object.
01850 
01851         >>> from Bio.Seq import Seq
01852         >>> from Bio.Alphabet import IUPAC
01853         >>> my_mseq = MutableSeq("MKQHKAMIVALIVICITAVVAAL", 
01854         ...                      IUPAC.protein)
01855         >>> my_mseq
01856         MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
01857         >>> my_mseq.toseq()
01858         Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
01859 
01860         Note that the alphabet is preserved.
01861         """
01862         return Seq("".join(self.data), self.alphabet)
01863 
01864 # The transcribe, backward_transcribe, and translate functions are
01865 # user-friendly versions of the corresponding functions in Bio.Transcribe
01866 # and Bio.Translate. The functions work both on Seq objects, and on strings.

Returns the full sequence as a python string (semi-obsolete).

Although not formally deprecated, you are now encouraged to use
str(my_seq) instead of my_seq.tostring().

Because str(my_seq) will give you the full sequence as a python string,
there is often no need to make an explicit conversion.  For example,

print "ID={%s}, sequence={%s}" % (my_name, my_seq)

On Biopython 1.44 or older you would have to have done this:

print "ID={%s}, sequence={%s}" % (my_name, my_seq.tostring())

Definition at line 1830 of file Seq.py.

01830 
01831     def tostring(self):
01832         """Returns the full sequence as a python string (semi-obsolete).
01833 
01834         Although not formally deprecated, you are now encouraged to use
01835         str(my_seq) instead of my_seq.tostring().
01836 
01837         Because str(my_seq) will give you the full sequence as a python string,
01838         there is often no need to make an explicit conversion.  For example,
01839         
01840         print "ID={%s}, sequence={%s}" % (my_name, my_seq)
01841 
01842         On Biopython 1.44 or older you would have to have done this:
01843 
01844         print "ID={%s}, sequence={%s}" % (my_name, my_seq.tostring())
01845         """
01846         return "".join(self.data)

Here is the caller graph for this function:


Member Data Documentation

Definition at line 1535 of file Seq.py.

Definition at line 1528 of file Seq.py.

Definition at line 1532 of file Seq.py.


The documentation for this class was generated from the following file: