Back to index

python-biopython  1.60
Public Member Functions | Private Member Functions | Private Attributes
Bio.Align.MultipleSeqAlignment Class Reference

List of all members.

Public Member Functions

def __init__
def extend
def append
def __add__
def __getitem__
def sort
def get_column
def add_sequence

Private Member Functions

def _append

Private Attributes

 _alphabet
 _records

Detailed Description

Represents a classical multiple sequence alignment (MSA).

By this we mean a collection of sequences (usually shown as rows) which
are all the same length (usually with gap characters for insertions or
padding). The data can then be regarded as a matrix of letters, with well
defined columns.

You would typically create an MSA by loading an alignment file with the
AlignIO module:

>>> from Bio import AlignIO
>>> align = AlignIO.read("Clustalw/opuntia.aln", "clustal")
>>> print align
SingleLetterAlphabet() alignment with 7 rows and 156 columns
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191
TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191

In some respects you can treat these objects as lists of SeqRecord objects,
each representing a row of the alignment. Iterating over an alignment gives
the SeqRecord object for each row:

>>> len(align)
7
>>> for record in align:
...     print record.id, len(record)
gi|6273285|gb|AF191659.1|AF191 156
gi|6273284|gb|AF191658.1|AF191 156
gi|6273287|gb|AF191661.1|AF191 156
gi|6273286|gb|AF191660.1|AF191 156
gi|6273290|gb|AF191664.1|AF191 156
gi|6273289|gb|AF191663.1|AF191 156
gi|6273291|gb|AF191665.1|AF191 156

You can also access individual rows as SeqRecord objects via their index:

>>> print align[0].id
gi|6273285|gb|AF191659.1|AF191
>>> print align[-1].id
gi|6273291|gb|AF191665.1|AF191

And extract columns as strings:

>>> print align[:,1]
AAAAAAA

Or, take just the first ten columns as a sub-alignment:

>>> print align[:,:10]
SingleLetterAlphabet() alignment with 7 rows and 10 columns
TATACATTAA gi|6273285|gb|AF191659.1|AF191
TATACATTAA gi|6273284|gb|AF191658.1|AF191
TATACATTAA gi|6273287|gb|AF191661.1|AF191
TATACATAAA gi|6273286|gb|AF191660.1|AF191
TATACATTAA gi|6273290|gb|AF191664.1|AF191
TATACATTAA gi|6273289|gb|AF191663.1|AF191
TATACATTAA gi|6273291|gb|AF191665.1|AF191

Combining this alignment slicing with alignment addition allows you to
remove a section of the alignment. For example, taking just the first
and last ten columns:

>>> print align[:,:10] + align[:,-10:]
SingleLetterAlphabet() alignment with 7 rows and 20 columns
TATACATTAAGTGTACCAGA gi|6273285|gb|AF191659.1|AF191
TATACATTAAGTGTACCAGA gi|6273284|gb|AF191658.1|AF191
TATACATTAAGTGTACCAGA gi|6273287|gb|AF191661.1|AF191
TATACATAAAGTGTACCAGA gi|6273286|gb|AF191660.1|AF191
TATACATTAAGTGTACCAGA gi|6273290|gb|AF191664.1|AF191
TATACATTAAGTATACCAGA gi|6273289|gb|AF191663.1|AF191
TATACATTAAGTGTACCAGA gi|6273291|gb|AF191665.1|AF191

Note - This object is intended to replace the existing Alignment object
defined in module Bio.Align.Generic but is not fully backwards compatible
with it.

Note - This object does NOT attempt to model the kind of alignments used
in next generation sequencing with multiple sequencing reads which are
much shorter than the alignment, and where there is usually a consensus or
reference sequence with special status.

Definition at line 20 of file __init__.py.


Constructor & Destructor Documentation

def Bio.Align.MultipleSeqAlignment.__init__ (   self,
  records,
  alphabet = None 
)
Initialize a new MultipleSeqAlignment object.

Arguments:
 - records - A list (or iterator) of SeqRecord objects, whose
     sequences are all the same length.  This may be an be an
     empty list.
 - alphabet - The alphabet for the whole alignment, typically a gapped
      alphabet, which should be a super-set of the individual
      record alphabets.  If omitted, a consensus alphabet is
      used.

You would normally load a MSA from a file using Bio.AlignIO, but you
can do this from a list of SeqRecord objects too:

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
>>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
>>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
>>> align = MultipleSeqAlignment([a, b, c])
>>> print align
DNAAlphabet() alignment with 3 rows and 7 columns
AAAACGT Alpha
AAA-CGT Beta
AAAAGGT Gamma

NOTE - The older Bio.Align.Generic.Alignment class only accepted a
single argument, an alphabet.  This is still supported via a backwards
compatible "hack" so as not to disrupt existing scripts and users, but
is deprecated and will be removed in a future release.

Definition at line 107 of file __init__.py.

00107 
00108     def __init__(self, records, alphabet=None):
00109         """Initialize a new MultipleSeqAlignment object.
00110 
00111         Arguments:
00112          - records - A list (or iterator) of SeqRecord objects, whose
00113                      sequences are all the same length.  This may be an be an
00114                      empty list.
00115          - alphabet - The alphabet for the whole alignment, typically a gapped
00116                       alphabet, which should be a super-set of the individual
00117                       record alphabets.  If omitted, a consensus alphabet is
00118                       used.
00119 
00120         You would normally load a MSA from a file using Bio.AlignIO, but you
00121         can do this from a list of SeqRecord objects too:
00122 
00123         >>> from Bio.Alphabet import generic_dna
00124         >>> from Bio.Seq import Seq
00125         >>> from Bio.SeqRecord import SeqRecord
00126         >>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
00127         >>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
00128         >>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
00129         >>> align = MultipleSeqAlignment([a, b, c])
00130         >>> print align
00131         DNAAlphabet() alignment with 3 rows and 7 columns
00132         AAAACGT Alpha
00133         AAA-CGT Beta
00134         AAAAGGT Gamma
00135 
00136         NOTE - The older Bio.Align.Generic.Alignment class only accepted a
00137         single argument, an alphabet.  This is still supported via a backwards
00138         compatible "hack" so as not to disrupt existing scripts and users, but
00139         is deprecated and will be removed in a future release.
00140         """
00141         if isinstance(records, Alphabet.Alphabet) \
00142         or isinstance(records, Alphabet.AlphabetEncoder):
00143             if alphabet is None:
00144                 #TODO - Remove this backwards compatible mode!                
00145                 alphabet = records
00146                 records = []
00147                 import warnings
00148                 warnings.warn("Invalid records argument: While the old "
00149                               "Bio.Align.Generic.Alignment class only "
00150                               "accepted a single argument (the alphabet), the "
00151                               "newer Bio.Align.MultipleSeqAlignment class "
00152                               "expects a list/iterator of SeqRecord objects "
00153                               "(which can be an empty list) and an optional "
00154                               "alphabet argument")
00155             else :
00156                 raise ValueError("Invalid records argument")
00157         if alphabet is not None :
00158             if not (isinstance(alphabet, Alphabet.Alphabet) \
00159             or isinstance(alphabet, Alphabet.AlphabetEncoder)):
00160                 raise ValueError("Invalid alphabet argument")
00161             self._alphabet = alphabet
00162         else :
00163             #Default while we add sequences, will take a consensus later
00164             self._alphabet = Alphabet.single_letter_alphabet
00165 
00166         self._records = []
00167         if records:
00168             self.extend(records)
00169             if alphabet is None:
00170                 #No alphabet was given, take a consensus alphabet
00171                 self._alphabet = Alphabet._consensus_alphabet(rec.seq.alphabet for \
00172                                                               rec in self._records \
00173                                                               if rec.seq is not None)

Here is the caller graph for this function:


Member Function Documentation

def Bio.Align.MultipleSeqAlignment.__add__ (   self,
  other 
)
Combines to alignments with the same number of rows by adding them.

If you have two multiple sequence alignments (MSAs), there are two ways to think
about adding them - by row or by column. Using the extend method adds by row.
Using the addition operator adds by column. For example,

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> a1 = SeqRecord(Seq("AAAAC", generic_dna), id="Alpha")
>>> b1 = SeqRecord(Seq("AAA-C", generic_dna), id="Beta")
>>> c1 = SeqRecord(Seq("AAAAG", generic_dna), id="Gamma")
>>> a2 = SeqRecord(Seq("GT", generic_dna), id="Alpha")
>>> b2 = SeqRecord(Seq("GT", generic_dna), id="Beta")
>>> c2 = SeqRecord(Seq("GT", generic_dna), id="Gamma")
>>> left = MultipleSeqAlignment([a1, b1, c1])
>>> right = MultipleSeqAlignment([a2, b2, c2])

Now, let's look at these two alignments:

>>> print left
DNAAlphabet() alignment with 3 rows and 5 columns
AAAAC Alpha
AAA-C Beta
AAAAG Gamma
>>> print right
DNAAlphabet() alignment with 3 rows and 2 columns
GT Alpha
GT Beta
GT Gamma

And add them:

>>> print left + right
DNAAlphabet() alignment with 3 rows and 7 columns
AAAACGT Alpha
AAA-CGT Beta
AAAAGGT Gamma

For this to work, both alignments must have the same number of records (here
they both have 3 rows):

>>> len(left)
3
>>> len(right)
3

The individual rows are SeqRecord objects, and these can be added together. Refer
to the SeqRecord documentation for details of how the annotation is handled. This
example is a special case in that both original alignments shared the same names,
meaning when the rows are added they also get the same name.

Definition at line 301 of file __init__.py.

00301 
00302     def __add__(self, other):
00303         """Combines to alignments with the same number of rows by adding them.
00304 
00305         If you have two multiple sequence alignments (MSAs), there are two ways to think
00306         about adding them - by row or by column. Using the extend method adds by row.
00307         Using the addition operator adds by column. For example,
00308 
00309         >>> from Bio.Alphabet import generic_dna
00310         >>> from Bio.Seq import Seq
00311         >>> from Bio.SeqRecord import SeqRecord
00312         >>> from Bio.Align import MultipleSeqAlignment
00313         >>> a1 = SeqRecord(Seq("AAAAC", generic_dna), id="Alpha")
00314         >>> b1 = SeqRecord(Seq("AAA-C", generic_dna), id="Beta")
00315         >>> c1 = SeqRecord(Seq("AAAAG", generic_dna), id="Gamma")
00316         >>> a2 = SeqRecord(Seq("GT", generic_dna), id="Alpha")
00317         >>> b2 = SeqRecord(Seq("GT", generic_dna), id="Beta")
00318         >>> c2 = SeqRecord(Seq("GT", generic_dna), id="Gamma")
00319         >>> left = MultipleSeqAlignment([a1, b1, c1])
00320         >>> right = MultipleSeqAlignment([a2, b2, c2])
00321 
00322         Now, let's look at these two alignments:
00323 
00324         >>> print left
00325         DNAAlphabet() alignment with 3 rows and 5 columns
00326         AAAAC Alpha
00327         AAA-C Beta
00328         AAAAG Gamma
00329         >>> print right
00330         DNAAlphabet() alignment with 3 rows and 2 columns
00331         GT Alpha
00332         GT Beta
00333         GT Gamma
00334 
00335         And add them:
00336 
00337         >>> print left + right
00338         DNAAlphabet() alignment with 3 rows and 7 columns
00339         AAAACGT Alpha
00340         AAA-CGT Beta
00341         AAAAGGT Gamma
00342 
00343         For this to work, both alignments must have the same number of records (here
00344         they both have 3 rows):
00345 
00346         >>> len(left)
00347         3
00348         >>> len(right)
00349         3
00350 
00351         The individual rows are SeqRecord objects, and these can be added together. Refer
00352         to the SeqRecord documentation for details of how the annotation is handled. This
00353         example is a special case in that both original alignments shared the same names,
00354         meaning when the rows are added they also get the same name.
00355         """
00356         if not isinstance(other, MultipleSeqAlignment):
00357             raise NotImplementedError
00358         if len(self) != len(other):
00359             raise ValueError("When adding two alignments they must have the same length"
00360                              " (i.e. same number or rows)")
00361         alpha = Alphabet._consensus_alphabet([self._alphabet, other._alphabet])
00362         merged = (left+right for left,right in zip(self, other))
00363         return MultipleSeqAlignment(merged, alpha)

def Bio.Align.MultipleSeqAlignment.__getitem__ (   self,
  index 
)
Access part of the alignment.

Depending on the indices, you can get a SeqRecord object
(representing a single row), a Seq object (for a single columns),
a string (for a single characters) or another alignment
(representing some part or all of the alignment).

align[r,c] gives a single character as a string
align[r] gives a row as a SeqRecord
align[r,:] gives a row as a SeqRecord
align[:,c] gives a column as a Seq (using the alignment's alphabet)

align[:] and align[:,:] give a copy of the alignment

Anything else gives a sub alignment, e.g.
align[0:2] or align[0:2,:] uses only row 0 and 1
align[:,1:3] uses only columns 1 and 2
align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2

We'll use the following example alignment here for illustration:

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
>>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
>>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
>>> d = SeqRecord(Seq("AAAACGT", generic_dna), id="Delta")
>>> e = SeqRecord(Seq("AAA-GGT", generic_dna), id="Epsilon")
>>> align = MultipleSeqAlignment([a, b, c, d, e], generic_dna)

You can access a row of the alignment as a SeqRecord using an integer
index (think of the alignment as a list of SeqRecord objects here):

>>> first_record = align[0]
>>> print first_record.id, first_record.seq
Alpha AAAACGT
>>> last_record = align[-1]
>>> print last_record.id, last_record.seq
Epsilon AAA-GGT

You can also access use python's slice notation to create a sub-alignment
containing only some of the SeqRecord objects:

>>> sub_alignment = align[2:5]
>>> print sub_alignment
DNAAlphabet() alignment with 3 rows and 7 columns
AAAAGGT Gamma
AAAACGT Delta
AAA-GGT Epsilon

This includes support for a step, i.e. align[start:end:step], which
can be used to select every second sequence:

>>> sub_alignment = align[::2]
>>> print sub_alignment
DNAAlphabet() alignment with 3 rows and 7 columns
AAAACGT Alpha
AAAAGGT Gamma
AAA-GGT Epsilon

Or to get a copy of the alignment with the rows in reverse order:

>>> rev_alignment = align[::-1]
>>> print rev_alignment
DNAAlphabet() alignment with 5 rows and 7 columns
AAA-GGT Epsilon
AAAACGT Delta
AAAAGGT Gamma
AAA-CGT Beta
AAAACGT Alpha
    
You can also use two indices to specify both rows and columns. Using simple
integers gives you the entry as a single character string. e.g.

>>> align[3,4]
'C'

This is equivalent to:

>>> align[3][4]
'C'

or:

>>> align[3].seq[4]
'C'

To get a single column (as a string) use this syntax:

>>> align[:,4]
'CCGCG'

Or, to get part of a column,

>>> align[1:3,4]
'CG'

However, in general you get a sub-alignment,

>>> print align[1:5,3:6]
DNAAlphabet() alignment with 4 rows and 3 columns
-CG Beta
AGG Gamma
ACG Delta
-GG Epsilon

This should all seem familiar to anyone who has used the NumPy
array or matrix objects.

Definition at line 364 of file __init__.py.

00364 
00365     def __getitem__(self, index):
00366         """Access part of the alignment.
00367 
00368         Depending on the indices, you can get a SeqRecord object
00369         (representing a single row), a Seq object (for a single columns),
00370         a string (for a single characters) or another alignment
00371         (representing some part or all of the alignment).
00372 
00373         align[r,c] gives a single character as a string
00374         align[r] gives a row as a SeqRecord
00375         align[r,:] gives a row as a SeqRecord
00376         align[:,c] gives a column as a Seq (using the alignment's alphabet)
00377 
00378         align[:] and align[:,:] give a copy of the alignment
00379 
00380         Anything else gives a sub alignment, e.g.
00381         align[0:2] or align[0:2,:] uses only row 0 and 1
00382         align[:,1:3] uses only columns 1 and 2
00383         align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2
00384 
00385         We'll use the following example alignment here for illustration:
00386 
00387         >>> from Bio.Alphabet import generic_dna
00388         >>> from Bio.Seq import Seq
00389         >>> from Bio.SeqRecord import SeqRecord
00390         >>> from Bio.Align import MultipleSeqAlignment
00391         >>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
00392         >>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
00393         >>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
00394         >>> d = SeqRecord(Seq("AAAACGT", generic_dna), id="Delta")
00395         >>> e = SeqRecord(Seq("AAA-GGT", generic_dna), id="Epsilon")
00396         >>> align = MultipleSeqAlignment([a, b, c, d, e], generic_dna)
00397         
00398         You can access a row of the alignment as a SeqRecord using an integer
00399         index (think of the alignment as a list of SeqRecord objects here):
00400 
00401         >>> first_record = align[0]
00402         >>> print first_record.id, first_record.seq
00403         Alpha AAAACGT
00404         >>> last_record = align[-1]
00405         >>> print last_record.id, last_record.seq
00406         Epsilon AAA-GGT
00407 
00408         You can also access use python's slice notation to create a sub-alignment
00409         containing only some of the SeqRecord objects:
00410 
00411         >>> sub_alignment = align[2:5]
00412         >>> print sub_alignment
00413         DNAAlphabet() alignment with 3 rows and 7 columns
00414         AAAAGGT Gamma
00415         AAAACGT Delta
00416         AAA-GGT Epsilon
00417 
00418         This includes support for a step, i.e. align[start:end:step], which
00419         can be used to select every second sequence:
00420 
00421         >>> sub_alignment = align[::2]
00422         >>> print sub_alignment
00423         DNAAlphabet() alignment with 3 rows and 7 columns
00424         AAAACGT Alpha
00425         AAAAGGT Gamma
00426         AAA-GGT Epsilon
00427 
00428         Or to get a copy of the alignment with the rows in reverse order:
00429 
00430         >>> rev_alignment = align[::-1]
00431         >>> print rev_alignment
00432         DNAAlphabet() alignment with 5 rows and 7 columns
00433         AAA-GGT Epsilon
00434         AAAACGT Delta
00435         AAAAGGT Gamma
00436         AAA-CGT Beta
00437         AAAACGT Alpha
00438     
00439         You can also use two indices to specify both rows and columns. Using simple
00440         integers gives you the entry as a single character string. e.g.
00441 
00442         >>> align[3,4]
00443         'C'
00444 
00445         This is equivalent to:
00446 
00447         >>> align[3][4]
00448         'C'
00449 
00450         or:
00451 
00452         >>> align[3].seq[4]
00453         'C'
00454 
00455         To get a single column (as a string) use this syntax:
00456 
00457         >>> align[:,4]
00458         'CCGCG'
00459 
00460         Or, to get part of a column,
00461 
00462         >>> align[1:3,4]
00463         'CG'
00464 
00465         However, in general you get a sub-alignment,
00466 
00467         >>> print align[1:5,3:6]
00468         DNAAlphabet() alignment with 4 rows and 3 columns
00469         -CG Beta
00470         AGG Gamma
00471         ACG Delta
00472         -GG Epsilon
00473 
00474         This should all seem familiar to anyone who has used the NumPy
00475         array or matrix objects.
00476         """
00477         if isinstance(index, int):
00478             #e.g. result = align[x]
00479             #Return a SeqRecord
00480             return self._records[index]
00481         elif isinstance(index, slice):
00482             #e.g. sub_align = align[i:j:k]
00483             return MultipleSeqAlignment(self._records[index], self._alphabet)
00484         elif len(index)!=2:
00485             raise TypeError("Invalid index type.")
00486 
00487         #Handle double indexing
00488         row_index, col_index = index
00489         if isinstance(row_index, int):
00490             #e.g. row_or_part_row = align[6, 1:4], gives a SeqRecord
00491             return self._records[row_index][col_index]
00492         elif isinstance(col_index, int):
00493             #e.g. col_or_part_col = align[1:5, 6], gives a string
00494             return "".join(rec[col_index] for rec in self._records[row_index])
00495         else:
00496             #e.g. sub_align = align[1:4, 5:7], gives another alignment
00497             return MultipleSeqAlignment((rec[col_index] for rec in self._records[row_index]),
00498                                         self._alphabet)

Here is the caller graph for this function:

def Bio.Align.MultipleSeqAlignment._append (   self,
  record,
  expected_length = None 
) [private]
Helper function (PRIVATE).

Definition at line 281 of file __init__.py.

00281 
00282     def _append(self, record, expected_length=None):
00283         """Helper function (PRIVATE)."""
00284         if not isinstance(record, SeqRecord):
00285             raise TypeError("New sequence is not a SeqRecord object")
00286 
00287         #Currently the get_alignment_length() call is expensive, so we need
00288         #to avoid calling it repeatedly for __init__ and extend, hence this
00289         #private _append method
00290         if expected_length is not None and len(record) != expected_length:
00291             #TODO - Use the following more helpful error, but update unit tests
00292             #raise ValueError("New sequence is not of length %i" \
00293             #                 % self.get_alignment_length())
00294             raise ValueError("Sequences must all be the same length")
00295             
00296         #Using not self.alphabet.contains(record.seq.alphabet) needs fixing
00297         #for AlphabetEncoders (e.g. gapped versus ungapped).
00298         if not Alphabet._check_type_compatible([self._alphabet, record.seq.alphabet]):
00299             raise ValueError("New sequence's alphabet is incompatible")
00300         self._records.append(record)

Here is the caller graph for this function:

def Bio.Align.MultipleSeqAlignment.add_sequence (   self,
  descriptor,
  sequence,
  start = None,
  end = None,
  weight = 1.0 
)
Add a sequence to the alignment (DEPRECATED).

The start, end, and weight arguments are not supported! This method
only provides limited backwards compatibility with the old
Bio.Align.Generic.Alignment object. Please use the append method with
a SeqRecord instead, since add_sequence is likely to be removed in a
future release of Biopython.

Definition at line 591 of file __init__.py.

00591 
00592                      weight = 1.0):
00593         """Add a sequence to the alignment (DEPRECATED).
00594 
00595         The start, end, and weight arguments are not supported! This method
00596         only provides limited backwards compatibility with the old
00597         Bio.Align.Generic.Alignment object. Please use the append method with
00598         a SeqRecord instead, since add_sequence is likely to be removed in a
00599         future release of Biopython.
00600         """
00601         import warnings
00602         import Bio
00603         warnings.warn("The start, end, and weight arguments are not supported! This method only provides limited backwards compatibility with the old Bio.Align.Generic.Alignment object. Please use the append method with a SeqRecord instead, as the add_sequence method is likely to be removed in a future release of Biopython.", Bio.BiopythonDeprecationWarning)
00604         #Should we handle start/end/strand information somehow? What for?
00605         #TODO - Should we handle weights somehow? See also AlignInfo code...
00606         if start is not None or end is not None or weight != 1.0:
00607             raise ValueError("The add_Sequence method is obsolete, and only "
00608                              "provides limited backwards compatibily. The"
00609                              "start, end and weight arguments are not "
00610                              "supported.")
00611         self.append(SeqRecord(Seq(sequence, self._alphabet),
00612                               id = descriptor, description = descriptor))
00613 

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Align.MultipleSeqAlignment.append (   self,
  record 
)
Add one more SeqRecord object to the alignment as a new row.

This must have the same length as the original alignment (unless this is
the first record), and have an alphabet compatible with the alignment's
alphabet.

>>> from Bio import AlignIO
>>> align = AlignIO.read("Clustalw/opuntia.aln", "clustal")
>>> print align
SingleLetterAlphabet() alignment with 7 rows and 156 columns
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191
TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191
>>> len(align)
7

We'll now construct a dummy record to append as an example:

>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> dummy = SeqRecord(Seq("N"*156), id="dummy")

Now append this to the alignment,

>>> align.append(dummy)
>>> print align
SingleLetterAlphabet() alignment with 8 rows and 156 columns
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191
TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN dummy
>>> len(align)
8

Definition at line 232 of file __init__.py.

00232 
00233     def append(self, record):
00234         """Add one more SeqRecord object to the alignment as a new row.
00235 
00236         This must have the same length as the original alignment (unless this is
00237         the first record), and have an alphabet compatible with the alignment's
00238         alphabet.
00239 
00240         >>> from Bio import AlignIO
00241         >>> align = AlignIO.read("Clustalw/opuntia.aln", "clustal")
00242         >>> print align
00243         SingleLetterAlphabet() alignment with 7 rows and 156 columns
00244         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191
00245         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191
00246         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191
00247         TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191
00248         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191
00249         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191
00250         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191
00251         >>> len(align)
00252         7
00253 
00254         We'll now construct a dummy record to append as an example:
00255 
00256         >>> from Bio.Seq import Seq
00257         >>> from Bio.SeqRecord import SeqRecord
00258         >>> dummy = SeqRecord(Seq("N"*156), id="dummy")
00259 
00260         Now append this to the alignment,
00261 
00262         >>> align.append(dummy)
00263         >>> print align
00264         SingleLetterAlphabet() alignment with 8 rows and 156 columns
00265         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191
00266         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191
00267         TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191
00268         TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191
00269         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191
00270         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191
00271         TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191
00272         NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN dummy
00273         >>> len(align)
00274         8
00275 
00276         """
00277         if self._records:
00278             self._append(record, self.get_alignment_length())
00279         else:
00280             self._append(record)
    

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Align.MultipleSeqAlignment.extend (   self,
  records 
)
Add more SeqRecord objects to the alignment as rows.

They must all have the same length as the original alignment, and have
alphabets compatible with the alignment's alphabet. For example,

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
>>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
>>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
>>> d = SeqRecord(Seq("AAAACGT", generic_dna), id="Delta")
>>> e = SeqRecord(Seq("AAA-GGT", generic_dna), id="Epsilon")

First we create a small alignment (three rows):

>>> align = MultipleSeqAlignment([a, b, c])
>>> print align
DNAAlphabet() alignment with 3 rows and 7 columns
AAAACGT Alpha
AAA-CGT Beta
AAAAGGT Gamma

Now we can extend this alignment with another two rows:

>>> align.extend([d, e])
>>> print align
DNAAlphabet() alignment with 5 rows and 7 columns
AAAACGT Alpha
AAA-CGT Beta
AAAAGGT Gamma
AAAACGT Delta
AAA-GGT Epsilon

Because the alignment object allows iteration over the rows as
SeqRecords, you can use the extend method with a second alignment
(provided its sequences have the same length as the original alignment).

Definition at line 174 of file __init__.py.

00174 
00175     def extend(self, records):
00176         """Add more SeqRecord objects to the alignment as rows.
00177 
00178         They must all have the same length as the original alignment, and have
00179         alphabets compatible with the alignment's alphabet. For example,
00180 
00181         >>> from Bio.Alphabet import generic_dna
00182         >>> from Bio.Seq import Seq
00183         >>> from Bio.SeqRecord import SeqRecord
00184         >>> from Bio.Align import MultipleSeqAlignment
00185         >>> a = SeqRecord(Seq("AAAACGT", generic_dna), id="Alpha")
00186         >>> b = SeqRecord(Seq("AAA-CGT", generic_dna), id="Beta")
00187         >>> c = SeqRecord(Seq("AAAAGGT", generic_dna), id="Gamma")
00188         >>> d = SeqRecord(Seq("AAAACGT", generic_dna), id="Delta")
00189         >>> e = SeqRecord(Seq("AAA-GGT", generic_dna), id="Epsilon")
00190 
00191         First we create a small alignment (three rows):
00192 
00193         >>> align = MultipleSeqAlignment([a, b, c])
00194         >>> print align
00195         DNAAlphabet() alignment with 3 rows and 7 columns
00196         AAAACGT Alpha
00197         AAA-CGT Beta
00198         AAAAGGT Gamma
00199 
00200         Now we can extend this alignment with another two rows:
00201 
00202         >>> align.extend([d, e])
00203         >>> print align
00204         DNAAlphabet() alignment with 5 rows and 7 columns
00205         AAAACGT Alpha
00206         AAA-CGT Beta
00207         AAAAGGT Gamma
00208         AAAACGT Delta
00209         AAA-GGT Epsilon
00210 
00211         Because the alignment object allows iteration over the rows as
00212         SeqRecords, you can use the extend method with a second alignment
00213         (provided its sequences have the same length as the original alignment).
00214         """
00215         if len(self):
00216             #Use the standard method to get the length
00217             expected_length = self.get_alignment_length()
00218         else:
00219             #Take the first record's length
00220             records = iter(records) #records arg could be list or iterator
00221             try:
00222                 rec = records.next()
00223             except StopIteration:
00224                 #Special case, no records
00225                 return
00226             expected_length = len(rec)
00227             self._append(rec, expected_length)
00228             #Now continue to the rest of the records as usual
00229             
00230         for rec in records:
00231             self._append(rec, expected_length)
            

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Align.MultipleSeqAlignment.get_column (   self,
  col 
)
Returns a string containing a given column (DEPRECATED).

This is a method provided for backwards compatibility with the old
Bio.Align.Generic.Alignment object. Please use the slice notation
instead, since get_column is likely to be removed in a future release
of Biopython..

Definition at line 577 of file __init__.py.

00577 
00578     def get_column(self, col):
00579         """Returns a string containing a given column (DEPRECATED).
00580 
00581         This is a method provided for backwards compatibility with the old
00582         Bio.Align.Generic.Alignment object. Please use the slice notation
00583         instead, since get_column is likely to be removed in a future release
00584         of Biopython..
00585         """
00586         import warnings
00587         import Bio
00588         warnings.warn("This method is deprecated and is provided for backwards compatibility with the old Bio.Align.Generic.Alignment object. Please use the slice notation instead, as get_column is likely to be removed in a future release of Biopython.", Bio.BiopythonDeprecationWarning)
00589         return _Alignment.get_column(self, col)

Here is the call graph for this function:

def Bio.Align.MultipleSeqAlignment.sort (   self,
  key = None,
  reverse = False 
)
Sort the rows (SeqRecord objects) of the alignment in place.

This sorts the rows alphabetically using the SeqRecord object id by
default. The sorting can be controlled by supplying a key function
which must map each SeqRecord to a sort value.

This is useful if you want to add two alignments which use the same
record identifiers, but in a different order. For example,

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> align1 = MultipleSeqAlignment([
...              SeqRecord(Seq("ACGT", generic_dna), id="Human"),
...              SeqRecord(Seq("ACGG", generic_dna), id="Mouse"),
...              SeqRecord(Seq("ACGC", generic_dna), id="Chicken"),
...          ])
>>> align2 = MultipleSeqAlignment([
...              SeqRecord(Seq("CGGT", generic_dna), id="Mouse"),
...              SeqRecord(Seq("CGTT", generic_dna), id="Human"),
...              SeqRecord(Seq("CGCT", generic_dna), id="Chicken"),
...          ])

If you simple try and add these without sorting, you get this:

>>> print align1 + align2
DNAAlphabet() alignment with 3 rows and 8 columns
ACGTCGGT <unknown id>
ACGGCGTT <unknown id>
ACGCCGCT Chicken

Consult the SeqRecord documentation which explains why you get a
default value when annotation like the identifier doesn't match up.
However, if we sort the alignments first, then add them we get the
desired result:

>>> align1.sort()
>>> align2.sort()
>>> print align1 + align2
DNAAlphabet() alignment with 3 rows and 8 columns
ACGCCGCT Chicken
ACGTCGTT Human
ACGGCGGT Mouse

As an example using a different sort order, you could sort on the
GC content of each sequence.

>>> from Bio.SeqUtils import GC
>>> print align1
DNAAlphabet() alignment with 3 rows and 4 columns
ACGC Chicken
ACGT Human
ACGG Mouse
>>> align1.sort(key = lambda record: GC(record.seq))
>>> print align1
DNAAlphabet() alignment with 3 rows and 4 columns
ACGT Human
ACGC Chicken
ACGG Mouse

There is also a reverse argument, so if you wanted to sort by ID
but backwards:

>>> align1.sort(reverse=True)
>>> print align1
DNAAlphabet() alignment with 3 rows and 4 columns
ACGG Mouse
ACGT Human
ACGC Chicken

Definition at line 499 of file __init__.py.

00499 
00500     def sort(self, key=None, reverse=False):
00501         """Sort the rows (SeqRecord objects) of the alignment in place.
00502 
00503         This sorts the rows alphabetically using the SeqRecord object id by
00504         default. The sorting can be controlled by supplying a key function
00505         which must map each SeqRecord to a sort value.
00506 
00507         This is useful if you want to add two alignments which use the same
00508         record identifiers, but in a different order. For example,
00509 
00510         >>> from Bio.Alphabet import generic_dna
00511         >>> from Bio.Seq import Seq
00512         >>> from Bio.SeqRecord import SeqRecord
00513         >>> from Bio.Align import MultipleSeqAlignment
00514         >>> align1 = MultipleSeqAlignment([
00515         ...              SeqRecord(Seq("ACGT", generic_dna), id="Human"),
00516         ...              SeqRecord(Seq("ACGG", generic_dna), id="Mouse"),
00517         ...              SeqRecord(Seq("ACGC", generic_dna), id="Chicken"),
00518         ...          ])
00519         >>> align2 = MultipleSeqAlignment([
00520         ...              SeqRecord(Seq("CGGT", generic_dna), id="Mouse"),
00521         ...              SeqRecord(Seq("CGTT", generic_dna), id="Human"),
00522         ...              SeqRecord(Seq("CGCT", generic_dna), id="Chicken"),
00523         ...          ])
00524 
00525         If you simple try and add these without sorting, you get this:
00526 
00527         >>> print align1 + align2
00528         DNAAlphabet() alignment with 3 rows and 8 columns
00529         ACGTCGGT <unknown id>
00530         ACGGCGTT <unknown id>
00531         ACGCCGCT Chicken
00532 
00533         Consult the SeqRecord documentation which explains why you get a
00534         default value when annotation like the identifier doesn't match up.
00535         However, if we sort the alignments first, then add them we get the
00536         desired result:
00537 
00538         >>> align1.sort()
00539         >>> align2.sort()
00540         >>> print align1 + align2
00541         DNAAlphabet() alignment with 3 rows and 8 columns
00542         ACGCCGCT Chicken
00543         ACGTCGTT Human
00544         ACGGCGGT Mouse
00545 
00546         As an example using a different sort order, you could sort on the
00547         GC content of each sequence.
00548 
00549         >>> from Bio.SeqUtils import GC
00550         >>> print align1
00551         DNAAlphabet() alignment with 3 rows and 4 columns
00552         ACGC Chicken
00553         ACGT Human
00554         ACGG Mouse
00555         >>> align1.sort(key = lambda record: GC(record.seq))
00556         >>> print align1
00557         DNAAlphabet() alignment with 3 rows and 4 columns
00558         ACGT Human
00559         ACGC Chicken
00560         ACGG Mouse
00561 
00562         There is also a reverse argument, so if you wanted to sort by ID
00563         but backwards:
00564 
00565         >>> align1.sort(reverse=True)
00566         >>> print align1
00567         DNAAlphabet() alignment with 3 rows and 4 columns
00568         ACGG Mouse
00569         ACGT Human
00570         ACGC Chicken
00571 
00572         """
00573         if key is None:
00574             self._records.sort(key = lambda r: r.id, reverse = reverse)
00575         else:
00576             self._records.sort(key = key, reverse = reverse)


Member Data Documentation

Definition at line 160 of file __init__.py.

Definition at line 165 of file __init__.py.


The documentation for this class was generated from the following file: