Back to index

python-biopython  1.60
Public Member Functions
Bio.AlignIO.PhylipIO.PhylipWriter Class Reference
Inheritance diagram for Bio.AlignIO.PhylipIO.PhylipWriter:
Inheritance graph
[legend]
Collaboration diagram for Bio.AlignIO.PhylipIO.PhylipWriter:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def write_alignment

Detailed Description

Phylip alignment writer.

Definition at line 55 of file PhylipIO.py.


Member Function Documentation

def Bio.AlignIO.PhylipIO.PhylipWriter.write_alignment (   self,
  alignment,
  id_width = _PHYLIP_ID_WIDTH 
)
Use this to write (another) single alignment to an open file.

This code will write interlaced alignments (when the sequences are
longer than 50 characters).

Note that record identifiers are strictly truncated to id_width,
defaulting to the value required to comply with the PHYLIP standard.

For more information on the file format, please see:
http://evolution.genetics.washington.edu/phylip/doc/sequence.html
http://evolution.genetics.washington.edu/phylip/doc/main.html#inputfiles

Definition at line 58 of file PhylipIO.py.

00058 
00059     def write_alignment(self, alignment, id_width=_PHYLIP_ID_WIDTH):
00060         """Use this to write (another) single alignment to an open file.
00061 
00062         This code will write interlaced alignments (when the sequences are
00063         longer than 50 characters).
00064 
00065         Note that record identifiers are strictly truncated to id_width,
00066         defaulting to the value required to comply with the PHYLIP standard.
00067 
00068         For more information on the file format, please see:
00069         http://evolution.genetics.washington.edu/phylip/doc/sequence.html
00070         http://evolution.genetics.washington.edu/phylip/doc/main.html#inputfiles
00071         """
00072         handle = self.handle
00073 
00074         if len(alignment)==0:
00075             raise ValueError("Must have at least one sequence")
00076         length_of_seqs = alignment.get_alignment_length()
00077         for record in alignment:
00078             if length_of_seqs != len(record.seq):
00079                 raise ValueError("Sequences must all be the same length")
00080         if length_of_seqs <= 0:
00081             raise ValueError("Non-empty sequences are required")
00082 
00083         # Check for repeated identifiers...
00084         # Apply this test *after* cleaning the identifiers
00085         names = []
00086         for record in alignment:
00087             """
00088             Quoting the PHYLIP version 3.6 documentation:
00089 
00090             The name should be ten characters in length, filled out to
00091             the full ten characters by blanks if shorter. Any printable
00092             ASCII/ISO character is allowed in the name, except for
00093             parentheses ("(" and ")"), square brackets ("[" and "]"),
00094             colon (":"), semicolon (";") and comma (","). If you forget
00095             to extend the names to ten characters in length by blanks,
00096             the program [i.e. PHYLIP] will get out of synchronization
00097             with the contents of the data file, and an error message will
00098             result.
00099 
00100             Note that Tab characters count as only one character in the
00101             species names. Their inclusion can cause trouble.
00102             """
00103             name = record.id.strip()
00104             #Either remove the banned characters, or map them to something
00105             #else like an underscore "_" or pipe "|" character...
00106             for char in "[](),":
00107                 name = name.replace(char,"")
00108             for char in ":;":
00109                 name = name.replace(char,"|")
00110             name = name[:id_width]
00111             if name in names:
00112                 raise ValueError("Repeated name %r (originally %r), "
00113                                  "possibly due to truncation" \
00114                                  % (name, record.id))
00115             names.append(name)
00116 
00117         # From experimentation, the use of tabs is not understood by the
00118         # EMBOSS suite.  The nature of the expected white space is not
00119         # defined in the PHYLIP documentation, simply "These are in free
00120         # format, separated by blanks".  We'll use spaces to keep EMBOSS
00121         # happy.
00122         handle.write(" %i %s\n" % (len(alignment), length_of_seqs))
00123         block=0
00124         while True:
00125             for name, record in zip(names, alignment):
00126                 if block==0:
00127                     #Write name (truncated/padded to id_width characters)
00128                     #Now truncate and right pad to expected length.
00129                     handle.write(name[:id_width].ljust(id_width))
00130                 else:
00131                     #write indent
00132                     handle.write(" " * id_width)
00133                 #Write five chunks of ten letters per line...
00134                 sequence = str(record.seq)
00135                 if "." in sequence:
00136                     raise ValueError("PHYLIP format no longer allows dots in "
00137                                      "sequence")
00138                 for chunk in range(0,5):
00139                     i = block*50 + chunk*10
00140                     seq_segment = sequence[i:i+10]
00141                     #TODO - Force any gaps to be '-' character?  Look at the
00142                     #alphabet...
00143                     #TODO - How to cope with '?' or '.' in the sequence?
00144                     handle.write(" %s" % seq_segment)
00145                     if i+10 > length_of_seqs : break
00146                 handle.write("\n")
00147             block=block+1
00148             if block*50 > length_of_seqs : break
00149             handle.write("\n")


The documentation for this class was generated from the following file: