Back to index

python-biopython  1.60
Public Member Functions | Static Public Attributes | Private Member Functions
Bio.SeqIO.InsdcIO.EmblWriter Class Reference
Inheritance diagram for Bio.SeqIO.InsdcIO.EmblWriter:
Inheritance graph
[legend]
Collaboration diagram for Bio.SeqIO.InsdcIO.EmblWriter:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def write_record

Static Public Attributes

int HEADER_WIDTH = 5
int QUALIFIER_INDENT = 21
string QUALIFIER_INDENT_STR = "FT"
string QUALIFIER_INDENT_TMP = "FT %s "
string FEATURE_HEADER = "FH Key Location/Qualifiers\n"
int MAX_WIDTH = 80

Private Member Functions

def _write_contig
def _write_sequence
def _write_single_line
def _write_multi_line
def _write_the_first_lines
def _get_data_division
def _write_references
def _write_comment

Detailed Description

Definition at line 761 of file InsdcIO.py.


Member Function Documentation

def Bio.SeqIO.InsdcIO.EmblWriter._get_data_division (   self,
  record 
) [private]

Definition at line 901 of file InsdcIO.py.

00901 
00902     def _get_data_division(self, record):
00903         try:
00904             division = record.annotations["data_file_division"]
00905         except KeyError:
00906             division = "UNC"
00907         if division in ["PHG", "ENV", "FUN", "HUM", "INV", "MAM", "VRT",
00908                         "MUS", "PLN", "PRO", "ROD", "SYN", "TGN", "UNC",
00909                         "VRL", "XXX"]:
00910             #Good, already EMBL style
00911             #    Division                 Code
00912             #    -----------------        ----
00913             #    Bacteriophage            PHG
00914             #    Environmental Sample     ENV
00915             #    Fungal                   FUN
00916             #    Human                    HUM
00917             #    Invertebrate             INV
00918             #    Other Mammal             MAM
00919             #    Other Vertebrate         VRT
00920             #    Mus musculus             MUS
00921             #    Plant                    PLN
00922             #    Prokaryote               PRO
00923             #    Other Rodent             ROD
00924             #    Synthetic                SYN
00925             #    Transgenic               TGN
00926             #    Unclassified             UNC (i.e. unknown)
00927             #    Viral                    VRL
00928             #
00929             #(plus XXX used for submiting data to EMBL)
00930             pass
00931         else:
00932             #See if this is in GenBank style & can be converted.
00933             #Generally a problem as the GenBank groups are wider
00934             #than those of EMBL. Note that GenBank use "BCT" for
00935             #both bacteria and acherea thus this maps to EMBL's
00936             #"PRO" nicely.
00937             gbk_to_embl = {"BCT":"PRO",
00938                            "UNK":"UNC",
00939                            }
00940             try:
00941                 division = gbk_to_embl[division]
00942             except KeyError:
00943                 division = "UNC"
00944         assert len(division)==3
00945         return division

def Bio.SeqIO.InsdcIO.EmblWriter._write_comment (   self,
  record 
) [private]

Definition at line 976 of file InsdcIO.py.

00976 
00977     def _write_comment(self, record):
00978         #This is a bit complicated due to the range of possible
00979         #ways people might have done their annotation...
00980         #Currently the parser uses a single string with newlines.
00981         #A list of lines is also reasonable.
00982         #A single (long) string is perhaps the most natural of all.
00983         #This means we may need to deal with line wrapping.
00984         comment = record.annotations["comment"]
00985         if isinstance(comment, basestring):
00986             lines = comment.split("\n")
00987         elif isinstance(comment, list) or isinstance(comment, tuple):
00988             lines = comment
00989         else:
00990             raise ValueError("Could not understand comment annotation")
00991         #TODO - Merge this with the GenBank comment code?
00992         if not lines : return
00993         for line in lines:
00994             self._write_multi_line("CC", line)
00995         self.handle.write("XX\n")

Here is the call graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_contig (   self,
  record 
) [private]

Definition at line 768 of file InsdcIO.py.

00768 
00769     def _write_contig(self, record):
00770         max_len = self.MAX_WIDTH - self.HEADER_WIDTH
00771         lines = self._split_contig(record, max_len)
00772         for text in lines:
00773             self._write_single_line("CO", text)

Here is the call graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_multi_line (   self,
  tag,
  text 
) [private]

Definition at line 835 of file InsdcIO.py.

00835 
00836     def _write_multi_line(self, tag, text):
00837         max_len = self.MAX_WIDTH - self.HEADER_WIDTH
00838         lines = self._split_multi_line(text, max_len)
00839         for line in lines :
00840             self._write_single_line(tag, line)
        

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_references (   self,
  record 
) [private]

Definition at line 946 of file InsdcIO.py.

00946 
00947     def _write_references(self, record):
00948         #The order should be RN, RC, RP, RX, RG, RA, RT, RL
00949         number = 0
00950         for ref in record.annotations["references"]:
00951             if not isinstance(ref, SeqFeature.Reference):
00952                 continue
00953             number += 1
00954             self._write_single_line("RN", "[%i]" % number)
00955             #TODO - support for RC line (needed in parser too)
00956             #TODO - support more complex record reference locations?
00957             if ref.location and len(ref.location)==1:
00958                 self._write_single_line("RP", "%i-%i" % (ref.location[0].nofuzzy_start+1,
00959                                                          ref.location[0].nofuzzy_end))
00960             #TODO - record any DOI or AGRICOLA identifier in the reference object?
00961             if ref.pubmed_id:
00962                 self._write_single_line("RX", "PUBMED; %s." % ref.pubmed_id)
00963             if ref.consrtm:
00964                 self._write_single_line("RG", "%s" % ref.consrtm)
00965             if ref.authors:
00966                 #We store the AUTHORS data as a single string
00967                 self._write_multi_line("RA", ref.authors+";")
00968             if ref.title:
00969                 #We store the title as a single string
00970                 self._write_multi_line("RT", '"%s";' % ref.title)
00971             if ref.journal:
00972                 #We store this as a single string - holds the journal name,
00973                 #volume, year, and page numbers of the citation
00974                 self._write_multi_line("RL", ref.journal)
00975             self.handle.write("XX\n")

Here is the call graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_sequence (   self,
  record 
) [private]

Definition at line 774 of file InsdcIO.py.

00774 
00775     def _write_sequence(self, record):
00776         LETTERS_PER_BLOCK = 10
00777         BLOCKS_PER_LINE = 6
00778         LETTERS_PER_LINE = LETTERS_PER_BLOCK * BLOCKS_PER_LINE
00779         POSITION_PADDING = 10
00780         handle = self.handle #save looking up this multiple times
00781         
00782         if isinstance(record.seq, UnknownSeq):
00783             #We have already recorded the length, and there is no need
00784             #to record a long sequence of NNNNNNN...NNN or whatever.
00785             if "contig" in record.annotations:
00786                 self._write_contig(record)
00787             else:
00788                 #TODO - Can the sequence just be left out as in GenBank files?
00789                 handle.write("SQ   \n")
00790             return
00791 
00792         #Catches sequence being None
00793         data = self._get_seq_string(record).lower()
00794         seq_len = len(data)
00795 
00796         #Get the base alphabet (underneath any Gapped or StopCodon encoding)
00797         a = Alphabet._get_base_alphabet(record.seq.alphabet)
00798         if isinstance(a, Alphabet.DNAAlphabet):
00799             #TODO - What if we have RNA?
00800             a_count = data.count('A') + data.count('a')
00801             c_count = data.count('C') + data.count('c')
00802             g_count = data.count('G') + data.count('g')
00803             t_count = data.count('T') + data.count('t')
00804             other = seq_len - (a_count + c_count + g_count + t_count)
00805             handle.write("SQ   Sequence %i BP; %i A; %i C; %i G; %i T; %i other;\n" \
00806                          % (seq_len, a_count, c_count, g_count, t_count, other))
00807         else:
00808             handle.write("SQ   \n")
00809         
00810         for line_number in range(0, seq_len // LETTERS_PER_LINE):
00811             handle.write("    ") #Just four, not five
00812             for block in range(BLOCKS_PER_LINE) :
00813                 index = LETTERS_PER_LINE*line_number + LETTERS_PER_BLOCK*block
00814                 handle.write((" %s" % data[index:index+LETTERS_PER_BLOCK]))
00815             handle.write(str((line_number+1)
00816                              *LETTERS_PER_LINE).rjust(POSITION_PADDING))
00817             handle.write("\n")
00818         if seq_len % LETTERS_PER_LINE:
00819             #Final (partial) line
00820             line_number = (seq_len // LETTERS_PER_LINE)
00821             handle.write("    ") #Just four, not five
00822             for block in range(BLOCKS_PER_LINE) :
00823                 index = LETTERS_PER_LINE*line_number + LETTERS_PER_BLOCK*block
00824                 handle.write((" %s" % data[index:index+LETTERS_PER_BLOCK]).ljust(11))
00825             handle.write(str(seq_len).rjust(POSITION_PADDING))
00826             handle.write("\n")

Here is the call graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_single_line (   self,
  tag,
  text 
) [private]

Definition at line 827 of file InsdcIO.py.

00827 
00828     def _write_single_line(self, tag, text):
00829         assert len(tag)==2
00830         line = tag+"   "+text
00831         if len(text) > self.MAX_WIDTH:
00832             import warnings
00833             warnings.warn("Line %r too long" % line)
00834         self.handle.write(line+"\n")

Here is the caller graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter._write_the_first_lines (   self,
  record 
) [private]
Write the ID and AC lines.

Definition at line 841 of file InsdcIO.py.

00841 
00842     def _write_the_first_lines(self, record):
00843         """Write the ID and AC lines."""
00844         if "." in record.id and record.id.rsplit(".", 1)[1].isdigit():
00845             version = "SV " + record.id.rsplit(".", 1)[1]
00846             accession = self._get_annotation_str(record, "accession",
00847                                                  record.id.rsplit(".", 1)[0],
00848                                                  just_first=True)
00849         else :
00850             version = ""
00851             accession = self._get_annotation_str(record, "accession",
00852                                                  record.id,
00853                                                  just_first=True)
00854         
00855         if ";" in accession :
00856             raise ValueError("Cannot have semi-colon in EMBL accession, %s" \
00857                              % repr(str(accession)))
00858         if " " in accession :
00859             #This is out of practicallity... might it be allowed?
00860             raise ValueError("Cannot have spaces in EMBL accession, %s" \
00861                              % repr(str(accession)))
00862 
00863         #Get the molecule type
00864         #TODO - record this explicitly in the parser?
00865         #Get the base alphabet (underneath any Gapped or StopCodon encoding)
00866         a = Alphabet._get_base_alphabet(record.seq.alphabet)
00867         if not isinstance(a, Alphabet.Alphabet):
00868             raise TypeError("Invalid alphabet")
00869         elif isinstance(a, Alphabet.DNAAlphabet):
00870             mol_type = "DNA"
00871             units = "BP"
00872         elif isinstance(a, Alphabet.RNAAlphabet):
00873             mol_type = "RNA"
00874             units = "BP"
00875         elif isinstance(a, Alphabet.ProteinAlphabet):
00876             mol_type = "PROTEIN"
00877             units = "AA"
00878         else:
00879             #Must be something like NucleotideAlphabet
00880             raise ValueError("Need a DNA, RNA or Protein alphabet")
00881 
00882         #Get the taxonomy division
00883         division = self._get_data_division(record)
00884 
00885         #TODO - Full ID line
00886         handle = self.handle
00887         #ID   <1>; SV <2>; <3>; <4>; <5>; <6>; <7> BP.
00888         #1. Primary accession number
00889         #2. Sequence version number
00890         #3. Topology: 'circular' or 'linear'
00891         #4. Molecule type
00892         #5. Data class
00893         #6. Taxonomic division
00894         #7. Sequence length
00895         self._write_single_line("ID", "%s; %s; ; %s; ; %s; %i %s." \
00896                                 % (accession, version, mol_type,
00897                                    division, len(record), units))
00898         handle.write("XX\n")
00899         self._write_single_line("AC", accession+";")
00900         handle.write("XX\n")

Here is the call graph for this function:

def Bio.SeqIO.InsdcIO.EmblWriter.write_record (   self,
  record 
)
Write a single record to the output file.

Definition at line 996 of file InsdcIO.py.

00996 
00997     def write_record(self, record):
00998         """Write a single record to the output file."""
00999 
01000         handle = self.handle
01001         self._write_the_first_lines(record)
01002 
01003         #PR line (0 or 1 lines only), project identifier
01004         for xref in record.dbxrefs:
01005             if xref.startswith("Project:"):
01006                 self._write_single_line("PR", xref+";")
01007                 handle.write("XX\n")
01008                 break
01009 
01010         #TODO - DT lines (date)
01011 
01012         descr = record.description
01013         if descr == "<unknown description>" : descr = "."
01014         self._write_multi_line("DE", descr)
01015         handle.write("XX\n")
01016 
01017         #Should this be "source" or "organism"?
01018         self._write_multi_line("OS", self._get_annotation_str(record, "organism"))
01019         try:
01020             #List of strings
01021             taxonomy = "; ".join(record.annotations["taxonomy"]) + "."
01022         except KeyError:
01023             taxonomy = "."
01024         self._write_multi_line("OC", taxonomy)
01025         handle.write("XX\n")
01026 
01027         if "references" in record.annotations:
01028             self._write_references(record)
01029 
01030         if "comment" in record.annotations:
01031             self._write_comment(record)
01032 
01033         handle.write(self.FEATURE_HEADER)
01034         rec_length = len(record)
01035         for feature in record.features:
01036             self._write_feature(feature, rec_length)
01037 
01038         self._write_sequence(record)
01039         handle.write("//\n")

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

string Bio.SeqIO.InsdcIO.EmblWriter.FEATURE_HEADER = "FH Key Location/Qualifiers\n" [static]

Reimplemented in Bio.SeqIO.InsdcIO.ImgtWriter.

Definition at line 766 of file InsdcIO.py.

Reimplemented in Bio.SeqIO.InsdcIO.ImgtWriter.

Definition at line 762 of file InsdcIO.py.

int Bio.SeqIO.InsdcIO._InsdcWriter.MAX_WIDTH = 80 [static, inherited]

Definition at line 236 of file InsdcIO.py.

Reimplemented from Bio.SeqIO.InsdcIO._InsdcWriter.

Reimplemented in Bio.SeqIO.InsdcIO.ImgtWriter.

Definition at line 763 of file InsdcIO.py.

Reimplemented from Bio.SeqIO.InsdcIO._InsdcWriter.

Reimplemented in Bio.SeqIO.InsdcIO.ImgtWriter.

Definition at line 764 of file InsdcIO.py.

Reimplemented from Bio.SeqIO.InsdcIO._InsdcWriter.

Reimplemented in Bio.SeqIO.InsdcIO.ImgtWriter.

Definition at line 765 of file InsdcIO.py.


The documentation for this class was generated from the following file: