Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions
Bio.GenBank.Record.Record Class Reference
Collaboration diagram for Bio.GenBank.Record.Record:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def __str__

Public Attributes

 locus
 size
 residue_type
 data_file_division
 date
 definition
 accession
 nid
 pid
 version
 projects
 dblinks
 db_source
 gi
 keywords
 segment
 source
 organism
 taxonomy
 references
 comment
 features
 base_counts
 origin
 sequence
 contig
 primary
 wgs
 wgs_scafld

Static Public Attributes

int GB_LINE_LENGTH = 79
int GB_BASE_INDENT = 12
int GB_FEATURE_INDENT = 21
int GB_INTERNAL_INDENT = 2
int GB_OTHER_INTERNAL_INDENT = 3
int GB_FEATURE_INTERNAL_INDENT = 5
int GB_SEQUENCE_INDENT = 9
string BASE_FORMAT = "%-"
string INTERNAL_FORMAT = " "
string OTHER_INTERNAL_FORMAT = " "
string BASE_FEATURE_FORMAT = "%-"
string INTERNAL_FEATURE_FORMAT = " "
string SEQUENCE_FORMAT = "%"

Private Member Functions

def _locus_line
def _definition_line
def _accession_line
def _version_line
def _project_line
def _dblink_line
def _nid_line
def _pid_line
def _keywords_line
def _db_source_line
def _segment_line
def _source_line
def _organism_line
def _comment_line
def _features_line
def _base_count_line
def _origin_line
def _sequence_line
def _wgs_line
def _wgs_scafld_line
def _contig_line

Detailed Description

Hold GenBank information in a format similar to the original record.

The Record class is meant to make data easy to get to when you are
just interested in looking at GenBank data.

Attributes:
o locus - The name specified after the LOCUS keyword in the GenBank
record. This may be the accession number, or a clone id or something else.
o size - The size of the record.
o residue_type - The type of residues making up the sequence in this
record. Normally something like RNA, DNA or PROTEIN, but may be as
esoteric as 'ss-RNA circular'.
o data_file_division - The division this record is stored under in
GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria...)
o date - The date of submission of the record, in a form like '28-JUL-1998'
o accession - list of all accession numbers for the sequence.
o nid - Nucleotide identifier number.
o pid - Proteint identifier number
o version - The accession number + version (ie. AB01234.2)
o db_source - Information about the database the record came from
o gi - The NCBI gi identifier for the record.
o keywords - A list of keywords related to the record.
o segment - If the record is one of a series, this is info about which
segment this record is (something like '1 of 6').
o source - The source of material where the sequence came from.
o organism - The genus and species of the organism (ie. 'Homo sapiens')
o taxonomy - A listing of the taxonomic classification of the organism,
starting general and getting more specific.
o references - A list of Reference objects.
o comment - Text with any kind of comment about the record.
o features - A listing of Features making up the feature table.
o base_counts - A string with the counts of bases for the sequence.
o origin - A string specifying info about the origin of the sequence.
o sequence - A string with the sequence itself.
o contig - A string of location information for a CONTIG in a RefSeq file
o project - The genome sequencing project numbers
            (will be replaced by the dblink cross-references in 2009).
o dblinks - The genome sequencing project number(s) and other links.
            (will replace the project information in 2009).

Definition at line 93 of file Record.py.


Constructor & Destructor Documentation

Definition at line 156 of file Record.py.

00156 
00157     def __init__(self):
00158         self.locus = ''
00159         self.size = ''
00160         self.residue_type = ''
00161         self.data_file_division = ''
00162         self.date = ''
00163         self.definition = ''
00164         self.accession = []
00165         self.nid = ''
00166         self.pid = ''
00167         self.version = ''
00168         self.projects = []
00169         self.dblinks = []
00170         self.db_source = ''
00171         self.gi = ''
00172         self.keywords = []
00173         self.segment = ''
00174         self.source = ''
00175         self.organism = ''
00176         self.taxonomy = []
00177         self.references = []
00178         self.comment = ''
00179         self.features = []
00180         self.base_counts = ''
00181         self.origin = ''
00182         self.sequence = ''
00183         self.contig = ''
00184         self.primary=[]
00185         self.wgs = ''
00186         self.wgs_scafld = []

Here is the caller graph for this function:


Member Function Documentation

Provide a GenBank formatted output option for a Record.

The objective of this is to provide an easy way to read in a GenBank
record, modify it somehow, and then output it in 'GenBank format.'
We are striving to make this work so that a parsed Record that is
output using this function will look exactly like the original
record.

Much of the output is based on format description info at:

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

Definition at line 187 of file Record.py.

00187 
00188     def __str__(self):
00189         """Provide a GenBank formatted output option for a Record.
00190 
00191         The objective of this is to provide an easy way to read in a GenBank
00192         record, modify it somehow, and then output it in 'GenBank format.'
00193         We are striving to make this work so that a parsed Record that is
00194         output using this function will look exactly like the original
00195         record.
00196 
00197         Much of the output is based on format description info at:
00198 
00199         ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt
00200         """
00201         output = self._locus_line()
00202         output += self._definition_line()
00203         output += self._accession_line()
00204         output += self._version_line()
00205         output += self._project_line()
00206         output += self._dblink_line()
00207         output += self._nid_line()
00208         output += self._pid_line()
00209         output += self._keywords_line()
00210         output += self._db_source_line()
00211         output += self._segment_line()
00212         output += self._source_line()
00213         output += self._organism_line()
00214         for reference in self.references:
00215             output += str(reference)
00216         output += self._comment_line()
00217         output += self._features_line()
00218         for feature in self.features:
00219             output += str(feature)
00220         output += self._base_count_line()
00221         output += self._origin_line()
00222         output += self._sequence_line()
00223         output += self._wgs_line()
00224         output += self._wgs_scafld_line()
00225         output += self._contig_line()
00226         output += "//"
00227         return output
            

Here is the call graph for this function:

Here is the caller graph for this function:

Output for the ACCESSION line.

Definition at line 268 of file Record.py.

00268 
00269     def _accession_line(self):
00270         """Output for the ACCESSION line.
00271         """
00272         if self.accession:
00273             output = Record.BASE_FORMAT % "ACCESSION"
00274 
00275             acc_info = ""
00276             for accession in self.accession:
00277                 acc_info += "%s " % accession
00278             # strip off an extra space at the end
00279             acc_info = acc_info.rstrip()
00280             output += _wrapped_genbank(acc_info, Record.GB_BASE_INDENT)
00281         else:
00282             output = ""
00283         
00284         return output

Here is the call graph for this function:

Here is the caller graph for this function:

Output for the BASE COUNT line with base information.

Definition at line 412 of file Record.py.

00412 
00413     def _base_count_line(self):
00414         """Output for the BASE COUNT line with base information.
00415         """
00416         output = ""
00417         if self.base_counts:
00418             output += Record.BASE_FORMAT % "BASE COUNT  "
00419             # split up the base counts into their individual parts
00420             count_parts = self.base_counts.split(" ")
00421             while '' in count_parts:
00422                 count_parts.remove('')
00423             # deal with the standard case, with a normal origin line
00424             # like: 474 a    356 c    428 g    364 t
00425             if len(count_parts) % 2 == 0:
00426                 while len(count_parts) > 0:
00427                     count_info = count_parts.pop(0)
00428                     count_type = count_parts.pop(0)
00429 
00430                     output += "%7s %s" % (count_info, count_type)
00431             # deal with ugly ORIGIN lines like:
00432             # 1311257 a2224835 c2190093 g1309889 t
00433             # by just outputting the raw information
00434             else:
00435                 output += self.base_counts
00436             output += "\n"
00437         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._comment_line (   self) [private]
Output for the COMMENT lines.

Definition at line 393 of file Record.py.

00393 
00394     def _comment_line(self):
00395         """Output for the COMMENT lines.
00396         """
00397         output = ""
00398         if self.comment:
00399             output += Record.BASE_FORMAT % "COMMENT"
00400             output += _indent_genbank(self.comment,
00401                                       Record.GB_BASE_INDENT)
00402         return output

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._contig_line (   self) [private]
Output for CONTIG location information from RefSeq.

Definition at line 489 of file Record.py.

00489 
00490     def _contig_line(self):
00491         """Output for CONTIG location information from RefSeq.
00492         """
00493         output = ""
00494         if self.contig:
00495             output += Record.BASE_FORMAT % "CONTIG"
00496             output += _wrapped_genbank(self.contig,
00497                                        Record.GB_BASE_INDENT, split_char = ',')
00498         return output
00499         

Here is the call graph for this function:

Here is the caller graph for this function:

Output for DBSOURCE line.

Definition at line 350 of file Record.py.

00350 
00351     def _db_source_line(self):
00352         """Output for DBSOURCE line.
00353         """
00354         if self.db_source:
00355             output = Record.BASE_FORMAT % "DBSOURCE"
00356             output += "%s\n" % self.db_source
00357         else:
00358             output = ""
00359         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._dblink_line (   self) [private]

Definition at line 304 of file Record.py.

00304 
00305     def _dblink_line(self):
00306         output = ""
00307         if len(self.dblinks) > 0:
00308             output = Record.BASE_FORMAT % "DBLINK"
00309             dblink_info = "\n".join(self.dblinks)
00310             output += _wrapped_genbank(dblink_info, Record.GB_BASE_INDENT)
00311         return output

Here is the call graph for this function:

Here is the caller graph for this function:

Provide output for the DEFINITION line.

Definition at line 261 of file Record.py.

00261 
00262     def _definition_line(self):
00263         """Provide output for the DEFINITION line.
00264         """
00265         output = Record.BASE_FORMAT % "DEFINITION"
00266         output += _wrapped_genbank(self.definition, Record.GB_BASE_INDENT)
00267         return output

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._features_line (   self) [private]
Output for the FEATURES line.

Definition at line 403 of file Record.py.

00403 
00404     def _features_line(self):
00405         """Output for the FEATURES line.
00406         """
00407         output = ""
00408         if len(self.features) > 0:
00409             output += Record.BASE_FEATURE_FORMAT % "FEATURES"
00410             output += "Location/Qualifiers\n"
00411         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._keywords_line (   self) [private]
Output for the KEYWORDS line.

Definition at line 332 of file Record.py.

00332 
00333     def _keywords_line(self):
00334         """Output for the KEYWORDS line.
00335         """
00336         output = ""
00337         if len(self.keywords) >= 0:
00338             output +=  Record.BASE_FORMAT % "KEYWORDS"
00339             keyword_info = ""
00340             for keyword in self.keywords:
00341                 keyword_info += "%s; " % keyword
00342             # replace the ; at the end with a period
00343             keyword_info = keyword_info[:-2]
00344             keyword_info += "."
00345             
00346             output += _wrapped_genbank(keyword_info,
00347                                        Record.GB_BASE_INDENT)
00348 
00349         return output

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._locus_line (   self) [private]
Provide the output string for the LOCUS line.

Definition at line 228 of file Record.py.

00228 
00229     def _locus_line(self):
00230         """Provide the output string for the LOCUS line.
00231         """
00232         output = "LOCUS"
00233         output += " " * 7 # 6-12 spaces
00234         output += "%-9s" % self.locus
00235         output += " " # 22 space
00236         output += "%7s" % self.size
00237         if self.residue_type.find("PROTEIN") >= 0:
00238             output += " aa"
00239         else:
00240             output += " bp "
00241 
00242         # treat circular types differently, since they'll have long residue
00243         # types
00244         if self.residue_type.find("circular") >= 0:
00245              output += "%17s" % self.residue_type
00246         # second case: ss-DNA types of records
00247         elif self.residue_type.find("-") >= 0:
00248             output += "%7s" % self.residue_type
00249             output += " " * 10 # spaces for circular
00250         else:
00251             output += " " * 3 # spaces for stuff like ss-
00252             output += "%-4s" % self.residue_type
00253             output += " " * 10 # spaces for circular
00254 
00255         output += " " * 2
00256         output += "%3s" % self.data_file_division
00257         output += " " * 7 # spaces for 56-63
00258         output += "%11s" % self.date
00259         output += "\n"
00260         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._nid_line (   self) [private]
Output for the NID line. Use of NID is obsolete in GenBank files.

Definition at line 312 of file Record.py.

00312 
00313     def _nid_line(self):
00314         """Output for the NID line. Use of NID is obsolete in GenBank files.
00315         """
00316         if self.nid:
00317             output = Record.BASE_FORMAT % "NID"
00318             output += "%s\n" % self.nid
00319         else:
00320             output = ""
00321         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._organism_line (   self) [private]
Output for ORGANISM line with taxonomy info.

Definition at line 376 of file Record.py.

00376 
00377     def _organism_line(self):
00378         """Output for ORGANISM line with taxonomy info.
00379         """
00380         output = Record.INTERNAL_FORMAT % "ORGANISM"
00381         # Now that species names can be too long, this line can wrap (Bug 2591)
00382         output += _wrapped_genbank(self.organism, Record.GB_BASE_INDENT)
00383         output += " " * Record.GB_BASE_INDENT
00384         taxonomy_info = ""
00385         for tax in self.taxonomy:
00386             taxonomy_info += "%s; " % tax
00387         # replace the ; at the end with a period
00388         taxonomy_info = taxonomy_info[:-2]
00389         taxonomy_info += "."
00390         output += _wrapped_genbank(taxonomy_info, Record.GB_BASE_INDENT)
00391 
00392         return output
            

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._origin_line (   self) [private]
Output for the ORIGIN line

Definition at line 438 of file Record.py.

00438 
00439     def _origin_line(self):
00440         """Output for the ORIGIN line
00441         """
00442         output = ""
00443         # only output the ORIGIN line if we have a sequence
00444         if self.sequence:
00445             output += Record.BASE_FORMAT % "ORIGIN"
00446             if self.origin:
00447                 output += _wrapped_genbank(self.origin,
00448                                            Record.GB_BASE_INDENT)
00449             else:
00450                 output += "\n"
00451         return output

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._pid_line (   self) [private]
Output for PID line. Presumedly, PID usage is also obsolete.

Definition at line 322 of file Record.py.

00322 
00323     def _pid_line(self):
00324         """Output for PID line. Presumedly, PID usage is also obsolete.
00325         """
00326         if self.pid:
00327             output = Record.BASE_FORMAT % "PID"
00328             output += "%s\n" % self.pid
00329         else:
00330             output = ""
00331         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._project_line (   self) [private]

Definition at line 297 of file Record.py.

00297 
00298     def _project_line(self):
00299         output = ""
00300         if len(self.projects) > 0:
00301             output = Record.BASE_FORMAT % "PROJECT"
00302             output += "%s\n" % "  ".join(self.projects)
00303         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._segment_line (   self) [private]
Output for the SEGMENT line.

Definition at line 360 of file Record.py.

00360 
00361     def _segment_line(self):
00362         """Output for the SEGMENT line.
00363         """
00364         output = ""
00365         if self.segment:
00366             output += Record.BASE_FORMAT % "SEGMENT"
00367             output += _wrapped_genbank(self.segment, Record.GB_BASE_INDENT)
00368         return output

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._sequence_line (   self) [private]
Output for all of the sequence.

Definition at line 452 of file Record.py.

00452 
00453     def _sequence_line(self):
00454         """Output for all of the sequence.
00455         """
00456         output = ""
00457         if self.sequence:
00458             cur_seq_pos = 0
00459             while cur_seq_pos < len(self.sequence):
00460                 output += Record.SEQUENCE_FORMAT % str(cur_seq_pos + 1)
00461 
00462                 for section in range(6):
00463                     start_pos = cur_seq_pos + section * 10
00464                     end_pos = start_pos + 10
00465                     seq_section = self.sequence[start_pos:end_pos]
00466                     output += " %s" % seq_section.lower()
00467 
00468                     # stop looping if we are out of sequence
00469                     if end_pos > len(self.sequence):
00470                         break
00471                 
00472                 output += "\n"
00473                 cur_seq_pos += 60
00474         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._source_line (   self) [private]
Output for SOURCE line on where the sample came from.

Definition at line 369 of file Record.py.

00369 
00370     def _source_line(self):
00371         """Output for SOURCE line on where the sample came from.
00372         """
00373         output = Record.BASE_FORMAT % "SOURCE"
00374         output += _wrapped_genbank(self.source, Record.GB_BASE_INDENT)
00375         return output
    

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._version_line (   self) [private]
Output for the VERSION line.

Definition at line 285 of file Record.py.

00285 
00286     def _version_line(self):
00287         """Output for the VERSION line.
00288         """
00289         if self.version:
00290             output = Record.BASE_FORMAT % "VERSION"
00291             output += self.version
00292             output += "  GI:"
00293             output += "%s\n" % self.gi
00294         else:
00295             output = ""
00296         return output

Here is the caller graph for this function:

def Bio.GenBank.Record.Record._wgs_line (   self) [private]

Definition at line 475 of file Record.py.

00475 
00476     def _wgs_line(self):
00477             output = ""
00478             if self.wgs:
00479                     output += Record.BASE_FORMAT % "WGS"
00480                     output += self.wgs
00481             return output

Here is the caller graph for this function:

Definition at line 482 of file Record.py.

00482 
00483     def _wgs_scafld_line(self):
00484             output = ""
00485             if self.wgs_scafld:
00486                     output += Record.BASE_FORMAT % "WGS_SCAFLD"
00487                     output += self.wgs_scafld
00488             return output
        

Here is the caller graph for this function:


Member Data Documentation

Definition at line 163 of file Record.py.

Definition at line 179 of file Record.py.

Definition at line 150 of file Record.py.

string Bio.GenBank.Record.Record.BASE_FORMAT = "%-" [static]

Definition at line 143 of file Record.py.

Definition at line 177 of file Record.py.

Definition at line 182 of file Record.py.

Definition at line 160 of file Record.py.

Definition at line 161 of file Record.py.

Definition at line 169 of file Record.py.

Definition at line 168 of file Record.py.

Definition at line 162 of file Record.py.

Definition at line 178 of file Record.py.

Definition at line 136 of file Record.py.

Definition at line 137 of file Record.py.

Definition at line 140 of file Record.py.

Definition at line 138 of file Record.py.

Definition at line 135 of file Record.py.

Definition at line 139 of file Record.py.

Definition at line 141 of file Record.py.

Definition at line 170 of file Record.py.

Definition at line 151 of file Record.py.

Definition at line 144 of file Record.py.

Definition at line 171 of file Record.py.

Definition at line 157 of file Record.py.

Definition at line 164 of file Record.py.

Definition at line 174 of file Record.py.

Definition at line 180 of file Record.py.

Definition at line 146 of file Record.py.

Definition at line 165 of file Record.py.

Definition at line 183 of file Record.py.

Definition at line 167 of file Record.py.

Definition at line 176 of file Record.py.

Definition at line 159 of file Record.py.

Definition at line 172 of file Record.py.

Definition at line 181 of file Record.py.

Definition at line 154 of file Record.py.

Definition at line 158 of file Record.py.

Definition at line 173 of file Record.py.

Definition at line 175 of file Record.py.

Definition at line 166 of file Record.py.

Definition at line 184 of file Record.py.

Definition at line 185 of file Record.py.


The documentation for this class was generated from the following file: