Back to index

python-biopython  1.60
Classes | Functions | Variables
Bio.SeqIO.FastaIO Namespace Reference

Classes

class  FastaWriter

Functions

def FastaIterator
def genbank_name_function
def print_record

Variables

string fna_filename = "NC_005213.fna"
string faa_filename = "NC_005213.faa"
tuple iterator = FastaIterator(open(fna_filename, "r"), alphabet=generic_nucleotide, title2ids=genbank_name_function)
int count = 0

Function Documentation

def Bio.SeqIO.FastaIO.FastaIterator (   handle,
  alphabet = single_letter_alphabet,
  title2ids = None 
)
Generator function to iterate over Fasta records (as SeqRecord objects).

handle - input file
alphabet - optional alphabet
title2ids - A function that, when given the title of the FASTA
file (without the beginning >), will return the id, name and
description (in that order) for the record as a tuple of strings.

If this is not given, then the entire title line will be used
as the description, and the first word as the id and name.

Note that use of title2ids matches that of Bio.Fasta.SequenceParser
but the defaults are slightly different.

Definition at line 20 of file FastaIO.py.

00020 
00021 def FastaIterator(handle, alphabet = single_letter_alphabet, title2ids = None):
00022     """Generator function to iterate over Fasta records (as SeqRecord objects).
00023 
00024     handle - input file
00025     alphabet - optional alphabet
00026     title2ids - A function that, when given the title of the FASTA
00027     file (without the beginning >), will return the id, name and
00028     description (in that order) for the record as a tuple of strings.
00029 
00030     If this is not given, then the entire title line will be used
00031     as the description, and the first word as the id and name.
00032 
00033     Note that use of title2ids matches that of Bio.Fasta.SequenceParser
00034     but the defaults are slightly different.
00035     """
00036     #Skip any text before the first record (e.g. blank lines, comments)
00037     while True:
00038         line = handle.readline()
00039         if line == "" : return #Premature end of file, or just empty?
00040         if line[0] == ">":
00041             break
00042 
00043     while True:
00044         if line[0]!=">":
00045             raise ValueError("Records in Fasta files should start with '>' character")
00046         if title2ids:
00047             id, name, descr = title2ids(line[1:].rstrip())
00048         else:
00049             descr = line[1:].rstrip()
00050             try:
00051                 id = descr.split()[0]
00052             except IndexError:
00053                 assert not descr, repr(line)
00054                 #Should we use SeqRecord default for no ID?
00055                 id = ""
00056             name = id
00057 
00058         lines = []
00059         line = handle.readline()
00060         while True:
00061             if not line : break
00062             if line[0] == ">": break
00063             lines.append(line.rstrip())
00064             line = handle.readline()
00065 
00066         #Remove trailing whitespace, and any internal spaces
00067         #(and any embedded \r which are possible in mangled files
00068         #when not opened in universal read lines mode)
00069         result = "".join(lines).replace(" ", "").replace("\r", "")
00070 
00071         #Return the record and then continue...
00072         yield SeqRecord(Seq(result, alphabet),
00073                          id = id, name = name, description = descr)
00074 
00075         if not line : return #StopIteration
00076     assert False, "Should not reach this line"

Here is the caller graph for this function:

Definition at line 165 of file FastaIO.py.

00165 
00166     def genbank_name_function(text):
00167         text, descr = text.split(None,1)
00168         id = text.split("|")[3]
00169         name = id.split(".",1)[0]
00170         return id, name, descr

Definition at line 171 of file FastaIO.py.

00171 
00172     def print_record(record):
00173         #See also bug 2057
00174         #http://bugzilla.open-bio.org/show_bug.cgi?id=2057
00175         print "ID:" + record.id
00176         print "Name:" + record.name
00177         print "Descr:" + record.description
00178         print record.seq
00179         for feature in record.annotations:
00180             print '/%s=%s' % (feature, record.annotations[feature])
00181         if record.dbxrefs:
00182             print "Database cross references:"
00183             for x in record.dbxrefs : print " - %s" % x


Variable Documentation

Definition at line 188 of file FastaIO.py.

string Bio.SeqIO.FastaIO.faa_filename = "NC_005213.faa"

Definition at line 163 of file FastaIO.py.

string Bio.SeqIO.FastaIO.fna_filename = "NC_005213.fna"

Definition at line 162 of file FastaIO.py.

tuple Bio.SeqIO.FastaIO.iterator = FastaIterator(open(fna_filename, "r"), alphabet=generic_nucleotide, title2ids=genbank_name_function)

Definition at line 187 of file FastaIO.py.