Back to index

python-biopython  1.60
Classes | Functions | Variables
Bio.Sequencing.Phd Namespace Reference

Classes

class  Record

Functions

def read
def parse

Variables

list CKEYWORDS

Function Documentation

def Bio.Sequencing.Phd.parse (   handle)
Iterates over a file returning multiple PHD records.

The data is read line by line from the handle. The handle can be a list
of lines, an open file, or similar; the only requirement is that we can
iterate over the handle to retrieve lines from it.

Typical usage:

records = parse(handle)
for record in records:
    # do something with the record object

Definition at line 122 of file Phd.py.

00122 
00123 def parse(handle):
00124     """Iterates over a file returning multiple PHD records.
00125 
00126     The data is read line by line from the handle. The handle can be a list
00127     of lines, an open file, or similar; the only requirement is that we can
00128     iterate over the handle to retrieve lines from it.
00129 
00130     Typical usage:
00131 
00132     records = parse(handle)
00133     for record in records:
00134         # do something with the record object
00135     """
00136     while True:
00137         record = read(handle)
00138         if not record:
00139             return
00140         yield record

Here is the call graph for this function:

def Bio.Sequencing.Phd.read (   handle)
Reads the next PHD record from the file, returning it as a Record object.

This function reads PHD file data line by line from the handle,
and returns a single Record object.

Definition at line 38 of file Phd.py.

00038 
00039 def read(handle):
00040     """Reads the next PHD record from the file, returning it as a Record object.
00041 
00042     This function reads PHD file data line by line from the handle,
00043     and returns a single Record object.
00044     """
00045     for line in handle:
00046         if line.startswith("BEGIN_SEQUENCE"):
00047             record = Record()
00048             record.file_name = line[15:].rstrip() 
00049             break
00050     else:
00051         return # No record found
00052 
00053     for line in handle:
00054         if line.startswith("BEGIN_COMMENT"):
00055             break
00056     else:
00057         raise ValueError("Failed to find BEGIN_COMMENT line")
00058        
00059     for line in handle:
00060         line = line.strip()
00061         if not line:
00062             continue
00063         if line=="END_COMMENT":
00064             break
00065         keyword, value = line.split(":", 1)
00066         keyword = keyword.lower()
00067         value = value.strip()
00068         if keyword in ('chromat_file',
00069                        'phred_version',
00070                        'call_method',
00071                        'chem',
00072                        'dye',
00073                        'time',
00074                        'basecaller_version',
00075                        'trace_processor_version'):
00076             record.comments[keyword] = value
00077         elif keyword in ('abi_thumbprint',
00078                          'quality_levels',
00079                          'trace_array_min_index',
00080                          'trace_array_max_index'):
00081             record.comments[keyword] = int(value)
00082         elif keyword=='trace_peak_area_ratio':
00083             record.comments[keyword] = float(value)
00084         elif keyword=='trim':
00085             first, last, prob = value.split()
00086             record.comments[keyword] = (int(first), int(last), float(prob))
00087     else:
00088         raise ValueError("Failed to find END_COMMENT line")
00089 
00090     for line in handle:
00091         if line.startswith('BEGIN_DNA'):
00092             break
00093     else:
00094         raise ValueError("Failed to find BEGIN_DNA line")
00095 
00096     for line in handle:
00097         if line.startswith('END_DNA'):
00098             break
00099         else:
00100             # Line is: "site quality peak_location"
00101             # Peak location is optional according to
00102             # David Gordon (the Consed author)
00103             parts = line.split()
00104             if len(parts) in [2,3]:
00105                 record.sites.append(tuple(parts))
00106             else:
00107                 raise ValueError("DNA line must contain a base and quality "
00108                                  "score, and optionally a peak location.")
00109 
00110     for line in handle:
00111         if line.startswith("END_SEQUENCE"):
00112             break
00113     else:
00114         raise ValueError("Failed to find END_SEQUENCE line")
00115 
00116     record.seq = Seq.Seq(''.join([n[0] for n in record.sites]), generic_dna)
00117     if record.comments['trim'] is not None:
00118         first, last = record.comments['trim'][:2]
00119         record.seq_trimmed = record.seq[first:last]
00120 
00121     return record

Here is the caller graph for this function:


Variable Documentation

Initial value:
00001 ['CHROMAT_FILE','ABI_THUMBPRINT','PHRED_VERSION','CALL_METHOD',\
00002         'QUALITY_LEVELS','TIME','TRACE_ARRAY_MIN_INDEX','TRACE_ARRAY_MAX_INDEX',\
00003         'TRIM','TRACE_PEAK_AREA_RATIO','CHEM','DYE']

Definition at line 22 of file Phd.py.