Back to index

python-biopython  1.60
Public Member Functions
Bio.SeqIO._index.FastqRandomAccess Class Reference
Inheritance diagram for Bio.SeqIO._index.FastqRandomAccess:
Inheritance graph
[legend]
Collaboration diagram for Bio.SeqIO._index.FastqRandomAccess:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __iter__
def get_raw
def get

Detailed Description

Random access to a FASTQ file (any supported variant).

With FASTQ the records all start with a "@" line, but so can quality lines.
Note this will cope with line-wrapped FASTQ files.

Definition at line 988 of file _index.py.


Member Function Documentation

Returns (id,offset) tuples.

Reimplemented from Bio.SeqIO._index.SeqFileRandomAccess.

Definition at line 994 of file _index.py.

00994 
00995     def __iter__(self):
00996         handle = self._handle
00997         handle.seek(0)
00998         id = None
00999         start_offset = handle.tell()
01000         line = handle.readline()
01001         if not line:
01002             #Empty file!
01003             return
01004         at_char = _as_bytes("@")
01005         plus_char = _as_bytes("+")
01006         if line[0:1] != at_char:
01007             raise ValueError("Problem with FASTQ @ line:\n%s" % repr(line))
01008         while line:
01009             #assert line[0]=="@"
01010             #This record seems OK (so far)
01011             id = line[1:].rstrip().split(None, 1)[0]
01012             #Find the seq line(s)
01013             seq_len = 0
01014             length = len(line)
01015             while line:
01016                 line = handle.readline()
01017                 length += len(line)
01018                 if line.startswith(plus_char) : break
01019                 seq_len += len(line.strip())
01020             if not line:
01021                 raise ValueError("Premature end of file in seq section")
01022             #assert line[0]=="+"
01023             #Find the qual line(s)
01024             qual_len = 0
01025             while line:
01026                 if seq_len == qual_len:
01027                     #Should be end of record...
01028                     end_offset = handle.tell()
01029                     line = handle.readline()
01030                     if line and line[0:1] != at_char:
01031                         ValueError("Problem with line %s" % repr(line))
01032                     break
01033                 else:
01034                     line = handle.readline()
01035                     qual_len += len(line.strip())
01036                     length += len(line)
01037             if seq_len != qual_len:
01038                 raise ValueError("Problem with quality section")
01039             yield _bytes_to_string(id), start_offset, length
01040             start_offset = end_offset
01041         #print "EOF"

Here is the call graph for this function:

def Bio.SeqIO._index.SeqFileRandomAccess.get (   self,
  offset 
) [inherited]
Returns SeqRecord.

Reimplemented in Bio.SeqIO._index.UniprotRandomAccess, Bio.SeqIO._index.SffTrimedRandomAccess, and Bio.SeqIO._index.SffRandomAccess.

Definition at line 540 of file _index.py.

00540 
00541     def get(self, offset):
00542         """Returns SeqRecord."""
00543         #Should be overriden for binary file formats etc:
00544         return self._parse(StringIO(_bytes_to_string(self.get_raw(offset))))

Here is the call graph for this function:

def Bio.SeqIO._index.FastqRandomAccess.get_raw (   self,
  offset 
)
Similar to the get method, but returns the record as a raw string.

Reimplemented from Bio.SeqIO._index.SeqFileRandomAccess.

Definition at line 1042 of file _index.py.

01042 
01043     def get_raw(self, offset):
01044         """Similar to the get method, but returns the record as a raw string."""
01045         #TODO - Refactor this and the __init__ method to reduce code duplication?
01046         handle = self._handle
01047         handle.seek(offset)
01048         line = handle.readline()
01049         data = line
01050         at_char = _as_bytes("@")
01051         plus_char = _as_bytes("+")
01052         if line[0:1] != at_char:
01053             raise ValueError("Problem with FASTQ @ line:\n%s" % repr(line))
01054         identifier = line[1:].rstrip().split(None, 1)[0]
01055         #Find the seq line(s)
01056         seq_len = 0
01057         while line:
01058             line = handle.readline()
01059             data += line
01060             if line.startswith(plus_char) : break
01061             seq_len += len(line.strip())
01062         if not line:
01063             raise ValueError("Premature end of file in seq section")
01064         assert line[0:1] == plus_char
01065         #Find the qual line(s)
01066         qual_len = 0
01067         while line:
01068             if seq_len == qual_len:
01069                 #Should be end of record...
01070                 pos = handle.tell()
01071                 line = handle.readline()
01072                 if line and line[0:1] != at_char:
01073                     ValueError("Problem with line %s" % repr(line))
01074                 break
01075             else:
01076                 line = handle.readline()
01077                 data += line
01078                 qual_len += len(line.strip())
01079         if seq_len != qual_len:
01080             raise ValueError("Problem with quality section")
01081         return data
01082 

Here is the call graph for this function:


The documentation for this class was generated from the following file: