Back to index

python-biopython  1.60
Public Member Functions | Private Attributes
Bio.SeqIO._index._IndexedSeqFileDict Class Reference
Inheritance diagram for Bio.SeqIO._index._IndexedSeqFileDict:
Inheritance graph
[legend]

List of all members.

Public Member Functions

def __init__
def __repr__
def __str__
def __contains__
def __len__
def values
def items
def keys
def itervalues
def iteritems
def iterkeys
def items
def values
def keys
def __iter__
def __getitem__
def get
def get_raw
def __setitem__
def update
def pop
def popitem
def clear
def fromkeys
def copy

Private Attributes

 _proxy
 _key_function
 _offsets

Detailed Description

Read only dictionary interface to a sequential sequence file.

Keeps the keys and associated file offsets in memory, reads the file to
access entries as SeqRecord objects using Bio.SeqIO for parsing them.
This approach is memory limited, but will work even with millions of
sequences.

Note - as with the Bio.SeqIO.to_dict() function, duplicate keys
(record identifiers by default) are not allowed. If this happens,
a ValueError exception is raised.

By default the SeqRecord's id string is used as the dictionary
key. This can be changed by suppling an optional key_function,
a callback function which will be given the record id and must
return the desired key. For example, this allows you to parse
NCBI style FASTA identifiers, and extract the GI number to use
as the dictionary key.

Note that this dictionary is essentially read only. You cannot
add or change values, pop values, nor clear the dictionary.

Definition at line 52 of file _index.py.


Constructor & Destructor Documentation

def Bio.SeqIO._index._IndexedSeqFileDict.__init__ (   self,
  filename,
  format,
  alphabet,
  key_function 
)

Definition at line 74 of file _index.py.

00074 
00075     def __init__(self, filename, format, alphabet, key_function):
00076         #Use key_function=None for default value
00077         try:
00078             proxy_class = _FormatToRandomAccess[format]
00079         except KeyError:
00080             raise ValueError("Unsupported format '%s'" % format)
00081         random_access_proxy = proxy_class(filename, format, alphabet)
00082         self._proxy = random_access_proxy
00083         self._key_function = key_function
00084         if key_function:
00085             offset_iter = ((key_function(k),o,l) for (k,o,l) in random_access_proxy)
00086         else:
00087             offset_iter = random_access_proxy
00088         offsets = {}
00089         for key, offset, length in offset_iter:
00090             #Note - we don't store the length because I want to minimise the
00091             #memory requirements. With the SQLite backend the length is kept
00092             #and is used to speed up the get_raw method (by about 3 times).
00093             #The length should be provided by all the current backends except
00094             #SFF where there is an existing Roche index we can reuse (very fast
00095             #but lacks the record lengths)
00096             #assert length or format in ["sff", "sff-trim"], \
00097             #       "%s at offset %i given length %r (%s format %s)" \
00098             #       % (key, offset, length, filename, format)
00099             if key in offsets:
00100                 self._proxy._handle.close()
00101                 raise ValueError("Duplicate key '%s'" % key)
00102             else:
00103                 offsets[key] = offset
00104         self._offsets = offsets
    

Member Function Documentation

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 116 of file _index.py.

00116 
00117     def __contains__(self, key) :
00118         return key in self._offsets
        
x.__getitem__(y) <==> x[y]

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 188 of file _index.py.

00188 
00189     def __getitem__(self, key):
00190         """x.__getitem__(y) <==> x[y]"""
00191         #Pass the offset to the proxy
00192         record = self._proxy.get(self._offsets[key])
00193         if self._key_function:
00194             key2 = self._key_function(record.id)
00195         else:
00196             key2 = record.id
00197         if key != key2:
00198             raise ValueError("Key did not match (%s vs %s)" % (key, key2))
00199         return record

Here is the caller graph for this function:

Iterate over the keys.

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 184 of file _index.py.

00184 
00185     def __iter__(self):
00186         """Iterate over the keys."""
00187         return iter(self._offsets)
        

Here is the caller graph for this function:

How many records are there?

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 119 of file _index.py.

00119 
00120     def __len__(self):
00121         """How many records are there?"""
00122         return len(self._offsets)

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 105 of file _index.py.

00105 
00106     def __repr__(self):
00107         return "SeqIO.index(%r, %r, alphabet=%r, key_function=%r)" \
00108                % (self._proxy._handle.name, self._proxy._format,
00109                   self._proxy._alphabet, self._key_function)

def Bio.SeqIO._index._IndexedSeqFileDict.__setitem__ (   self,
  key,
  value 
)
Would allow setting or replacing records, but not implemented.

Definition at line 220 of file _index.py.

00220 
00221     def __setitem__(self, key, value):
00222         """Would allow setting or replacing records, but not implemented."""
00223         raise NotImplementedError("An indexed a sequence file is read only.")
    

Definition at line 110 of file _index.py.

00110 
00111     def __str__(self):
00112         if self:
00113             return "{%s : SeqRecord(...), ...}" % repr(self.keys()[0])
00114         else:
00115             return "{}"

Here is the call graph for this function:

Would clear dictionary, but not implemented.

Definition at line 238 of file _index.py.

00238 
00239     def clear(self):
00240         """Would clear dictionary, but not implemented."""
00241         raise NotImplementedError("An indexed a sequence file is read only.")

A dictionary method which we don't implement.

Definition at line 247 of file _index.py.

00247 
00248     def copy(self):
00249         """A dictionary method which we don't implement."""
00250         raise NotImplementedError("An indexed a sequence file doesn't "
00251                                   "support this.")
00252 

def Bio.SeqIO._index._IndexedSeqFileDict.fromkeys (   self,
  keys,
  value = None 
)
A dictionary method which we don't implement.

Definition at line 242 of file _index.py.

00242 
00243     def fromkeys(self, keys, value=None):
00244         """A dictionary method which we don't implement."""
00245         raise NotImplementedError("An indexed a sequence file doesn't "
00246                                   "support this.")

def Bio.SeqIO._index._IndexedSeqFileDict.get (   self,
  k,
  d = None 
)
D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 200 of file _index.py.

00200 
00201     def get(self, k, d=None):
00202         """D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None."""
00203         try:
00204             return self.__getitem__(k)
00205         except KeyError:
00206             return d

Here is the call graph for this function:

Similar to the get method, but returns the record as a raw string.

If the key is not found, a KeyError exception is raised.

Note that on Python 3 a bytes string is returned, not a typical
unicode string.

NOTE - This functionality is not supported for every file format.

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 207 of file _index.py.

00207 
00208     def get_raw(self, key):
00209         """Similar to the get method, but returns the record as a raw string.
00210 
00211         If the key is not found, a KeyError exception is raised.
00212 
00213         Note that on Python 3 a bytes string is returned, not a typical
00214         unicode string.
00215 
00216         NOTE - This functionality is not supported for every file format.
00217         """
00218         #Pass the offset to the proxy
00219         return self._proxy.get_raw(self._offsets[key])

Here is the caller graph for this function:

Would be a list of the (key, SeqRecord) tuples, but not implemented.

In general you can be indexing very very large files, with millions
of sequences. Loading all these into memory at once as SeqRecord
objects would (probably) use up all the RAM. Therefore we simply
don't support this dictionary method.

Definition at line 137 of file _index.py.

00137 
00138         def items(self):
00139             """Would be a list of the (key, SeqRecord) tuples, but not implemented.
00140 
00141             In general you can be indexing very very large files, with millions
00142             of sequences. Loading all these into memory at once as SeqRecord
00143             objects would (probably) use up all the RAM. Therefore we simply
00144             don't support this dictionary method.
00145             """
00146             raise NotImplementedError("Due to memory concerns, when indexing a "
00147                                       "sequence file you cannot access all the "
00148                                       "records at once.")

Here is the caller graph for this function:

Iterate over the (key, SeqRecord) items.

Definition at line 170 of file _index.py.

00170 
00171         def items(self):
00172             """Iterate over the (key, SeqRecord) items."""
00173             for key in self.__iter__():
00174                 yield key, self.__getitem__(key)

Here is the caller graph for this function:

Iterate over the (key, SeqRecord) items.

Definition at line 159 of file _index.py.

00159 
00160         def iteritems(self):
00161             """Iterate over the (key, SeqRecord) items."""
00162             for key in self.__iter__():
00163                 yield key, self.__getitem__(key)
        

Here is the caller graph for this function:

Iterate over the keys.

Definition at line 164 of file _index.py.

00164 
00165         def iterkeys(self):
00166             """Iterate over the keys."""
00167             return self.__iter__()

Here is the call graph for this function:

Iterate over the SeqRecord) items.

Definition at line 154 of file _index.py.

00154 
00155         def itervalues(self):
00156             """Iterate over the SeqRecord) items."""
00157             for key in self.__iter__():
00158                 yield self.__getitem__(key)

Here is the caller graph for this function:

Return a list of all the keys (SeqRecord identifiers).

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 149 of file _index.py.

00149 
00150         def keys(self) :
00151             """Return a list of all the keys (SeqRecord identifiers)."""
00152             #TODO - Stick a warning in here for large lists? Or just refuse?
00153             return self._offsets.keys()

Here is the caller graph for this function:

Iterate over the keys.

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 180 of file _index.py.

00180 
00181         def keys(self):
00182             """Iterate over the keys."""
00183             return self.__iter__()

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.SeqIO._index._IndexedSeqFileDict.pop (   self,
  key,
  default = None 
)
Would remove specified record, but not implemented.

Definition at line 229 of file _index.py.

00229 
00230     def pop(self, key, default=None):
00231         """Would remove specified record, but not implemented."""
00232         raise NotImplementedError("An indexed a sequence file is read only.")
    
Would remove and return a SeqRecord, but not implemented.

Definition at line 233 of file _index.py.

00233 
00234     def popitem(self):
00235         """Would remove and return a SeqRecord, but not implemented."""
00236         raise NotImplementedError("An indexed a sequence file is read only.")
00237 
    
def Bio.SeqIO._index._IndexedSeqFileDict.update (   self,
  args,
  kwargs 
)
Would allow adding more values, but not implemented.

Definition at line 224 of file _index.py.

00224 
00225     def update(self, *args, **kwargs):
00226         """Would allow adding more values, but not implemented."""
00227         raise NotImplementedError("An indexed a sequence file is read only.")
00228 
    

Here is the caller graph for this function:

Would be a list of the SeqRecord objects, but not implemented.

In general you can be indexing very very large files, with millions
of sequences. Loading all these into memory at once as SeqRecord
objects would (probably) use up all the RAM. Therefore we simply
don't support this dictionary method.

Definition at line 125 of file _index.py.

00125 
00126         def values(self):
00127             """Would be a list of the SeqRecord objects, but not implemented.
00128 
00129             In general you can be indexing very very large files, with millions
00130             of sequences. Loading all these into memory at once as SeqRecord
00131             objects would (probably) use up all the RAM. Therefore we simply
00132             don't support this dictionary method.
00133             """
00134             raise NotImplementedError("Due to memory concerns, when indexing a "
00135                                       "sequence file you cannot access all the "
00136                                       "records at once.")

Here is the caller graph for this function:

Iterate over the SeqRecord items.

Definition at line 175 of file _index.py.

00175 
00176         def values(self):
00177             """Iterate over the SeqRecord items."""
00178             for key in self.__iter__():
00179                 yield self.__getitem__(key)


Member Data Documentation

Reimplemented in Bio.SeqIO._index._SQLiteManySeqFilesDict.

Definition at line 82 of file _index.py.

Definition at line 103 of file _index.py.

Definition at line 81 of file _index.py.


The documentation for this class was generated from the following file: