Back to index

python-biopython  1.60
Functions | Variables
Bio.TogoWS Namespace Reference

Functions

def _get_fields
def _get_entry_dbs
def _get_entry_fields
def _get_entry_formats
def _get_convert_formats
def entry
def search_count
def search_iter
def search
def convert
def _open

Variables

string _BASE_URL = "http://togows.dbcls.jp"
 _search_db_names = None
 _entry_db_names = None
dictionary _entry_db_fields = {}
dictionary _entry_db_formats = {}
list _convert_formats = []

Function Documentation

Definition at line 64 of file __init__.py.

00064 
00065 def _get_convert_formats():
00066     return [pair.split(".") for pair in \
00067             _get_fields(_BASE_URL + "/convert/")]

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS._get_entry_dbs ( ) [private]

Definition at line 55 of file __init__.py.

00055 
00056 def _get_entry_dbs():
00057     return _get_fields(_BASE_URL + "/entry")

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS._get_entry_fields (   db) [private]

Definition at line 58 of file __init__.py.

00058 
00059 def _get_entry_fields(db):
00060     return _get_fields(_BASE_URL + "/entry/%s?fields" % db)

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS._get_entry_formats (   db) [private]

Definition at line 61 of file __init__.py.

00061 
00062 def _get_entry_formats(db):
00063     return _get_fields(_BASE_URL + "/entry/%s?formats" % db)

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS._get_fields (   url) [private]
Queries a TogoWS URL for a plain text list of values (PRIVATE).

Definition at line 48 of file __init__.py.

00048 
00049 def _get_fields(url):
00050     """Queries a TogoWS URL for a plain text list of values (PRIVATE)."""
00051     handle = _open(url)
00052     fields = handle.read().strip().split()
00053     handle.close()
00054     return fields

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS._open (   url,
  post = None 
) [private]
Helper function to build the URL and open a handle to it (PRIVATE).

Open a handle to TogoWS, will raise an IOError if it encounters an error.

In the absense of clear guidelines, this function enforces a limit of
"up to three queries per second" to avoid abusing the TogoWS servers.

Definition at line 289 of file __init__.py.

00289 
00290 def _open(url, post=None):
00291     """Helper function to build the URL and open a handle to it (PRIVATE).
00292 
00293     Open a handle to TogoWS, will raise an IOError if it encounters an error.
00294 
00295     In the absense of clear guidelines, this function enforces a limit of
00296     "up to three queries per second" to avoid abusing the TogoWS servers.
00297     """
00298     delay = 0.333333333 #one third of a second
00299     current = time.time()
00300     wait = _open.previous + delay - current
00301     if wait > 0:
00302         time.sleep(wait)
00303         _open.previous = current + wait
00304     else:
00305         _open.previous = current
00306 
00307     #print url
00308     try:
00309         if post:
00310             handle = urllib2.urlopen(url, _as_bytes(urllib.urlencode(post)))
00311         else:
00312             handle = urllib2.urlopen(url)
00313     except urllib2.HTTPError, exception:
00314         raise exception
00315 
00316     #We now trust TogoWS to have set an HTTP error code, that
00317     #suffices for my current unit tests. Previously we would
00318     #examine the start of the data returned back.
00319     return _binary_to_string_handle(handle)
00320 
00321 _open.previous = 0

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS.convert (   data,
  in_format,
  out_format 
)
TogoWS convert (returns a handle).

data - string or handle containing input record(s)
in_format - string describing the input file format (e.g. "genbank")
out_format - string describing the requested output format (e.g. "fasta")

For a list of supported conversions (e.g. "genbank" to "fasta"), see
http://togows.dbcls.jp/convert/

Note that Biopython has built in support for conversion of sequence and
alignnent file formats (functions Bio.SeqIO.convert and Bio.AlignIO.convert)

Definition at line 261 of file __init__.py.

00261 
00262 def convert(data, in_format, out_format):
00263     """TogoWS convert (returns a handle).
00264     
00265     data - string or handle containing input record(s)
00266     in_format - string describing the input file format (e.g. "genbank")
00267     out_format - string describing the requested output format (e.g. "fasta")
00268 
00269     For a list of supported conversions (e.g. "genbank" to "fasta"), see
00270     http://togows.dbcls.jp/convert/
00271     
00272     Note that Biopython has built in support for conversion of sequence and
00273     alignnent file formats (functions Bio.SeqIO.convert and Bio.AlignIO.convert)
00274     """
00275     global _convert_formats
00276     if not _convert_formats:
00277         _convert_formats = _get_convert_formats()
00278     if [in_format, out_format] not in _convert_formats:
00279         msg = "\n".join("%s -> %s" % tuple(pair) for pair in _convert_formats)
00280         raise ValueError("Unsupported conversion. Choose from:\n%s" % msg)
00281     url = _BASE_URL + "/convert/%s.%s" % (in_format, out_format)
00282     #TODO - Should we just accept a string not a handle? What about a filename?
00283     if hasattr(data, "read"):
00284         #Handle
00285         return _open(url, post={"data":data.read()})
00286     else:
00287         #String
00288         return _open(url, post={"data":data})

Here is the call graph for this function:

def Bio.TogoWS.entry (   db,
  id,
  format = None,
  field = None 
)
TogoWS fetch entry (returns a handle).

db - database (string), see list below.
id - identier (string) or a list of identifiers (either as a list of
     strings or a single string with comma separators).
format - return data file format (string), options depend on the database
         e.g. "xml", "json", "gff", "fasta", "ttl" (RDF Turtle)
field - specific field from within the database record (string)
        e.g. "au" or "authors" for pubmed.

At the time of writing, this includes the following:

KEGG: compound, drug, enzyme, genes, glycan, orthology, reaction,
      module, pathway
DDBj: ddbj, dad, pdb
NCBI: nuccore, nucest, nucgss, nucleotide, protein, gene, onim,
      homologue, snp, mesh, pubmed
EBI:  embl, uniprot, uniparc, uniref100, uniref90, uniref50

For the current list, please see http://togows.dbcls.jp/entry/

This function is essentially equivalent to the NCBI Entrez service
EFetch, available in Biopython as Bio.Entrez.efetch(...), but that
does not offer field extraction.

Definition at line 68 of file __init__.py.

00068 
00069 def entry(db, id, format=None, field=None):
00070     """TogoWS fetch entry (returns a handle).
00071 
00072     db - database (string), see list below.
00073     id - identier (string) or a list of identifiers (either as a list of
00074          strings or a single string with comma separators).
00075     format - return data file format (string), options depend on the database
00076              e.g. "xml", "json", "gff", "fasta", "ttl" (RDF Turtle)
00077     field - specific field from within the database record (string)
00078             e.g. "au" or "authors" for pubmed.
00079 
00080     At the time of writing, this includes the following:
00081 
00082     KEGG: compound, drug, enzyme, genes, glycan, orthology, reaction,
00083           module, pathway
00084     DDBj: ddbj, dad, pdb
00085     NCBI: nuccore, nucest, nucgss, nucleotide, protein, gene, onim,
00086           homologue, snp, mesh, pubmed
00087     EBI:  embl, uniprot, uniparc, uniref100, uniref90, uniref50
00088 
00089     For the current list, please see http://togows.dbcls.jp/entry/
00090 
00091     This function is essentially equivalent to the NCBI Entrez service
00092     EFetch, available in Biopython as Bio.Entrez.efetch(...), but that
00093     does not offer field extraction.
00094     """
00095     global _entry_db_names, _entry_db_fields, fetch_db_formats
00096     if _entry_db_names is None:
00097         _entry_db_names = _get_entry_dbs()
00098     if db not in _entry_db_names:
00099         raise ValueError("TogoWS entry fetch does not officially support "
00100                          "database '%s'." % db)
00101     if field:
00102         try:
00103             fields = _entry_db_fields[db]
00104         except KeyError:
00105             fields = _get_entry_fields(db)
00106             _entry_db_fields[db] = fields
00107         if field not in fields:
00108             raise ValueError("TogoWS entry fetch does not explicitly support "
00109                              "field '%s' for database '%s'. Only: %s" \
00110                              % (field, db, ", ".join(sorted(fields))))
00111     if format:
00112         try:
00113             formats = _entry_db_formats[db]
00114         except KeyError:
00115             formats = _get_entry_formats(db)
00116             _entry_db_formats[db] = formats
00117         if format not in formats:
00118             raise ValueError("TogoWS entry fetch does not explicitly support "
00119                              "format '%s' for database '%s'. Only: %s" \
00120                              % (format, db, ", ".join(sorted(formats))))
00121 
00122     if isinstance(id, list):
00123         id = ",".join(id)
00124     url = _BASE_URL + "/entry/%s/%s" % (db, urllib.quote(id))
00125     if field:
00126         url += "/" + field
00127     if format:
00128         url += "." + format
00129     return _open(url)

Here is the call graph for this function:

def Bio.TogoWS.search (   db,
  query,
  offset = None,
  limit = None,
  format = None 
)
TogoWS search (returns a handle).

This is a low level wrapper for the TogoWS search function, which
can return results in a several formats. In general, the search_iter
function is more suitable for end users.

db - database (string), see http://togows.dbcls.jp/search/
query - search term (string)
offset, limit - optional integers specifying which result to start from
        (1 based) and the number of results to return.
format - return data file format (string), e.g. "json", "ttl" (RDF)
         By default plain text is returned, one result per line.

At the time of writing, TogoWS applies a default count limit of 100
search results, and this is an upper bound. To access more results,
use the offset argument or the search_iter(...) function.

TogoWS supports a long list of databases, including many from the NCBI
(e.g. "ncbi-pubmed" or "pubmed", "ncbi-genbank" or "genbank", and
"ncbi-taxonomy"), EBI (e.g. "ebi-ebml" or "embl", "ebi-uniprot" or
"uniprot, "ebi-go"), and KEGG (e.g. "kegg-compound" or "compound").
For the current list, see http://togows.dbcls.jp/search/

The NCBI provide the Entrez Search service (ESearch) which is similar,
available in Biopython as the Bio.Entrez.esearch() function.

See also the function Bio.TogoWS.search_count() which returns the number
of matches found, and the Bio.TogoWS.search_iter() function which allows
you to iterate over the search results (taking care of batching for you).

Definition at line 199 of file __init__.py.

00199 
00200 def search(db, query, offset=None, limit=None, format=None):
00201     """TogoWS search (returns a handle).
00202 
00203     This is a low level wrapper for the TogoWS search function, which
00204     can return results in a several formats. In general, the search_iter
00205     function is more suitable for end users.
00206 
00207     db - database (string), see http://togows.dbcls.jp/search/
00208     query - search term (string)
00209     offset, limit - optional integers specifying which result to start from
00210             (1 based) and the number of results to return.
00211     format - return data file format (string), e.g. "json", "ttl" (RDF)
00212              By default plain text is returned, one result per line.
00213 
00214     At the time of writing, TogoWS applies a default count limit of 100
00215     search results, and this is an upper bound. To access more results,
00216     use the offset argument or the search_iter(...) function.
00217 
00218     TogoWS supports a long list of databases, including many from the NCBI
00219     (e.g. "ncbi-pubmed" or "pubmed", "ncbi-genbank" or "genbank", and
00220     "ncbi-taxonomy"), EBI (e.g. "ebi-ebml" or "embl", "ebi-uniprot" or
00221     "uniprot, "ebi-go"), and KEGG (e.g. "kegg-compound" or "compound").
00222     For the current list, see http://togows.dbcls.jp/search/
00223 
00224     The NCBI provide the Entrez Search service (ESearch) which is similar,
00225     available in Biopython as the Bio.Entrez.esearch() function.
00226 
00227     See also the function Bio.TogoWS.search_count() which returns the number
00228     of matches found, and the Bio.TogoWS.search_iter() function which allows
00229     you to iterate over the search results (taking care of batching for you).
00230     """
00231     global _search_db_names
00232     if _search_db_names is None:
00233         _search_db_names = _get_fields(_BASE_URL + "/search")
00234     if db not in _search_db_names:
00235         #TODO - Make this a ValueError? Right now despite the HTML website
00236         #claiming to, the "gene" or "ncbi-gene" don't work and are not listed.
00237         import warnings
00238         warnings.warn("TogoWS search does not explicitly support database '%s'. "
00239                       "See %s/search/ for options." % (db, _BASE_URL))
00240     url = _BASE_URL + "/search/%s/%s" % (db, urllib.quote(query))
00241     if offset is not None and limit is not None:
00242         try:
00243             offset = int(offset)
00244         except:
00245             raise ValueError("Offset should be an integer (at least one), not %r" % offset)
00246         try:
00247             limit = int(limit)
00248         except:
00249             raise ValueError("Limit should be an integer (at least one), not %r" % limit)
00250         if offset <= 0:
00251             raise ValueError("Offset should be at least one, not %i" % offset)
00252         if limit <= 0:
00253             raise ValueError("Count should be at least one, not %i" % limit)
00254         url += "/%i,%i" % (offset, limit)
00255     elif offset is not None or limit is not None:
00256         raise ValueError("Expect BOTH offset AND limit to be provided (or neither)")
00257     if format:
00258         url += "." + format
00259     #print url
00260     return _open(url)

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS.search_count (   db,
  query 
)
TogoWS search count (returns an integer).

db - database (string), see http://togows.dbcls.jp/search
query - search term (string)

You could then use the count to download a large set of search results in
batches using the offset and limit options to Bio.TogoWS.search(). In
general however the Bio.TogoWS.search_iter() function is simpler to use.

Definition at line 130 of file __init__.py.

00130 
00131 def search_count(db, query):
00132     """TogoWS search count (returns an integer).
00133 
00134     db - database (string), see http://togows.dbcls.jp/search
00135     query - search term (string)
00136 
00137     You could then use the count to download a large set of search results in
00138     batches using the offset and limit options to Bio.TogoWS.search(). In
00139     general however the Bio.TogoWS.search_iter() function is simpler to use.
00140     """
00141     global _search_db_names
00142     if _search_db_names is None:
00143         _search_db_names = _get_fields(_BASE_URL + "/search")
00144     if db not in _search_db_names:
00145         #TODO - Make this a ValueError? Right now despite the HTML website
00146         #claiming to, the "gene" or "ncbi-gene" don't work and are not listed.
00147         import warnings
00148         warnings.warn("TogoWS search does not officially support database '%s'. "
00149                       "See %s/search/ for options." % (db, _BASE_URL))
00150     handle = _open(_BASE_URL + "/search/%s/%s/count" \
00151                    % (db, urllib.quote(query)))
00152     count = int(handle.read().strip())
00153     handle.close()
00154     return count

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.TogoWS.search_iter (   db,
  query,
  limit = None,
  batch = 100 
)
TogoWS search iteratating over the results (generator function).

db - database (string), see http://togows.dbcls.jp/search
query - search term (string)
limit - optional upper bound on number of search results
batch - number of search results to pull back each time talk to
        TogoWS (currently limited to 100).

You would use this function within a for loop, e.g.

>>> for id in search_iter("pubmed", "lung+cancer+drug", limit=10):
...     print id #maybe fetch data with entry?

Internally this first calls the Bio.TogoWS.search_count() and then
uses Bio.TogoWS.search() to get the results in batches.

Definition at line 155 of file __init__.py.

00155 
00156 def search_iter(db, query, limit=None, batch=100):
00157     """TogoWS search iteratating over the results (generator function).
00158 
00159     db - database (string), see http://togows.dbcls.jp/search
00160     query - search term (string)
00161     limit - optional upper bound on number of search results
00162     batch - number of search results to pull back each time talk to
00163             TogoWS (currently limited to 100).
00164 
00165     You would use this function within a for loop, e.g.
00166 
00167     >>> for id in search_iter("pubmed", "lung+cancer+drug", limit=10):
00168     ...     print id #maybe fetch data with entry?
00169 
00170     Internally this first calls the Bio.TogoWS.search_count() and then
00171     uses Bio.TogoWS.search() to get the results in batches.
00172     """
00173     count = search_count(db, query)
00174     if not count:
00175         raise StopIteration
00176     #NOTE - We leave it to TogoWS to enforce any upper bound on each
00177     #batch, they currently return an HTTP 400 Bad Request if above 100.
00178     remain = count
00179     if limit is not None:
00180         remain = min(remain, limit)
00181     offset = 1 #They don't use zero based counting
00182     prev_ids = [] #Just cache the last batch for error checking
00183     while remain:
00184         batch = min(batch, remain)
00185         #print "%r left, asking for %r" % (remain, batch)
00186         ids = search(db, query, offset, batch).read().strip().split()
00187         assert len(ids)==batch, "Got %i, expected %i" % (len(ids), batch)
00188         #print "offset %i, %s ... %s" % (offset, ids[0], ids[-1])
00189         if ids == prev_ids:
00190             raise RuntimeError("Same search results for previous offset")
00191         for identifier in ids:
00192             if identifier in prev_ids:
00193                 raise RuntimeError("Result %s was in previous batch" \
00194                                    % identifier)
00195             yield identifier
00196         offset += batch
00197         remain -= batch
00198         prev_ids = ids

Here is the call graph for this function:


Variable Documentation

string Bio.TogoWS._BASE_URL = "http://togows.dbcls.jp"

Definition at line 39 of file __init__.py.

Definition at line 46 of file __init__.py.

dictionary Bio.TogoWS._entry_db_fields = {}

Definition at line 44 of file __init__.py.

Definition at line 45 of file __init__.py.

Definition at line 43 of file __init__.py.

Definition at line 42 of file __init__.py.