Back to index

python-biopython  1.60
Namespaces | Functions | Variables
Bio.Entrez Namespace Reference

Namespaces

namespace  Parser

Functions

def epost
def efetch
def esearch
def elink
def einfo
def esummary
def egquery
def espell
def read
def parse
def _open
def _test

Variables

 email = None
string tool = "biopython"

Function Documentation

def Bio.Entrez._open (   cgi,
  params = {},
  post = False 
) [private]
Helper function to build the URL and open a handle to it (PRIVATE).

Open a handle to Entrez.  cgi is the URL for the cgi script to access.
params is a dictionary with the options to pass to it.  Does some
simple error checking, and will raise an IOError if it encounters one.

This function also enforces the "up to three queries per second rule"
to avoid abusing the NCBI servers.

Definition at line 385 of file __init__.py.

00385 
00386 def _open(cgi, params={}, post=False):
00387     """Helper function to build the URL and open a handle to it (PRIVATE).
00388 
00389     Open a handle to Entrez.  cgi is the URL for the cgi script to access.
00390     params is a dictionary with the options to pass to it.  Does some
00391     simple error checking, and will raise an IOError if it encounters one.
00392 
00393     This function also enforces the "up to three queries per second rule"
00394     to avoid abusing the NCBI servers.
00395     """
00396     # NCBI requirement: At most three queries per second.
00397     # Equivalently, at least a third of second between queries
00398     delay = 0.333333334
00399     current = time.time()
00400     wait = _open.previous + delay - current
00401     if wait > 0:
00402         time.sleep(wait)
00403         _open.previous = current + wait
00404     else:
00405         _open.previous = current
00406     # Remove None values from the parameters
00407     for key, value in params.items():
00408         if value is None:
00409             del params[key]
00410     # Tell Entrez that we are using Biopython (or whatever the user has
00411     # specified explicitly in the parameters or by changing the default)
00412     if not "tool" in params:
00413         params["tool"] = tool
00414     # Tell Entrez who we are
00415     if not "email" in params:
00416         if email!=None:
00417             params["email"] = email
00418         else:
00419             warnings.warn("""
00420 Email address is not specified.
00421 
00422 To make use of NCBI's E-utilities, NCBI strongly recommends you to specify
00423 your email address with each request. From June 1, 2010, this will be
00424 mandatory. As an example, if your email address is A.N.Other@example.com, you
00425 can specify it as follows:
00426    from Bio import Entrez
00427    Entrez.email = 'A.N.Other@example.com'
00428 In case of excessive usage of the E-utilities, NCBI will attempt to contact
00429 a user at the email address provided before blocking access to the
00430 E-utilities.""", UserWarning)
00431     # Open a handle to Entrez.
00432     options = urllib.urlencode(params, doseq=True)
00433     #print cgi + "?" + options
00434     try:
00435         if post:
00436             #HTTP POST
00437             handle = urllib2.urlopen(cgi, data=options)
00438         else:
00439             #HTTP GET
00440             cgi += "?" + options
00441             handle = urllib2.urlopen(cgi)
00442     except urllib2.HTTPError, exception:
00443         raise exception
00444 
00445     return _binary_to_string_handle(handle)
00446 
00447 _open.previous = 0
00448 

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Entrez._test ( ) [private]
Run the module's doctests (PRIVATE).

Definition at line 449 of file __init__.py.

00449 
00450 def _test():
00451     """Run the module's doctests (PRIVATE)."""
00452     print "Runing doctests..."
00453     import doctest
00454     doctest.testmod()
00455     print "Done"

def Bio.Entrez.efetch (   db,
  keywds 
)
Fetches Entrez results which are returned as a handle.

EFetch retrieves records in the requested format from a list of one or
more UIs or from user's environment.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html

Return a handle to the results.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.efetch(db="nucleotide", id="57240072", rettype="gb", retmode="text")
>>> print handle.readline().strip()
LOCUS       AY851612                 892 bp    DNA     linear   PLN 10-APR-2007
>>> handle.close()

Warning: The NCBI changed the default retmode in Feb 2012, so many
databases which previously returned text output now give XML.

Definition at line 99 of file __init__.py.

00099 
00100 def efetch(db, **keywds):
00101     """Fetches Entrez results which are returned as a handle.
00102 
00103     EFetch retrieves records in the requested format from a list of one or
00104     more UIs or from user's environment.
00105 
00106     See the online documentation for an explanation of the parameters:
00107     http://www.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html
00108 
00109     Return a handle to the results.
00110 
00111     Raises an IOError exception if there's a network error.
00112 
00113     Short example:
00114 
00115     >>> from Bio import Entrez
00116     >>> Entrez.email = "Your.Name.Here@example.org"
00117     >>> handle = Entrez.efetch(db="nucleotide", id="57240072", rettype="gb", retmode="text")
00118     >>> print handle.readline().strip()
00119     LOCUS       AY851612                 892 bp    DNA     linear   PLN 10-APR-2007
00120     >>> handle.close()
00121 
00122     Warning: The NCBI changed the default retmode in Feb 2012, so many
00123     databases which previously returned text output now give XML.
00124     """
00125     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi'
00126     variables = {'db' : db}
00127     keywords = keywds
00128     if "id" in keywds and isinstance(keywds["id"], list):
00129         #Fix for NCBI change (probably part of EFetch 2,0, Feb 2012) where
00130         #a list of ID strings now gives HTTP Error 500: Internal server error
00131         #This was turned into ...&id=22307645&id=22303114&... which used to work
00132         #while now the NCBI appear to insist on ...&id=22301129,22299544,...
00133         keywords = keywds.copy() #Don't alter input dict!
00134         keywords["id"] = ",".join(keywds["id"])
00135     variables.update(keywords)
00136     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.egquery (   keywds)
EGQuery provides Entrez database counts for a global search.

EGQuery provides Entrez database counts in XML for a single search
using Global Query.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/egquery_help.html

Return a handle to the results in XML format.

Raises an IOError exception if there's a network error.

This quick example based on a longer version from the Biopython
Tutorial just checks there are over 60 matches for 'Biopython'
in PubMedCentral:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.egquery(term="biopython")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> for row in record["eGQueryResult"]:
...     if "pmc" in row["DbName"]:
...         print row["Count"] > 60
True

Definition at line 268 of file __init__.py.

00268 
00269 def egquery(**keywds):
00270     """EGQuery provides Entrez database counts for a global search.
00271 
00272     EGQuery provides Entrez database counts in XML for a single search
00273     using Global Query.
00274 
00275     See the online documentation for an explanation of the parameters:
00276     http://www.ncbi.nlm.nih.gov/entrez/query/static/egquery_help.html
00277 
00278     Return a handle to the results in XML format.
00279 
00280     Raises an IOError exception if there's a network error.
00281 
00282     This quick example based on a longer version from the Biopython
00283     Tutorial just checks there are over 60 matches for 'Biopython'
00284     in PubMedCentral:
00285 
00286     >>> from Bio import Entrez
00287     >>> Entrez.email = "Your.Name.Here@example.org"
00288     >>> handle = Entrez.egquery(term="biopython")
00289     >>> record = Entrez.read(handle)
00290     >>> handle.close()
00291     >>> for row in record["eGQueryResult"]:
00292     ...     if "pmc" in row["DbName"]:
00293     ...         print row["Count"] > 60
00294     True
00295 
00296     """
00297     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi'
00298     variables = {}
00299     variables.update(keywds)
00300     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.einfo (   keywds)
EInfo returns a summary of the Entez databases as a results handle.

EInfo provides field names, index term counts, last update, and
available links for each Entrez database.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/einfo_help.html

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> record = Entrez.read(Entrez.einfo())
>>> 'pubmed' in record['DbList']
True

Definition at line 210 of file __init__.py.

00210 
00211 def einfo(**keywds):
00212     """EInfo returns a summary of the Entez databases as a results handle.
00213 
00214     EInfo provides field names, index term counts, last update, and
00215     available links for each Entrez database.
00216 
00217     See the online documentation for an explanation of the parameters:
00218     http://www.ncbi.nlm.nih.gov/entrez/query/static/einfo_help.html
00219 
00220     Return a handle to the results, by default in XML format.
00221 
00222     Raises an IOError exception if there's a network error.
00223 
00224     Short example:
00225 
00226     >>> from Bio import Entrez
00227     >>> Entrez.email = "Your.Name.Here@example.org"
00228     >>> record = Entrez.read(Entrez.einfo())
00229     >>> 'pubmed' in record['DbList']
00230     True
00231 
00232     """
00233     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi'
00234     variables = {}
00235     variables.update(keywds)
00236     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.elink (   keywds)
ELink checks for linked external articles and returns a handle.

ELink checks for the existence of an external or Related Articles link
from a list of one or more primary IDs;  retrieves IDs and relevancy
scores for links to Entrez databases or Related Articles; creates a
hyperlink to the primary LinkOut provider for a specific ID and
database, or lists LinkOut URLs and attributes for multiple IDs.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/elink_help.html

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

This example finds articles related to the Biopython application
note's entry in the PubMed database:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> pmid = "19304878"
>>> handle = Entrez.elink(dbfrom="pubmed", id=pmid, linkname="pubmed_pubmed")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> print record[0]["LinkSetDb"][0]["LinkName"]
pubmed_pubmed
>>> linked = [link["Id"] for link in record[0]["LinkSetDb"][0]["Link"]]
>>> "17121776" in linked
True

This is explained in much more detail in the Biopython Tutorial.

Definition at line 172 of file __init__.py.

00172 
00173 def elink(**keywds):
00174     """ELink checks for linked external articles and returns a handle.
00175 
00176     ELink checks for the existence of an external or Related Articles link
00177     from a list of one or more primary IDs;  retrieves IDs and relevancy
00178     scores for links to Entrez databases or Related Articles; creates a
00179     hyperlink to the primary LinkOut provider for a specific ID and
00180     database, or lists LinkOut URLs and attributes for multiple IDs.
00181 
00182     See the online documentation for an explanation of the parameters:
00183     http://www.ncbi.nlm.nih.gov/entrez/query/static/elink_help.html
00184 
00185     Return a handle to the results, by default in XML format.
00186 
00187     Raises an IOError exception if there's a network error.
00188 
00189     This example finds articles related to the Biopython application
00190     note's entry in the PubMed database:
00191 
00192     >>> from Bio import Entrez
00193     >>> Entrez.email = "Your.Name.Here@example.org"
00194     >>> pmid = "19304878"
00195     >>> handle = Entrez.elink(dbfrom="pubmed", id=pmid, linkname="pubmed_pubmed")
00196     >>> record = Entrez.read(handle)
00197     >>> handle.close()
00198     >>> print record[0]["LinkSetDb"][0]["LinkName"]
00199     pubmed_pubmed
00200     >>> linked = [link["Id"] for link in record[0]["LinkSetDb"][0]["Link"]]
00201     >>> "17121776" in linked
00202     True
00203 
00204     This is explained in much more detail in the Biopython Tutorial.
00205     """
00206     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi'
00207     variables = {}
00208     variables.update(keywds)
00209     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.epost (   db,
  keywds 
)
Post a file of identifiers for future use.

Posts a file containing a list of UIs for future use in the user's
environment to use with subsequent search strategies.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html

Return a handle to the results.

Raises an IOError exception if there's a network error.

Definition at line 81 of file __init__.py.

00081 
00082 def epost(db, **keywds):
00083     """Post a file of identifiers for future use.
00084 
00085     Posts a file containing a list of UIs for future use in the user's
00086     environment to use with subsequent search strategies.
00087 
00088     See the online documentation for an explanation of the parameters:
00089     http://www.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html
00090 
00091     Return a handle to the results.
00092 
00093     Raises an IOError exception if there's a network error.
00094     """
00095     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi'
00096     variables = {'db' : db}
00097     variables.update(keywds)
00098     return _open(cgi, variables, post=True)

Here is the call graph for this function:

def Bio.Entrez.esearch (   db,
  term,
  keywds 
)
ESearch runs an Entrez search and returns a handle to the results.

ESearch searches and retrieves primary IDs (for use in EFetch, ELink
and ESummary) and term translations, and optionally retains results
for future use in the user's environment.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

Return a handle to the results which are always in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.esearch(db="nucleotide", retmax=10, term="opuntia[ORGN] accD")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> record["Count"] >= 2
True
>>> "156535671" in record["IdList"]
True
>>> "156535673" in record["IdList"]
True

Definition at line 137 of file __init__.py.

00137 
00138 def esearch(db, term, **keywds):
00139     """ESearch runs an Entrez search and returns a handle to the results.
00140 
00141     ESearch searches and retrieves primary IDs (for use in EFetch, ELink
00142     and ESummary) and term translations, and optionally retains results
00143     for future use in the user's environment.
00144 
00145     See the online documentation for an explanation of the parameters:
00146     http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
00147 
00148     Return a handle to the results which are always in XML format.
00149 
00150     Raises an IOError exception if there's a network error.
00151 
00152     Short example:
00153 
00154     >>> from Bio import Entrez
00155     >>> Entrez.email = "Your.Name.Here@example.org"
00156     >>> handle = Entrez.esearch(db="nucleotide", retmax=10, term="opuntia[ORGN] accD")
00157     >>> record = Entrez.read(handle)
00158     >>> handle.close()
00159     >>> record["Count"] >= 2
00160     True
00161     >>> "156535671" in record["IdList"]
00162     True
00163     >>> "156535673" in record["IdList"]
00164     True
00165 
00166     """
00167     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'
00168     variables = {'db' : db,
00169                  'term' : term}
00170     variables.update(keywds)
00171     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.espell (   keywds)
ESpell retrieves spelling suggestions, returned in a results handle.

ESpell retrieves spelling suggestions, if available.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/espell_help.html

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez 
>>> Entrez.email = "Your.Name.Here@example.org"
>>> record = Entrez.read(Entrez.espell(term="biopythooon"))
>>> print record["Query"] 
biopythooon
>>> print record["CorrectedQuery"] 
biopython

Definition at line 301 of file __init__.py.

00301 
00302 def espell(**keywds):
00303     """ESpell retrieves spelling suggestions, returned in a results handle.
00304 
00305     ESpell retrieves spelling suggestions, if available.
00306 
00307     See the online documentation for an explanation of the parameters:
00308     http://www.ncbi.nlm.nih.gov/entrez/query/static/espell_help.html
00309 
00310     Return a handle to the results, by default in XML format.
00311 
00312     Raises an IOError exception if there's a network error.
00313 
00314     Short example:
00315 
00316     >>> from Bio import Entrez 
00317     >>> Entrez.email = "Your.Name.Here@example.org"
00318     >>> record = Entrez.read(Entrez.espell(term="biopythooon"))
00319     >>> print record["Query"] 
00320     biopythooon
00321     >>> print record["CorrectedQuery"] 
00322     biopython
00323 
00324     """
00325     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi'
00326     variables = {}
00327     variables.update(keywds)
00328     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.esummary (   keywds)
ESummary retrieves document summaries as a results handle.

ESummary retrieves document summaries from a list of primary IDs or
from the user's environment.

See the online documentation for an explanation of the parameters:
http://www.ncbi.nlm.nih.gov/entrez/query/static/esummary_help.html

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

This example discovers more about entry 30367 in the journals database:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.esummary(db="journals", id="30367")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> print record[0]["Id"]
30367
>>> print record[0]["Title"]
Computational biology and chemistry

Definition at line 237 of file __init__.py.

00237 
00238 def esummary(**keywds):
00239     """ESummary retrieves document summaries as a results handle.
00240 
00241     ESummary retrieves document summaries from a list of primary IDs or
00242     from the user's environment.
00243 
00244     See the online documentation for an explanation of the parameters:
00245     http://www.ncbi.nlm.nih.gov/entrez/query/static/esummary_help.html
00246 
00247     Return a handle to the results, by default in XML format.
00248 
00249     Raises an IOError exception if there's a network error.
00250 
00251     This example discovers more about entry 30367 in the journals database:
00252 
00253     >>> from Bio import Entrez
00254     >>> Entrez.email = "Your.Name.Here@example.org"
00255     >>> handle = Entrez.esummary(db="journals", id="30367")
00256     >>> record = Entrez.read(handle)
00257     >>> handle.close()
00258     >>> print record[0]["Id"]
00259     30367
00260     >>> print record[0]["Title"]
00261     Computational biology and chemistry
00262 
00263     """
00264     cgi='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi'
00265     variables = {}
00266     variables.update(keywds)
00267     return _open(cgi, variables)

Here is the call graph for this function:

def Bio.Entrez.parse (   handle,
  validate = True 
)
Parses an XML file from the NCBI Entrez Utilities into python objects.

This function parses an XML file created by NCBI's Entrez Utilities,
returning a multilevel data structure of Python lists and dictionaries.
This function is suitable for XML files that (in Python) can be represented
as a list of individual records. Whereas 'read' reads the complete file
and returns a single Python list, 'parse' is a generator function that
returns the records one by one. This function is therefore particularly
useful for parsing large files.

Most XML files returned by NCBI's Entrez Utilities can be parsed by
this function, provided its DTD is available. Biopython includes the
DTDs for most commonly used Entrez Utilities.

If validate is True (default), the parser will validate the XML file
against the DTD, and raise an error if the XML file contains tags that
are not represented in the DTD. If validate is False, the parser will
simply skip such tags.

Whereas the data structure seems to consist of generic Python lists,
dictionaries, strings, and so on, each of these is actually a class
derived from the base type. This allows us to store the attributes
(if any) of each element in a dictionary my_element.attributes, and
the tag name in my_element.tag.

Definition at line 354 of file __init__.py.

00354 
00355 def parse(handle, validate=True):
00356     """Parses an XML file from the NCBI Entrez Utilities into python objects.
00357     
00358     This function parses an XML file created by NCBI's Entrez Utilities,
00359     returning a multilevel data structure of Python lists and dictionaries.
00360     This function is suitable for XML files that (in Python) can be represented
00361     as a list of individual records. Whereas 'read' reads the complete file
00362     and returns a single Python list, 'parse' is a generator function that
00363     returns the records one by one. This function is therefore particularly
00364     useful for parsing large files.
00365 
00366     Most XML files returned by NCBI's Entrez Utilities can be parsed by
00367     this function, provided its DTD is available. Biopython includes the
00368     DTDs for most commonly used Entrez Utilities.
00369 
00370     If validate is True (default), the parser will validate the XML file
00371     against the DTD, and raise an error if the XML file contains tags that
00372     are not represented in the DTD. If validate is False, the parser will
00373     simply skip such tags.
00374 
00375     Whereas the data structure seems to consist of generic Python lists,
00376     dictionaries, strings, and so on, each of these is actually a class
00377     derived from the base type. This allows us to store the attributes
00378     (if any) of each element in a dictionary my_element.attributes, and
00379     the tag name in my_element.tag.
00380     """
00381     from Parser import DataHandler
00382     handler = DataHandler(validate)
00383     records = handler.parse(handle)
00384     return records

def Bio.Entrez.read (   handle,
  validate = True 
)
Parses an XML file from the NCBI Entrez Utilities into python objects.

This function parses an XML file created by NCBI's Entrez Utilities,
returning a multilevel data structure of Python lists and dictionaries.
Most XML files returned by NCBI's Entrez Utilities can be parsed by
this function, provided its DTD is available. Biopython includes the
DTDs for most commonly used Entrez Utilities.

If validate is True (default), the parser will validate the XML file
against the DTD, and raise an error if the XML file contains tags that
are not represented in the DTD. If validate is False, the parser will
simply skip such tags.

Whereas the data structure seems to consist of generic Python lists,
dictionaries, strings, and so on, each of these is actually a class
derived from the base type. This allows us to store the attributes
(if any) of each element in a dictionary my_element.attributes, and
the tag name in my_element.tag.

Definition at line 329 of file __init__.py.

00329 
00330 def read(handle, validate=True):
00331     """Parses an XML file from the NCBI Entrez Utilities into python objects.
00332     
00333     This function parses an XML file created by NCBI's Entrez Utilities,
00334     returning a multilevel data structure of Python lists and dictionaries.
00335     Most XML files returned by NCBI's Entrez Utilities can be parsed by
00336     this function, provided its DTD is available. Biopython includes the
00337     DTDs for most commonly used Entrez Utilities.
00338 
00339     If validate is True (default), the parser will validate the XML file
00340     against the DTD, and raise an error if the XML file contains tags that
00341     are not represented in the DTD. If validate is False, the parser will
00342     simply skip such tags.
00343 
00344     Whereas the data structure seems to consist of generic Python lists,
00345     dictionaries, strings, and so on, each of these is actually a class
00346     derived from the base type. This allows us to store the attributes
00347     (if any) of each element in a dictionary my_element.attributes, and
00348     the tag name in my_element.tag.
00349     """
00350     from Parser import DataHandler
00351     handler = DataHandler(validate)
00352     record = handler.read(handle)
00353     return record


Variable Documentation

Definition at line 76 of file __init__.py.

string Bio.Entrez.tool = "biopython"

Definition at line 77 of file __init__.py.