Back to index

python-biopython  1.60
Namespaces | Classes | Functions | Variables
Bio.GenBank Namespace Reference

Namespaces

namespace  Record
namespace  Scanner
namespace  utils

Classes

class  Iterator
class  ParserFailureError
class  LocationParserError
class  FeatureParser
class  RecordParser
class  _BaseGenBankConsumer
class  _FeatureConsumer
class  _RecordConsumer

Functions

def _pos
def _loc
def _split_compound_loc
def parse
def read
def _test

Variables

int GENBANK_INDENT = 12
string GENBANK_SPACER = " "
int FEATURE_KEY_INDENT = 5
int FEATURE_QUALIFIER_INDENT = 21
string FEATURE_KEY_SPACER = " "
string FEATURE_QUALIFIER_SPACER = " "
string _solo_location = r"[<>]?\d+"
string _pair_location = r"[<>]?\d+\.\.[<>]?\d+"
string _between_location = r"\d+\^\d+"
string _within_position = r"\(\d+\.\d+\)"
tuple _re_within_position = re.compile(_within_position)
string _within_location = r"([<>]?\d+|%s)\.\.([<>]?\d+|%s)"
string _oneof_position = r"one\-of\(\d+(,\d+)+\)"
tuple _re_oneof_position = re.compile(_oneof_position)
string _oneof_location = r"([<>]?\d+|%s)\.\.([<>]?\d+|%s)"
string _simple_location = r"\d+\.\.\d+"
tuple _re_simple_location = re.compile(_simple_location)
tuple _re_simple_compound
string _complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)"
tuple _re_complex_location = re.compile(r"^%s$" % _complex_location)
string _possibly_complemented_complex_location = r"(%s|complement\(%s\))"
tuple _re_complex_compound

Class Documentation

class Bio::GenBank::ParserFailureError
Failure caused by some kind of problem in the parser.

Definition at line 387 of file __init__.py.

class Bio::GenBank::LocationParserError
Could not Properly parse out a location from a GenBank file.

Definition at line 392 of file __init__.py.


Function Documentation

def Bio.GenBank._loc (   loc_str,
  expected_seq_length,
  strand 
) [private]
FeatureLocation from non-compound non-complement location (PRIVATE).

Simple examples,

>>> _loc("123..456", 1000, +1)
FeatureLocation(ExactPosition(122), ExactPosition(456), strand=1)
>>> _loc("<123..>456", 1000, strand = -1)
FeatureLocation(BeforePosition(122), AfterPosition(456), strand=-1)

A more complex location using within positions,

>>> _loc("(9.10)..(20.25)", 1000, 1)
FeatureLocation(WithinPosition(8, left=8, right=9), WithinPosition(25, left=20, right=25), strand=1)

Notice how that will act as though it has overall start 8 and end 25.

Zero length between feature,

>>> _loc("123^124", 1000, 0)
FeatureLocation(ExactPosition(123), ExactPosition(123), strand=0)

The expected sequence length is needed for a special case, a between
position at the start/end of a circular genome:

>>> _loc("1000^1", 1000, 1)
FeatureLocation(ExactPosition(1000), ExactPosition(1000), strand=1)

Apart from this special case, between positions P^Q must have P+1==Q,

>>> _loc("123^456", 1000, 1)
Traceback (most recent call last):
   ...
ValueError: Invalid between location '123^456'

Definition at line 226 of file __init__.py.

00226 
00227 def _loc(loc_str, expected_seq_length, strand):
00228     """FeatureLocation from non-compound non-complement location (PRIVATE).
00229     
00230     Simple examples,
00231 
00232     >>> _loc("123..456", 1000, +1)
00233     FeatureLocation(ExactPosition(122), ExactPosition(456), strand=1)
00234     >>> _loc("<123..>456", 1000, strand = -1)
00235     FeatureLocation(BeforePosition(122), AfterPosition(456), strand=-1)
00236 
00237     A more complex location using within positions,
00238 
00239     >>> _loc("(9.10)..(20.25)", 1000, 1)
00240     FeatureLocation(WithinPosition(8, left=8, right=9), WithinPosition(25, left=20, right=25), strand=1)
00241 
00242     Notice how that will act as though it has overall start 8 and end 25.
00243 
00244     Zero length between feature,
00245 
00246     >>> _loc("123^124", 1000, 0)
00247     FeatureLocation(ExactPosition(123), ExactPosition(123), strand=0)
00248     
00249     The expected sequence length is needed for a special case, a between
00250     position at the start/end of a circular genome:
00251 
00252     >>> _loc("1000^1", 1000, 1)
00253     FeatureLocation(ExactPosition(1000), ExactPosition(1000), strand=1)
00254     
00255     Apart from this special case, between positions P^Q must have P+1==Q,
00256 
00257     >>> _loc("123^456", 1000, 1)
00258     Traceback (most recent call last):
00259        ...
00260     ValueError: Invalid between location '123^456'
00261     """
00262     try:
00263         s, e = loc_str.split("..")
00264     except ValueError:
00265         assert ".." not in loc_str
00266         if "^" in loc_str:
00267             #A between location like "67^68" (one based counting) is a
00268             #special case (note it has zero length). In python slice
00269             #notation this is 67:67, a zero length slice.  See Bug 2622
00270             #Further more, on a circular genome of length N you can have
00271             #a location N^1 meaning the junction at the origin. See Bug 3098.
00272             #NOTE - We can imagine between locations like "2^4", but this
00273             #is just "3".  Similarly, "2^5" is just "3..4"
00274             s, e = loc_str.split("^")
00275             if int(s)+1==int(e):
00276                 pos = _pos(s)
00277             elif int(s)==expected_seq_length and e=="1":
00278                 pos = _pos(s)
00279             else:
00280                 raise ValueError("Invalid between location %s" % repr(loc_str))
00281             return SeqFeature.FeatureLocation(pos, pos, strand)
00282         else:
00283             #e.g. "123"
00284             s = loc_str
00285             e = loc_str
00286     return SeqFeature.FeatureLocation(_pos(s,-1), _pos(e), strand)

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.GenBank._pos (   pos_str,
  offset = 0 
) [private]
Build a Position object (PRIVATE).

For an end position, leave offset as zero (default):

>>> _pos("5")
ExactPosition(5)

For a start position, set offset to minus one (for Python counting):

>>> _pos("5", -1)
ExactPosition(4)

This also covers fuzzy positions:

>>> p = _pos("<5")
>>> p
BeforePosition(5)
>>> print p
<5
>>> int(p)
5

>>> _pos(">5")
AfterPosition(5)

By default assumes an end position, so note the integer behaviour:

>>> p = _pos("one-of(5,8,11)")
>>> p
OneOfPosition(11, choices=[ExactPosition(5), ExactPosition(8), ExactPosition(11)])
>>> print p
one-of(5,8,11)
>>> int(p)
11

>>> _pos("(8.10)")
WithinPosition(10, left=8, right=10)

Fuzzy start positions:

>>> p = _pos("<5", -1)
>>> p
BeforePosition(4)
>>> print p
<4
>>> int(p)
4

Notice how the integer behaviour changes too!

>>> p = _pos("one-of(5,8,11)", -1)
>>> p
OneOfPosition(4, choices=[ExactPosition(4), ExactPosition(7), ExactPosition(10)])
>>> print(p)
one-of(4,7,10)
>>> int(p)
4

Definition at line 140 of file __init__.py.

00140 
00141 def _pos(pos_str, offset=0):
00142     """Build a Position object (PRIVATE).
00143     
00144     For an end position, leave offset as zero (default):
00145 
00146     >>> _pos("5")
00147     ExactPosition(5)
00148 
00149     For a start position, set offset to minus one (for Python counting):
00150 
00151     >>> _pos("5", -1)
00152     ExactPosition(4)
00153 
00154     This also covers fuzzy positions:
00155 
00156     >>> p = _pos("<5")
00157     >>> p
00158     BeforePosition(5)
00159     >>> print p
00160     <5
00161     >>> int(p)
00162     5
00163 
00164     >>> _pos(">5")
00165     AfterPosition(5)
00166 
00167     By default assumes an end position, so note the integer behaviour:
00168 
00169     >>> p = _pos("one-of(5,8,11)")
00170     >>> p
00171     OneOfPosition(11, choices=[ExactPosition(5), ExactPosition(8), ExactPosition(11)])
00172     >>> print p
00173     one-of(5,8,11)
00174     >>> int(p)
00175     11
00176 
00177     >>> _pos("(8.10)")
00178     WithinPosition(10, left=8, right=10)
00179 
00180     Fuzzy start positions:
00181 
00182     >>> p = _pos("<5", -1)
00183     >>> p
00184     BeforePosition(4)
00185     >>> print p
00186     <4
00187     >>> int(p)
00188     4
00189 
00190     Notice how the integer behaviour changes too!
00191 
00192     >>> p = _pos("one-of(5,8,11)", -1)
00193     >>> p
00194     OneOfPosition(4, choices=[ExactPosition(4), ExactPosition(7), ExactPosition(10)])
00195     >>> print(p)
00196     one-of(4,7,10)
00197     >>> int(p)
00198     4
00199 
00200     """
00201     if pos_str.startswith("<"):
00202         return SeqFeature.BeforePosition(int(pos_str[1:])+offset)
00203     elif pos_str.startswith(">"):
00204         return SeqFeature.AfterPosition(int(pos_str[1:])+offset)
00205     elif _re_within_position.match(pos_str):
00206         s,e = pos_str[1:-1].split(".")
00207         s = int(s) + offset
00208         e = int(e) + offset
00209         if offset == -1:
00210             default = s
00211         else:
00212             default = e
00213         return SeqFeature.WithinPosition(default, left=s, right=e)
00214     elif _re_oneof_position.match(pos_str):
00215         assert pos_str.startswith("one-of(")
00216         assert pos_str[-1]==")"
00217         parts = [SeqFeature.ExactPosition(int(pos)+offset) \
00218                  for pos in pos_str[7:-1].split(",")]
00219         if offset == -1:
00220             default = min(int(pos) for pos in parts)
00221         else:
00222             default = max(int(pos) for pos in parts)
00223         return SeqFeature.OneOfPosition(default, choices=parts)
00224     else:
00225         return SeqFeature.ExactPosition(int(pos_str)+offset)

Here is the caller graph for this function:

def Bio.GenBank._split_compound_loc (   compound_loc) [private]
Split a tricky compound location string (PRIVATE).

>>> list(_split_compound_loc("123..145"))
['123..145']
>>> list(_split_compound_loc("123..145,200..209"))
['123..145', '200..209']
>>> list(_split_compound_loc("one-of(200,203)..300"))
['one-of(200,203)..300']
>>> list(_split_compound_loc("complement(123..145),200..209"))
['complement(123..145)', '200..209']
>>> list(_split_compound_loc("123..145,one-of(200,203)..209"))
['123..145', 'one-of(200,203)..209']
>>> list(_split_compound_loc("123..145,one-of(200,203)..one-of(209,211),300"))
['123..145', 'one-of(200,203)..one-of(209,211)', '300']
>>> list(_split_compound_loc("123..145,complement(one-of(200,203)..one-of(209,211)),300"))
['123..145', 'complement(one-of(200,203)..one-of(209,211))', '300']
>>> list(_split_compound_loc("123..145,200..one-of(209,211),300"))
['123..145', '200..one-of(209,211)', '300']
>>> list(_split_compound_loc("123..145,200..one-of(209,211)"))
['123..145', '200..one-of(209,211)']
>>> list(_split_compound_loc("complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905"))
['complement(149815..150200)', 'complement(293787..295573)', 'NC_016402.1:6618..6676', '181647..181905']

Definition at line 287 of file __init__.py.

00287 
00288 def _split_compound_loc(compound_loc):
00289     """Split a tricky compound location string (PRIVATE).
00290     
00291     >>> list(_split_compound_loc("123..145"))
00292     ['123..145']
00293     >>> list(_split_compound_loc("123..145,200..209"))
00294     ['123..145', '200..209']
00295     >>> list(_split_compound_loc("one-of(200,203)..300"))
00296     ['one-of(200,203)..300']
00297     >>> list(_split_compound_loc("complement(123..145),200..209"))
00298     ['complement(123..145)', '200..209']
00299     >>> list(_split_compound_loc("123..145,one-of(200,203)..209"))
00300     ['123..145', 'one-of(200,203)..209']
00301     >>> list(_split_compound_loc("123..145,one-of(200,203)..one-of(209,211),300"))
00302     ['123..145', 'one-of(200,203)..one-of(209,211)', '300']
00303     >>> list(_split_compound_loc("123..145,complement(one-of(200,203)..one-of(209,211)),300"))
00304     ['123..145', 'complement(one-of(200,203)..one-of(209,211))', '300']
00305     >>> list(_split_compound_loc("123..145,200..one-of(209,211),300"))
00306     ['123..145', '200..one-of(209,211)', '300']
00307     >>> list(_split_compound_loc("123..145,200..one-of(209,211)"))
00308     ['123..145', '200..one-of(209,211)']
00309     >>> list(_split_compound_loc("complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905"))
00310     ['complement(149815..150200)', 'complement(293787..295573)', 'NC_016402.1:6618..6676', '181647..181905']
00311     """
00312     if "one-of(" in compound_loc:
00313         #Hard case
00314         while "," in compound_loc:
00315             assert compound_loc[0] != ","
00316             assert compound_loc[0:2] != ".."
00317             i = compound_loc.find(",")
00318             part = compound_loc[:i]
00319             compound_loc = compound_loc[i:] #includes the comma
00320             while part.count("(") > part.count(")"):
00321                 assert "one-of(" in part, (part, compound_loc)
00322                 i = compound_loc.find(")")
00323                 part += compound_loc[:i+1]
00324                 compound_loc = compound_loc[i+1:]
00325             if compound_loc.startswith(".."):
00326                 i = compound_loc.find(",")
00327                 if i==-1:
00328                     part += compound_loc
00329                     compound_loc = ""
00330                 else:
00331                     part += compound_loc[:i]
00332                     compound_loc = compound_loc[i:] #includes the comma
00333             while part.count("(") > part.count(")"):
00334                 assert part.count("one-of(") == 2
00335                 i = compound_loc.find(")")
00336                 part += compound_loc[:i+1]
00337                 compound_loc = compound_loc[i+1:]
00338             if compound_loc.startswith(","):
00339                 compound_loc = compound_loc[1:]
00340             assert part
00341             yield part
00342         if compound_loc:
00343             yield compound_loc
00344     else:
00345         #Easy case
00346         for part in compound_loc.split(","):
00347             yield part

Here is the caller graph for this function:

def Bio.GenBank._test ( ) [private]
Run the Bio.GenBank module's doctests.

Definition at line 1472 of file __init__.py.

01472 
01473 def _test():
01474     """Run the Bio.GenBank module's doctests."""
01475     import doctest
01476     import os
01477     if os.path.isdir(os.path.join("..","..","Tests")):
01478         print "Runing doctests..."
01479         cur_dir = os.path.abspath(os.curdir)
01480         os.chdir(os.path.join("..","..","Tests"))
01481         doctest.testmod()
01482         os.chdir(cur_dir)
01483         del cur_dir
01484         print "Done"
01485     elif os.path.isdir(os.path.join("Tests")):
01486         print "Runing doctests..."
01487         cur_dir = os.path.abspath(os.curdir)
01488         os.chdir(os.path.join("Tests"))
01489         doctest.testmod()
01490         os.chdir(cur_dir)
01491         del cur_dir
01492         print "Done"

def Bio.GenBank.parse (   handle)
Iterate over GenBank formatted entries as Record objects.

>>> from Bio import GenBank
>>> handle = open("GenBank/NC_000932.gb")
>>> for record in GenBank.parse(handle):
...     print record.accession
['NC_000932']
>>> handle.close()

To get SeqRecord objects use Bio.SeqIO.parse(..., format="gb")
instead.

Definition at line 1429 of file __init__.py.

01429 
01430 def parse(handle):
01431     """Iterate over GenBank formatted entries as Record objects.
01432 
01433     >>> from Bio import GenBank
01434     >>> handle = open("GenBank/NC_000932.gb")
01435     >>> for record in GenBank.parse(handle):
01436     ...     print record.accession
01437     ['NC_000932']
01438     >>> handle.close()
01439 
01440     To get SeqRecord objects use Bio.SeqIO.parse(..., format="gb")
01441     instead.
01442     """
01443     return iter(Iterator(handle, RecordParser()))

Here is the caller graph for this function:

def Bio.GenBank.read (   handle)
Read a handle containing a single GenBank entry as a Record object.

>>> from Bio import GenBank
>>> handle = open("GenBank/NC_000932.gb")
>>> record = GenBank.read(handle)
>>> print record.accession
['NC_000932']
>>> handle.close()
                   
To get a SeqRecord object use Bio.SeqIO.read(..., format="gb")
instead.

Definition at line 1444 of file __init__.py.

01444 
01445 def read(handle):
01446     """Read a handle containing a single GenBank entry as a Record object.
01447 
01448     >>> from Bio import GenBank
01449     >>> handle = open("GenBank/NC_000932.gb")
01450     >>> record = GenBank.read(handle)
01451     >>> print record.accession
01452     ['NC_000932']
01453     >>> handle.close()
01454                        
01455     To get a SeqRecord object use Bio.SeqIO.read(..., format="gb")
01456     instead.
01457     """
01458     iterator = parse(handle)
01459     try:
01460         first = iterator.next()
01461     except StopIteration:
01462         first = None
01463     if first is None:
01464         raise ValueError("No records found in handle")
01465     try:
01466         second = iterator.next()
01467     except StopIteration:
01468         second = None
01469     if second is not None:
01470         raise ValueError("More than one record found in handle")
01471     return first

Here is the call graph for this function:


Variable Documentation

string Bio.GenBank._between_location = r"\d+\^\d+"

Definition at line 63 of file __init__.py.

string Bio.GenBank._complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)"

Definition at line 92 of file __init__.py.

string Bio.GenBank._oneof_location = r"([<>]?\d+|%s)\.\.([<>]?\d+|%s)"

Definition at line 76 of file __init__.py.

string Bio.GenBank._oneof_position = r"one\-of\(\d+(,\d+)+\)"

Definition at line 74 of file __init__.py.

string Bio.GenBank._pair_location = r"[<>]?\d+\.\.[<>]?\d+"

Definition at line 62 of file __init__.py.

string Bio.GenBank._possibly_complemented_complex_location = r"(%s|complement\(%s\))"

Definition at line 96 of file __init__.py.

Initial value:
00001 re.compile(r"^(join|order|bond)\(%s(,%s)*\)$" \
00002                                  % (_possibly_complemented_complex_location,
00003                                     _possibly_complemented_complex_location))

Definition at line 98 of file __init__.py.

tuple Bio.GenBank._re_complex_location = re.compile(r"^%s$" % _complex_location)

Definition at line 95 of file __init__.py.

Definition at line 75 of file __init__.py.

Initial value:
00001 re.compile(r"^(join|order|bond)\(%s(,%s)*\)$" \
00002                                  % (_simple_location, _simple_location))

Definition at line 90 of file __init__.py.

Definition at line 89 of file __init__.py.

Definition at line 66 of file __init__.py.

string Bio.GenBank._simple_location = r"\d+\.\.\d+"

Definition at line 88 of file __init__.py.

string Bio.GenBank._solo_location = r"[<>]?\d+"

Definition at line 61 of file __init__.py.

string Bio.GenBank._within_location = r"([<>]?\d+|%s)\.\.([<>]?\d+|%s)"

Definition at line 67 of file __init__.py.

string Bio.GenBank._within_position = r"\(\d+\.\d+\)"

Definition at line 65 of file __init__.py.

Definition at line 55 of file __init__.py.

Definition at line 57 of file __init__.py.

Definition at line 56 of file __init__.py.

Definition at line 58 of file __init__.py.

Definition at line 51 of file __init__.py.

Definition at line 52 of file __init__.py.