Back to index

python-biopython  1.60
Namespaces | Classes | Functions | Variables
Bio.Alphabet Namespace Reference

Namespaces

namespace  IUPAC
namespace  Reduced

Classes

class  Alphabet
class  SingleLetterAlphabet
class  ProteinAlphabet
 Protein. More...
class  NucleotideAlphabet
 DNA. More...
class  DNAAlphabet
class  RNAAlphabet
 RNA. More...
class  SecondaryStructure
 Other per-sequence encodings. More...
class  ThreeLetterProtein
class  AlphabetEncoder
class  Gapped
class  HasStopCodon

Functions

def _get_base_alphabet
def _ungap
def _consensus_base_alphabet
def _consensus_alphabet
def _check_type_compatible
def _verify_alphabet

Variables

tuple generic_alphabet = Alphabet()
tuple single_letter_alphabet = SingleLetterAlphabet()
tuple generic_protein = ProteinAlphabet()
tuple generic_nucleotide = NucleotideAlphabet()
tuple generic_dna = DNAAlphabet()
tuple generic_rna = RNAAlphabet()

Function Documentation

def Bio.Alphabet._check_type_compatible (   alphabets) [private]
Returns True except for DNA+RNA or Nucleotide+Protein (PRIVATE).

>>> _check_type_compatible([generic_dna, generic_nucleotide])
True
>>> _check_type_compatible([generic_dna, generic_rna])
False
>>> _check_type_compatible([generic_dna, generic_protein])
False
>>> _check_type_compatible([single_letter_alphabet, generic_protein])
True

This relies on the Alphabet subclassing hierarchy.  It does not
check things like gap characters or stop symbols.

Definition at line 340 of file __init__.py.

00340 
00341 def _check_type_compatible(alphabets):
00342     """Returns True except for DNA+RNA or Nucleotide+Protein (PRIVATE).
00343 
00344     >>> _check_type_compatible([generic_dna, generic_nucleotide])
00345     True
00346     >>> _check_type_compatible([generic_dna, generic_rna])
00347     False
00348     >>> _check_type_compatible([generic_dna, generic_protein])
00349     False
00350     >>> _check_type_compatible([single_letter_alphabet, generic_protein])
00351     True
00352 
00353     This relies on the Alphabet subclassing hierarchy.  It does not
00354     check things like gap characters or stop symbols."""
00355     dna, rna, nucl, protein = False, False, False, False
00356     for alpha in alphabets:
00357         a = _get_base_alphabet(alpha)
00358         if isinstance(a, DNAAlphabet):
00359             dna = True
00360             nucl = True
00361             if rna or protein : return False
00362         elif isinstance(a, RNAAlphabet):
00363             rna = True
00364             nucl = True
00365             if dna or protein : return False
00366         elif isinstance(a, NucleotideAlphabet):
00367             nucl = True
00368             if protein : return False
00369         elif isinstance(a, ProteinAlphabet):
00370             protein = True
00371             if nucl : return False
00372     return True

Here is the call graph for this function:

def Bio.Alphabet._consensus_alphabet (   alphabets) [private]
Returns a common but often generic alphabet object (PRIVATE).

>>> from Bio.Alphabet import IUPAC
>>> _consensus_alphabet([IUPAC.extended_protein, IUPAC.protein])
ExtendedIUPACProtein()
>>> _consensus_alphabet([generic_protein, IUPAC.protein])
ProteinAlphabet()

Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single
letter.  These DO NOT raise an exception!

>>> _consensus_alphabet([generic_dna, generic_nucleotide])
NucleotideAlphabet()
>>> _consensus_alphabet([generic_dna, generic_rna])
NucleotideAlphabet()
>>> _consensus_alphabet([generic_dna, generic_protein])
SingleLetterAlphabet()
>>> _consensus_alphabet([single_letter_alphabet, generic_protein])
SingleLetterAlphabet()

This is aware of Gapped and HasStopCodon and new letters added by
other AlphabetEncoders.  This WILL raise an exception if more than
one gap character or stop symbol is present.

>>> from Bio.Alphabet import IUPAC
>>> _consensus_alphabet([Gapped(IUPAC.extended_protein), HasStopCodon(IUPAC.protein)])
HasStopCodon(Gapped(ExtendedIUPACProtein(), '-'), '*')
>>> _consensus_alphabet([Gapped(IUPAC.protein, "-"), Gapped(IUPAC.protein, "=")])
Traceback (most recent call last):
    ...
ValueError: More than one gap character present
>>> _consensus_alphabet([HasStopCodon(IUPAC.protein, "*"), HasStopCodon(IUPAC.protein, "+")])
Traceback (most recent call last):
    ...
ValueError: More than one stop symbol present

Definition at line 264 of file __init__.py.

00264 
00265 def _consensus_alphabet(alphabets):
00266     """Returns a common but often generic alphabet object (PRIVATE).
00267 
00268     >>> from Bio.Alphabet import IUPAC
00269     >>> _consensus_alphabet([IUPAC.extended_protein, IUPAC.protein])
00270     ExtendedIUPACProtein()
00271     >>> _consensus_alphabet([generic_protein, IUPAC.protein])
00272     ProteinAlphabet()
00273 
00274     Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single
00275     letter.  These DO NOT raise an exception!
00276 
00277     >>> _consensus_alphabet([generic_dna, generic_nucleotide])
00278     NucleotideAlphabet()
00279     >>> _consensus_alphabet([generic_dna, generic_rna])
00280     NucleotideAlphabet()
00281     >>> _consensus_alphabet([generic_dna, generic_protein])
00282     SingleLetterAlphabet()
00283     >>> _consensus_alphabet([single_letter_alphabet, generic_protein])
00284     SingleLetterAlphabet()
00285     
00286     This is aware of Gapped and HasStopCodon and new letters added by
00287     other AlphabetEncoders.  This WILL raise an exception if more than
00288     one gap character or stop symbol is present.
00289 
00290     >>> from Bio.Alphabet import IUPAC
00291     >>> _consensus_alphabet([Gapped(IUPAC.extended_protein), HasStopCodon(IUPAC.protein)])
00292     HasStopCodon(Gapped(ExtendedIUPACProtein(), '-'), '*')
00293     >>> _consensus_alphabet([Gapped(IUPAC.protein, "-"), Gapped(IUPAC.protein, "=")])
00294     Traceback (most recent call last):
00295         ...
00296     ValueError: More than one gap character present
00297     >>> _consensus_alphabet([HasStopCodon(IUPAC.protein, "*"), HasStopCodon(IUPAC.protein, "+")])
00298     Traceback (most recent call last):
00299         ...
00300     ValueError: More than one stop symbol present
00301     """
00302     base = _consensus_base_alphabet(alphabets)
00303     gap = None
00304     stop = None
00305     new_letters = ""
00306     for alpha in alphabets:
00307         #Gaps...
00308         if not hasattr(alpha, "gap_char"):
00309             pass
00310         elif gap is None:
00311             gap = alpha.gap_char
00312         elif gap == alpha.gap_char:
00313             pass
00314         else:
00315             raise ValueError("More than one gap character present")
00316         #Stops...
00317         if not hasattr(alpha, "stop_symbol"):
00318             pass
00319         elif stop is None:
00320             stop = alpha.stop_symbol
00321         elif stop == alpha.stop_symbol:
00322             pass
00323         else:
00324             raise ValueError("More than one stop symbol present")
00325         #New letters...
00326         if hasattr(alpha, "new_letters"):
00327             for letter in alpha.new_letters:
00328                 if letter not in new_letters \
00329                 and letter != gap and letter != stop:
00330                     new_letters += letter
00331 
00332     alpha = base
00333     if new_letters:
00334         alpha = AlphabetEncoder(alpha, new_letters)
00335     if gap:
00336         alpha = Gapped(alpha, gap_char=gap)
00337     if stop:
00338         alpha = HasStopCodon(alpha, stop_symbol=stop)
00339     return alpha

Here is the call graph for this function:

def Bio.Alphabet._consensus_base_alphabet (   alphabets) [private]
Returns a common but often generic base alphabet object (PRIVATE).

This throws away any AlphabetEncoder information, e.g. Gapped alphabets.

Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single
letter.  These DO NOT raise an exception!

Definition at line 230 of file __init__.py.

00230 
00231 def _consensus_base_alphabet(alphabets):
00232     """Returns a common but often generic base alphabet object (PRIVATE).
00233 
00234     This throws away any AlphabetEncoder information, e.g. Gapped alphabets.
00235 
00236     Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single
00237     letter.  These DO NOT raise an exception!"""
00238     common = None
00239     for alpha in alphabets:
00240         a = _get_base_alphabet(alpha)
00241         if common is None:
00242             common = a
00243         elif common == a:
00244             pass
00245         elif isinstance(a, common.__class__):
00246             pass
00247         elif isinstance(common, a.__class__):
00248             common = a
00249         elif isinstance(a, NucleotideAlphabet) \
00250         and isinstance(common, NucleotideAlphabet):
00251             #e.g. Give a mix of RNA and DNA alphabets
00252             common = generic_nucleotide
00253         elif isinstance(a, SingleLetterAlphabet) \
00254         and isinstance(common, SingleLetterAlphabet):
00255             #This is a pretty big mis-match!
00256             common = single_letter_alphabet
00257         else:
00258             #We have a major mis-match... take the easy way out!
00259             return generic_alphabet
00260     if common is None:
00261         #Given NO alphabets!
00262         return generic_alphabet
00263     return common

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Alphabet._get_base_alphabet (   alphabet) [private]
Returns the non-gapped non-stop-codon Alphabet object (PRIVATE).

Definition at line 207 of file __init__.py.

00207 
00208 def _get_base_alphabet(alphabet):
00209     """Returns the non-gapped non-stop-codon Alphabet object (PRIVATE)."""
00210     a = alphabet
00211     while isinstance(a, AlphabetEncoder):
00212         a = a.alphabet
00213     assert isinstance(a, Alphabet), \
00214            "Invalid alphabet found, %s" % repr(a)
00215     return a

Here is the caller graph for this function:

def Bio.Alphabet._ungap (   alphabet) [private]
Returns the alphabet without any gap encoder (PRIVATE).

Definition at line 216 of file __init__.py.

00216 
00217 def _ungap(alphabet):
00218     """Returns the alphabet without any gap encoder (PRIVATE)."""
00219     #TODO - Handle via method of the objects?
00220     if not hasattr(alphabet, "gap_char"):
00221         return alphabet
00222     elif isinstance(alphabet, Gapped):
00223         return alphabet.alphabet
00224     elif isinstance(alphabet, HasStopCodon):
00225         return HasStopCodon(_ungap(alphabet.alphabet), stop_symbol=alphabet.stop_symbol)
00226     elif isinstance(alphabet, AlphabetEncoder):
00227         return AlphabetEncoder(_ungap(alphabet.alphabet), letters=alphabet.letters)
00228     else:
00229         raise NotImplementedError
    
def Bio.Alphabet._verify_alphabet (   sequence) [private]
Check all letters in sequence are in the alphabet (PRIVATE).

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",
...              IUPAC.protein)
>>> _verify_alphabet(my_seq)
True

This example has an X, which is not in the IUPAC protein alphabet
(you should be using the IUPAC extended protein alphabet):

>>> bad_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVFX",
...                IUPAC.protein)
>>> _verify_alphabet(bad_seq)
False

This replaces Bio.utils.verify_alphabet() since we are deprecating
that. Potentially this could be added to the Alphabet object, and
I would like it to be an option when creating a Seq object... but
that might slow things down.

Definition at line 373 of file __init__.py.

00373 
00374 def _verify_alphabet(sequence):
00375     """Check all letters in sequence are in the alphabet (PRIVATE).
00376 
00377     >>> from Bio.Seq import Seq
00378     >>> from Bio.Alphabet import IUPAC
00379     >>> my_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",
00380     ...              IUPAC.protein)
00381     >>> _verify_alphabet(my_seq)
00382     True
00383 
00384     This example has an X, which is not in the IUPAC protein alphabet
00385     (you should be using the IUPAC extended protein alphabet):
00386 
00387     >>> bad_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVFX",
00388     ...                IUPAC.protein)
00389     >>> _verify_alphabet(bad_seq)
00390     False
00391 
00392     This replaces Bio.utils.verify_alphabet() since we are deprecating
00393     that. Potentially this could be added to the Alphabet object, and
00394     I would like it to be an option when creating a Seq object... but
00395     that might slow things down.
00396     """
00397     letters = sequence.alphabet.letters
00398     if not letters:
00399         raise ValueError("Alphabet does not define letters.")
00400     for letter in sequence:
00401         if letter not in letters:
00402             return False
00403     return True
00404 

Here is the caller graph for this function:


Variable Documentation

Definition at line 67 of file __init__.py.

Definition at line 91 of file __init__.py.

Definition at line 86 of file __init__.py.

Definition at line 80 of file __init__.py.

Definition at line 99 of file __init__.py.

Definition at line 73 of file __init__.py.