Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Private Member Functions
Bio.SeqUtils.ProtParam.ProteinAnalysis Class Reference

List of all members.

Public Member Functions

def __init__
def count_amino_acids
def get_amino_acids_percent
def molecular_weight
def aromaticity
def instability_index
def flexibility
def gravy
def protein_scale
def isoelectric_point
def secondary_structure_fraction

Public Attributes

 sequence
 amino_acids_content
 amino_acids_percent
 length

Private Member Functions

def _weight_list

Detailed Description

Class containing methods for protein analysis.

The constructor takes one argument: the protein sequence as a
string and builds a sequence object using the Bio.Seq module. This is done
just to make sure the sequence is a protein sequence and not anything else.

Definition at line 27 of file ProtParam.py.


Constructor & Destructor Documentation

def Bio.SeqUtils.ProtParam.ProteinAnalysis.__init__ (   self,
  prot_sequence 
)

Definition at line 35 of file ProtParam.py.

00035 
00036     def __init__(self, prot_sequence):
00037         if prot_sequence.islower():
00038             self.sequence = Seq(prot_sequence.upper(), IUPAC.protein)
00039         else:
00040             self.sequence = Seq(prot_sequence, IUPAC.protein)
00041         self.amino_acids_content = None
00042         self.amino_acids_percent = None
00043         self.length = len(self.sequence)
        

Member Function Documentation

def Bio.SeqUtils.ProtParam.ProteinAnalysis._weight_list (   self,
  window,
  edge 
) [private]
Makes a list of relative weight of the
window edges compared to the window center. The weights are linear.
it actually generates half a list. For a window of size 9 and edge 0.4
you get a list of [0.4, 0.55, 0.7, 0.85]. 

Definition at line 165 of file ProtParam.py.

00165 
00166     def _weight_list(self, window, edge):
00167         """Makes a list of relative weight of the
00168         window edges compared to the window center. The weights are linear.
00169         it actually generates half a list. For a window of size 9 and edge 0.4
00170         you get a list of [0.4, 0.55, 0.7, 0.85]. 
00171         """
00172         unit = 2 * (1.0 - edge) / (window - 1)
00173         weights = [0.0] * (window // 2)
00174         
00175         for i in range(window // 2):
00176             weights[i] = edge + unit * i
00177 
00178         return weights
    

Here is the caller graph for this function:

Calculate the aromaticity according to Lobry, 1994.

Calculates the aromaticity value of a protein according to Lobry, 1994.
It is simply the relative frequency of Phe+Trp+Tyr.

Definition at line 98 of file ProtParam.py.

00098 
00099     def aromaticity(self):
00100         """Calculate the aromaticity according to Lobry, 1994.
00101 
00102         Calculates the aromaticity value of a protein according to Lobry, 1994.
00103         It is simply the relative frequency of Phe+Trp+Tyr.
00104         """
00105         aromatic_aas = 'YWF'
00106         aa_percentages = self.get_amino_acids_percent()
00107         
00108         aromaticity = sum([aa_percentages[aa] for aa in aromatic_aas])
00109 
00110         return aromaticity

Here is the call graph for this function:

Count standard amino acids, returns a dict.
    
Counts the number times each amino acid is in the protein
sequence. Returns a dictionary {AminoAcid:Number}.

The return value is cached in self.amino_acids_content.
It is not recalculated upon subsequent calls.

Definition at line 44 of file ProtParam.py.

00044 
00045     def count_amino_acids(self):
00046         """Count standard amino acids, returns a dict.
00047             
00048         Counts the number times each amino acid is in the protein
00049         sequence. Returns a dictionary {AminoAcid:Number}.
00050         
00051         The return value is cached in self.amino_acids_content.
00052         It is not recalculated upon subsequent calls.
00053         """
00054         if self.amino_acids_content is None:
00055             prot_dic = dict([(k, 0) for k in IUPACData.protein_letters])
00056             for aa in prot_dic:
00057                 prot_dic[aa] = self.sequence.count(aa)
00058             
00059             self.amino_acids_content = prot_dic
00060             
00061         return self.amino_acids_content
    

Here is the caller graph for this function:

Calculate the flexibility according to Vihinen, 1994.

No argument to change window size because parameters are specific for a
window=9. The parameters used are optimized for determining the flexibility.

Definition at line 131 of file ProtParam.py.

00131 
00132     def flexibility(self):
00133         """Calculate the flexibility according to Vihinen, 1994.
00134         
00135         No argument to change window size because parameters are specific for a
00136         window=9. The parameters used are optimized for determining the flexibility.
00137         """
00138         flexibilities = ProtParamData.Flex
00139         window_size = 9
00140         weights = [0.25, 0.4375, 0.625, 0.8125, 1]
00141         scores = []
00142 
00143         for i in range(self.length - window_size):
00144             subsequence = self.sequence[i:i+window_size]
00145             score = 0.0
00146 
00147             for j in range(window_size // 2):
00148                 front = subsequence[j]
00149                 back = subsequence[window_size - j - 1]
00150                 score += (flexibilities[front] + flexibilities[back]) * weights[j]
00151 
00152             middle = subsequence[window_size // 2 + 1]
00153             score += flexibilities[middle]
00154             
00155             scores.append(score / 5.25)
00156 
00157         return scores

Calculate the amino acid content in percentages.

The same as count_amino_acids only returns the Number in percentage of
entire sequence. Returns a dictionary of {AminoAcid:percentage}.

The return value is cached in self.amino_acids_percent.

input is the dictionary self.amino_acids_content.
output is a dictionary with amino acids as keys.

Definition at line 62 of file ProtParam.py.

00062 
00063     def get_amino_acids_percent(self):
00064         """Calculate the amino acid content in percentages.
00065 
00066         The same as count_amino_acids only returns the Number in percentage of
00067         entire sequence. Returns a dictionary of {AminoAcid:percentage}.
00068         
00069         The return value is cached in self.amino_acids_percent.
00070         
00071         input is the dictionary self.amino_acids_content.
00072         output is a dictionary with amino acids as keys.
00073         """
00074         if self.amino_acids_percent is None:
00075             aa_counts = self.count_amino_acids()
00076                 
00077             percentages = {}
00078             for aa in aa_counts:
00079                 percentages[aa] = aa_counts[aa] / float(self.length)
00080                 
00081             self.amino_acids_percent = percentages
00082 
00083         return self.amino_acids_percent

Here is the call graph for this function:

Here is the caller graph for this function:

Calculate the gravy according to Kyte and Doolittle.

Definition at line 158 of file ProtParam.py.

00158 
00159     def gravy(self):
00160         """Calculate the gravy according to Kyte and Doolittle."""
00161         total_gravy = sum(ProtParamData.kd[aa] for aa in self.sequence)
00162             
00163         return total_gravy / self.length
00164 

Calculate the instability index according to Guruprasad et al 1990.

Implementation of the method of Guruprasad et al. 1990 to test a
protein for stability. Any value above 40 means the protein is unstable
(has a short half life). 

See: Guruprasad K., Reddy B.V.B., Pandit M.W.
Protein Engineering 4:155-161(1990).

Definition at line 111 of file ProtParam.py.

00111 
00112     def instability_index(self):
00113         """Calculate the instability index according to Guruprasad et al 1990.
00114 
00115         Implementation of the method of Guruprasad et al. 1990 to test a
00116         protein for stability. Any value above 40 means the protein is unstable
00117         (has a short half life). 
00118         
00119         See: Guruprasad K., Reddy B.V.B., Pandit M.W.
00120         Protein Engineering 4:155-161(1990).
00121         """
00122         index = ProtParamData.DIWV
00123         score = 0.0
00124         
00125         for i in range(self.length - 1):
00126             this, next = self.sequence[i:i+2]
00127             dipeptide_value = index[this][next]
00128             score += dipeptide_value
00129 
00130         return (10.0 / self.length) * score

Calculate the isoelectric point.

Uses the module IsoelectricPoint to calculate the pI of a protein.

Definition at line 248 of file ProtParam.py.

00248 
00249     def isoelectric_point(self):
00250         """Calculate the isoelectric point.
00251         
00252         Uses the module IsoelectricPoint to calculate the pI of a protein.
00253         """
00254         aa_content = self.count_amino_acids()
00255             
00256         ie_point = IsoelectricPoint.IsoelectricPoint(self.sequence, aa_content)
00257         return ie_point.pi()
        

Here is the call graph for this function:

Calculate MW from Protein sequence

Definition at line 84 of file ProtParam.py.

00084 
00085     def molecular_weight (self):
00086         """Calculate MW from Protein sequence"""
00087         # make local dictionary for speed
00088         aa_weights = {}
00089         for i in IUPACData.protein_weights:
00090             # remove a molecule of water from the amino acid weight
00091             aa_weights[i] = IUPACData.protein_weights[i] - 18.02
00092 
00093         total_weight = 18.02 # add just one water molecule for the whole sequence
00094         for aa in self.sequence:
00095             total_weight += aa_weights[aa]
00096 
00097         return total_weight

def Bio.SeqUtils.ProtParam.ProteinAnalysis.protein_scale (   self,
  param_dict,
  window,
  edge = 1.0 
)
Compute a profile by any amino acid scale.

An amino acid scale is defined by a numerical value assigned to each type of
amino acid. The most frequently used scales are the hydrophobicity or
hydrophilicity scales and the secondary structure conformational parameters
scales, but many other scales exist which are based on different chemical and
physical properties of the amino acids.  You can set several parameters that
control the computation  of a scale profile, such as the window size and the
window edge relative weight value.  

WindowSize: The window size is the length
of the interval to use for the profile computation. For a window size n, we
use the i-(n-1)/2 neighboring residues on each side to compute
the score for residue i. The score for residue i is the sum of the scaled values
for these amino acids, optionally weighted according to their position in the
window.  

Edge: The central amino acid of the window always has a weight of 1.
By default, the amino acids at the remaining window positions have the same
weight, but you can make the residue at the center of the window  have a
larger weight than the others by setting the edge value for the  residues at
the beginning and end of the interval to a value between 0 and 1. For
instance, for Edge=0.4 and a window size of 5 the weights will be: 0.4, 0.7,
1.0, 0.7, 0.4.  

The method returns a list of values which can be plotted to
view the change along a protein sequence.  Many scales exist. Just add your
favorites to the ProtParamData modules.

Similar to expasy's ProtScale: http://www.expasy.org/cgi-bin/protscale.pl

Definition at line 179 of file ProtParam.py.

00179 
00180     def protein_scale(self, param_dict, window, edge=1.0):
00181         """Compute a profile by any amino acid scale.
00182         
00183         An amino acid scale is defined by a numerical value assigned to each type of
00184         amino acid. The most frequently used scales are the hydrophobicity or
00185         hydrophilicity scales and the secondary structure conformational parameters
00186         scales, but many other scales exist which are based on different chemical and
00187         physical properties of the amino acids.  You can set several parameters that
00188         control the computation  of a scale profile, such as the window size and the
00189         window edge relative weight value.  
00190         
00191         WindowSize: The window size is the length
00192         of the interval to use for the profile computation. For a window size n, we
00193         use the i-(n-1)/2 neighboring residues on each side to compute
00194         the score for residue i. The score for residue i is the sum of the scaled values
00195         for these amino acids, optionally weighted according to their position in the
00196         window.  
00197         
00198         Edge: The central amino acid of the window always has a weight of 1.
00199         By default, the amino acids at the remaining window positions have the same
00200         weight, but you can make the residue at the center of the window  have a
00201         larger weight than the others by setting the edge value for the  residues at
00202         the beginning and end of the interval to a value between 0 and 1. For
00203         instance, for Edge=0.4 and a window size of 5 the weights will be: 0.4, 0.7,
00204         1.0, 0.7, 0.4.  
00205         
00206         The method returns a list of values which can be plotted to
00207         view the change along a protein sequence.  Many scales exist. Just add your
00208         favorites to the ProtParamData modules.
00209 
00210         Similar to expasy's ProtScale: http://www.expasy.org/cgi-bin/protscale.pl
00211         """
00212         # generate the weights
00213         #   _weight_list returns only one tail. If the list should be [0.4,0.7,1.0,0.7,0.4]
00214         #   what you actually get from _weights_list is [0.4,0.7]. The correct calculation is done
00215         #   in the loop.
00216         weights = self._weight_list(window, edge)
00217         scores = []
00218         
00219         # the score in each Window is divided by the sum of weights
00220         # (* 2 + 1) since the weight list is one sided:
00221         sum_of_weights = sum(weights) * 2 + 1
00222         
00223         for i in range(self.length - window + 1):
00224             subsequence = self.sequence[i:i+window]
00225             score = 0.0
00226             
00227             for j in range(window // 2):
00228                 # walk from the outside of the Window towards the middle.
00229                 # Iddo: try/except clauses added to avoid raising an exception on a non-standard amino acid
00230                 try:
00231                     front = param_dict[subsequence[j]]
00232                     back = param_dict[subsequence[window - j - 1]]
00233                     score += weights[j] * front + weights[j] * back
00234                 except KeyError:
00235                     sys.stderr.write('warning: %s or %s is not a standard amino acid.\n' %
00236                              (subsequence[j], subsequence[window - j - 1]))
00237 
00238             # Now add the middle value, which always has a weight of 1.
00239             middle = subsequence[window // 2]
00240             if middle in param_dict:
00241                 score += param_dict[middle]
00242             else:
00243                 sys.stderr.write('warning: %s  is not a standard amino acid.\n' % (middle))
00244         
00245             scores.append(score / sum_of_weights)
00246             
00247         return scores

Here is the call graph for this function:

Calculate fraction of helix, turn and sheet.

Returns a list of the fraction of amino acids which tend
to be in Helix, Turn or Sheet.

Amino acids in helix: V, I, Y, F, W, L.
Amino acids in Turn: N, P, G, S.
Amino acids in sheet: E, M, A, L.

Returns a tuple of three integers (Helix, Turn, Sheet).

Definition at line 258 of file ProtParam.py.

00258 
00259     def secondary_structure_fraction (self):
00260         """Calculate fraction of helix, turn and sheet.
00261         
00262         Returns a list of the fraction of amino acids which tend
00263         to be in Helix, Turn or Sheet.
00264         
00265         Amino acids in helix: V, I, Y, F, W, L.
00266         Amino acids in Turn: N, P, G, S.
00267         Amino acids in sheet: E, M, A, L.
00268         
00269         Returns a tuple of three integers (Helix, Turn, Sheet).
00270         """
00271         aa_percentages = self.get_amino_acids_percent()
00272             
00273         helix = sum([aa_percentages[r] for r in 'VIYFWL'])
00274         turn  = sum([aa_percentages[r] for r in 'NPGS'])
00275         sheet = sum([aa_percentages[r] for r in 'EMAL'])
00276 
00277         return helix, turn, sheet
00278 

Here is the call graph for this function:


Member Data Documentation

Definition at line 40 of file ProtParam.py.

Definition at line 41 of file ProtParam.py.

Definition at line 42 of file ProtParam.py.

Definition at line 37 of file ProtParam.py.


The documentation for this class was generated from the following file: