Back to index

python-biopython  1.60
Public Member Functions | Private Attributes
Bio.NeuralNetwork.Gene.Pattern.PatternRepository Class Reference

List of all members.

Public Member Functions

def __init__
def get_all
def get_random
def get_top_percentage
def get_top
def get_differing
def remove_polyA
def count

Private Attributes

 _pattern_dict
 _pattern_list

Detailed Description

This holds a list of specific patterns found in sequences.

This is designed to be a general holder for a set of patterns and
should be subclassed for specific implementations (ie. holding Motifs
or Signatures.

Definition at line 107 of file Pattern.py.


Constructor & Destructor Documentation

Initialize a repository with patterns,

Arguments:

o pattern_info - A representation of all of the patterns found in
a *Finder search. This should be a dictionary, where the keys
are patterns, and the values are the number of times a pattern is
found. 

The patterns are represented interally as a list of two
tuples, where the first element is the number of times a pattern
occurs, and the second is the pattern itself. This makes it easy
to sort the list and return the top N patterns.

Definition at line 114 of file Pattern.py.

00114 
00115     def __init__(self, pattern_info):
00116         """Initialize a repository with patterns,
00117 
00118         Arguments:
00119 
00120         o pattern_info - A representation of all of the patterns found in
00121         a *Finder search. This should be a dictionary, where the keys
00122         are patterns, and the values are the number of times a pattern is
00123         found. 
00124 
00125         The patterns are represented interally as a list of two
00126         tuples, where the first element is the number of times a pattern
00127         occurs, and the second is the pattern itself. This makes it easy
00128         to sort the list and return the top N patterns.
00129         """
00130         self._pattern_dict = pattern_info
00131 
00132         # create the list representation
00133         self._pattern_list = []
00134         for pattern_name in self._pattern_dict:
00135             self._pattern_list.append((self._pattern_dict[pattern_name],
00136                                        pattern_name))
00137 
00138         self._pattern_list.sort()
00139         self._pattern_list.reverse()

Here is the caller graph for this function:


Member Function Documentation

Return the number of times the specified pattern is found.

Definition at line 247 of file Pattern.py.

00247 
00248     def count(self, pattern):
00249         """Return the number of times the specified pattern is found.
00250         """
00251         try:
00252             return self._pattern_dict[pattern]
00253         except KeyError:
00254             return 0

Here is the caller graph for this function:

Retrieve all of the patterns in the repository.

Definition at line 140 of file Pattern.py.

00140 
00141     def get_all(self):
00142         """Retrieve all of the patterns in the repository.
00143         """
00144         patterns = []
00145         for pattern_info in self._pattern_list:
00146             patterns.append(pattern_info[1])
00147             
00148         return patterns

Here is the caller graph for this function:

def Bio.NeuralNetwork.Gene.Pattern.PatternRepository.get_differing (   self,
  top_num,
  bottom_num 
)
Retrieve patterns that are at the extreme ranges.

This returns both patterns at the top of the list (ie. the same as
returned by get_top) and at the bottom of the list. This
is especially useful for patterns that are the differences between
two sets of patterns.

Arguments:

o top_num - The number of patterns to take from the top of the list.

o bottom_num - The number of patterns to take from the bottom of
the list.

Definition at line 194 of file Pattern.py.

00194 
00195     def get_differing(self, top_num, bottom_num):
00196         """Retrieve patterns that are at the extreme ranges.
00197 
00198         This returns both patterns at the top of the list (ie. the same as
00199         returned by get_top) and at the bottom of the list. This
00200         is especially useful for patterns that are the differences between
00201         two sets of patterns.
00202 
00203         Arguments:
00204 
00205         o top_num - The number of patterns to take from the top of the list.
00206 
00207         o bottom_num - The number of patterns to take from the bottom of
00208         the list.
00209         """
00210         all_patterns = []
00211         # first get from the top of the list
00212         for pattern_info in self._pattern_list[:top_num]:
00213             all_patterns.append(pattern_info[1])
00214 
00215         # then from the bottom
00216         for pattern_info in self._pattern_list[-bottom_num:]:
00217             all_patterns.append(pattern_info[1])
00218 
00219         return all_patterns

Retrieve the specified number of patterns randomly.

Randomly selects patterns from the list and returns them.

Arguments:

o num_patterns - The total number of patterns to return.

Definition at line 149 of file Pattern.py.

00149 
00150     def get_random(self, num_patterns):
00151         """Retrieve the specified number of patterns randomly.
00152 
00153         Randomly selects patterns from the list and returns them.
00154 
00155         Arguments:
00156 
00157         o num_patterns - The total number of patterns to return.
00158         """
00159         all_patterns = []
00160 
00161         while len(all_patterns) < num_patterns:
00162             # pick a pattern, and only add it if it is not already present
00163             new_pattern_info = random.choice(self._pattern_list)
00164 
00165             if new_pattern_info[1] not in all_patterns:
00166                 all_patterns.append(new_pattern_info[1])
00167 
00168         return all_patterns

Return the specified number of most frequently occurring patterns

Arguments:

o num_patterns - The number of patterns to return.

Definition at line 181 of file Pattern.py.

00181 
00182     def get_top(self, num_patterns):
00183         """Return the specified number of most frequently occurring patterns
00184 
00185         Arguments:
00186 
00187         o num_patterns - The number of patterns to return.
00188         """
00189         all_patterns = []
00190         for pattern_info in self._pattern_list[:num_patterns]:
00191             all_patterns.append(pattern_info[1])
00192 
00193         return all_patterns
    
Return a percentage of the patterns.

This returns the top 'percent' percentage of the patterns in the
repository.

Definition at line 169 of file Pattern.py.

00169 
00170     def get_top_percentage(self, percent):
00171         """Return a percentage of the patterns.
00172 
00173         This returns the top 'percent' percentage of the patterns in the
00174         repository.
00175         """
00176         all_patterns = self.get_all()
00177 
00178         num_to_return = int(len(all_patterns) * percent)
00179 
00180         return all_patterns[:num_to_return]
        

Here is the call graph for this function:

def Bio.NeuralNetwork.Gene.Pattern.PatternRepository.remove_polyA (   self,
  at_percentage = .9 
)
Remove patterns which are likely due to polyA tails from the lists.

This is just a helper function to remove pattenrs which are likely
just due to polyA tails, and thus are not really great motifs.
This will also get rid of stuff like ATATAT, which might be a
useful motif, so use at your own discretion.

XXX Could we write a more general function, based on info content
or something like that?

Arguments:

o at_percentage - The percentage of A and T residues in a pattern
that qualifies it for being removed.

Definition at line 220 of file Pattern.py.

00220 
00221     def remove_polyA(self, at_percentage = .9):
00222         """Remove patterns which are likely due to polyA tails from the lists.
00223 
00224         This is just a helper function to remove pattenrs which are likely
00225         just due to polyA tails, and thus are not really great motifs.
00226         This will also get rid of stuff like ATATAT, which might be a
00227         useful motif, so use at your own discretion.
00228 
00229         XXX Could we write a more general function, based on info content
00230         or something like that?
00231         
00232         Arguments:
00233 
00234         o at_percentage - The percentage of A and T residues in a pattern
00235         that qualifies it for being removed.
00236         """
00237         remove_list = []
00238         # find all of the really AT rich patterns
00239         for pattern_info in self._pattern_list:
00240             pattern_at = float(pattern_info[1].count('A') + pattern_info[1].count('T')) / len(pattern_info[1])
00241             if pattern_at > at_percentage:
00242                 remove_list.append(pattern_info)
00243 
00244         # now remove them from the master list
00245         for to_remove in remove_list:
00246             self._pattern_list.remove(to_remove)

Here is the call graph for this function:


Member Data Documentation

Definition at line 129 of file Pattern.py.

Definition at line 132 of file Pattern.py.


The documentation for this class was generated from the following file: