Back to index

python-biopython  1.60
Public Member Functions | Private Member Functions | Private Attributes
Bio.NeuralNetwork.Gene.Schema.SchemaFactory Class Reference

List of all members.

Public Member Functions

def __init__
def from_motifs
def from_signatures

Private Member Functions

def _get_num_motifs
def _get_unique_schema
def _schema_from_motif

Private Attributes

 _ambiguity_symbol

Detailed Description

Generate Schema from inputs of Motifs or Signatures.

Definition at line 556 of file Schema.py.


Constructor & Destructor Documentation

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory.__init__ (   self,
  ambiguity_symbol = '*' 
)
Initialize the SchemaFactory

Arguments:

o ambiguity_symbol -- The symbol to use when specifying that
a position is arbitrary.

Definition at line 559 of file Schema.py.

00559 
00560     def __init__(self, ambiguity_symbol = '*'):
00561         """Initialize the SchemaFactory
00562 
00563         Arguments:
00564 
00565         o ambiguity_symbol -- The symbol to use when specifying that
00566         a position is arbitrary.
00567         """
00568         self._ambiguity_symbol = ambiguity_symbol

Here is the caller graph for this function:


Member Function Documentation

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory._get_num_motifs (   self,
  repository,
  motif_list 
) [private]
Return the number of motif counts for the list of motifs.

Definition at line 622 of file Schema.py.

00622 
00623     def _get_num_motifs(self, repository, motif_list):
00624         """Return the number of motif counts for the list of motifs.
00625         """
00626         motif_count = 0
00627         for motif in motif_list:
00628             motif_count += repository.count(motif)
00629 
00630         return motif_count

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory._get_unique_schema (   self,
  cur_schemas,
  motif_list,
  num_ambiguous 
) [private]
Retrieve a unique schema from a motif.

We don't want to end up with schema that match the same thing,
since this could lead to ambiguous results, and be messy. This
tries to create schema, and checks that they do not match any
currently existing schema.

Definition at line 631 of file Schema.py.

00631 
00632     def _get_unique_schema(self, cur_schemas, motif_list, num_ambiguous):
00633         """Retrieve a unique schema from a motif.
00634 
00635         We don't want to end up with schema that match the same thing,
00636         since this could lead to ambiguous results, and be messy. This
00637         tries to create schema, and checks that they do not match any
00638         currently existing schema.
00639         """
00640         # create a schema starting with a random motif
00641         # we'll keep doing this until we get a completely new schema that
00642         # doesn't match any old schema
00643         num_tries = 0
00644         
00645         while 1:
00646             # pick a motif to work from and make a schema from it
00647             cur_motif = random.choice(motif_list)
00648             
00649             num_tries += 1
00650                 
00651             new_schema, matching_motifs = \
00652                         self._schema_from_motif(cur_motif, motif_list,
00653                                                 num_ambiguous)
00654 
00655             has_match = 0
00656             for old_schema in cur_schemas:
00657                 if matches_schema(new_schema, old_schema,
00658                                   self._ambiguity_symbol):
00659                     has_match = 1
00660 
00661             # if the schema doesn't match any other schema we've got
00662             # a good one
00663             if not(has_match):
00664                 break
00665 
00666             # check for big loops in which we can't find a new schema
00667             assert num_tries < 150, \
00668                    "Could not generate schema in %s tries from %s with %s" \
00669                    % (num_tries, motif_list, cur_schemas)
00670 
00671         return new_schema, matching_motifs

Here is the call graph for this function:

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory._schema_from_motif (   self,
  motif,
  motif_list,
  num_ambiguous 
) [private]
Create a schema from a given starting motif.

Arguments:

o motif - A motif with the pattern we will start from.

o motif_list - The total motifs we have.to match to.

o num_ambiguous - The number of ambiguous characters that should
be present in the schema.

Returns:

o A string representing the newly generated schema.

o A list of all of the motifs in motif_list that match the schema.

Definition at line 672 of file Schema.py.

00672 
00673     def _schema_from_motif(self, motif, motif_list, num_ambiguous):
00674         """Create a schema from a given starting motif.
00675 
00676         Arguments:
00677 
00678         o motif - A motif with the pattern we will start from.
00679 
00680         o motif_list - The total motifs we have.to match to.
00681 
00682         o num_ambiguous - The number of ambiguous characters that should
00683         be present in the schema.
00684 
00685         Returns:
00686 
00687         o A string representing the newly generated schema.
00688 
00689         o A list of all of the motifs in motif_list that match the schema.
00690         """
00691         assert motif in motif_list, \
00692                "Expected starting motif present in remaining motifs."
00693 
00694         # convert random positions in the motif to ambiguous characters
00695         # convert the motif into a list of characters so we can manipulate it
00696         new_schema_list = list(motif)
00697         for add_ambiguous in range(num_ambiguous):
00698             # add an ambiguous position in a new place in the motif
00699             while 1:
00700                 ambig_pos = random.choice(range(len(new_schema_list)))
00701 
00702                 # only add a position if it isn't already ambiguous
00703                 # otherwise, we'll try again
00704                 if new_schema_list[ambig_pos] != self._ambiguity_symbol:
00705                     new_schema_list[ambig_pos] = self._ambiguity_symbol
00706                     break
00707 
00708         # convert the schema back to a string
00709         new_schema = ''.join(new_schema_list)
00710 
00711         # get the motifs that the schema matches
00712         matched_motifs = []
00713         for motif in motif_list:
00714             if matches_schema(motif, new_schema, self._ambiguity_symbol):
00715                 matched_motifs.append(motif)
00716 
00717         return new_schema, matched_motifs
            

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory.from_motifs (   self,
  motif_repository,
  motif_percent,
  num_ambiguous 
)
Generate schema from a list of motifs.

Arguments:

o motif_repository - A MotifRepository class that has all of the
motifs we want to convert to Schema.

o motif_percent - The percentage of motifs in the motif bank which
should be matches. We'll try to create schema that match this
percentage of motifs.

o num_ambiguous - The number of ambiguous characters to include
in each schema. The positions of these ambiguous characters will
be randomly selected.

Definition at line 569 of file Schema.py.

00569 
00570     def from_motifs(self, motif_repository, motif_percent, num_ambiguous):
00571         """Generate schema from a list of motifs.
00572 
00573         Arguments:
00574 
00575         o motif_repository - A MotifRepository class that has all of the
00576         motifs we want to convert to Schema.
00577 
00578         o motif_percent - The percentage of motifs in the motif bank which
00579         should be matches. We'll try to create schema that match this
00580         percentage of motifs.
00581 
00582         o num_ambiguous - The number of ambiguous characters to include
00583         in each schema. The positions of these ambiguous characters will
00584         be randomly selected.
00585         """
00586         # get all of the motifs we can deal with
00587         all_motifs = motif_repository.get_top_percentage(motif_percent)
00588 
00589         # start building up schemas
00590         schema_info = {}
00591         # continue until we've built schema matching the desired percentage
00592         # of motifs
00593         total_count = self._get_num_motifs(motif_repository, all_motifs)
00594         matched_count = 0
00595         assert total_count > 0, "Expected to have motifs to match"
00596         while (float(matched_count) / float(total_count)) < motif_percent:
00597             
00598             new_schema, matching_motifs = \
00599                         self._get_unique_schema(schema_info.keys(),
00600                                                 all_motifs, num_ambiguous)
00601 
00602             # get the number of counts for the new schema and clean up
00603             # the motif list
00604             schema_counts = 0
00605             for motif in matching_motifs:
00606                 # get the counts for the motif
00607                 schema_counts += motif_repository.count(motif)
00608 
00609                 # remove the motif from the motif list since it is already
00610                 # represented by this schema
00611                 all_motifs.remove(motif)
00612 
00613 
00614             # all the schema info
00615             schema_info[new_schema] = schema_counts
00616 
00617             matched_count += schema_counts
00618 
00619             # print "percentage:", float(matched_count) / float(total_count)
00620 
00621         return PatternRepository(schema_info)

Here is the call graph for this function:

def Bio.NeuralNetwork.Gene.Schema.SchemaFactory.from_signatures (   self,
  signature_repository,
  num_ambiguous 
)

Definition at line 718 of file Schema.py.

00718 
00719     def from_signatures(self, signature_repository, num_ambiguous):
00720         raise NotImplementedError("Still need to code this.")

Member Data Documentation

Definition at line 567 of file Schema.py.


The documentation for this class was generated from the following file: