Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | Private Attributes
Bio.HMM.MarkovModel.MarkovModelBuilder Class Reference

List of all members.

Public Member Functions

def __init__
def get_markov_model
def set_initial_probabilities
def set_equal_probabilities
def set_random_initial_probabilities
def set_random_transition_probabilities
def set_random_emission_probabilities
def set_random_probabilities
def allow_all_transitions
def allow_transition
def destroy_transition
def set_transition_score
def set_transition_pseudocount
def set_emission_score
def set_emission_pseudocount

Public Attributes

 initial_prob
 transition_prob
 emission_prob
 transition_pseudo
 emission_pseudo

Static Public Attributes

int DEFAULT_PSEUDO = 1

Private Member Functions

def _all_blank
def _all_pseudo

Private Attributes

 _state_alphabet
 _emission_alphabet

Detailed Description

Interface to build up a Markov Model.

This class is designed to try to separate the task of specifying the
Markov Model from the actual model itself. This is in hopes of making
the actual Markov Model classes smaller.

So, this builder class should be used to create Markov models instead
of trying to initiate a Markov Model directly.

Definition at line 73 of file MarkovModel.py.


Constructor & Destructor Documentation

def Bio.HMM.MarkovModel.MarkovModelBuilder.__init__ (   self,
  state_alphabet,
  emission_alphabet 
)
Initialize a builder to create Markov Models.

Arguments:

o state_alphabet -- An alphabet containing all of the letters that
can appear in the states
       
o emission_alphabet -- An alphabet containing all of the letters for
states that can be emitted by the HMM.

Definition at line 86 of file MarkovModel.py.

00086 
00087     def __init__(self, state_alphabet, emission_alphabet):
00088         """Initialize a builder to create Markov Models.
00089 
00090         Arguments:
00091 
00092         o state_alphabet -- An alphabet containing all of the letters that
00093         can appear in the states
00094        
00095         o emission_alphabet -- An alphabet containing all of the letters for
00096         states that can be emitted by the HMM.
00097         """
00098         self._state_alphabet = state_alphabet
00099         self._emission_alphabet = emission_alphabet
00100 
00101         # probabilities for the initial state, initialized by calling
00102         # set_initial_probabilities (required)
00103         self.initial_prob = {}
00104 
00105         # the probabilities for transitions and emissions
00106         # by default we have no transitions and all possible emissions
00107         self.transition_prob = {}
00108         self.emission_prob = self._all_blank(state_alphabet,
00109                                              emission_alphabet)
00110 
00111         # the default pseudocounts for transition and emission counting
00112         self.transition_pseudo = {}
00113         self.emission_pseudo = self._all_pseudo(state_alphabet,
00114                                                 emission_alphabet)

Here is the caller graph for this function:


Member Function Documentation

def Bio.HMM.MarkovModel.MarkovModelBuilder._all_blank (   self,
  first_alphabet,
  second_alphabet 
) [private]
Return a dictionary with all counts set to zero.

This uses the letters in the first and second alphabet to create
a dictionary with keys of two tuples organized as
(letter of first alphabet, letter of second alphabet). The values
are all set to 0.

Definition at line 115 of file MarkovModel.py.

00115 
00116     def _all_blank(self, first_alphabet, second_alphabet):
00117         """Return a dictionary with all counts set to zero.
00118 
00119         This uses the letters in the first and second alphabet to create
00120         a dictionary with keys of two tuples organized as
00121         (letter of first alphabet, letter of second alphabet). The values
00122         are all set to 0.
00123         """
00124         all_blank = {}
00125         for first_state in first_alphabet.letters:
00126             for second_state in second_alphabet.letters:
00127                 all_blank[(first_state, second_state)] = 0
00128 
00129         return all_blank

Here is the caller graph for this function:

def Bio.HMM.MarkovModel.MarkovModelBuilder._all_pseudo (   self,
  first_alphabet,
  second_alphabet 
) [private]
Return a dictionary with all counts set to a default value.

This takes the letters in first alphabet and second alphabet and
creates a dictionary with keys of two tuples organized as:
(letter of first alphabet, letter of second alphabet). The values
are all set to the value of the class attribute DEFAULT_PSEUDO.

Definition at line 130 of file MarkovModel.py.

00130 
00131     def _all_pseudo(self, first_alphabet, second_alphabet):
00132         """Return a dictionary with all counts set to a default value.
00133 
00134         This takes the letters in first alphabet and second alphabet and
00135         creates a dictionary with keys of two tuples organized as:
00136         (letter of first alphabet, letter of second alphabet). The values
00137         are all set to the value of the class attribute DEFAULT_PSEUDO.
00138         """
00139         all_counts = {}
00140         for first_state in first_alphabet.letters:
00141             for second_state in second_alphabet.letters:
00142                 all_counts[(first_state, second_state)] = self.DEFAULT_PSEUDO
00143 
00144         return all_counts
                

Here is the caller graph for this function:

A convenience function to create transitions between all states.

By default all transitions within the alphabet are disallowed; this
is a way to change this to allow all possible transitions.

Definition at line 301 of file MarkovModel.py.

00301 
00302     def allow_all_transitions(self):
00303         """A convenience function to create transitions between all states.
00304 
00305         By default all transitions within the alphabet are disallowed; this
00306         is a way to change this to allow all possible transitions.
00307         """
00308         # first get all probabilities and pseudo counts set
00309         # to the default values
00310         all_probs = self._all_blank(self._state_alphabet,
00311                                     self._state_alphabet)
00312 
00313         all_pseudo = self._all_pseudo(self._state_alphabet,
00314                                       self._state_alphabet)
00315 
00316         # now set any probabilities and pseudo counts that
00317         # were previously set
00318         for set_key in self.transition_prob:
00319             all_probs[set_key] = self.transition_prob[set_key]
00320 
00321         for set_key in self.transition_pseudo:
00322             all_pseudo[set_key] = self.transition_pseudo[set_key]
00323 
00324         # finally reinitialize the transition probs and pseudo counts
00325         self.transition_prob = all_probs
00326         self.transition_pseudo = all_pseudo

Here is the call graph for this function:

def Bio.HMM.MarkovModel.MarkovModelBuilder.allow_transition (   self,
  from_state,
  to_state,
  probability = None,
  pseudocount = None 
)
Set a transition as being possible between the two states.

probability and pseudocount are optional arguments
specifying the probabilities and pseudo counts for the transition.
If these are not supplied, then the values are set to the
default values.

Raises:
KeyError -- if the two states already have an allowed transition.

Definition at line 328 of file MarkovModel.py.

00328 
00329                          pseudocount = None):
00330         """Set a transition as being possible between the two states.
00331 
00332         probability and pseudocount are optional arguments
00333         specifying the probabilities and pseudo counts for the transition.
00334         If these are not supplied, then the values are set to the
00335         default values.
00336 
00337         Raises:
00338         KeyError -- if the two states already have an allowed transition.
00339         """
00340         # check the sanity of adding these states
00341         for state in [from_state, to_state]:
00342             assert state in self._state_alphabet.letters, \
00343                    "State %s was not found in the sequence alphabet" % state
00344 
00345         # ensure that the states are not already set
00346         if ((from_state, to_state) not in self.transition_prob and 
00347             (from_state, to_state) not in self.transition_pseudo):
00348             # set the initial probability
00349             if probability is None:
00350                 probability = 0
00351             self.transition_prob[(from_state, to_state)] = probability
00352 
00353             # set the initial pseudocounts
00354             if pseudocount is None:
00355                 pseudcount = self.DEFAULT_PSEUDO
00356             self.transition_pseudo[(from_state, to_state)] = pseudocount 
00357         else:
00358             raise KeyError("Transition from %s to %s is already allowed."
00359                            % (from_state, to_state))

Here is the caller graph for this function:

def Bio.HMM.MarkovModel.MarkovModelBuilder.destroy_transition (   self,
  from_state,
  to_state 
)
Restrict transitions between the two states.

Raises:
KeyError if the transition is not currently allowed.

Definition at line 360 of file MarkovModel.py.

00360 
00361     def destroy_transition(self, from_state, to_state):
00362         """Restrict transitions between the two states.
00363 
00364         Raises:
00365         KeyError if the transition is not currently allowed.
00366         """
00367         try:
00368             del self.transition_prob[(from_state, to_state)]
00369             del self.transition_pseudo[(from_state, to_state)]
00370         except KeyError:
00371             raise KeyError("Transition from %s to %s is already disallowed."
00372                            % (from_state, to_state))

Return the markov model corresponding with the current parameters.

Each markov model returned by a call to this function is unique
(ie. they don't influence each other).

Definition at line 145 of file MarkovModel.py.

00145 
00146     def get_markov_model(self):
00147         """Return the markov model corresponding with the current parameters.
00148 
00149         Each markov model returned by a call to this function is unique
00150         (ie. they don't influence each other).
00151         """
00152 
00153         # user must set initial probabilities
00154         if not self.initial_prob:
00155             raise Exception("set_initial_probabilities must be called to " +
00156                             "fully initialize the Markov model")
00157 
00158         initial_prob = copy.deepcopy(self.initial_prob)
00159         transition_prob = copy.deepcopy(self.transition_prob)
00160         emission_prob = copy.deepcopy(self.emission_prob)
00161         transition_pseudo = copy.deepcopy(self.transition_pseudo)
00162         emission_pseudo = copy.deepcopy(self.emission_pseudo)
00163         
00164         return HiddenMarkovModel(initial_prob, transition_prob, emission_prob,
00165                                  transition_pseudo, emission_pseudo)

def Bio.HMM.MarkovModel.MarkovModelBuilder.set_emission_pseudocount (   self,
  seq_state,
  emission_state,
  count 
)
Set the default pseudocount for an emission.

To avoid computational problems, it is helpful to be able to
set a 'default' pseudocount to start with for estimating
transition and emission probabilities (see p62 in Durbin et al
for more discussion on this. By default, all emissions have
a pseudocount of 1.

Raises:
KeyError if the emission from the given state is not allowed.

Definition at line 417 of file MarkovModel.py.

00417 
00418     def set_emission_pseudocount(self, seq_state, emission_state, count):
00419         """Set the default pseudocount for an emission.
00420 
00421         To avoid computational problems, it is helpful to be able to
00422         set a 'default' pseudocount to start with for estimating
00423         transition and emission probabilities (see p62 in Durbin et al
00424         for more discussion on this. By default, all emissions have
00425         a pseudocount of 1.
00426 
00427         Raises:
00428         KeyError if the emission from the given state is not allowed.
00429         """
00430         if (seq_state, emission_state) in self.emission_pseudo:
00431             self.emission_pseudo[(seq_state, emission_state)] = count
00432         else:
00433             raise KeyError("Emission of %s from %s is not allowed."
00434                            % (emission_state, seq_state))

def Bio.HMM.MarkovModel.MarkovModelBuilder.set_emission_score (   self,
  seq_state,
  emission_state,
  probability 
)
Set the probability of a emission from a particular state.

Raises:
KeyError if the emission from the given state is not allowed.

Definition at line 405 of file MarkovModel.py.

00405 
00406     def set_emission_score(self, seq_state, emission_state, probability):
00407         """Set the probability of a emission from a particular state.
00408 
00409         Raises:
00410         KeyError if the emission from the given state is not allowed.
00411         """
00412         if (seq_state, emission_state) in self.emission_prob:
00413             self.emission_prob[(seq_state, emission_state)] = probability
00414         else:
00415             raise KeyError("Emission of %s from %s is not allowed."
00416                            % (emission_state, seq_state))

Reset all probabilities to be an average value.

Resets the values of all initial probabilities and all allowed
transitions and all allowed emissions to be equal to 1 divided by the
number of possible elements.

This is useful if you just want to initialize a Markov Model to
starting values (ie. if you have no prior notions of what the
probabilities should be -- or if you are just feeling too lazy
to calculate them :-).

Warning 1 -- this will reset all currently set probabilities.

Warning 2 -- This just sets all probabilities for transitions and
emissions to total up to 1, so it doesn't ensure that the sum of
each set of transitions adds up to 1.

Definition at line 207 of file MarkovModel.py.

00207 
00208     def set_equal_probabilities(self):
00209         """Reset all probabilities to be an average value.
00210 
00211         Resets the values of all initial probabilities and all allowed
00212         transitions and all allowed emissions to be equal to 1 divided by the
00213         number of possible elements.
00214 
00215         This is useful if you just want to initialize a Markov Model to
00216         starting values (ie. if you have no prior notions of what the
00217         probabilities should be -- or if you are just feeling too lazy
00218         to calculate them :-).
00219 
00220         Warning 1 -- this will reset all currently set probabilities.
00221 
00222         Warning 2 -- This just sets all probabilities for transitions and
00223         emissions to total up to 1, so it doesn't ensure that the sum of
00224         each set of transitions adds up to 1.
00225         """
00226 
00227         # set initial state probabilities
00228         new_initial_prob = float(1) / float(len(self.transition_prob))
00229         for state in self._state_alphabet.letters:
00230             self.initial_prob[state] = new_initial_prob
00231 
00232         # set the transitions
00233         new_trans_prob = float(1) / float(len(self.transition_prob))
00234         for key in self.transition_prob:
00235             self.transition_prob[key] = new_trans_prob
00236 
00237         # set the emissions
00238         new_emission_prob = float(1) / float(len(self.emission_prob))
00239         for key in self.emission_prob:
00240             self.emission_prob[key] = new_emission_prob
00241 

Set initial state probabilities.

initial_prob is a dictionary mapping states to probabilities.
Suppose, for example, that the state alphabet is ['A', 'B']. Call
set_initial_prob({'A': 1}) to guarantee that the initial
state will be 'A'. Call set_initial_prob({'A': 0.5, 'B': 0.5})
to make each initial state equally probable.

This method must now be called in order to use the Markov model
because the calculation of initial probabilities has changed
incompatibly; the previous calculation was incorrect.

If initial probabilities are set for all states, then they should add up
to 1. Otherwise the sum should be <= 1. The residual probability is
divided up evenly between all the states for which the initial
probability has not been set. For example, calling
set_initial_prob({}) results in P('A') = 0.5 and P('B') = 0.5,
for the above example.

Definition at line 166 of file MarkovModel.py.

00166 
00167     def set_initial_probabilities(self, initial_prob):
00168         """Set initial state probabilities.
00169 
00170         initial_prob is a dictionary mapping states to probabilities.
00171         Suppose, for example, that the state alphabet is ['A', 'B']. Call
00172         set_initial_prob({'A': 1}) to guarantee that the initial
00173         state will be 'A'. Call set_initial_prob({'A': 0.5, 'B': 0.5})
00174         to make each initial state equally probable.
00175 
00176         This method must now be called in order to use the Markov model
00177         because the calculation of initial probabilities has changed
00178         incompatibly; the previous calculation was incorrect.
00179 
00180         If initial probabilities are set for all states, then they should add up
00181         to 1. Otherwise the sum should be <= 1. The residual probability is
00182         divided up evenly between all the states for which the initial
00183         probability has not been set. For example, calling
00184         set_initial_prob({}) results in P('A') = 0.5 and P('B') = 0.5,
00185         for the above example.
00186         """
00187         self.initial_prob = copy.copy(initial_prob)
00188 
00189         # ensure that all referenced states are valid
00190         for state in initial_prob.iterkeys():
00191             assert state in self._state_alphabet.letters, \
00192                    "State %s was not found in the sequence alphabet" % state
00193 
00194         # distribute the residual probability, if any
00195         num_states_not_set =\
00196             len(self._state_alphabet.letters) - len(self.initial_prob)
00197         if num_states_not_set < 0:
00198             raise Exception("Initial probabilities can't exceed # of states")
00199         prob_sum = sum(self.initial_prob.values())
00200         if prob_sum > 1.0:
00201             raise Exception("Total initial probability cannot exceed 1.0")
00202         if num_states_not_set > 0:
00203             prob = (1.0 - prob_sum) / num_states_not_set
00204             for state in self._state_alphabet.letters:
00205                 if not state in self.initial_prob:
00206                     self.initial_prob[state] = prob

Set all allowed emission probabilities to a randomly generated
distribution.  Returns the dictionary containing the emission
probabilities.

Definition at line 270 of file MarkovModel.py.

00270 
00271     def set_random_emission_probabilities(self):
00272         """Set all allowed emission probabilities to a randomly generated
00273         distribution.  Returns the dictionary containing the emission
00274         probabilities.
00275         """
00276 
00277         if not self.emission_prob:
00278             raise Exception("No emissions have been allowed yet. " +
00279                             "Allow some or all emissions.")
00280 
00281         emissions = _calculate_emissions(self.emission_prob)
00282         for state in emissions.iterkeys():
00283             freqs = _gen_random_array(len(emissions[state]))
00284             for symbol in emissions[state]:
00285                 self.emission_prob[(state, symbol)] = freqs.pop()
00286 
00287         return self.emission_prob
00288 
        

Here is the call graph for this function:

Here is the caller graph for this function:

Set all initial state probabilities to a randomly generated distribution.
Returns the dictionary containing the initial probabilities.

Definition at line 242 of file MarkovModel.py.

00242 
00243     def set_random_initial_probabilities(self):
00244         """Set all initial state probabilities to a randomly generated distribution.
00245         Returns the dictionary containing the initial probabilities.
00246         """
00247         initial_freqs = _gen_random_array(len(self._state_alphabet.letters))
00248         for state in self._state_alphabet.letters:
00249             self.initial_prob[state] = initial_freqs.pop()
00250 
00251         return self.initial_prob

Here is the call graph for this function:

Here is the caller graph for this function:

Set all probabilities to randomly generated numbers.

Resets probabilities of all initial states, transitions, and
emissions to random values.

Definition at line 289 of file MarkovModel.py.

00289 
00290     def set_random_probabilities(self):
00291         """Set all probabilities to randomly generated numbers.
00292 
00293         Resets probabilities of all initial states, transitions, and
00294         emissions to random values.
00295         """
00296         self.set_random_initial_probabilities()
00297         self.set_random_transition_probabilities()
00298         self.set_random_emission_probabilities()

Here is the call graph for this function:

Set all allowed transition probabilities to a randomly generated distribution.
Returns the dictionary containing the transition probabilities.

Definition at line 252 of file MarkovModel.py.

00252 
00253     def set_random_transition_probabilities(self):
00254         """Set all allowed transition probabilities to a randomly generated distribution.
00255         Returns the dictionary containing the transition probabilities.
00256         """
00257 
00258         if not self.transition_prob:
00259             raise Exception("No transitions have been allowed yet. " +
00260                             "Allow some or all transitions by calling " + 
00261                             "allow_transition or allow_all_transitions first.")
00262 
00263         transitions_from = _calculate_from_transitions(self.transition_prob)
00264         for from_state in transitions_from.keys():
00265             freqs = _gen_random_array(len(transitions_from[from_state]))
00266             for to_state in transitions_from[from_state]:
00267                 self.transition_prob[(from_state, to_state)] = freqs.pop()
00268 
00269         return self.transition_prob

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.HMM.MarkovModel.MarkovModelBuilder.set_transition_pseudocount (   self,
  from_state,
  to_state,
  count 
)
Set the default pseudocount for a transition.

To avoid computational problems, it is helpful to be able to
set a 'default' pseudocount to start with for estimating
transition and emission probabilities (see p62 in Durbin et al
for more discussion on this. By default, all transitions have
a pseudocount of 1.

Raises:
KeyError if the transition is not allowed.

Definition at line 385 of file MarkovModel.py.

00385 
00386     def set_transition_pseudocount(self, from_state, to_state, count):
00387         """Set the default pseudocount for a transition.
00388 
00389         To avoid computational problems, it is helpful to be able to
00390         set a 'default' pseudocount to start with for estimating
00391         transition and emission probabilities (see p62 in Durbin et al
00392         for more discussion on this. By default, all transitions have
00393         a pseudocount of 1.
00394 
00395         Raises:
00396         KeyError if the transition is not allowed.
00397         """
00398         if (from_state, to_state) in self.transition_pseudo:
00399             self.transition_pseudo[(from_state, to_state)] = count
00400         else:
00401             raise KeyError("Transition from %s to %s is not allowed."
00402                            % (from_state, to_state))

def Bio.HMM.MarkovModel.MarkovModelBuilder.set_transition_score (   self,
  from_state,
  to_state,
  probability 
)
Set the probability of a transition between two states.

Raises:
KeyError if the transition is not allowed.

Definition at line 373 of file MarkovModel.py.

00373 
00374     def set_transition_score(self, from_state, to_state, probability):
00375         """Set the probability of a transition between two states.
00376 
00377         Raises:
00378         KeyError if the transition is not allowed.
00379         """
00380         if (from_state, to_state) in self.transition_prob:
00381             self.transition_prob[(from_state, to_state)] = probability
00382         else:
00383             raise KeyError("Transition from %s to %s is not allowed."
00384                            % (from_state, to_state))


Member Data Documentation

Definition at line 98 of file MarkovModel.py.

Definition at line 97 of file MarkovModel.py.

Definition at line 84 of file MarkovModel.py.

Definition at line 107 of file MarkovModel.py.

Definition at line 112 of file MarkovModel.py.

Definition at line 102 of file MarkovModel.py.

Definition at line 106 of file MarkovModel.py.

Definition at line 111 of file MarkovModel.py.


The documentation for this class was generated from the following file: