Back to index

python-biopython  1.60
Public Member Functions | Private Member Functions
Bio.HMM.Trainer.KnownStateTrainer Class Reference
Inheritance diagram for Bio.HMM.Trainer.KnownStateTrainer:
Inheritance graph
[legend]
Collaboration diagram for Bio.HMM.Trainer.KnownStateTrainer:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def train
def log_likelihood
def estimate_params
def ml_estimator

Private Member Functions

def _count_emissions
def _count_transitions

Detailed Description

Estimate probabilities with known state sequences.

This should be used for direct estimation of emission and transition
probabilities when both the state path and emission sequence are
known for the training examples.

Definition at line 341 of file Trainer.py.


Constructor & Destructor Documentation

def Bio.HMM.Trainer.KnownStateTrainer.__init__ (   self,
  markov_model 
)

Reimplemented from Bio.HMM.Trainer.AbstractTrainer.

Definition at line 348 of file Trainer.py.

00348 
00349     def __init__(self, markov_model):
00350         AbstractTrainer.__init__(self, markov_model)

Here is the caller graph for this function:


Member Function Documentation

def Bio.HMM.Trainer.KnownStateTrainer._count_emissions (   self,
  training_seq,
  emission_counts 
) [private]
Add emissions from the training sequence to the current counts.

Arguments:

o training_seq -- A TrainingSequence with states and emissions
to get the counts from

o emission_counts -- The current emission counts to add to.

Definition at line 379 of file Trainer.py.

00379 
00380     def _count_emissions(self, training_seq, emission_counts):
00381         """Add emissions from the training sequence to the current counts.
00382 
00383         Arguments:
00384 
00385         o training_seq -- A TrainingSequence with states and emissions
00386         to get the counts from
00387 
00388         o emission_counts -- The current emission counts to add to.
00389         """
00390         for index in range(len(training_seq.emissions)):
00391             cur_state = training_seq.states[index]
00392             cur_emission = training_seq.emissions[index]
00393 
00394             try:
00395                 emission_counts[(cur_state, cur_emission)] += 1
00396             except KeyError:
00397                 raise KeyError("Unexpected emission (%s, %s)"
00398                                % (cur_state, cur_emission))
00399         return emission_counts

Here is the caller graph for this function:

def Bio.HMM.Trainer.KnownStateTrainer._count_transitions (   self,
  state_seq,
  transition_counts 
) [private]
Add transitions from the training sequence to the current counts.

Arguments:

o state_seq -- A Seq object with the states of the current training
sequence.

o transition_counts -- The current transition counts to add to.

Definition at line 400 of file Trainer.py.

00400 
00401     def _count_transitions(self, state_seq, transition_counts):
00402         """Add transitions from the training sequence to the current counts.
00403 
00404         Arguments:
00405 
00406         o state_seq -- A Seq object with the states of the current training
00407         sequence.
00408 
00409         o transition_counts -- The current transition counts to add to.
00410         """
00411         for cur_pos in range(len(state_seq) - 1):
00412             cur_state = state_seq[cur_pos]
00413             next_state = state_seq[cur_pos + 1]
00414 
00415             try:
00416                 transition_counts[(cur_state, next_state)] += 1
00417             except KeyError:
00418                 raise KeyError("Unexpected transition (%s, %s)" %
00419                                (cur_state, next_state))
00420 
00421         return transition_counts
00422 
00423             
00424         
00425             
00426                 
00427             
00428 
00429             
00430     

Here is the caller graph for this function:

def Bio.HMM.Trainer.AbstractTrainer.estimate_params (   self,
  transition_counts,
  emission_counts 
) [inherited]
Get a maximum likelihood estimation of transition and emmission.

Arguments:

o transition_counts -- A dictionary with the total number of counts
of transitions between two states.

o emissions_counts -- A dictionary with the total number of counts
of emmissions of a particular emission letter by a state letter.

This then returns the maximum likelihood estimators for the
transitions and emissions, estimated by formulas 3.18 in
Durbin et al:

a_{kl} = A_{kl} / sum(A_{kl'})
e_{k}(b) = E_{k}(b) / sum(E_{k}(b'))

Returns:
Transition and emission dictionaries containing the maximum
likelihood estimators.

Definition at line 63 of file Trainer.py.

00063 
00064     def estimate_params(self, transition_counts, emission_counts):
00065         """Get a maximum likelihood estimation of transition and emmission.
00066 
00067         Arguments:
00068         
00069         o transition_counts -- A dictionary with the total number of counts
00070         of transitions between two states.
00071 
00072         o emissions_counts -- A dictionary with the total number of counts
00073         of emmissions of a particular emission letter by a state letter.
00074 
00075         This then returns the maximum likelihood estimators for the
00076         transitions and emissions, estimated by formulas 3.18 in
00077         Durbin et al:
00078 
00079         a_{kl} = A_{kl} / sum(A_{kl'})
00080         e_{k}(b) = E_{k}(b) / sum(E_{k}(b'))
00081 
00082         Returns:
00083         Transition and emission dictionaries containing the maximum
00084         likelihood estimators.
00085         """
00086         # now calculate the information
00087         ml_transitions = self.ml_estimator(transition_counts)
00088         ml_emissions = self.ml_estimator(emission_counts)
00089 
00090         return ml_transitions, ml_emissions

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.HMM.Trainer.AbstractTrainer.log_likelihood (   self,
  probabilities 
) [inherited]
Calculate the log likelihood of the training seqs.

Arguments:

o probabilities -- A list of the probabilities of each training
sequence under the current paramters, calculated using the forward
algorithm.

Definition at line 48 of file Trainer.py.

00048 
00049     def log_likelihood(self, probabilities):
00050         """Calculate the log likelihood of the training seqs.
00051 
00052         Arguments:
00053 
00054         o probabilities -- A list of the probabilities of each training
00055         sequence under the current paramters, calculated using the forward
00056         algorithm.
00057         """
00058         total_likelihood = 0
00059         for probability in probabilities:
00060             total_likelihood += math.log(probability)
00061 
00062         return total_likelihood
                 

Here is the caller graph for this function:

def Bio.HMM.Trainer.AbstractTrainer.ml_estimator (   self,
  counts 
) [inherited]
Calculate the maximum likelihood estimator.

This can calculate maximum likelihoods for both transitions
and emissions.

Arguments:

o counts -- A dictionary of the counts for each item.

See estimate_params for a description of the formula used for
calculation.

Definition at line 91 of file Trainer.py.

00091 
00092     def ml_estimator(self, counts):
00093         """Calculate the maximum likelihood estimator.
00094 
00095         This can calculate maximum likelihoods for both transitions
00096         and emissions.
00097 
00098         Arguments:
00099 
00100         o counts -- A dictionary of the counts for each item.
00101 
00102         See estimate_params for a description of the formula used for
00103         calculation.
00104         """
00105         # get an ordered list of all items
00106         all_ordered = counts.keys()
00107         all_ordered.sort()
00108         
00109         ml_estimation = {}
00110 
00111         # the total counts for the current letter we are on
00112         cur_letter = None
00113         cur_letter_counts = 0
00114         
00115         for cur_item in all_ordered:
00116             # if we are on a new letter (ie. the first letter of the tuple)
00117             if cur_item[0] != cur_letter:
00118                 # set the new letter we are working with
00119                 cur_letter = cur_item[0]
00120 
00121                 # count up the total counts for this letter
00122                 cur_letter_counts = counts[cur_item]
00123                 
00124                 # add counts for all other items with the same first letter
00125                 cur_position = all_ordered.index(cur_item) + 1
00126 
00127                 # keep adding while we have the same first letter or until
00128                 # we get to the end of the ordered list
00129                 while (cur_position < len(all_ordered) and
00130                        all_ordered[cur_position][0] == cur_item[0]):
00131                     cur_letter_counts += counts[all_ordered[cur_position]]
00132                     cur_position += 1
00133             # otherwise we've already got the total counts for this letter
00134             else:
00135                 pass
00136 
00137             # now calculate the ml and add it to the estimation
00138             cur_ml = float(counts[cur_item]) / float(cur_letter_counts)
00139             ml_estimation[cur_item] = cur_ml
00140 
00141         return ml_estimation
            

Here is the caller graph for this function:

def Bio.HMM.Trainer.KnownStateTrainer.train (   self,
  training_seqs 
)
Estimate the Markov Model parameters with known state paths.

This trainer requires that both the state and the emissions are
known for all of the training sequences in the list of
TrainingSequence objects.
This training will then count all of the transitions and emissions,
and use this to estimate the parameters of the model.

Definition at line 351 of file Trainer.py.

00351 
00352     def train(self, training_seqs):
00353         """Estimate the Markov Model parameters with known state paths.
00354 
00355         This trainer requires that both the state and the emissions are
00356         known for all of the training sequences in the list of
00357         TrainingSequence objects.
00358         This training will then count all of the transitions and emissions,
00359         and use this to estimate the parameters of the model.
00360         """
00361         # count up all of the transitions and emissions
00362         transition_counts = self._markov_model.get_blank_transitions()
00363         emission_counts = self._markov_model.get_blank_emissions()
00364 
00365         for training_seq in training_seqs:
00366             emission_counts = self._count_emissions(training_seq,
00367                                                     emission_counts)
00368             transition_counts = self._count_transitions(training_seq.states,
00369                                                         transition_counts)
00370 
00371         # update the markov model from the counts
00372         ml_transitions, ml_emissions = \
00373                         self.estimate_params(transition_counts,
00374                                              emission_counts)
00375         self._markov_model.transition_prob = ml_transitions
00376         self._markov_model.emission_prob = ml_emissions
00377 
00378         return self._markov_model

Here is the call graph for this function:


The documentation for this class was generated from the following file: