Back to index

python-biopython  1.60
Public Member Functions | Private Attributes
Bio.NeuralNetwork.Gene.Signature.SignatureCoder Class Reference

List of all members.

Public Member Functions

def __init__
def representation

Private Attributes

 _signatures
 _max_gap

Detailed Description

Convert a Sequence into its signature representatives.

This takes a sequence and a set of signatures, and converts the
sequence into a list of numbers representing the relative amounts
each signature is seen in the sequence. This allows a sequence to
serve as input into a neural network.

Definition at line 113 of file Signature.py.


Constructor & Destructor Documentation

def Bio.NeuralNetwork.Gene.Signature.SignatureCoder.__init__ (   self,
  signatures,
  max_gap 
)
Initialize with the signatures to look for.

Arguments:

o signatures - A complete list of signatures, in order, that
are to be searched for in the sequences. The signatures should
be represented as a tuple of (first part of the signature,
second_part of the signature) -- ('GATC', 'GATC').

o max_gap - The maximum gap we can have between the two
elements of the signature.

Definition at line 121 of file Signature.py.

00121 
00122     def __init__(self, signatures, max_gap):
00123         """Initialize with the signatures to look for.
00124 
00125         Arguments:
00126 
00127         o signatures - A complete list of signatures, in order, that
00128         are to be searched for in the sequences. The signatures should
00129         be represented as a tuple of (first part of the signature,
00130         second_part of the signature) -- ('GATC', 'GATC').
00131 
00132         o max_gap - The maximum gap we can have between the two
00133         elements of the signature.
00134         """
00135         self._signatures = signatures
00136         self._max_gap = max_gap
00137 
00138         # check to be sure the signatures are all the same size
00139         # only do this if we actually have signatures
00140         if len(self._signatures) > 0:
00141             first_sig_size = len(self._signatures[0][0])
00142             second_sig_size = len(self._signatures[0][1])
00143 
00144             assert first_sig_size == second_sig_size, \
00145                    "Ends of the signature do not match: %s" \
00146                    % self._signatures[0]
00147 
00148             for sig in self._signatures:
00149                 assert len(sig[0]) == first_sig_size, \
00150                        "Got first part of signature %s, expected size %s" % \
00151                        (sig[0], first_sig_size)
00152                 assert len(sig[1]) == second_sig_size, \
00153                        "Got second part of signature %s, expected size %s" % \
00154                        (sig[1], second_sig_size)

Here is the caller graph for this function:


Member Function Documentation

Convert a sequence into a representation of its signatures.

Arguments:

o sequence - A Seq object we are going to convert into a set of
signatures.

Returns:
A list of relative signature representations. Each item in the
list corresponds to the signature passed in to the initializer and
is the number of times that the signature was found, divided by the
total number of signatures found in the sequence.

Definition at line 155 of file Signature.py.

00155 
00156     def representation(self, sequence):
00157         """Convert a sequence into a representation of its signatures.
00158 
00159         Arguments:
00160 
00161         o sequence - A Seq object we are going to convert into a set of
00162         signatures.
00163 
00164         Returns:
00165         A list of relative signature representations. Each item in the
00166         list corresponds to the signature passed in to the initializer and
00167         is the number of times that the signature was found, divided by the
00168         total number of signatures found in the sequence.
00169         """
00170         # check to be sure we have signatures to deal with,
00171         # otherwise just return an empty list
00172         if len(self._signatures) == 0:
00173             return []
00174         
00175         # initialize a dictionary to hold the signature counts
00176         sequence_sigs = {}
00177         for sig in self._signatures:
00178             sequence_sigs[sig] = 0
00179 
00180         # get a list of all of the first parts of the signatures
00181         all_first_sigs = []
00182         for sig_start, sig_end in self._signatures:
00183             all_first_sigs.append(sig_start)
00184         
00185         # count all of the signatures we are looking for in the sequence
00186         sig_size = len(self._signatures[0][0])
00187         smallest_sig_size = sig_size * 2
00188 
00189         for start in range(len(sequence) - (smallest_sig_size - 1)):
00190             # if the first part matches any of the signatures we are looking
00191             # for, then expand out to look for the second part
00192             first_sig = sequence[start:start + sig_size].tostring()
00193             if first_sig in all_first_sigs:
00194                 for second in range(start + sig_size,
00195                                     (start + sig_size + 1) + self._max_gap):
00196                     second_sig = sequence[second:second + sig_size].tostring()
00197 
00198                     # if we find the motif, increase the counts for it
00199                     if (first_sig, second_sig) in sequence_sigs:
00200                         sequence_sigs[(first_sig, second_sig)] += 1
00201 
00202         # -- normalize the signature info to go between zero and one
00203         min_count = min(sequence_sigs.values())
00204         max_count = max(sequence_sigs.values())
00205 
00206         # as long as we have some signatures present, normalize them
00207         # otherwise we'll just return 0 for everything 
00208         if max_count > 0:
00209             for sig in sequence_sigs:
00210                 sequence_sigs[sig] = (float(sequence_sigs[sig] - min_count)
00211                                       / float(max_count))
00212 
00213         # return the relative signature info in the specified order
00214         sig_amounts = []
00215         for sig in self._signatures:
00216             sig_amounts.append(sequence_sigs[sig])
00217 
00218         return sig_amounts
00219         

Here is the call graph for this function:


Member Data Documentation

Definition at line 135 of file Signature.py.

Definition at line 134 of file Signature.py.


The documentation for this class was generated from the following file: