Back to index

python-biopython  1.60
Classes | Functions
Bio.kNN Namespace Reference

Classes

class  kNN

Functions

def equal_weight
def train
def calculate
def classify

Function Documentation

def Bio.kNN.calculate (   knn,
  x,
  weight_fn = equal_weight,
  distance_fn = None 
)
calculate(knn, x[, weight_fn][, distance_fn]) -> weight dict

Calculate the probability for each class.  knn is a kNN object.  x
is the observed data.  weight_fn is an optional function that
takes x and a training example, and returns a weight.  distance_fn
is an optional function that takes two points and returns the
distance between them.  If distance_fn is None (the default), the
Euclidean distance is used.  Returns a dictionary of the class to
the weight given to the class.

Definition at line 71 of file kNN.py.

00071 
00072 def calculate(knn, x, weight_fn=equal_weight, distance_fn=None):
00073     """calculate(knn, x[, weight_fn][, distance_fn]) -> weight dict
00074 
00075     Calculate the probability for each class.  knn is a kNN object.  x
00076     is the observed data.  weight_fn is an optional function that
00077     takes x and a training example, and returns a weight.  distance_fn
00078     is an optional function that takes two points and returns the
00079     distance between them.  If distance_fn is None (the default), the
00080     Euclidean distance is used.  Returns a dictionary of the class to
00081     the weight given to the class.
00082     
00083     """
00084     x = numpy.asarray(x)
00085 
00086     order = []  # list of (distance, index)
00087     if distance_fn:
00088         for i in range(len(knn.xs)):
00089             dist = distance_fn(x, knn.xs[i])
00090             order.append((dist, i))
00091     else:
00092         # Default: Use a fast implementation of the Euclidean distance
00093         temp = numpy.zeros(len(x))
00094         # Predefining temp allows reuse of this array, making this
00095         # function about twice as fast.
00096         for i in range(len(knn.xs)):
00097             temp[:] = x - knn.xs[i]
00098             dist = numpy.sqrt(numpy.dot(temp,temp))
00099             order.append((dist, i))
00100     order.sort()
00101 
00102     # first 'k' are the ones I want.
00103     weights = {}  # class -> number of votes
00104     for k in knn.classes:
00105         weights[k] = 0.0
00106     for dist, i in order[:knn.k]:
00107         klass = knn.ys[i]
00108         weights[klass] = weights[klass] + weight_fn(x, knn.xs[i])
00109 
00110     return weights

Here is the caller graph for this function:

def Bio.kNN.classify (   knn,
  x,
  weight_fn = equal_weight,
  distance_fn = None 
)
classify(knn, x[, weight_fn][, distance_fn]) -> class

Classify an observation into a class.  If not specified, weight_fn will
give all neighbors equal weight.  distance_fn is an optional function
that takes two points and returns the distance between them.  If
distance_fn is None (the default), the Euclidean distance is used.

Definition at line 111 of file kNN.py.

00111 
00112 def classify(knn, x, weight_fn=equal_weight, distance_fn=None):
00113     """classify(knn, x[, weight_fn][, distance_fn]) -> class
00114 
00115     Classify an observation into a class.  If not specified, weight_fn will
00116     give all neighbors equal weight.  distance_fn is an optional function
00117     that takes two points and returns the distance between them.  If
00118     distance_fn is None (the default), the Euclidean distance is used.
00119     """
00120     weights = calculate(
00121         knn, x, weight_fn=weight_fn, distance_fn=distance_fn)
00122 
00123     most_class = None
00124     most_weight = None
00125     for klass, weight in weights.items():
00126         if most_class is None or weight > most_weight:
00127             most_class = klass
00128             most_weight = weight
00129     return most_class

Here is the call graph for this function:

def Bio.kNN.equal_weight (   x,
  y 
)
equal_weight(x, y) -> 1

Definition at line 49 of file kNN.py.

00049 
00050 def equal_weight(x, y):
00051     """equal_weight(x, y) -> 1"""
00052     # everything gets 1 vote
00053     return 1

def Bio.kNN.train (   xs,
  ys,
  k,
  typecode = None 
)
train(xs, ys, k) -> kNN

Train a k nearest neighbors classifier on a training set.  xs is a
list of observations and ys is a list of the class assignments.
Thus, xs and ys should contain the same number of elements.  k is
the number of neighbors that should be examined when doing the
classification.

Definition at line 54 of file kNN.py.

00054 
00055 def train(xs, ys, k, typecode=None):
00056     """train(xs, ys, k) -> kNN
00057     
00058     Train a k nearest neighbors classifier on a training set.  xs is a
00059     list of observations and ys is a list of the class assignments.
00060     Thus, xs and ys should contain the same number of elements.  k is
00061     the number of neighbors that should be examined when doing the
00062     classification.
00063     
00064     """
00065     knn = kNN()
00066     knn.classes = set(ys)
00067     knn.xs = numpy.asarray(xs, typecode)
00068     knn.ys = ys
00069     knn.k = k
00070     return knn