Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Private Member Functions
Bio.Cluster.Record Class Reference

List of all members.

Public Member Functions

def __init__
def treecluster
def kcluster
def somcluster
def clustercentroids
def clusterdistance
def distancematrix
def save

Public Attributes

 data
 mask
 geneid
 genename
 gweight
 gorder
 expid
 eweight
 eorder
 uniqid

Private Member Functions

def _savekmeans
def _savedata

Detailed Description

Store gene expression data.

A Record stores the gene expression data and related information contained
in a data file following the file format defined for Michael Eisen's
Cluster/TreeView program. A Record has the following members:

data:     a matrix containing the gene expression data
mask:     a matrix containing only 1's and 0's, denoting which values
      are present (1) or missing (0). If all elements of mask are
      one (no missing data), then mask is set to None.
geneid:   a list containing a unique identifier for each gene
      (e.g., ORF name)
genename: a list containing an additional description for each gene
      (e.g., gene name)
gweight:  the weight to be used for each gene when calculating the
      distance
gorder:   an array of real numbers indicating the preferred order of the
      genes in the output file
expid:    a list containing a unique identifier for each experimental
      condition
eweight:  the weight to be used for each experimental condition when
      calculating the distance
eorder:   an array of real numbers indication the preferred order in the
      output file of the experimental conditions
uniqid:   the string that was used instead of UNIQID in the input file.

Definition at line 111 of file __init__.py.


Constructor & Destructor Documentation

def Bio.Cluster.Record.__init__ (   self,
  handle = None 
)
Read gene expression data from the file handle and return a Record.

The file should be in the format defined for Michael Eisen's
Cluster/TreeView program.

Definition at line 140 of file __init__.py.

00140 
00141     def __init__(self, handle=None):
00142         """Read gene expression data from the file handle and return a Record.
00143 
00144 The file should be in the format defined for Michael Eisen's
00145 Cluster/TreeView program.
00146 
00147 """
00148         self.data = None
00149         self.mask = None
00150         self.geneid = None
00151         self.genename = None
00152         self.gweight = None
00153         self.gorder = None
00154         self.expid = None
00155         self.eweight = None
00156         self.eorder = None
00157         self.uniqid = None
00158         if not handle:
00159             return
00160         line = handle.readline().strip("\r\n").split("\t")
00161         n = len(line)
00162         self.uniqid = line[0]
00163         self.expid = []
00164         cols = {0: "GENEID"}
00165         for word in line[1:]:
00166             if word == "NAME":
00167                 cols[line.index(word)] = word
00168                 self.genename = []
00169             elif word == "GWEIGHT":
00170                 cols[line.index(word)] = word
00171                 self.gweight = []
00172             elif word=="GORDER":
00173                 cols[line.index(word)] = word
00174                 self.gorder = []
00175             else:
00176                 self.expid.append(word)
00177         self.geneid = []
00178         self.data = []
00179         self.mask = []
00180         needmask = 0
00181         for line in handle:
00182             line = line.strip("\r\n").split("\t")
00183             if len(line) != n:
00184                 raise ValueError("Line with %d columns found (expected %d)" %
00185                                  (len(line), n))
00186             if line[0] == "EWEIGHT":
00187                 i = max(cols) + 1
00188                 self.eweight = numpy.array(line[i:], float)
00189                 continue
00190             if line[0] == "EORDER":
00191                 i = max(cols) + 1
00192                 self.eorder = numpy.array(line[i:], float)
00193                 continue
00194             rowdata = []
00195             rowmask = []
00196             n = len(line)
00197             for i in range(n):
00198                 word = line[i]
00199                 if i in cols:
00200                     if cols[i] == "GENEID":
00201                         self.geneid.append(word)
00202                     if cols[i] == "NAME":
00203                         self.genename.append(word)
00204                     if cols[i] == "GWEIGHT":
00205                         self.gweight.append(float(word))
00206                     if cols[i] == "GORDER":
00207                         self.gorder.append(float(word))
00208                     continue
00209                 if not word:
00210                     rowdata.append(0.0)
00211                     rowmask.append(0)
00212                     needmask = 1
00213                 else:
00214                     rowdata.append(float(word))
00215                     rowmask.append(1)
00216             self.data.append(rowdata)
00217             self.mask.append(rowmask)
00218         self.data = numpy.array(self.data)
00219         if needmask:
00220             self.mask = numpy.array(self.mask, int)
00221         else:
00222             self.mask = None
00223         if self.gweight:
00224             self.gweight = numpy.array(self.gweight)
00225         if self.gorder:
00226             self.gorder = numpy.array(self.gorder)

Here is the caller graph for this function:


Member Function Documentation

def Bio.Cluster.Record._savedata (   self,
  jobname,
  gid,
  aid,
  geneindex,
  expindex 
) [private]

Definition at line 570 of file __init__.py.

00570 
00571     def _savedata(self, jobname, gid, aid, geneindex, expindex):
00572         # Save the clustered data.
00573         if self.genename == None:
00574             genename = self.geneid
00575         else:
00576             genename = self.genename
00577         (ngenes, nexps) = numpy.shape(self.data)
00578         try:
00579             outputfile = open(jobname+'.cdt', 'w')
00580         except IOError:
00581             raise IOError("Unable to open output file")
00582         if self.mask!=None:
00583             mask = self.mask
00584         else:
00585             mask = numpy.ones((ngenes,nexps), int)
00586         if self.gweight!=None:
00587             gweight = self.gweight
00588         else:
00589             gweight = numpy.ones(ngenes)
00590         if self.eweight!=None:
00591             eweight = self.eweight
00592         else:
00593             eweight = numpy.ones(nexps)
00594         if gid:
00595             outputfile.write('GID\t')
00596         outputfile.write(self.uniqid)
00597         outputfile.write('\tNAME\tGWEIGHT')
00598         # Now add headers for data columns.
00599         for j in expindex:
00600             outputfile.write('\t%s' % self.expid[j])
00601         outputfile.write('\n')
00602         if aid:
00603             outputfile.write("AID")
00604             if gid:
00605                 outputfile.write('\t')
00606             outputfile.write("\t\t")
00607             for j in expindex:
00608                 outputfile.write('\tARRY%dX' % j)
00609             outputfile.write('\n')
00610         outputfile.write('EWEIGHT')
00611         if gid:
00612             outputfile.write('\t')
00613         outputfile.write('\t\t')
00614         for j in expindex:
00615             outputfile.write('\t%f' % eweight[j])
00616         outputfile.write('\n')
00617         for i in geneindex:
00618             if gid:
00619                 outputfile.write('GENE%dX\t' % i)
00620             outputfile.write("%s\t%s\t%f" %
00621                              (self.geneid[i], genename[i], gweight[i]))
00622             for j in expindex:
00623                 outputfile.write('\t')
00624                 if mask[i,j]:
00625                     outputfile.write(str(self.data[i,j]))
00626             outputfile.write('\n')
00627         outputfile.close()
00628 

Here is the call graph for this function:

def Bio.Cluster.Record._savekmeans (   self,
  filename,
  clusterids,
  order,
  transpose 
) [private]

Definition at line 542 of file __init__.py.

00542 
00543     def _savekmeans(self, filename, clusterids, order, transpose):
00544         # Save a k-means clustering solution
00545         if transpose == 0:
00546             label = self.uniqid
00547             names = self.geneid
00548         else:
00549             label = "ARRAY"
00550             names = self.expid
00551         try:
00552             outputfile = open(filename, "w")
00553         except IOError:
00554             raise IOError("Unable to open output file")
00555         outputfile.write(label + "\tGROUP\n")
00556         index = numpy.argsort(order)
00557         n = len(names)
00558         sortedindex = numpy.zeros(n, int)
00559         counter = 0
00560         cluster = 0
00561         while counter < n:
00562             for j in index:
00563                 if clusterids[j] == cluster:
00564                     outputfile.write("%s\t%s\n" % (names[j], cluster))
00565                     sortedindex[counter] = j
00566                     counter += 1
00567             cluster += 1
00568         outputfile.close()
00569         return sortedindex

Here is the call graph for this function:

def Bio.Cluster.Record.clustercentroids (   self,
  clusterid = None,
  method = 'a',
  transpose = 0 
)
Calculate the cluster centroids and return a tuple (cdata, cmask).

The centroid is defined as either the mean or the median over all elements
for each dimension.

data     : nrows x ncolumns array containing the expression data
mask     : nrows x ncolumns array of integers, showing which data are
   missing. If mask[i][j]==0, then data[i][j] is missing.
transpose: if equal to 0, gene (row) clusters are considered;
   if equal to 1, microarray (column) clusters are considered.
clusterid: array containing the cluster number for each gene or
   microarray. The cluster number should be non-negative.
method   : specifies how the centroid is calculated:
   method=='a': arithmetic mean over each dimension. (default)
   method=='m': median over each dimension.

Return values:
cdata    : 2D array containing the cluster centroids. If transpose==0,
   then the dimensions of cdata are nclusters x ncolumns. If
   transpose==1, then the dimensions of cdata are
   nrows x nclusters.
cmask    : 2D array of integers describing which elements in cdata,
   if any, are missing.

Definition at line 357 of file __init__.py.

00357 
00358     def clustercentroids(self, clusterid=None, method='a', transpose=0):
00359         """Calculate the cluster centroids and return a tuple (cdata, cmask).
00360 
00361 The centroid is defined as either the mean or the median over all elements
00362 for each dimension.
00363 
00364 data     : nrows x ncolumns array containing the expression data
00365 mask     : nrows x ncolumns array of integers, showing which data are
00366            missing. If mask[i][j]==0, then data[i][j] is missing.
00367 transpose: if equal to 0, gene (row) clusters are considered;
00368            if equal to 1, microarray (column) clusters are considered.
00369 clusterid: array containing the cluster number for each gene or
00370            microarray. The cluster number should be non-negative.
00371 method   : specifies how the centroid is calculated:
00372            method=='a': arithmetic mean over each dimension. (default)
00373            method=='m': median over each dimension.
00374 
00375 Return values:
00376 cdata    : 2D array containing the cluster centroids. If transpose==0,
00377            then the dimensions of cdata are nclusters x ncolumns. If
00378            transpose==1, then the dimensions of cdata are
00379            nrows x nclusters.
00380 cmask    : 2D array of integers describing which elements in cdata,
00381            if any, are missing.
00382 
00383 """
00384         return clustercentroids(self.data, self.mask, clusterid, method,
00385                                 transpose)

def Bio.Cluster.Record.clusterdistance (   self,
  index1 = [0],
  index2 = [0],
  method = 'a',
  dist = 'e',
  transpose = 0 
)
Calculate the distance between two clusters.

index1   : 1D array identifying which genes/microarrays belong to the
   first cluster. If the cluster contains only one gene, then
   index1 can also be written as a single integer.
index2   : 1D array identifying which genes/microarrays belong to the
   second cluster. If the cluster contains only one gene, then
   index2 can also be written as a single integer.
transpose: if equal to 0, genes (rows) are clustered;
   if equal to 1, microarrays (columns) are clustered.
dist     : specifies the distance function to be used:
   dist=='e': Euclidean distance
   dist=='b': City Block distance
   dist=='c': Pearson correlation
   dist=='a': absolute value of the correlation
   dist=='u': uncentered correlation
   dist=='x': absolute uncentered correlation
   dist=='s': Spearman's rank correlation
   dist=='k': Kendall's tau
method   : specifies how the distance between two clusters is defined:
   method=='a': the distance between the arithmetic means of the
        two clusters
   method=='m': the distance between the medians of the two
        clusters
   method=='s': the smallest pairwise distance between members
        of the two clusters
   method=='x': the largest pairwise distance between members of
        the two clusters
   method=='v': average of the pairwise distances between
        members of the clusters
transpose: if equal to 0: clusters of genes (rows) are considered;
   if equal to 1: clusters of microarrays (columns) are
          considered.

Definition at line 387 of file __init__.py.

00387 
00388                         transpose=0):
00389         """Calculate the distance between two clusters.
00390 
00391 index1   : 1D array identifying which genes/microarrays belong to the
00392            first cluster. If the cluster contains only one gene, then
00393            index1 can also be written as a single integer.
00394 index2   : 1D array identifying which genes/microarrays belong to the
00395            second cluster. If the cluster contains only one gene, then
00396            index2 can also be written as a single integer.
00397 transpose: if equal to 0, genes (rows) are clustered;
00398            if equal to 1, microarrays (columns) are clustered.
00399 dist     : specifies the distance function to be used:
00400            dist=='e': Euclidean distance
00401            dist=='b': City Block distance
00402            dist=='c': Pearson correlation
00403            dist=='a': absolute value of the correlation
00404            dist=='u': uncentered correlation
00405            dist=='x': absolute uncentered correlation
00406            dist=='s': Spearman's rank correlation
00407            dist=='k': Kendall's tau
00408 method   : specifies how the distance between two clusters is defined:
00409            method=='a': the distance between the arithmetic means of the
00410                         two clusters
00411            method=='m': the distance between the medians of the two
00412                         clusters
00413            method=='s': the smallest pairwise distance between members
00414                         of the two clusters
00415            method=='x': the largest pairwise distance between members of
00416                         the two clusters
00417            method=='v': average of the pairwise distances between
00418                         members of the clusters
00419 transpose: if equal to 0: clusters of genes (rows) are considered;
00420            if equal to 1: clusters of microarrays (columns) are
00421                           considered.
00422 
00423 """
00424 
00425         if transpose == 0:
00426             weight = self.eweight
00427         else:
00428             weight = self.gweight
00429         return clusterdistance(self.data, self.mask, weight,
00430                                index1, index2, method, dist, transpose)

def Bio.Cluster.Record.distancematrix (   self,
  transpose = 0,
  dist = 'e' 
)
Calculate the distance matrix and return it as a list of arrays

transpose: if equal to 0: calculate the distances between genes (rows);
   if equal to 1: calculate the distances beteeen microarrays
          (columns).
dist     : specifies the distance function to be used:
   dist=='e': Euclidean distance
   dist=='b': City Block distance
   dist=='c': Pearson correlation
   dist=='a': absolute value of the correlation
   dist=='u': uncentered correlation
   dist=='x': absolute uncentered correlation
   dist=='s': Spearman's rank correlation
   dist=='k': Kendall's tau

Return value:
The distance matrix is returned as a list of 1D arrays containing the
distance matrix between the gene expression data. The number of columns
in each row is equal to the row number. Hence, the first row has zero
elements. An example of the return value is
matrix = [[],
  array([1.]),
  array([7., 3.]),
  array([4., 2., 6.])]
This corresponds to the distance matrix
 [0., 1., 7., 4.]
 [1., 0., 3., 2.]
 [7., 3., 0., 6.]
 [4., 2., 6., 0.]

Definition at line 431 of file __init__.py.

00431 
00432     def distancematrix(self, transpose=0, dist='e'):
00433         """Calculate the distance matrix and return it as a list of arrays
00434 
00435 transpose: if equal to 0: calculate the distances between genes (rows);
00436            if equal to 1: calculate the distances beteeen microarrays
00437                           (columns).
00438 dist     : specifies the distance function to be used:
00439            dist=='e': Euclidean distance
00440            dist=='b': City Block distance
00441            dist=='c': Pearson correlation
00442            dist=='a': absolute value of the correlation
00443            dist=='u': uncentered correlation
00444            dist=='x': absolute uncentered correlation
00445            dist=='s': Spearman's rank correlation
00446            dist=='k': Kendall's tau
00447 
00448 Return value:
00449 The distance matrix is returned as a list of 1D arrays containing the
00450 distance matrix between the gene expression data. The number of columns
00451 in each row is equal to the row number. Hence, the first row has zero
00452 elements. An example of the return value is
00453 matrix = [[],
00454           array([1.]),
00455           array([7., 3.]),
00456           array([4., 2., 6.])]
00457 This corresponds to the distance matrix
00458  [0., 1., 7., 4.]
00459  [1., 0., 3., 2.]
00460  [7., 3., 0., 6.]
00461  [4., 2., 6., 0.]
00462 
00463 """
00464         if transpose == 0:
00465             weight = self.eweight
00466         else:
00467             weight = self.gweight
00468         return distancematrix(self.data, self.mask, weight, transpose, dist)

def Bio.Cluster.Record.kcluster (   self,
  nclusters = 2,
  transpose = 0,
  npass = 1,
  method = 'a',
  dist = 'e',
  initialid = None 
)
Apply k-means or k-median clustering.

This method returns a tuple (clusterid, error, nfound).

nclusters: number of clusters (the 'k' in k-means)
transpose: if equal to 0, genes (rows) are clustered;
   if equal to 1, microarrays (columns) are clustered.
npass    : number of times the k-means clustering algorithm is
   performed, each time with a different (random) initial
   condition.
method   : specifies how the center of a cluster is found:
   method=='a': arithmetic mean
   method=='m': median
dist     : specifies the distance function to be used:
   dist=='e': Euclidean distance
   dist=='b': City Block distance
   dist=='c': Pearson correlation
   dist=='a': absolute value of the correlation
   dist=='u': uncentered correlation
   dist=='x': absolute uncentered correlation
   dist=='s': Spearman's rank correlation
   dist=='k': Kendall's tau
initialid: the initial clustering from which the algorithm should start.
   If initialid is None, the routine carries out npass
   repetitions of the EM algorithm, each time starting from a
   different random initial clustering. If initialid is given,
   the routine carries out the EM algorithm only once, starting
   from the given initial clustering and without randomizing the
   order in which items are assigned to clusters (i.e., using
   the same order as in the data matrix). In that case, the
   k-means algorithm is fully deterministic.

Return values:
clusterid: array containing the number of the cluster to which each
   gene/microarray was assigned in the best k-means clustering
   solution that was found in the npass runs;
error:     the within-cluster sum of distances for the returned k-means
   clustering solution;
nfound:    the number of times this solution was found.

Definition at line 262 of file __init__.py.

00262 
00263                  initialid=None):
00264         """Apply k-means or k-median clustering.
00265 
00266 This method returns a tuple (clusterid, error, nfound).
00267 
00268 nclusters: number of clusters (the 'k' in k-means)
00269 transpose: if equal to 0, genes (rows) are clustered;
00270            if equal to 1, microarrays (columns) are clustered.
00271 npass    : number of times the k-means clustering algorithm is
00272            performed, each time with a different (random) initial
00273            condition.
00274 method   : specifies how the center of a cluster is found:
00275            method=='a': arithmetic mean
00276            method=='m': median
00277 dist     : specifies the distance function to be used:
00278            dist=='e': Euclidean distance
00279            dist=='b': City Block distance
00280            dist=='c': Pearson correlation
00281            dist=='a': absolute value of the correlation
00282            dist=='u': uncentered correlation
00283            dist=='x': absolute uncentered correlation
00284            dist=='s': Spearman's rank correlation
00285            dist=='k': Kendall's tau
00286 initialid: the initial clustering from which the algorithm should start.
00287            If initialid is None, the routine carries out npass
00288            repetitions of the EM algorithm, each time starting from a
00289            different random initial clustering. If initialid is given,
00290            the routine carries out the EM algorithm only once, starting
00291            from the given initial clustering and without randomizing the
00292            order in which items are assigned to clusters (i.e., using
00293            the same order as in the data matrix). In that case, the
00294            k-means algorithm is fully deterministic.
00295 
00296 Return values:
00297 clusterid: array containing the number of the cluster to which each
00298            gene/microarray was assigned in the best k-means clustering
00299            solution that was found in the npass runs;
00300 error:     the within-cluster sum of distances for the returned k-means
00301            clustering solution;
00302 nfound:    the number of times this solution was found.
00303 
00304 """
00305 
00306         if transpose == 0:
00307             weight = self.eweight
00308         else:
00309             weight = self.gweight
00310         return kcluster(self.data, nclusters, self.mask, weight, transpose,
00311                         npass, method, dist, initialid)

def Bio.Cluster.Record.save (   self,
  jobname,
  geneclusters = None,
  expclusters = None 
)
Save the clustering results.

The saved files follow the convention for the Java TreeView program,
which can therefore be used to view the clustering result.

Arguments:
jobname:   The base name of the files to be saved. The filenames are
   jobname.cdt, jobname.gtr, and jobname.atr for
   hierarchical clustering, and jobname-K*.cdt,
   jobname-K*.kgg, jobname-K*.kag for k-means clustering
   results.
geneclusters=None:  For hierarchical clustering results, geneclusters
   is a Tree object as returned by the treecluster method.
   For k-means clustering results, geneclusters is a vector
   containing ngenes integers, describing to which cluster a
   given gene belongs. This vector can be calculated by
   kcluster.
expclusters=None:  For hierarchical clustering results, expclusters
   is a Tree object as returned by the treecluster method.
   For k-means clustering results, expclusters is a vector
   containing nexps integers, describing to which cluster a
   given experimental condition belongs. This vector can be
   calculated by kcluster.

Definition at line 469 of file __init__.py.

00469 
00470     def save(self, jobname, geneclusters=None, expclusters=None):
00471         """Save the clustering results.
00472 
00473 The saved files follow the convention for the Java TreeView program,
00474 which can therefore be used to view the clustering result.
00475 
00476 Arguments:
00477 jobname:   The base name of the files to be saved. The filenames are
00478            jobname.cdt, jobname.gtr, and jobname.atr for
00479            hierarchical clustering, and jobname-K*.cdt,
00480            jobname-K*.kgg, jobname-K*.kag for k-means clustering
00481            results.
00482 geneclusters=None:  For hierarchical clustering results, geneclusters
00483            is a Tree object as returned by the treecluster method.
00484            For k-means clustering results, geneclusters is a vector
00485            containing ngenes integers, describing to which cluster a
00486            given gene belongs. This vector can be calculated by
00487            kcluster.
00488 expclusters=None:  For hierarchical clustering results, expclusters
00489            is a Tree object as returned by the treecluster method.
00490            For k-means clustering results, expclusters is a vector
00491            containing nexps integers, describing to which cluster a
00492            given experimental condition belongs. This vector can be
00493            calculated by kcluster.
00494 
00495 """
00496         (ngenes,nexps) = numpy.shape(self.data)
00497         if self.gorder == None:
00498             gorder = numpy.arange(ngenes)
00499         else:
00500             gorder = self.gorder
00501         if self.eorder == None:
00502             eorder = numpy.arange(nexps)
00503         else:
00504             eorder = self.eorder
00505         if geneclusters!=None and expclusters!=None and \
00506            type(geneclusters) != type(expclusters):
00507             raise ValueError("found one k-means and one hierarchical "
00508                            + "clustering solution in geneclusters and "
00509                            + "expclusters")
00510         gid = 0
00511         aid = 0
00512         filename = jobname
00513         postfix = ""
00514         if type(geneclusters) == Tree:
00515             # This is a hierarchical clustering result.
00516             geneindex = _savetree(jobname, geneclusters, gorder, 0)
00517             gid = 1
00518         elif geneclusters!=None:
00519             # This is a k-means clustering result.
00520             filename = jobname + "_K"
00521             k = max(geneclusters+1)
00522             kggfilename = "%s_K_G%d.kgg" % (jobname, k)
00523             geneindex = self._savekmeans(kggfilename, geneclusters, gorder, 0)
00524             postfix = "_G%d" % k
00525         else:
00526             geneindex = numpy.argsort(gorder)
00527         if type(expclusters) == Tree:
00528             # This is a hierarchical clustering result.
00529             expindex = _savetree(jobname, expclusters, eorder, 1)
00530             aid = 1
00531         elif expclusters!=None:
00532             # This is a k-means clustering result.
00533             filename = jobname + "_K"
00534             k = max(expclusters+1)
00535             kagfilename = "%s_K_A%d.kag" % (jobname, k)
00536             expindex = self._savekmeans(kagfilename, expclusters, eorder, 1)
00537             postfix += "_A%d" % k
00538         else:
00539             expindex = numpy.argsort(eorder)
00540         filename = filename + postfix
00541         self._savedata(filename,gid,aid,geneindex,expindex)

Here is the call graph for this function:

def Bio.Cluster.Record.somcluster (   self,
  transpose = 0,
  nxgrid = 2,
  nygrid = 1,
  inittau = 0.02,
  niter = 1,
  dist = 'e' 
)
Calculate a self-organizing map on a rectangular grid.

The somcluster method returns a tuple (clusterid, celldata).

transpose: if equal to 0, genes (rows) are clustered;
   if equal to 1, microarrays (columns) are clustered.
nxgrid   : the horizontal dimension of the rectangular SOM map
nygrid   : the vertical dimension of the rectangular SOM map
inittau  : the initial value of tau (the neighborbood function)
niter    : the number of iterations
dist     : specifies the distance function to be used:
   dist=='e': Euclidean distance
   dist=='b': City Block distance
   dist=='c': Pearson correlation
   dist=='a': absolute value of the correlation
   dist=='u': uncentered correlation
   dist=='x': absolute uncentered correlation
   dist=='s': Spearman's rank correlation
   dist=='k': Kendall's tau

Return values:
clusterid: array with two columns, while the number of rows is equal to
   the number of genes or the number of microarrays depending on
   whether genes or microarrays are being clustered. Each row in
   the array contains the x and y coordinates of the cell in the
   rectangular SOM grid to which the gene or microarray was
   assigned.
celldata:  an array with dimensions (nxgrid, nygrid, number of
   microarrays) if genes are being clustered, or (nxgrid,
   nygrid, number of genes) if microarrays are being clustered.
   Each element [ix][iy] of this array is a 1D vector containing
   the gene expression data for the centroid of the cluster in
   the SOM grid cell with coordinates (ix, iy).

Definition at line 313 of file __init__.py.

00313 
00314                    niter=1, dist='e'):
00315         """Calculate a self-organizing map on a rectangular grid.
00316 
00317 The somcluster method returns a tuple (clusterid, celldata).
00318 
00319 transpose: if equal to 0, genes (rows) are clustered;
00320            if equal to 1, microarrays (columns) are clustered.
00321 nxgrid   : the horizontal dimension of the rectangular SOM map
00322 nygrid   : the vertical dimension of the rectangular SOM map
00323 inittau  : the initial value of tau (the neighborbood function)
00324 niter    : the number of iterations
00325 dist     : specifies the distance function to be used:
00326            dist=='e': Euclidean distance
00327            dist=='b': City Block distance
00328            dist=='c': Pearson correlation
00329            dist=='a': absolute value of the correlation
00330            dist=='u': uncentered correlation
00331            dist=='x': absolute uncentered correlation
00332            dist=='s': Spearman's rank correlation
00333            dist=='k': Kendall's tau
00334 
00335 Return values:
00336 clusterid: array with two columns, while the number of rows is equal to
00337            the number of genes or the number of microarrays depending on
00338            whether genes or microarrays are being clustered. Each row in
00339            the array contains the x and y coordinates of the cell in the
00340            rectangular SOM grid to which the gene or microarray was
00341            assigned.
00342 celldata:  an array with dimensions (nxgrid, nygrid, number of
00343            microarrays) if genes are being clustered, or (nxgrid,
00344            nygrid, number of genes) if microarrays are being clustered.
00345            Each element [ix][iy] of this array is a 1D vector containing
00346            the gene expression data for the centroid of the cluster in
00347            the SOM grid cell with coordinates (ix, iy).
00348 
00349 """
00350 
00351         if transpose == 0:
00352             weight = self.eweight
00353         else:
00354             weight = self.gweight
00355         return somcluster(self.data, self.mask, weight, transpose,
00356                           nxgrid, nygrid, inittau, niter, dist)

def Bio.Cluster.Record.treecluster (   self,
  transpose = 0,
  method = 'm',
  dist = 'e' 
)
Apply hierarchical clustering and return a Tree object.

The pairwise single, complete, centroid, and average linkage hierarchical
clustering methods are available.

transpose: if equal to 0, genes (rows) are clustered;
   if equal to 1, microarrays (columns) are clustered.
dist     : specifies the distance function to be used:
   dist=='e': Euclidean distance
   dist=='b': City Block distance
   dist=='c': Pearson correlation
   dist=='a': absolute value of the correlation
   dist=='u': uncentered correlation
   dist=='x': absolute uncentered correlation
   dist=='s': Spearman's rank correlation
   dist=='k': Kendall's tau
method   : specifies which linkage method is used:
   method=='s': Single pairwise linkage
   method=='m': Complete (maximum) pairwise linkage (default)
   method=='c': Centroid linkage
   method=='a': Average pairwise linkage

See the description of the Tree class for more information about the Tree
object returned by this method.

Definition at line 227 of file __init__.py.

00227 
00228     def treecluster(self, transpose=0, method='m', dist='e'):
00229         """Apply hierarchical clustering and return a Tree object.
00230 
00231 The pairwise single, complete, centroid, and average linkage hierarchical
00232 clustering methods are available.
00233 
00234 transpose: if equal to 0, genes (rows) are clustered;
00235            if equal to 1, microarrays (columns) are clustered.
00236 dist     : specifies the distance function to be used:
00237            dist=='e': Euclidean distance
00238            dist=='b': City Block distance
00239            dist=='c': Pearson correlation
00240            dist=='a': absolute value of the correlation
00241            dist=='u': uncentered correlation
00242            dist=='x': absolute uncentered correlation
00243            dist=='s': Spearman's rank correlation
00244            dist=='k': Kendall's tau
00245 method   : specifies which linkage method is used:
00246            method=='s': Single pairwise linkage
00247            method=='m': Complete (maximum) pairwise linkage (default)
00248            method=='c': Centroid linkage
00249            method=='a': Average pairwise linkage
00250 
00251 See the description of the Tree class for more information about the Tree
00252 object returned by this method.
00253 
00254 """
00255         if transpose == 0:
00256             weight = self.eweight
00257         else:
00258             weight = self.gweight
00259         return treecluster(self.data, self.mask, weight, transpose, method,
00260                            dist)


Member Data Documentation

Definition at line 147 of file __init__.py.

Definition at line 155 of file __init__.py.

Definition at line 154 of file __init__.py.

Definition at line 153 of file __init__.py.

Definition at line 149 of file __init__.py.

Definition at line 150 of file __init__.py.

Definition at line 152 of file __init__.py.

Definition at line 151 of file __init__.py.

Definition at line 148 of file __init__.py.

Definition at line 156 of file __init__.py.


The documentation for this class was generated from the following file: