Back to index

moin  1.9.0~rc2
Public Member Functions | Properties | Private Member Functions | Private Attributes | Static Private Attributes
MoinMoin.support.xappy.searchconnection.SearchResults Class Reference

List of all members.

Public Member Functions

def __init__
def __repr__
def get_hit
def __iter__
def __len__
def get_top_tags
def get_suggested_facets

Properties

 more_matches
 startrank
 endrank
 matches_lower_bound
 matches_upper_bound
 matches_human_readable_estimate
 matches_estimated
 estimate_is_exact

Private Member Functions

def _cluster
def _reorder_by_clusters
def _make_expand_decider
def _reorder_by_similarity
def _get_more_matches
def _get_startrank
def _get_endrank
def _get_lower_bound
def _get_upper_bound
def _get_human_readable_estimate
def _get_estimated
def _estimate_is_exact

Private Attributes

 _conn
 _enq
 _query
 _mset
 _mset_order
 _fieldmappings
 _tagspy
 _tagfields
 _facetspy
 _facetfields
 _facethierarchy
 _facetassocs
 _numeric_ranges_built

Static Private Attributes

 __getitem__ = get_hit

Detailed Description

A set of results of a search.

Definition at line 233 of file searchconnection.py.


Constructor & Destructor Documentation

def MoinMoin.support.xappy.searchconnection.SearchResults.__init__ (   self,
  conn,
  enq,
  query,
  mset,
  fieldmappings,
  tagspy,
  tagfields,
  facetspy,
  facetfields,
  facethierarchy,
  facetassocs 
)

Definition at line 239 of file searchconnection.py.

00239 
00240                  facetassocs):
00241         self._conn = conn
00242         self._enq = enq
00243         self._query = query
00244         self._mset = mset
00245         self._mset_order = None
00246         self._fieldmappings = fieldmappings
00247         self._tagspy = tagspy
00248         if tagfields is None:
00249             self._tagfields = None
00250         else:
00251             self._tagfields = set(tagfields)
00252         self._facetspy = facetspy
00253         self._facetfields = facetfields
00254         self._facethierarchy = facethierarchy
00255         self._facetassocs = facetassocs
00256         self._numeric_ranges_built = {}


Member Function Documentation

Get an iterator over the hits in the search result.

The iterator returns the results in increasing order of rank.

Definition at line 525 of file searchconnection.py.

00525 
00526     def __iter__(self):
00527         """Get an iterator over the hits in the search result.
00528 
00529         The iterator returns the results in increasing order of rank.
00530 
00531         """
00532         return SearchResultIter(self, self._mset_order)

Get the number of hits in the search result.

Note that this is not (usually) the number of matching documents for
the search.  If startrank is non-zero, it's not even the rank of the
last document in the search result.  It's simply the number of hits
stored in the search result.

It is, however, the number of items returned by the iterator produced
by calling iter() on this SearchResults object.

Definition at line 533 of file searchconnection.py.

00533 
00534     def __len__(self):
00535         """Get the number of hits in the search result.
00536 
00537         Note that this is not (usually) the number of matching documents for
00538         the search.  If startrank is non-zero, it's not even the rank of the
00539         last document in the search result.  It's simply the number of hits
00540         stored in the search result.
00541 
00542         It is, however, the number of items returned by the iterator produced
00543         by calling iter() on this SearchResults object.
00544 
00545         """
00546         return len(self._mset)

Definition at line 416 of file searchconnection.py.

00416 
00417     def __repr__(self):
00418         return ("<SearchResults(startrank=%d, "
00419                 "endrank=%d, "
00420                 "more_matches=%s, "
00421                 "matches_lower_bound=%d, "
00422                 "matches_upper_bound=%d, "
00423                 "matches_estimated=%d, "
00424                 "estimate_is_exact=%s)>" %
00425                 (
00426                  self.startrank,
00427                  self.endrank,
00428                  self.more_matches,
00429                  self.matches_lower_bound,
00430                  self.matches_upper_bound,
00431                  self.matches_estimated,
00432                  self.estimate_is_exact,
00433                 ))

def MoinMoin.support.xappy.searchconnection.SearchResults._cluster (   self,
  num_clusters,
  maxdocs,
  fields = None 
) [private]
Cluster results based on similarity.

Note: this method is experimental, and will probably disappear or
change in the future.

The number of clusters is specified by num_clusters: unless there are
too few results, there will be exaclty this number of clusters in the
result.

Definition at line 257 of file searchconnection.py.

00257 
00258     def _cluster(self, num_clusters, maxdocs, fields=None):
00259         """Cluster results based on similarity.
00260 
00261         Note: this method is experimental, and will probably disappear or
00262         change in the future.
00263 
00264         The number of clusters is specified by num_clusters: unless there are
00265         too few results, there will be exaclty this number of clusters in the
00266         result.
00267 
00268         """
00269         clusterer = _xapian.ClusterSingleLink()
00270         xapclusters = _xapian.ClusterAssignments()
00271         docsim = _xapian.DocSimCosine()
00272         source = _xapian.MSetDocumentSource(self._mset, maxdocs)
00273 
00274         if fields is None:
00275             clusterer.cluster(self._conn._index, xapclusters, docsim, source, num_clusters)
00276         else:
00277             decider = self._make_expand_decider(fields)
00278             clusterer.cluster(self._conn._index, xapclusters, docsim, source, decider, num_clusters)
00279 
00280         newid = 0
00281         idmap = {}
00282         clusters = {}
00283         for item in self._mset:
00284             docid = item.docid
00285             clusterid = xapclusters.cluster(docid)
00286             if clusterid not in idmap:
00287                 idmap[clusterid] = newid
00288                 newid += 1
00289             clusterid = idmap[clusterid]
00290             if clusterid not in clusters:
00291                 clusters[clusterid] = []
00292             clusters[clusterid].append(item.rank)
00293         return clusters

Here is the call graph for this function:

Definition at line 499 of file searchconnection.py.

00499 
00500     def _estimate_is_exact(self):
00501         return self._mset.get_matches_lower_bound() == \
               self._mset.get_matches_upper_bound()

Definition at line 452 of file searchconnection.py.

00452 
00453     def _get_endrank(self):
        return self._mset.get_firstitem() + len(self._mset)

Definition at line 492 of file searchconnection.py.

00492 
00493     def _get_estimated(self):
        return self._mset.get_matches_estimated()

Definition at line 476 of file searchconnection.py.

00476 
00477     def _get_human_readable_estimate(self):
00478         lower = self._mset.get_matches_lower_bound()
00479         upper = self._mset.get_matches_upper_bound()
00480         est = self._mset.get_matches_estimated()
        return _get_significant_digits(est, lower, upper)

Here is the call graph for this function:

Definition at line 462 of file searchconnection.py.

00462 
00463     def _get_lower_bound(self):
        return self._mset.get_matches_lower_bound()

Definition at line 434 of file searchconnection.py.

00434 
00435     def _get_more_matches(self):
00436         # This check relies on us having asked for at least one more result
00437         # than retrieved to be checked.
        return (self.matches_lower_bound > self.endrank)

Definition at line 443 of file searchconnection.py.

00443 
00444     def _get_startrank(self):
        return self._mset.get_firstitem()

Definition at line 469 of file searchconnection.py.

00469 
00470     def _get_upper_bound(self):
        return self._mset.get_matches_upper_bound()
Make an expand decider which accepts only terms in the specified
field.

Definition at line 313 of file searchconnection.py.

00313 
00314     def _make_expand_decider(self, fields):
00315         """Make an expand decider which accepts only terms in the specified
00316         field.
00317 
00318         """
00319         prefixes = {}
00320         if isinstance(fields, basestring):
00321             fields = [fields]
00322         for field in fields:
00323             try:
00324                 actions = self._conn._field_actions[field]._actions
00325             except KeyError:
00326                 continue
00327             for action, kwargslist in actions.iteritems():
00328                 if action == FieldActions.INDEX_FREETEXT:
00329                     prefix = self._conn._field_mappings.get_prefix(field)
00330                     prefixes[prefix] = None
00331                     prefixes['Z' + prefix] = None
00332                 if action in (FieldActions.INDEX_EXACT,
00333                               FieldActions.TAG,
00334                               FieldActions.FACET,):
00335                     prefix = self._conn._field_mappings.get_prefix(field)
00336                     prefixes[prefix] = None
00337         prefix_re = _re.compile('|'.join([_re.escape(x) + '[^A-Z]' for x in prefixes.keys()]))
00338         class decider(_xapian.ExpandDecider):
00339             def __call__(self, term):
00340                 return prefix_re.match(term) is not None
00341         return decider()

Here is the call graph for this function:

Here is the caller graph for this function:

Reorder the mset based on some clusters.

Definition at line 294 of file searchconnection.py.

00294 
00295     def _reorder_by_clusters(self, clusters):
00296         """Reorder the mset based on some clusters.
00297 
00298         """
00299         if self.startrank != 0:
00300             raise _errors.SearchError("startrank must be zero to reorder by clusters")
00301         reordered = False
00302         tophits = []
00303         nottophits = []
00304 
00305         clusterstarts = dict(((c[0], None) for c in clusters.itervalues()))
00306         for i in xrange(self.endrank):
00307             if i in clusterstarts:
00308                 tophits.append(i)
00309             else:
00310                 nottophits.append(i)
00311         self._mset_order = tophits
00312         self._mset_order.extend(nottophits)

def MoinMoin.support.xappy.searchconnection.SearchResults._reorder_by_similarity (   self,
  count,
  maxcount,
  max_similarity,
  fields = None 
) [private]
Reorder results based on similarity.

The top `count` documents will be chosen such that they are relatively
dissimilar.  `maxcount` documents will be considered for moving around,
and `max_similarity` is a value between 0 and 1 indicating the maximum
similarity to the previous document before a document is moved down the
result set.

Note: this method is experimental, and will probably disappear or
change in the future.

Definition at line 343 of file searchconnection.py.

00343 
00344                                fields=None):
00345         """Reorder results based on similarity.
00346 
00347         The top `count` documents will be chosen such that they are relatively
00348         dissimilar.  `maxcount` documents will be considered for moving around,
00349         and `max_similarity` is a value between 0 and 1 indicating the maximum
00350         similarity to the previous document before a document is moved down the
00351         result set.
00352 
00353         Note: this method is experimental, and will probably disappear or
00354         change in the future.
00355 
00356         """
00357         if self.startrank != 0:
00358             raise _errors.SearchError("startrank must be zero to reorder by similiarity")
00359         ds = _xapian.DocSimCosine()
00360         ds.set_termfreqsource(_xapian.DatabaseTermFreqSource(self._conn._index))
00361 
00362         if fields is not None:
00363             ds.set_expand_decider(self._make_expand_decider(fields))
00364 
00365         tophits = []
00366         nottophits = []
00367         full = False
00368         reordered = False
00369 
00370         sim_count = 0
00371         new_order = []
00372         end = min(self.endrank, maxcount)
00373         for i in xrange(end):
00374             if full:
00375                 new_order.append(i)
00376                 continue
00377             hit = self._mset.get_hit(i)
00378             if len(tophits) == 0:
00379                 tophits.append(hit)
00380                 continue
00381 
00382             # Compare each incoming hit to tophits
00383             maxsim = 0.0
00384             for tophit in tophits[-1:]:
00385                 sim_count += 1
00386                 sim = ds.similarity(hit.document, tophit.document)
00387                 if sim > maxsim:
00388                     maxsim = sim
00389 
00390             # If it's not similar to an existing hit, add to tophits.
00391             if maxsim < max_similarity:
00392                 tophits.append(hit)
00393             else:
00394                 nottophits.append(hit)
00395                 reordered = True
00396 
00397             # If we're full of hits, append to the end.
00398             if len(tophits) >= count:
00399                 for hit in tophits:
00400                     new_order.append(hit.rank)
00401                 for hit in nottophits:
00402                     new_order.append(hit.rank)
00403                 full = True
00404         if not full:
00405             for hit in tophits:
00406                 new_order.append(hit.rank)
00407             for hit in nottophits:
00408                 new_order.append(hit.rank)
00409         if end != self.endrank:
00410             new_order.extend(range(end, self.endrank))
00411         assert len(new_order) == self.endrank
00412         if reordered:
00413             self._mset_order = new_order
00414         else:
00415             assert new_order == range(self.endrank)

Here is the call graph for this function:

Here is the caller graph for this function:

Get the hit with a given index.

Definition at line 514 of file searchconnection.py.

00514 
00515     def get_hit(self, index):
00516         """Get the hit with a given index.
00517 
00518         """
00519         if self._mset_order is None:
00520             msetitem = self._mset.get_hit(index)
00521         else:
00522             msetitem = self._mset.get_hit(self._mset_order[index])
        return SearchResult(msetitem, self)
def MoinMoin.support.xappy.searchconnection.SearchResults.get_suggested_facets (   self,
  maxfacets = 5,
  desired_num_of_categories = 7,
  required_facets = None 
)
Get a suggested set of facets, to present to the user.

This returns a list, in descending order of the usefulness of the
facet, in which each item is a tuple holding:

 - fieldname of facet.
 - sequence of 2-tuples holding the suggested values or ranges for that
   field:

   For facets of type 'string', the first item in the 2-tuple will
   simply be the string supplied when the facet value was added to its
   document.  For facets of type 'float', it will be a 2-tuple, holding
   floats giving the start and end of the suggested value range.

   The second item in the 2-tuple will be the frequency of the facet
   value or range in the result set.

If required_facets is not None, it must be a field name, or a sequence
of field names.  Any field names mentioned in required_facets will be
returned if there are any facet values at all in the search results for
that field.  The facet will only be omitted if there are no facet
values at all for the field.

The value of maxfacets will be respected as far as possible; the
exception is that if there are too many fields listed in
required_facets with at least one value in the search results, extra
facets will be returned (ie, obeying the required_facets parameter is
considered more important than the maxfacets parameter).

If facet_hierarchy was indicated when search() was called, and the
query included facets, then only subfacets of those query facets and
top-level facets will be included in the returned list. Furthermore
top-level facets will only be returned if there are remaining places
in the list after it has been filled with subfacets. Note that
required_facets is still respected regardless of the facet hierarchy.

If a query type was specified when search() was called, and the query
included facets, then facets with an association of Never to the
query type are never returned, even if mentioned in required_facets.
Facets with an association of Preferred are listed before others in
the returned list.

Definition at line 567 of file searchconnection.py.

00567 
00568                              required_facets=None):
00569         """Get a suggested set of facets, to present to the user.
00570 
00571         This returns a list, in descending order of the usefulness of the
00572         facet, in which each item is a tuple holding:
00573 
00574          - fieldname of facet.
00575          - sequence of 2-tuples holding the suggested values or ranges for that
00576            field:
00577 
00578            For facets of type 'string', the first item in the 2-tuple will
00579            simply be the string supplied when the facet value was added to its
00580            document.  For facets of type 'float', it will be a 2-tuple, holding
00581            floats giving the start and end of the suggested value range.
00582 
00583            The second item in the 2-tuple will be the frequency of the facet
00584            value or range in the result set.
00585 
00586         If required_facets is not None, it must be a field name, or a sequence
00587         of field names.  Any field names mentioned in required_facets will be
00588         returned if there are any facet values at all in the search results for
00589         that field.  The facet will only be omitted if there are no facet
00590         values at all for the field.
00591 
00592         The value of maxfacets will be respected as far as possible; the
00593         exception is that if there are too many fields listed in
00594         required_facets with at least one value in the search results, extra
00595         facets will be returned (ie, obeying the required_facets parameter is
00596         considered more important than the maxfacets parameter).
00597 
00598         If facet_hierarchy was indicated when search() was called, and the
00599         query included facets, then only subfacets of those query facets and
00600         top-level facets will be included in the returned list. Furthermore
00601         top-level facets will only be returned if there are remaining places
00602         in the list after it has been filled with subfacets. Note that
00603         required_facets is still respected regardless of the facet hierarchy.
00604 
00605         If a query type was specified when search() was called, and the query
00606         included facets, then facets with an association of Never to the
00607         query type are never returned, even if mentioned in required_facets.
00608         Facets with an association of Preferred are listed before others in
00609         the returned list.
00610 
00611         """
00612         if 'facets' in _checkxapian.missing_features:
00613             raise errors.SearchError("Facets unsupported with this release of xapian")
00614         if self._facetspy is None:
00615             raise _errors.SearchError("Facet selection wasn't enabled when the search was run")
00616         if isinstance(required_facets, basestring):
00617             required_facets = [required_facets]
00618         scores = []
00619         facettypes = {}
00620         for field, slot, kwargslist in self._facetfields:
00621             type = None
00622             for kwargs in kwargslist:
00623                 type = kwargs.get('type', None)
00624                 if type is not None: break
00625             if type is None: type = 'string'
00626 
00627             if type == 'float':
00628                 if field not in self._numeric_ranges_built:
00629                     self._facetspy.build_numeric_ranges(slot, desired_num_of_categories)
00630                     self._numeric_ranges_built[field] = None
00631             facettypes[field] = type
00632             score = self._facetspy.score_categorisation(slot, desired_num_of_categories)
00633             scores.append((score, field, slot))
00634 
00635         # Sort on whether facet is top-level ahead of score (use subfacets first),
00636         # and on whether facet is preferred for the query type ahead of anything else
00637         if self._facethierarchy:
00638             # Note, tuple[-2] is the value of 'field' in a scores tuple
00639             scores = [(tuple[-2] not in self._facethierarchy,) + tuple for tuple in scores]
00640         if self._facetassocs:
00641             preferred = _indexerconnection.IndexerConnection.FacetQueryType_Preferred
00642             scores = [(self._facetassocs.get(tuple[-2]) != preferred,) + tuple for tuple in scores]
00643         scores.sort()
00644         if self._facethierarchy:
00645             index = 1
00646         else:
00647             index = 0
00648         if self._facetassocs:
00649             index += 1
00650         if index > 0:
00651             scores = [tuple[index:] for tuple in scores]
00652 
00653         results = []
00654         required_results = []
00655         for score, field, slot in scores:
00656             # Check if the facet is required
00657             required = False
00658             if required_facets is not None:
00659                 required = field in required_facets
00660 
00661             # If we've got enough facets, and the field isn't required, skip it
00662             if not required and len(results) + len(required_results) >= maxfacets:
00663                 continue
00664 
00665             # Get the values
00666             values = self._facetspy.get_values_as_dict(slot)
00667             if field in self._numeric_ranges_built:
00668                 if '' in values:
00669                     del values['']
00670 
00671             # Required facets must occur at least once, other facets must occur
00672             # at least twice.
00673             if required:
00674                 if len(values) < 1:
00675                     continue
00676             else:
00677                 if len(values) <= 1:
00678                     continue
00679 
00680             newvalues = []
00681             if facettypes[field] == 'float':
00682                 # Convert numbers to python numbers, and number ranges to a
00683                 # python tuple of two numbers.
00684                 for value, frequency in values.iteritems():
00685                     if len(value) <= 9:
00686                         value1 = _log(_xapian.sortable_unserialise, value)
00687                         value2 = value1
00688                     else:
00689                         value1 = _log(_xapian.sortable_unserialise, value[:9])
00690                         value2 = _log(_xapian.sortable_unserialise, value[9:])
00691                     newvalues.append(((value1, value2), frequency))
00692             else:
00693                 for value, frequency in values.iteritems():
00694                     newvalues.append((value, frequency))
00695 
00696             newvalues.sort()
00697             if required:
00698                 required_results.append((score, field, newvalues))
00699             else:
00700                 results.append((score, field, newvalues))
00701 
00702         # Throw away any excess results if we have more required_results to
00703         # insert.
00704         maxfacets = maxfacets - len(required_results)
00705         if maxfacets <= 0:
00706             results = required_results
00707         else:
00708             results = results[:maxfacets]
00709             results.extend(required_results)
00710             results.sort()
00711 
00712         # Throw away the scores because they're not meaningful outside this
00713         # algorithm.
00714         results = [(field, newvalues) for (score, field, newvalues) in results]
00715         return results
00716 

Here is the call graph for this function:

Here is the caller graph for this function:

Get the most frequent tags in a given field.

 - `field` - the field to get tags for.  This must have been specified
   in the "gettags" argument of the search() call.
 - `maxtags` - the maximum number of tags to return.

Returns a sequence of 2-item tuples, in which the first item in the
tuple is the tag, and the second is the frequency of the tag in the
matches seen (as an integer).

Definition at line 547 of file searchconnection.py.

00547 
00548     def get_top_tags(self, field, maxtags):
00549         """Get the most frequent tags in a given field.
00550 
00551          - `field` - the field to get tags for.  This must have been specified
00552            in the "gettags" argument of the search() call.
00553          - `maxtags` - the maximum number of tags to return.
00554 
00555         Returns a sequence of 2-item tuples, in which the first item in the
00556         tuple is the tag, and the second is the frequency of the tag in the
00557         matches seen (as an integer).
00558 
00559         """
00560         if 'tags' in _checkxapian.missing_features:
00561             raise errors.SearchError("Tags unsupported with this release of xapian")
00562         if self._tagspy is None or field not in self._tagfields:
00563             raise _errors.SearchError("Field %r was not specified for getting tags" % field)
00564         prefix = self._conn._field_mappings.get_prefix(field)
00565         return self._tagspy.get_top_terms(prefix, maxtags)

Here is the call graph for this function:


Member Data Documentation

Definition at line 523 of file searchconnection.py.

Definition at line 240 of file searchconnection.py.

Definition at line 241 of file searchconnection.py.

Definition at line 254 of file searchconnection.py.

Definition at line 252 of file searchconnection.py.

Definition at line 253 of file searchconnection.py.

Definition at line 251 of file searchconnection.py.

Definition at line 245 of file searchconnection.py.

Definition at line 243 of file searchconnection.py.

Definition at line 244 of file searchconnection.py.

Definition at line 255 of file searchconnection.py.

Definition at line 242 of file searchconnection.py.

Definition at line 248 of file searchconnection.py.

Definition at line 246 of file searchconnection.py.


Property Documentation

Initial value:
property(_get_endrank, doc=
    """Get the rank of the item after the end of the search results.If there are sufficient results in the index, this corresponds to the"endrank" parameter passed to the search() method.""")

Definition at line 454 of file searchconnection.py.

Initial value:
property(_estimate_is_exact, doc=
    """Check whether the estimated number of matching documents is exact.If this returns true, the estimate given by the `matches_estimated`property is guaranteed to be correct.If this returns false, it is possible that the actual number of matchingdocuments is different from the number given by the `matches_estimated`property.""")

Definition at line 502 of file searchconnection.py.

Initial value:
property(_get_estimated, doc=
    """Get an estimate for the total number of matching documents.""")

Definition at line 494 of file searchconnection.py.

Initial value:
property(_get_human_readable_estimate,
                                               doc=
    """Get a human readable estimate of the number of matching documents.This consists of the value returned by the "matches_estimated" property,rounded to an appropriate number of significant digits (as determined bythe values of the "matches_lower_bound" and "matches_upper_bound"properties).""")

Definition at line 481 of file searchconnection.py.

Initial value:
property(_get_lower_bound, doc=
    """Get a lower bound on the total number of matching documents.""")

Definition at line 464 of file searchconnection.py.

Initial value:
property(_get_upper_bound, doc=
    """Get an upper bound on the total number of matching documents.""")

Definition at line 471 of file searchconnection.py.

Initial value:
property(_get_more_matches, doc=
    """Check whether there are further matches after those in this result set.""")

Definition at line 438 of file searchconnection.py.

Initial value:
property(_get_startrank, doc=
    """Get the rank of the first item in the search results.This corresponds to the "startrank" parameter passed to the search() method.""")

Definition at line 445 of file searchconnection.py.


The documentation for this class was generated from the following file: