Back to index

moin  1.9.0~rc2
Public Member Functions | Public Attributes | Properties | Private Member Functions | Private Attributes
MoinMoin.support.xappy.searchconnection.SearchResult Class Reference
Inheritance diagram for MoinMoin.support.xappy.searchconnection.SearchResult:
Inheritance graph
[legend]
Collaboration diagram for MoinMoin.support.xappy.searchconnection.SearchResult:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def summarise
def highlight
def __repr__
def add_term
def add_value
def get_value
def prepare

Public Attributes

 rank
 weight
 percent

Properties

 data
 id

Private Member Functions

def _get_language

Private Attributes

 _results

Detailed Description

A result from a search.

As well as being a ProcessedDocument representing the document in the
database, the result has several members which may be used to get
information about how well the document matches the search:

 - `rank`: The rank of the document in the search results, starting at 0
   (ie, 0 is the "top" result, 1 is the second result, etc).

 - `weight`: A floating point number indicating the weight of the result
   document.  The value is only meaningful relative to other results for a
   given search - a different search, or the same search with a different
   database, may give an entirely different scale to the weights.  This
   should not usually be displayed to users, but may be useful if trying to
   perform advanced reweighting operations on search results.

 - `percent`: A percentage value for the weight of a document.  This is
   just a rescaled form of the `weight` member.  It doesn't represent any
   kind of probability value; the only real meaning of the numbers is that,
   within a single set of results, a document with a higher percentage
   corresponds to a better match.  Because the percentage doesn't really
   represent a probability, or a confidence value, it is probably unhelpful
   to display it to most users, since they tend to place an over emphasis
   on its meaning.  However, it is included because it may be useful
   occasionally.

Definition at line 38 of file searchconnection.py.


Constructor & Destructor Documentation

def MoinMoin.support.xappy.searchconnection.SearchResult.__init__ (   self,
  fieldmappings,
  xapdoc 
)
Create a ProcessedDocument.

`fieldmappings` is the configuration from a database connection used lookup
the configuration to use to store each field.
    
If supplied, `xapdoc` is a Xapian document to store in the processed
document.  Otherwise, a new Xapian document is created.

Reimplemented from MoinMoin.support.xappy.datastructures.ProcessedDocument.

Definition at line 66 of file searchconnection.py.

00066 
00067     def __init__(self, msetitem, results):
00068         ProcessedDocument.__init__(self, results._fieldmappings, msetitem.document)
00069         self.rank = msetitem.rank
00070         self.weight = msetitem.weight
00071         self.percent = msetitem.percent
00072         self._results = results


Member Function Documentation

Reimplemented from MoinMoin.support.xappy.datastructures.ProcessedDocument.

Definition at line 158 of file searchconnection.py.

00158 
00159     def __repr__(self):
00160         return ('<SearchResult(rank=%d, id=%r, data=%r)>' %
00161                 (self.rank, self.id, self.data))
00162 

Get the language that should be used for a given field.

Raises a KeyError if the field is not known.

Definition at line 73 of file searchconnection.py.

00073 
00074     def _get_language(self, field):
00075         """Get the language that should be used for a given field.
00076 
00077         Raises a KeyError if the field is not known.
00078 
00079         """
00080         actions = self._results._conn._field_actions[field]._actions
00081         for action, kwargslist in actions.iteritems():
00082             if action == FieldActions.INDEX_FREETEXT:
00083                 for kwargs in kwargslist:
00084                     try:
00085                         return kwargs['language']
00086                     except KeyError:
00087                         pass
00088         return 'none'

Here is the caller graph for this function:

def MoinMoin.support.xappy.datastructures.ProcessedDocument.add_term (   self,
  field,
  term,
  wdfinc = 1,
  positions = None 
) [inherited]
Add a term to the document.

Terms are the main unit of information used for performing searches.

- `field` is the field to add the term to.
- `term` is the term to add.
- `wdfinc` is the value to increase the within-document-frequency
  measure for the term by.
- `positions` is the positional information to add for the term.
  This may be None to indicate that there is no positional information,
  or may be an integer to specify one position, or may be a sequence of
  integers to specify several positions.  (Note that the wdf is not
  increased automatically for each position: if you add a term at 7
  positions, and the wdfinc value is 2, the total wdf for the term will
  only be increased by 2, not by 14.)

Definition at line 98 of file datastructures.py.

00098 
00099     def add_term(self, field, term, wdfinc=1, positions=None):
00100         """Add a term to the document.
00101 
00102         Terms are the main unit of information used for performing searches.
00103 
00104         - `field` is the field to add the term to.
00105         - `term` is the term to add.
00106         - `wdfinc` is the value to increase the within-document-frequency
00107           measure for the term by.
00108         - `positions` is the positional information to add for the term.
00109           This may be None to indicate that there is no positional information,
00110           or may be an integer to specify one position, or may be a sequence of
00111           integers to specify several positions.  (Note that the wdf is not
00112           increased automatically for each position: if you add a term at 7
00113           positions, and the wdfinc value is 2, the total wdf for the term will
00114           only be increased by 2, not by 14.)
00115 
00116         """
00117         prefix = self._fieldmappings.get_prefix(field)
00118         if len(term) > 0:
00119             # We use the following check, rather than "isupper()" to ensure
00120             # that we match the check performed by the queryparser, regardless
00121             # of our locale.
00122             if ord(term[0]) >= ord('A') and ord(term[0]) <= ord('Z'):
00123                 prefix = prefix + ':'
00124 
00125         # Note - xapian currently restricts term lengths to about 248
00126         # characters - except that zero bytes are encoded in two bytes, so
00127         # in practice a term of length 125 characters could be too long.
00128         # Xapian will give an error when commit() is called after such
00129         # documents have been added to the database.
00130         # As a simple workaround, we give an error here for terms over 220
00131         # characters, which will catch most occurrences of the error early.
00132         #
00133         # In future, it might be good to change to a hashing scheme in this
00134         # situation (or for terms over, say, 64 characters), where the
00135         # characters after position 64 are hashed (we obviously need to do this
00136         # hashing at search time, too).
00137         if len(prefix + term) > 220:
00138             raise errors.IndexerError("Field %r is too long: maximum length "
00139                                        "220 - was %d (%r)" %
00140                                        (field, len(prefix + term),
00141                                         prefix + term))
00142 
00143         if positions is None:
00144             self._doc.add_term(prefix + term, wdfinc)
00145         elif isinstance(positions, int):
00146             self._doc.add_posting(prefix + term, positions, wdfinc)
00147         else:
00148             self._doc.add_term(prefix + term, wdfinc)
00149             for pos in positions:
00150                 self._doc.add_posting(prefix + term, pos, 0)

def MoinMoin.support.xappy.datastructures.ProcessedDocument.add_value (   self,
  field,
  value,
  purpose = '' 
) [inherited]
Add a value to the document.

Values are additional units of information used when performing
searches.  Note that values are _not_ intended to be used to store
information for display in the search results - use the document data
for that.  The intention is that as little information as possible is
stored in values, so that they can be accessed as quickly as possible
during the search operation.

Unlike terms, each document may have at most one value in each field
(whereas there may be an arbitrary number of terms in a given field).
If an attempt to add multiple values to a single field is made, only
the last value added will be stored.

Definition at line 151 of file datastructures.py.

00151 
00152     def add_value(self, field, value, purpose=''):
00153         """Add a value to the document.
00154 
00155         Values are additional units of information used when performing
00156         searches.  Note that values are _not_ intended to be used to store
00157         information for display in the search results - use the document data
00158         for that.  The intention is that as little information as possible is
00159         stored in values, so that they can be accessed as quickly as possible
00160         during the search operation.
00161         
00162         Unlike terms, each document may have at most one value in each field
00163         (whereas there may be an arbitrary number of terms in a given field).
00164         If an attempt to add multiple values to a single field is made, only
00165         the last value added will be stored.
00166 
00167         """
00168         slot = self._fieldmappings.get_slot(field, purpose)
00169         self._doc.add_value(slot, value)

def MoinMoin.support.xappy.datastructures.ProcessedDocument.get_value (   self,
  field,
  purpose = '' 
) [inherited]
Get a value from the document.

Definition at line 170 of file datastructures.py.

00170 
00171     def get_value(self, field, purpose=''):
00172         """Get a value from the document.
00173 
00174         """
00175         slot = self._fieldmappings.get_slot(field, purpose)
00176         return self._doc.get_value(slot)

def MoinMoin.support.xappy.searchconnection.SearchResult.highlight (   self,
  field,
  hl = ('<b>', '</b>',
  strip_tags = False,
  query = None 
)
Return a highlighted version of the field specified.

This will return all the contents of the field stored in the search
result, with words which match the query highlighted.

The return value will be a list of strings (corresponding to the list
of strings which is the raw field data).

Each highlight will consist of the first entry in the `hl` list being
placed before the word, and the second entry in the `hl` list being
placed after the word.

If `strip_tags` is True, any XML or HTML style markup tags in the field
will be stripped before highlighting is applied.

If `query` is supplied, it should contain a Query object, as returned
from SearchConnection.query_parse() or related methods, which will be
used as the basis of the summarisation and highlighting rather than the
query which was used for the search.

Raises KeyError if the field is not known.

Definition at line 125 of file searchconnection.py.

00125 
00126     def highlight(self, field, hl=('<b>', '</b>'), strip_tags=False, query=None):
00127         """Return a highlighted version of the field specified.
00128 
00129         This will return all the contents of the field stored in the search
00130         result, with words which match the query highlighted.
00131 
00132         The return value will be a list of strings (corresponding to the list
00133         of strings which is the raw field data).
00134 
00135         Each highlight will consist of the first entry in the `hl` list being
00136         placed before the word, and the second entry in the `hl` list being
00137         placed after the word.
00138 
00139         If `strip_tags` is True, any XML or HTML style markup tags in the field
00140         will be stripped before highlighting is applied.
00141 
00142         If `query` is supplied, it should contain a Query object, as returned
00143         from SearchConnection.query_parse() or related methods, which will be
00144         used as the basis of the summarisation and highlighting rather than the
00145         query which was used for the search.
00146 
00147         Raises KeyError if the field is not known.
00148 
00149         """
00150         highlighter = _highlight.Highlighter(language_code=self._get_language(field))
00151         field = self.data[field]
00152         results = []
00153         if query is None:
00154             query = self._results._query
00155         for text in field:
00156             results.append(highlighter.highlight(text, query, hl, strip_tags))
00157         return results

Here is the call graph for this function:

Prepare the document for adding to a xapian database.

This updates the internal xapian document with any changes which have
been made, and then returns it.

Definition at line 177 of file datastructures.py.

00177 
00178     def prepare(self):
00179         """Prepare the document for adding to a xapian database.
00180 
00181         This updates the internal xapian document with any changes which have
00182         been made, and then returns it.
00183 
00184         """
00185         if self._data is not None:
00186             self._doc.set_data(cPickle.dumps(self._data, 2))
00187             self._data = None
00188         return self._doc

def MoinMoin.support.xappy.searchconnection.SearchResult.summarise (   self,
  field,
  maxlen = 600,
  hl = ('<b>', '</b>',
  query = None 
)
Return a summarised version of the field specified.

This will return a summary of the contents of the field stored in the
search result, with words which match the query highlighted.

The maximum length of the summary (in characters) may be set using the
maxlen parameter.

The return value will be a string holding the summary, with
highlighting applied.  If there are multiple instances of the field in
the document, the instances will be joined with a newline character.

To turn off highlighting, set hl to None.  Each highlight will consist
of the first entry in the `hl` list being placed before the word, and
the second entry in the `hl` list being placed after the word.

Any XML or HTML style markup tags in the field will be stripped before
the summarisation algorithm is applied.

If `query` is supplied, it should contain a Query object, as returned
from SearchConnection.query_parse() or related methods, which will be
used as the basis of the summarisation and highlighting rather than the
query which was used for the search.

Raises KeyError if the field is not known.

Definition at line 89 of file searchconnection.py.

00089 
00090     def summarise(self, field, maxlen=600, hl=('<b>', '</b>'), query=None):
00091         """Return a summarised version of the field specified.
00092 
00093         This will return a summary of the contents of the field stored in the
00094         search result, with words which match the query highlighted.
00095 
00096         The maximum length of the summary (in characters) may be set using the
00097         maxlen parameter.
00098 
00099         The return value will be a string holding the summary, with
00100         highlighting applied.  If there are multiple instances of the field in
00101         the document, the instances will be joined with a newline character.
00102         
00103         To turn off highlighting, set hl to None.  Each highlight will consist
00104         of the first entry in the `hl` list being placed before the word, and
00105         the second entry in the `hl` list being placed after the word.
00106 
00107         Any XML or HTML style markup tags in the field will be stripped before
00108         the summarisation algorithm is applied.
00109 
00110         If `query` is supplied, it should contain a Query object, as returned
00111         from SearchConnection.query_parse() or related methods, which will be
00112         used as the basis of the summarisation and highlighting rather than the
00113         query which was used for the search.
00114 
00115         Raises KeyError if the field is not known.
00116 
00117         """
00118         highlighter = _highlight.Highlighter(language_code=self._get_language(field))
00119         field = self.data[field]
00120         results = []
00121         text = '\n'.join(field)
00122         if query is None:
00123             query = self._results._query
00124         return highlighter.makeSample(text, query, maxlen, hl)

Here is the call graph for this function:


Member Data Documentation

Definition at line 71 of file searchconnection.py.

Definition at line 70 of file searchconnection.py.

Definition at line 68 of file searchconnection.py.

Definition at line 69 of file searchconnection.py.


Property Documentation

Initial value:
property(_get_data, _set_data, doc=
    """The data stored in this processed document.This data is a dictionary of entries, where the key is a fieldname, and thevalue is a list of strings.""")

Definition at line 201 of file datastructures.py.

Initial value:
property(_get_id, _set_id, doc=
    """The unique ID for this document.""")

Definition at line 228 of file datastructures.py.


The documentation for this class was generated from the following file: