Back to index

moin  1.9.0~rc2
Public Member Functions | Properties | Private Member Functions | Private Attributes | Static Private Attributes
MoinMoin.support.xappy.datastructures.ProcessedDocument Class Reference
Inheritance diagram for MoinMoin.support.xappy.datastructures.ProcessedDocument:
Inheritance graph
[legend]
Collaboration diagram for MoinMoin.support.xappy.datastructures.ProcessedDocument:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def add_term
def add_value
def get_value
def prepare
def __repr__

Properties

 data
 id

Private Member Functions

def _get_data
def _set_data
def _get_id
def _set_id

Private Attributes

 _doc
 _fieldmappings
 _data

Static Private Attributes

string __slots__ = '_doc'

Detailed Description

A processed document, as stored in the index.

This represents an item which is ready to be stored in the search engine,
or which has been returned by the search engine.

Definition at line 72 of file datastructures.py.


Constructor & Destructor Documentation

def MoinMoin.support.xappy.datastructures.ProcessedDocument.__init__ (   self,
  fieldmappings,
  xapdoc = None 
)
Create a ProcessedDocument.

`fieldmappings` is the configuration from a database connection used lookup
the configuration to use to store each field.
    
If supplied, `xapdoc` is a Xapian document to store in the processed
document.  Otherwise, a new Xapian document is created.

Reimplemented in MoinMoin.support.xappy.searchconnection.SearchResult.

Definition at line 81 of file datastructures.py.

00081 
00082     def __init__(self, fieldmappings, xapdoc=None):
00083         """Create a ProcessedDocument.
00084 
00085         `fieldmappings` is the configuration from a database connection used lookup
00086         the configuration to use to store each field.
00087     
00088         If supplied, `xapdoc` is a Xapian document to store in the processed
00089         document.  Otherwise, a new Xapian document is created.
00090 
00091         """
00092         if xapdoc is None:
00093             self._doc = log(xapian.Document)
00094         else:
00095             self._doc = xapdoc
00096         self._fieldmappings = fieldmappings
00097         self._data = None


Member Function Documentation

Reimplemented in MoinMoin.support.xappy.searchconnection.SearchResult.

Definition at line 233 of file datastructures.py.

00233 
00234     def __repr__(self):
00235         return '<ProcessedDocument(%r)>' % (self.id)

Definition at line 189 of file datastructures.py.

00189 
00190     def _get_data(self):
00191         if self._data is None:
00192             rawdata = self._doc.get_data()
00193             if rawdata == '':
00194                 self._data = {}
00195             else:
00196                 self._data = cPickle.loads(rawdata)
        return self._data

Definition at line 209 of file datastructures.py.

00209 
00210     def _get_id(self):
00211         tl = self._doc.termlist()
00212         try:
00213             term = tl.skip_to('Q').term
00214             if len(term) == 0 or term[0] != 'Q':
00215                 return None
00216         except StopIteration:
00217             return None
        return term[1:]

Definition at line 197 of file datastructures.py.

00197 
00198     def _set_data(self, data):
00199         if not isinstance(data, dict):
00200             raise TypeError("Cannot set data to any type other than a dict")
        self._data = data

Definition at line 218 of file datastructures.py.

00218 
00219     def _set_id(self, id):
00220         tl = self._doc.termlist()
00221         try:
00222             term = tl.skip_to('Q').term
00223         except StopIteration:
00224             term = ''
00225         if len(term) != 0 and term[0] == 'Q':
00226             self._doc.remove_term(term)
00227         if id is not None:
            self._doc.add_term('Q' + id, 0)
def MoinMoin.support.xappy.datastructures.ProcessedDocument.add_term (   self,
  field,
  term,
  wdfinc = 1,
  positions = None 
)
Add a term to the document.

Terms are the main unit of information used for performing searches.

- `field` is the field to add the term to.
- `term` is the term to add.
- `wdfinc` is the value to increase the within-document-frequency
  measure for the term by.
- `positions` is the positional information to add for the term.
  This may be None to indicate that there is no positional information,
  or may be an integer to specify one position, or may be a sequence of
  integers to specify several positions.  (Note that the wdf is not
  increased automatically for each position: if you add a term at 7
  positions, and the wdfinc value is 2, the total wdf for the term will
  only be increased by 2, not by 14.)

Definition at line 98 of file datastructures.py.

00098 
00099     def add_term(self, field, term, wdfinc=1, positions=None):
00100         """Add a term to the document.
00101 
00102         Terms are the main unit of information used for performing searches.
00103 
00104         - `field` is the field to add the term to.
00105         - `term` is the term to add.
00106         - `wdfinc` is the value to increase the within-document-frequency
00107           measure for the term by.
00108         - `positions` is the positional information to add for the term.
00109           This may be None to indicate that there is no positional information,
00110           or may be an integer to specify one position, or may be a sequence of
00111           integers to specify several positions.  (Note that the wdf is not
00112           increased automatically for each position: if you add a term at 7
00113           positions, and the wdfinc value is 2, the total wdf for the term will
00114           only be increased by 2, not by 14.)
00115 
00116         """
00117         prefix = self._fieldmappings.get_prefix(field)
00118         if len(term) > 0:
00119             # We use the following check, rather than "isupper()" to ensure
00120             # that we match the check performed by the queryparser, regardless
00121             # of our locale.
00122             if ord(term[0]) >= ord('A') and ord(term[0]) <= ord('Z'):
00123                 prefix = prefix + ':'
00124 
00125         # Note - xapian currently restricts term lengths to about 248
00126         # characters - except that zero bytes are encoded in two bytes, so
00127         # in practice a term of length 125 characters could be too long.
00128         # Xapian will give an error when commit() is called after such
00129         # documents have been added to the database.
00130         # As a simple workaround, we give an error here for terms over 220
00131         # characters, which will catch most occurrences of the error early.
00132         #
00133         # In future, it might be good to change to a hashing scheme in this
00134         # situation (or for terms over, say, 64 characters), where the
00135         # characters after position 64 are hashed (we obviously need to do this
00136         # hashing at search time, too).
00137         if len(prefix + term) > 220:
00138             raise errors.IndexerError("Field %r is too long: maximum length "
00139                                        "220 - was %d (%r)" %
00140                                        (field, len(prefix + term),
00141                                         prefix + term))
00142 
00143         if positions is None:
00144             self._doc.add_term(prefix + term, wdfinc)
00145         elif isinstance(positions, int):
00146             self._doc.add_posting(prefix + term, positions, wdfinc)
00147         else:
00148             self._doc.add_term(prefix + term, wdfinc)
00149             for pos in positions:
00150                 self._doc.add_posting(prefix + term, pos, 0)

def MoinMoin.support.xappy.datastructures.ProcessedDocument.add_value (   self,
  field,
  value,
  purpose = '' 
)
Add a value to the document.

Values are additional units of information used when performing
searches.  Note that values are _not_ intended to be used to store
information for display in the search results - use the document data
for that.  The intention is that as little information as possible is
stored in values, so that they can be accessed as quickly as possible
during the search operation.

Unlike terms, each document may have at most one value in each field
(whereas there may be an arbitrary number of terms in a given field).
If an attempt to add multiple values to a single field is made, only
the last value added will be stored.

Definition at line 151 of file datastructures.py.

00151 
00152     def add_value(self, field, value, purpose=''):
00153         """Add a value to the document.
00154 
00155         Values are additional units of information used when performing
00156         searches.  Note that values are _not_ intended to be used to store
00157         information for display in the search results - use the document data
00158         for that.  The intention is that as little information as possible is
00159         stored in values, so that they can be accessed as quickly as possible
00160         during the search operation.
00161         
00162         Unlike terms, each document may have at most one value in each field
00163         (whereas there may be an arbitrary number of terms in a given field).
00164         If an attempt to add multiple values to a single field is made, only
00165         the last value added will be stored.
00166 
00167         """
00168         slot = self._fieldmappings.get_slot(field, purpose)
00169         self._doc.add_value(slot, value)

def MoinMoin.support.xappy.datastructures.ProcessedDocument.get_value (   self,
  field,
  purpose = '' 
)
Get a value from the document.

Definition at line 170 of file datastructures.py.

00170 
00171     def get_value(self, field, purpose=''):
00172         """Get a value from the document.
00173 
00174         """
00175         slot = self._fieldmappings.get_slot(field, purpose)
00176         return self._doc.get_value(slot)

Prepare the document for adding to a xapian database.

This updates the internal xapian document with any changes which have
been made, and then returns it.

Definition at line 177 of file datastructures.py.

00177 
00178     def prepare(self):
00179         """Prepare the document for adding to a xapian database.
00180 
00181         This updates the internal xapian document with any changes which have
00182         been made, and then returns it.
00183 
00184         """
00185         if self._data is not None:
00186             self._doc.set_data(cPickle.dumps(self._data, 2))
00187             self._data = None
00188         return self._doc


Member Data Documentation

Definition at line 80 of file datastructures.py.

Definition at line 96 of file datastructures.py.

Definition at line 92 of file datastructures.py.

Definition at line 95 of file datastructures.py.


Property Documentation

Initial value:
property(_get_data, _set_data, doc=
    """The data stored in this processed document.This data is a dictionary of entries, where the key is a fieldname, and thevalue is a list of strings.""")

Definition at line 201 of file datastructures.py.

Initial value:
property(_get_id, _set_id, doc=
    """The unique ID for this document.""")

Definition at line 228 of file datastructures.py.


The documentation for this class was generated from the following file: