Back to index

moin  1.9.0~rc2
Public Member Functions | Static Public Attributes | Private Member Functions | Private Attributes
MoinMoin.support.xappy.indexerconnection.IndexerConnection Class Reference

List of all members.

Public Member Functions

def __init__
def set_max_mem_use
def add_field_action
def clear_field_actions
def get_fields_with_actions
def process
def add
def replace
def add_synonym
def remove_synonym
def clear_synonyms
def add_subfacet
def remove_subfacet
def get_subfacets
def set_facet_for_query_type
def get_facets_for_query_type
def set_metadata
def get_metadata
def delete
def flush
def close
def get_doccount
def iterids
def get_document
def iter_synonyms
def iter_subfacets
def iter_facet_query_types

Static Public Attributes

int FacetQueryType_Preferred = 1
int FacetQueryType_Never = 2

Private Member Functions

def _store_config
def _load_config
def _allocate_id
def _get_bytes_used_by_doc_terms
def _make_synonym_key
def _assert_facet

Private Attributes

 _index
 _indexpath
 _field_actions
 _field_mappings
 _facet_hierarchy
 _facet_query_table
 _next_docid
 _config_modified
 _mem_buffered
 _max_mem

Detailed Description

A connection to the search engine for indexing.

Definition at line 34 of file indexerconnection.py.


Constructor & Destructor Documentation

Create a new connection to the index.

There may only be one indexer connection for a particular database open
at a given time.  Therefore, if a connection to the database is already
open, this will raise a xapian.DatabaseLockError.

If the database doesn't already exist, it will be created.

Definition at line 39 of file indexerconnection.py.

00039 
00040     def __init__(self, indexpath):
00041         """Create a new connection to the index.
00042 
00043         There may only be one indexer connection for a particular database open
00044         at a given time.  Therefore, if a connection to the database is already
00045         open, this will raise a xapian.DatabaseLockError.
00046 
00047         If the database doesn't already exist, it will be created.
00048 
00049         """
00050         self._index = log(xapian.WritableDatabase, indexpath, xapian.DB_CREATE_OR_OPEN)
00051         self._indexpath = indexpath
00052 
00053         # Read existing actions.
00054         self._field_actions = {}
00055         self._field_mappings = fieldmappings.FieldMappings()
00056         self._facet_hierarchy = {}
00057         self._facet_query_table = {}
00058         self._next_docid = 0
00059         self._config_modified = False
00060         self._load_config()
00061 
00062         # Set management of the memory used.
00063         # This can be removed once Xapian implements this itself.
00064         self._mem_buffered = 0
00065         self.set_max_mem_use()


Member Function Documentation

Allocate a new ID.

Definition at line 151 of file indexerconnection.py.

00151 
00152     def _allocate_id(self):
00153         """Allocate a new ID.
00154 
00155         """
00156         while True:
00157             idstr = "%x" % self._next_docid
00158             self._next_docid += 1
00159             if not self._index.term_exists('Q' + idstr):
00160                 break
00161         self._config_modified = True
00162         return idstr

Here is the caller graph for this function:

Raise an error if facet is not a declared facet field.

Definition at line 394 of file indexerconnection.py.

00394 
00395     def _assert_facet(self, facet):
00396         """Raise an error if facet is not a declared facet field.
00397 
00398         """
00399         for action in self._field_actions[facet]._actions:
00400             if action == FieldActions.FACET:
00401                 return
00402         raise errors.IndexerError("Field %r is not indexed as a facet" % facet)

Here is the caller graph for this function:

Get an estimate of the bytes used by the terms in a document.

(This is a very rough estimate.)

Definition at line 234 of file indexerconnection.py.

00234 
00235     def _get_bytes_used_by_doc_terms(self, xapdoc):
00236         """Get an estimate of the bytes used by the terms in a document.
00237 
00238         (This is a very rough estimate.)
00239 
00240         """
00241         count = 0
00242         for item in xapdoc.termlist():
00243             # The term may also be stored in the spelling correction table, so
00244             # double the amount used.
00245             count += len(item.term) * 2
00246 
00247             # Add a few more bytes for holding the wdf, and other bits and
00248             # pieces.
00249             count += 8
00250 
00251         # Empirical observations indicate that about 5 times as much memory as
00252         # the above calculation predicts is used for buffering in practice.
00253         return count * 5

Here is the caller graph for this function:

Load the configuration for the database.

Definition at line 130 of file indexerconnection.py.

00130 
00131     def _load_config(self):
00132         """Load the configuration for the database.
00133 
00134         """
00135         assert self._index is not None
00136 
00137         config_str = log(self._index.get_metadata, '_xappy_config')
00138         if len(config_str) == 0:
00139             return
00140 
00141         try:
00142             (self._field_actions, mappings, self._facet_hierarchy, self._facet_query_table, self._next_docid) = cPickle.loads(config_str)
00143         except ValueError:
00144             # Backwards compatibility - configuration used to lack _facet_hierarchy and _facet_query_table
00145             (self._field_actions, mappings, self._next_docid) = cPickle.loads(config_str)
00146             self._facet_hierarchy = {}
00147             self._facet_query_table = {}
00148         self._field_mappings = fieldmappings.FieldMappings(mappings)
00149 
00150         self._config_modified = False

Here is the call graph for this function:

Here is the caller graph for this function:

def MoinMoin.support.xappy.indexerconnection.IndexerConnection._make_synonym_key (   self,
  original,
  field 
) [private]
Make a synonym key (ie, the term or group of terms to store in
xapian).

Definition at line 326 of file indexerconnection.py.

00326 
00327     def _make_synonym_key(self, original, field):
00328         """Make a synonym key (ie, the term or group of terms to store in
00329         xapian).
00330 
00331         """
00332         if field is not None:
00333             prefix = self._field_mappings.get_prefix(field)
00334         else:
00335             prefix = ''
00336         original = original.lower()
00337         # Add the prefix to the start of each word.
00338         return ' '.join((prefix + word for word in original.split(' ')))

Here is the call graph for this function:

Here is the caller graph for this function:

Store the configuration for the database.

Currently, this stores the configuration in a file in the database
directory, so changes to it are not protected by transactions.  When
support is available in xapian for storing metadata associated with
databases. this will be used instead of a file.

Definition at line 108 of file indexerconnection.py.

00108 
00109     def _store_config(self):
00110         """Store the configuration for the database.
00111 
00112         Currently, this stores the configuration in a file in the database
00113         directory, so changes to it are not protected by transactions.  When
00114         support is available in xapian for storing metadata associated with
00115         databases. this will be used instead of a file.
00116 
00117         """
00118         assert self._index is not None
00119 
00120         config_str = cPickle.dumps((
00121                                      self._field_actions,
00122                                      self._field_mappings.serialise(),
00123                                      self._facet_hierarchy,
00124                                      self._facet_query_table,
00125                                      self._next_docid,
00126                                     ), 2)
00127         log(self._index.set_metadata, '_xappy_config', config_str)
00128 
00129         self._config_modified = False

Here is the call graph for this function:

Here is the caller graph for this function:

Add a new document to the search engine index.

If the document has a id set, and the id already exists in
the database, an exception will be raised.  Use the replace() method
instead if you wish to overwrite documents.

Returns the id of the newly added document (making up a new
unique ID if no id was set).

The supplied document may be an instance of UnprocessedDocument, or an
instance of ProcessedDocument.

Definition at line 254 of file indexerconnection.py.

00254 
00255     def add(self, document):
00256         """Add a new document to the search engine index.
00257 
00258         If the document has a id set, and the id already exists in
00259         the database, an exception will be raised.  Use the replace() method
00260         instead if you wish to overwrite documents.
00261 
00262         Returns the id of the newly added document (making up a new
00263         unique ID if no id was set).
00264 
00265         The supplied document may be an instance of UnprocessedDocument, or an
00266         instance of ProcessedDocument.
00267 
00268         """
00269         if self._index is None:
00270             raise errors.IndexerError("IndexerConnection has been closed")
00271         if not hasattr(document, '_doc'):
00272             # It's not a processed document.
00273             document = self.process(document)
00274 
00275         # Ensure that we have a id
00276         orig_id = document.id
00277         if orig_id is None:
00278             id = self._allocate_id()
00279             document.id = id
00280         else:
00281             id = orig_id
00282             if self._index.term_exists('Q' + id):
00283                 raise errors.IndexerError("Document ID of document supplied to add() is not unique.")
00284             
00285         # Add the document.
00286         xapdoc = document.prepare()
00287         self._index.add_document(xapdoc)
00288 
00289         if self._max_mem is not None:
00290             self._mem_buffered += self._get_bytes_used_by_doc_terms(xapdoc)
00291             if self._mem_buffered > self._max_mem:
00292                 self.flush()
00293 
00294         if id is not orig_id:
00295             document.id = orig_id
00296         return id

Here is the call graph for this function:

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.add_field_action (   self,
  fieldname,
  fieldtype,
  kwargs 
)
Add an action to be performed on a field.

Note that this change to the configuration will not be preserved on
disk until the next call to flush().

Definition at line 163 of file indexerconnection.py.

00163 
00164     def add_field_action(self, fieldname, fieldtype, **kwargs):
00165         """Add an action to be performed on a field.
00166 
00167         Note that this change to the configuration will not be preserved on
00168         disk until the next call to flush().
00169 
00170         """
00171         if self._index is None:
00172             raise errors.IndexerError("IndexerConnection has been closed")
00173         if fieldname in self._field_actions:
00174             actions = self._field_actions[fieldname]
00175         else:
00176             actions = FieldActions(fieldname)
00177             self._field_actions[fieldname] = actions
00178         actions.add(self._field_mappings, fieldtype, **kwargs)
00179         self._config_modified = True

Here is the caller graph for this function:

Add a subfacet-facet relationship to the facet hierarchy.

Any existing relationship for that subfacet is replaced.

Raises a KeyError if either facet or subfacet is not a field,
and an IndexerError if either facet or subfacet is not a facet field.

Definition at line 403 of file indexerconnection.py.

00403 
00404     def add_subfacet(self, subfacet, facet):
00405         """Add a subfacet-facet relationship to the facet hierarchy.
00406         
00407         Any existing relationship for that subfacet is replaced.
00408 
00409         Raises a KeyError if either facet or subfacet is not a field,
00410         and an IndexerError if either facet or subfacet is not a facet field.
00411         """
00412         if self._index is None:
00413             raise errors.IndexerError("IndexerConnection has been closed")
00414         self._assert_facet(facet)
00415         self._assert_facet(subfacet)
00416         self._facet_hierarchy[subfacet] = facet
00417         self._config_modified = True

Here is the call graph for this function:

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.add_synonym (   self,
  original,
  synonym,
  field = None,
  original_field = None,
  synonym_field = None 
)
Add a synonym to the index.

 - `original` is the word or words which will be synonym expanded in
   searches (if multiple words are specified, each word should be
   separated by a single space).
 - `synonym` is a synonym for `original`.
 - `field` is the field which the synonym is specific to.  If no field
   is specified, the synonym will be used for searches which are not
   specific to any particular field.

Definition at line 340 of file indexerconnection.py.

00340 
00341                     original_field=None, synonym_field=None):
00342         """Add a synonym to the index.
00343 
00344          - `original` is the word or words which will be synonym expanded in
00345            searches (if multiple words are specified, each word should be
00346            separated by a single space).
00347          - `synonym` is a synonym for `original`.
00348          - `field` is the field which the synonym is specific to.  If no field
00349            is specified, the synonym will be used for searches which are not
00350            specific to any particular field.
00351 
00352         """
00353         if self._index is None:
00354             raise errors.IndexerError("IndexerConnection has been closed")
00355         if original_field is None:
00356             original_field = field
00357         if synonym_field is None:
00358             synonym_field = field
00359         key = self._make_synonym_key(original, original_field)
00360         # FIXME - this only works for exact fields which have no upper case
00361         # characters, or single words
00362         value = self._make_synonym_key(synonym, synonym_field)
00363         self._index.add_synonym(key, value)

Here is the call graph for this function:

Here is the caller graph for this function:

Clear all actions for the specified field.

This does not report an error if there are already no actions for the
specified field.

Note that this change to the configuration will not be preserved on
disk until the next call to flush().

Definition at line 180 of file indexerconnection.py.

00180 
00181     def clear_field_actions(self, fieldname):
00182         """Clear all actions for the specified field.
00183 
00184         This does not report an error if there are already no actions for the
00185         specified field.
00186 
00187         Note that this change to the configuration will not be preserved on
00188         disk until the next call to flush().
00189 
00190         """
00191         if self._index is None:
00192             raise errors.IndexerError("IndexerConnection has been closed")
00193         if fieldname in self._field_actions:
00194             del self._field_actions[fieldname]
00195             self._config_modified = True

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.clear_synonyms (   self,
  original,
  field = None 
)
Remove all synonyms for a word (or phrase).

 - `field` is the field which this synonym is specific to.  If no field
   is specified, the synonym will be used for searches which are not
   specific to any particular field.

Definition at line 381 of file indexerconnection.py.

00381 
00382     def clear_synonyms(self, original, field=None):
00383         """Remove all synonyms for a word (or phrase).
00384 
00385          - `field` is the field which this synonym is specific to.  If no field
00386            is specified, the synonym will be used for searches which are not
00387            specific to any particular field.
00388 
00389         """
00390         if self._index is None:
00391             raise errors.IndexerError("IndexerConnection has been closed")
00392         key = self._make_synonym_key(original, field)
00393         self._index.clear_synonyms(key)

Here is the call graph for this function:

Close the connection to the database.

It is important to call this method before allowing the class to be
garbage collected, because it will ensure that any un-flushed changes
will be flushed.  It also ensures that the connection is cleaned up
promptly.

No other methods may be called on the connection after this has been
called.  (It is permissible to call close() multiple times, but
only the first call will have any effect.)

If an exception occurs, the database will be closed, but changes since
the last call to flush may be lost.

Definition at line 545 of file indexerconnection.py.

00545 
00546     def close(self):
00547         """Close the connection to the database.
00548 
00549         It is important to call this method before allowing the class to be
00550         garbage collected, because it will ensure that any un-flushed changes
00551         will be flushed.  It also ensures that the connection is cleaned up
00552         promptly.
00553 
00554         No other methods may be called on the connection after this has been
00555         called.  (It is permissible to call close() multiple times, but
00556         only the first call will have any effect.)
00557 
00558         If an exception occurs, the database will be closed, but changes since
00559         the last call to flush may be lost.
00560 
00561         """
00562         if self._index is None:
00563             return
00564         try:
00565             self.flush()
00566         finally:
00567             # There is currently no "close()" method for xapian databases, so
00568             # we have to rely on the garbage collector.  Since we never copy
00569             # the _index property out of this class, there should be no cycles,
00570             # so the standard python implementation should garbage collect
00571             # _index straight away.  A close() method is planned to be added to
00572             # xapian at some point - when it is, we should call it here to make
00573             # the code more robust.
00574             self._index = None
00575             self._indexpath = None
00576             self._field_actions = None
00577             self._config_modified = False

Here is the call graph for this function:

Here is the caller graph for this function:

Delete a document from the search engine index.

If the id does not already exist in the database, this method
will have no effect (and will not report an error).

Definition at line 520 of file indexerconnection.py.

00520 
00521     def delete(self, id):
00522         """Delete a document from the search engine index.
00523 
00524         If the id does not already exist in the database, this method
00525         will have no effect (and will not report an error).
00526 
00527         """
00528         if self._index is None:
00529             raise errors.IndexerError("IndexerConnection has been closed")
00530         self._index.delete_document('Q' + id)

Apply recent changes to the database.

If an exception occurs, any changes since the last call to flush() may
be lost.

Definition at line 531 of file indexerconnection.py.

00531 
00532     def flush(self):
00533         """Apply recent changes to the database.
00534 
00535         If an exception occurs, any changes since the last call to flush() may
00536         be lost.
00537 
00538         """
00539         if self._index is None:
00540             raise errors.IndexerError("IndexerConnection has been closed")
00541         if self._config_modified:
00542             self._store_config()
00543         self._index.flush()
00544         self._mem_buffered = 0

Here is the call graph for this function:

Here is the caller graph for this function:

Count the number of documents in the database.

This count will include documents which have been added or removed but
not yet flushed().

Definition at line 578 of file indexerconnection.py.

00578 
00579     def get_doccount(self):
00580         """Count the number of documents in the database.
00581 
00582         This count will include documents which have been added or removed but
00583         not yet flushed().
00584 
00585         """
00586         if self._index is None:
00587             raise errors.IndexerError("IndexerConnection has been closed")
00588         return self._index.get_doccount()

Here is the caller graph for this function:

Get the document with the specified unique ID.

Raises a KeyError if there is no such document.  Otherwise, it returns
a ProcessedDocument.

Definition at line 600 of file indexerconnection.py.

00600 
00601     def get_document(self, id):
00602         """Get the document with the specified unique ID.
00603 
00604         Raises a KeyError if there is no such document.  Otherwise, it returns
00605         a ProcessedDocument.
00606 
00607         """
00608         if self._index is None:
00609             raise errors.IndexerError("IndexerConnection has been closed")
00610         postlist = self._index.postlist('Q' + id)
00611         try:
00612             plitem = postlist.next()
00613         except StopIteration:
00614             # Unique ID not found
00615             raise KeyError('Unique ID %r not found' % id)
00616         try:
00617             postlist.next()
00618             raise errors.IndexerError("Multiple documents " #pragma: no cover
00619                                        "found with same unique ID")
00620         except StopIteration:
00621             # Only one instance of the unique ID found, as it should be.
00622             pass
00623 
00624         result = ProcessedDocument(self._field_mappings)
00625         result.id = id
00626         result._doc = self._index.get_document(plitem.docid)
00627         return result

Get the set of facets associated with a query type.

Only those facets associated with the query type in the specified
manner are returned; `association` must be one of
IndexerConnection.FacetQueryType_Preferred or
IndexerConnection.FacetQueryType_Never.

If the query type has no facets associated with it, None is returned.

Definition at line 463 of file indexerconnection.py.

00463 
00464     def get_facets_for_query_type(self, query_type, association):
00465         """Get the set of facets associated with a query type.
00466 
00467         Only those facets associated with the query type in the specified
00468         manner are returned; `association` must be one of
00469         IndexerConnection.FacetQueryType_Preferred or
00470         IndexerConnection.FacetQueryType_Never.
00471 
00472         If the query type has no facets associated with it, None is returned.
00473 
00474         """
00475         if self._index is None:
00476             raise errors.IndexerError("IndexerConnection has been closed")
00477         if query_type not in self._facet_query_table:
00478             return None
00479         facet_dict = self._facet_query_table[query_type]
00480         return set([facet for facet, assoc in facet_dict.iteritems() if assoc == association])

Get a list of field names which have actions defined.

Definition at line 196 of file indexerconnection.py.

00196 
00197     def get_fields_with_actions(self):
00198         """Get a list of field names which have actions defined.
00199 
00200         """
00201         if self._index is None:
00202             raise errors.IndexerError("IndexerConnection has been closed")
00203         return self._field_actions.keys()

Get an item of metadata stored in the connection.

This returns a value stored by a previous call to set_metadata.

If the value is not found, this will return the empty string.

Definition at line 506 of file indexerconnection.py.

00506 
00507     def get_metadata(self, key):
00508         """Get an item of metadata stored in the connection.
00509 
00510         This returns a value stored by a previous call to set_metadata.
00511 
00512         If the value is not found, this will return the empty string.
00513 
00514         """
00515         if self._index is None:
00516             raise errors.IndexerError("IndexerConnection has been closed")
00517         if not hasattr(self._index, 'get_metadata'):
00518             raise errors.IndexerError("Version of xapian in use does not support metadata")
00519         return log(self._index.get_metadata, key)

Here is the call graph for this function:

Get a list of subfacets of a facet.

Definition at line 428 of file indexerconnection.py.

00428 
00429     def get_subfacets(self, facet):
00430         """Get a list of subfacets of a facet.
00431 
00432         """
00433         if self._index is None:
00434             raise errors.IndexerError("IndexerConnection has been closed")
00435         return [k for k, v in self._facet_hierarchy.iteritems() if v == facet] 

Get an iterator over query types and their associated facets.

Only facets associated with the query types in the specified manner
are returned; `association` must be one of IndexerConnection.FacetQueryType_Preferred
or IndexerConnection.FacetQueryType_Never.

The iterator returns 2-tuples, in which the first item is the query
type and the second item is the associated set of facets.

The return values are suitable for the dict() builtin, for example:

 >>> conn = IndexerConnection('db')
 >>> conn.add_field_action('foo', FieldActions.FACET)
 >>> conn.add_field_action('bar', FieldActions.FACET)
 >>> conn.add_field_action('baz', FieldActions.FACET)
 >>> conn.set_facet_for_query_type('type1', 'foo', conn.FacetQueryType_Preferred)
 >>> conn.set_facet_for_query_type('type1', 'bar', conn.FacetQueryType_Never)
 >>> conn.set_facet_for_query_type('type1', 'baz', conn.FacetQueryType_Never)
 >>> conn.set_facet_for_query_type('type2', 'bar', conn.FacetQueryType_Preferred)
 >>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Preferred))
 {'type1': set(['foo']), 'type2': set(['bar'])}
 >>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Never))
 {'type1': set(['bar', 'baz'])}

Definition at line 679 of file indexerconnection.py.

00679 
00680     def iter_facet_query_types(self, association):
00681         """Get an iterator over query types and their associated facets.
00682 
00683         Only facets associated with the query types in the specified manner
00684         are returned; `association` must be one of IndexerConnection.FacetQueryType_Preferred
00685         or IndexerConnection.FacetQueryType_Never.
00686 
00687         The iterator returns 2-tuples, in which the first item is the query
00688         type and the second item is the associated set of facets.
00689 
00690         The return values are suitable for the dict() builtin, for example:
00691 
00692          >>> conn = IndexerConnection('db')
00693          >>> conn.add_field_action('foo', FieldActions.FACET)
00694          >>> conn.add_field_action('bar', FieldActions.FACET)
00695          >>> conn.add_field_action('baz', FieldActions.FACET)
00696          >>> conn.set_facet_for_query_type('type1', 'foo', conn.FacetQueryType_Preferred)
00697          >>> conn.set_facet_for_query_type('type1', 'bar', conn.FacetQueryType_Never)
00698          >>> conn.set_facet_for_query_type('type1', 'baz', conn.FacetQueryType_Never)
00699          >>> conn.set_facet_for_query_type('type2', 'bar', conn.FacetQueryType_Preferred)
00700          >>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Preferred))
00701          {'type1': set(['foo']), 'type2': set(['bar'])}
00702          >>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Never))
00703          {'type1': set(['bar', 'baz'])}
00704 
00705         """
00706         if self._index is None:
00707             raise errors.IndexerError("IndexerConnection has been closed")
00708         if 'facets' in _checkxapian.missing_features:
00709             raise errors.IndexerError("Facets unsupported with this release of xapian")
00710         return FacetQueryTypeIter(self._facet_query_table, association)

Get an iterator over the facet hierarchy.

The iterator returns 2-tuples, in which the first item is the
subfacet and the second item is its parent facet.

The return values are suitable for the dict() builtin, for example:

 >>> conn = IndexerConnection('db')
 >>> conn.add_field_action('foo', FieldActions.FACET)
 >>> conn.add_field_action('bar', FieldActions.FACET)
 >>> conn.add_field_action('baz', FieldActions.FACET)
 >>> conn.add_subfacet('foo', 'bar')
 >>> conn.add_subfacet('baz', 'bar')
 >>> dict(conn.iter_subfacets())
 {'foo': 'bar', 'baz': 'bar'}

Definition at line 655 of file indexerconnection.py.

00655 
00656     def iter_subfacets(self):
00657         """Get an iterator over the facet hierarchy.
00658 
00659         The iterator returns 2-tuples, in which the first item is the
00660         subfacet and the second item is its parent facet.
00661 
00662         The return values are suitable for the dict() builtin, for example:
00663 
00664          >>> conn = IndexerConnection('db')
00665          >>> conn.add_field_action('foo', FieldActions.FACET)
00666          >>> conn.add_field_action('bar', FieldActions.FACET)
00667          >>> conn.add_field_action('baz', FieldActions.FACET)
00668          >>> conn.add_subfacet('foo', 'bar')
00669          >>> conn.add_subfacet('baz', 'bar')
00670          >>> dict(conn.iter_subfacets())
00671          {'foo': 'bar', 'baz': 'bar'}
00672 
00673         """
00674         if self._index is None:
00675             raise errors.IndexerError("IndexerConnection has been closed")
00676         if 'facets' in _checkxapian.missing_features:
00677             raise errors.IndexerError("Facets unsupported with this release of xapian")
00678         return self._facet_hierarchy.iteritems()

Get an iterator over the synonyms.

 - `prefix`: if specified, only synonym keys with this prefix will be
   returned.

The iterator returns 2-tuples, in which the first item is the key (ie,
a 2-tuple holding the term or terms which will be synonym expanded,
followed by the fieldname specified (or None if no fieldname)), and the
second item is a tuple of strings holding the synonyms for the first
item.

These return values are suitable for the dict() builtin, so you can
write things like:

 >>> conn = IndexerConnection('foo')
 >>> conn.add_synonym('foo', 'bar')
 >>> conn.add_synonym('foo bar', 'baz')
 >>> conn.add_synonym('foo bar', 'foo baz')
 >>> dict(conn.iter_synonyms())
 {('foo', None): ('bar',), ('foo bar', None): ('baz', 'foo baz')}

Definition at line 628 of file indexerconnection.py.

00628 
00629     def iter_synonyms(self, prefix=""):
00630         """Get an iterator over the synonyms.
00631 
00632          - `prefix`: if specified, only synonym keys with this prefix will be
00633            returned.
00634 
00635         The iterator returns 2-tuples, in which the first item is the key (ie,
00636         a 2-tuple holding the term or terms which will be synonym expanded,
00637         followed by the fieldname specified (or None if no fieldname)), and the
00638         second item is a tuple of strings holding the synonyms for the first
00639         item.
00640 
00641         These return values are suitable for the dict() builtin, so you can
00642         write things like:
00643 
00644          >>> conn = IndexerConnection('foo')
00645          >>> conn.add_synonym('foo', 'bar')
00646          >>> conn.add_synonym('foo bar', 'baz')
00647          >>> conn.add_synonym('foo bar', 'foo baz')
00648          >>> dict(conn.iter_synonyms())
00649          {('foo', None): ('bar',), ('foo bar', None): ('baz', 'foo baz')}
00650 
00651         """
00652         if self._index is None:
00653             raise errors.IndexerError("IndexerConnection has been closed")
00654         return SynonymIter(self._index, self._field_mappings, prefix)

Get an iterator which returns all the ids in the database.

The unqiue_ids are currently returned in binary lexicographical sort
order, but this should not be relied on.

Definition at line 589 of file indexerconnection.py.

00589 
00590     def iterids(self):
00591         """Get an iterator which returns all the ids in the database.
00592 
00593         The unqiue_ids are currently returned in binary lexicographical sort
00594         order, but this should not be relied on.
00595 
00596         """
00597         if self._index is None:
00598             raise errors.IndexerError("IndexerConnection has been closed")
00599         return PrefixedTermIter('Q', self._index.allterms())

Process an UnprocessedDocument with the settings in this database.

The resulting ProcessedDocument is returned.

Note that this processing will be automatically performed if an
UnprocessedDocument is supplied to the add() or replace() methods of
IndexerConnection.  This method is exposed to allow the processing to
be performed separately, which may be desirable if you wish to manually
modify the processed document before adding it to the database, or if
you want to split processing of documents from adding documents to the
database for performance reasons.

Definition at line 204 of file indexerconnection.py.

00204 
00205     def process(self, document):
00206         """Process an UnprocessedDocument with the settings in this database.
00207 
00208         The resulting ProcessedDocument is returned.
00209 
00210         Note that this processing will be automatically performed if an
00211         UnprocessedDocument is supplied to the add() or replace() methods of
00212         IndexerConnection.  This method is exposed to allow the processing to
00213         be performed separately, which may be desirable if you wish to manually
00214         modify the processed document before adding it to the database, or if
00215         you want to split processing of documents from adding documents to the
00216         database for performance reasons.
00217 
00218         """
00219         if self._index is None:
00220             raise errors.IndexerError("IndexerConnection has been closed")
00221         result = ProcessedDocument(self._field_mappings)
00222         result.id = document.id
00223         context = ActionContext(self._index)
00224 
00225         for field in document.fields:
00226             try:
00227                 actions = self._field_actions[field.name]
00228             except KeyError:
00229                 # If no actions are defined, just ignore the field.
00230                 continue
00231             actions.perform(result, field.value, context)
00232 
00233         return result

Here is the caller graph for this function:

Remove any existing facet hierarchy relationship for a subfacet.

Definition at line 418 of file indexerconnection.py.

00418 
00419     def remove_subfacet(self, subfacet):
00420         """Remove any existing facet hierarchy relationship for a subfacet.
00421 
00422         """
00423         if self._index is None:
00424             raise errors.IndexerError("IndexerConnection has been closed")
00425         if subfacet in self._facet_hierarchy:
00426             del self._facet_hierarchy[subfacet]
00427             self._config_modified = True

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.remove_synonym (   self,
  original,
  synonym,
  field = None 
)
Remove a synonym from the index.

 - `original` is the word or words which will be synonym expanded in
   searches (if multiple words are specified, each word should be
   separated by a single space).
 - `synonym` is a synonym for `original`.
 - `field` is the field which this synonym is specific to.  If no field
   is specified, the synonym will be used for searches which are not
   specific to any particular field.

Definition at line 364 of file indexerconnection.py.

00364 
00365     def remove_synonym(self, original, synonym, field=None):
00366         """Remove a synonym from the index.
00367 
00368          - `original` is the word or words which will be synonym expanded in
00369            searches (if multiple words are specified, each word should be
00370            separated by a single space).
00371          - `synonym` is a synonym for `original`.
00372          - `field` is the field which this synonym is specific to.  If no field
00373            is specified, the synonym will be used for searches which are not
00374            specific to any particular field.
00375 
00376         """
00377         if self._index is None:
00378             raise errors.IndexerError("IndexerConnection has been closed")
00379         key = self._make_synonym_key(original, field)
00380         self._index.remove_synonym(key, synonym.lower())

Here is the call graph for this function:

Replace a document in the search engine index.

If the document does not have a id set, an exception will be
raised.

If the document has a id set, and the id does not already
exist in the database, this method will have the same effect as add().

Definition at line 297 of file indexerconnection.py.

00297 
00298     def replace(self, document):
00299         """Replace a document in the search engine index.
00300 
00301         If the document does not have a id set, an exception will be
00302         raised.
00303 
00304         If the document has a id set, and the id does not already
00305         exist in the database, this method will have the same effect as add().
00306 
00307         """
00308         if self._index is None:
00309             raise errors.IndexerError("IndexerConnection has been closed")
00310         if not hasattr(document, '_doc'):
00311             # It's not a processed document.
00312             document = self.process(document)
00313 
00314         # Ensure that we have a id
00315         id = document.id
00316         if id is None:
00317             raise errors.IndexerError("No document ID set for document supplied to replace().")
00318 
00319         xapdoc = document.prepare()
00320         self._index.replace_document('Q' + id, xapdoc)
00321 
00322         if self._max_mem is not None:
00323             self._mem_buffered += self._get_bytes_used_by_doc_terms(xapdoc)
00324             if self._mem_buffered > self._max_mem:
00325                 self.flush()

Here is the call graph for this function:

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.set_facet_for_query_type (   self,
  query_type,
  facet,
  association 
)
Set the association between a query type and a facet.

The value of `association` must be one of
IndexerConnection.FacetQueryType_Preferred,
IndexerConnection.FacetQueryType_Never or None. A value of None removes
any previously set association.

Definition at line 438 of file indexerconnection.py.

00438 
00439     def set_facet_for_query_type(self, query_type, facet, association):
00440         """Set the association between a query type and a facet.
00441 
00442         The value of `association` must be one of
00443         IndexerConnection.FacetQueryType_Preferred,
00444         IndexerConnection.FacetQueryType_Never or None. A value of None removes
00445         any previously set association.
00446 
00447         """
00448         if self._index is None:
00449             raise errors.IndexerError("IndexerConnection has been closed")
00450         if query_type is None:
00451             raise errors.IndexerError("Cannot set query type information for None")
00452         self._assert_facet(facet)
00453         if query_type not in self._facet_query_table:
00454             self._facet_query_table[query_type] = {}
00455         if association is None:
00456             if facet in self._facet_query_table[query_type]:
00457                 del self._facet_query_table[query_type][facet]
00458         else:
00459             self._facet_query_table[query_type][facet] = association;
00460         if self._facet_query_table[query_type] == {}:
00461             del self._facet_query_table[query_type]
00462         self._config_modified = True

Here is the call graph for this function:

def MoinMoin.support.xappy.indexerconnection.IndexerConnection.set_max_mem_use (   self,
  max_mem = None,
  max_mem_proportion = None 
)
Set the maximum memory to use.

This call allows the amount of memory to use to buffer changes to be
set.  This will affect the speed of indexing, but should not result in
other changes to the indexing.

Note: this is an approximate measure - the actual amount of memory used
max exceed the specified amount.  Also, note that future versions of
xapian are likely to implement this differently, so this setting may be
entirely ignored.

The absolute amount of memory to use (in bytes) may be set by setting
max_mem.  Alternatively, the proportion of the available memory may be
set by setting max_mem_proportion (this should be a value between 0 and
1).

Setting too low a value will result in excessive flushing, and very
slow indexing.  Setting too high a value will result in excessive
buffering, leading to swapping, and very slow indexing.

A reasonable default for max_mem_proportion for a system which is
dedicated to indexing is probably 0.5: if other tasks are also being
performed on the system, the value should be lowered.

Definition at line 66 of file indexerconnection.py.

00066 
00067     def set_max_mem_use(self, max_mem=None, max_mem_proportion=None):
00068         """Set the maximum memory to use.
00069 
00070         This call allows the amount of memory to use to buffer changes to be
00071         set.  This will affect the speed of indexing, but should not result in
00072         other changes to the indexing.
00073 
00074         Note: this is an approximate measure - the actual amount of memory used
00075         max exceed the specified amount.  Also, note that future versions of
00076         xapian are likely to implement this differently, so this setting may be
00077         entirely ignored.
00078 
00079         The absolute amount of memory to use (in bytes) may be set by setting
00080         max_mem.  Alternatively, the proportion of the available memory may be
00081         set by setting max_mem_proportion (this should be a value between 0 and
00082         1).
00083 
00084         Setting too low a value will result in excessive flushing, and very
00085         slow indexing.  Setting too high a value will result in excessive
00086         buffering, leading to swapping, and very slow indexing.
00087 
00088         A reasonable default for max_mem_proportion for a system which is
00089         dedicated to indexing is probably 0.5: if other tasks are also being
00090         performed on the system, the value should be lowered.
00091 
00092         """
00093         if self._index is None:
00094             raise errors.IndexerError("IndexerConnection has been closed")
00095         if max_mem is not None and max_mem_proportion is not None:
00096             raise errors.IndexerError("Only one of max_mem and "
00097                                        "max_mem_proportion may be specified")
00098 
00099         if max_mem is None and max_mem_proportion is None:
00100             self._max_mem = None
00101 
00102         if max_mem_proportion is not None:
00103             physmem = memutils.get_physical_memory()
00104             if physmem is not None:
00105                 max_mem = int(physmem * max_mem_proportion)
00106 
00107         self._max_mem = max_mem

Set an item of metadata stored in the connection.

The value supplied will be returned by subsequent calls to
get_metadata() which use the same key.

Keys with a leading underscore are reserved for internal use - you
should not use such keys unless you really know what you are doing.

This will store the value supplied in the database.  It will not be
visible to readers (ie, search connections) until after the next flush.

The key is limited to about 200 characters (the same length as a term
is limited to).  The value can be several megabytes in size.

To remove an item of metadata, simply call this with a `value`
parameter containing an empty string.

Definition at line 481 of file indexerconnection.py.

00481 
00482     def set_metadata(self, key, value):
00483         """Set an item of metadata stored in the connection.
00484 
00485         The value supplied will be returned by subsequent calls to
00486         get_metadata() which use the same key.
00487 
00488         Keys with a leading underscore are reserved for internal use - you
00489         should not use such keys unless you really know what you are doing.
00490 
00491         This will store the value supplied in the database.  It will not be
00492         visible to readers (ie, search connections) until after the next flush.
00493 
00494         The key is limited to about 200 characters (the same length as a term
00495         is limited to).  The value can be several megabytes in size.
00496 
00497         To remove an item of metadata, simply call this with a `value`
00498         parameter containing an empty string.
00499 
00500         """
00501         if self._index is None:
00502             raise errors.IndexerError("IndexerConnection has been closed")
00503         if not hasattr(self._index, 'set_metadata'):
00504             raise errors.IndexerError("Version of xapian in use does not support metadata")
00505         log(self._index.set_metadata, key, value)

Here is the call graph for this function:


Member Data Documentation

Definition at line 58 of file indexerconnection.py.

Definition at line 55 of file indexerconnection.py.

Definition at line 56 of file indexerconnection.py.

Definition at line 53 of file indexerconnection.py.

Definition at line 54 of file indexerconnection.py.

Definition at line 49 of file indexerconnection.py.

Definition at line 50 of file indexerconnection.py.

Definition at line 99 of file indexerconnection.py.

Definition at line 63 of file indexerconnection.py.

Definition at line 57 of file indexerconnection.py.

Definition at line 437 of file indexerconnection.py.

Definition at line 436 of file indexerconnection.py.


The documentation for this class was generated from the following file: