Back to index

openldap  2.4.31
Classes | Defines | Typedefs
Reader Lock Table
MDB Internals

Readers don't acquire any locks for their data access. More...

Collaboration diagram for Reader Lock Table:

Classes

struct  MDB_rxbody
 The information we store in a single slot of the reader table. More...
struct  MDB_reader
 The actual reader record, with cacheline padding. More...
union  MDB_reader.mru
struct  MDB_txbody
 The header for the reader table. More...
struct  MDB_txninfo
 The actual reader table definition. More...
union  MDB_txninfo.mt1
union  MDB_txninfo.mt2

Defines

#define DEFAULT_READERS   126
 Number of slots in the reader table.
#define CACHELINE   64
 The size of a CPU cache line in bytes.

Typedefs

typedef struct MDB_rxbody MDB_rxbody
 The information we store in a single slot of the reader table.
typedef struct MDB_reader MDB_reader
 The actual reader record, with cacheline padding.
typedef struct MDB_txbody MDB_txbody
 The header for the reader table.
typedef struct MDB_txninfo MDB_txninfo
 The actual reader table definition.

Detailed Description

Readers don't acquire any locks for their data access.

Instead, they simply record their transaction ID in the reader table. The reader mutex is needed just to find an empty slot in the reader table. The slot's address is saved in thread-specific data so that subsequent read transactions started by the same thread need no further locking to proceed.

Since the database uses multi-version concurrency control, readers don't actually need any locking. This table is used to keep track of which readers are using data from which old transactions, so that we'll know when a particular old transaction is no longer in use. Old transactions that have discarded any data pages can then have those pages reclaimed for use by a later write transaction.

The lock table is constructed such that reader slots are aligned with the processor's cache line size. Any slot is only ever used by one thread. This alignment guarantees that there will be no contention or cache thrashing as threads update their own slot info, and also eliminates any need for locking when accessing a slot.

A writer thread will scan every slot in the table to determine the oldest outstanding reader transaction. Any freed pages older than this will be reclaimed by the writer. The writer doesn't use any locks when scanning this table. This means that there's no guarantee that the writer will see the most up-to-date reader info, but that's not required for correct operation - all we need is to know the upper bound on the oldest reader, we don't care at all about the newest reader. So the only consequence of reading stale information here is that old pages might hang around a while longer before being reclaimed. That's actually good anyway, because the longer we delay reclaiming old pages, the more likely it is that a string of contiguous pages can be found after coalescing old pages from many old transactions together.

Todo:
We don't actually do such coalescing yet, we grab pages from one old transaction at a time.

Class Documentation

struct MDB_rxbody

The information we store in a single slot of the reader table.

In addition to a transaction ID, we also record the process and thread ID that owns a slot, so that we can detect stale information, e.g. threads or processes that went away without cleaning up.

Note:
We currently don't check for stale records. We simply re-init the table when we know that we're the only process opening the lock file.

Definition at line 457 of file mdb.c.

Class Members
pid_t mrb_pid The process ID of the process owning this reader txn.
pthread_t mrb_tid The thread ID of the thread owning this txn.
txnid_t mrb_txnid The current Transaction ID when this transaction began. Multiple readers that start at the same time will probably have the same ID here. Again, it's not important to exclude them from anything; all we need to know is which version of the DB they started from so we can avoid overwriting any data used in that particular version.
struct MDB_reader

The actual reader record, with cacheline padding.

Definition at line 473 of file mdb.c.

Class Members
union MDB_reader mru
union MDB_reader.mru

Definition at line 474 of file mdb.c.

Class Members
MDB_rxbody mrx
char pad cache line alignment
struct MDB_txbody

The header for the reader table.

The table resides in a memory-mapped file. (This is a different file than is used for the main database.)

For POSIX the actual mutexes reside in the shared memory of this mapped file. On Windows, mutexes are named objects allocated by the kernel; we store the mutex names in this mapped file so that other processes can grab them. This same approach is also used on MacOSX/Darwin (using named semaphores) since MacOSX doesn't support process-shared POSIX mutexes. For these cases where a named object is used, the object name is derived from a 64 bit FNV hash of the environment pathname. As such, naming collisions are extremely unlikely. If a collision occurs, the results are unpredictable.

Definition at line 499 of file mdb.c.

Class Members
uint32_t mtb_magic Stamp identifying this as an MDB file. It must be set to MDB_MAGIC.
uint32_t mtb_me_toggle The ID of the most recent meta page in the database. This is recorded here only for convenience; the value can always be determined by reading the main database meta pages.
pthread_mutex_t mtb_mutex Mutex protecting access to this table. This is the reader lock that LOCK_MUTEX_R acquires.
unsigned mtb_numreaders The number of slots that have been used in the reader table. This always records the maximum count, it is not decremented when readers release their slots.
txnid_t mtb_txnid The ID of the last transaction committed to the database. This is recorded here only for convenience; the value can always be determined by reading the main database meta pages.
uint32_t mtb_version Version number of this lock file. Must be set to MDB_VERSION.
struct MDB_txninfo

The actual reader table definition.

Definition at line 531 of file mdb.c.

Collaboration diagram for MDB_txninfo:
Class Members
union MDB_txninfo mt1
union MDB_txninfo mt2
MDB_reader mti_readers
union MDB_txninfo.mt1

Definition at line 532 of file mdb.c.

Class Members
MDB_txbody mtb
char pad
union MDB_txninfo.mt2

Definition at line 543 of file mdb.c.

Class Members
pthread_mutex_t mt2_wmutex
char pad

Define Documentation

#define CACHELINE   64

The size of a CPU cache line in bytes.

We want our lock structures aligned to this size to avoid false cache line sharing in the lock table. This value works for most CPUs. For Itanium this should be 128.

Definition at line 446 of file mdb.c.

#define DEFAULT_READERS   126

Number of slots in the reader table.

This value was chosen somewhat arbitrarily. 126 readers plus a couple mutexes fit exactly into 8KB on my development machine. Applications should set the table size using mdb_env_set_maxreaders().

Definition at line 438 of file mdb.c.


Typedef Documentation

typedef struct MDB_reader MDB_reader

The actual reader record, with cacheline padding.

typedef struct MDB_rxbody MDB_rxbody

The information we store in a single slot of the reader table.

In addition to a transaction ID, we also record the process and thread ID that owns a slot, so that we can detect stale information, e.g. threads or processes that went away without cleaning up.

Note:
We currently don't check for stale records. We simply re-init the table when we know that we're the only process opening the lock file.
typedef struct MDB_txbody MDB_txbody

The header for the reader table.

The table resides in a memory-mapped file. (This is a different file than is used for the main database.)

For POSIX the actual mutexes reside in the shared memory of this mapped file. On Windows, mutexes are named objects allocated by the kernel; we store the mutex names in this mapped file so that other processes can grab them. This same approach is also used on MacOSX/Darwin (using named semaphores) since MacOSX doesn't support process-shared POSIX mutexes. For these cases where a named object is used, the object name is derived from a 64 bit FNV hash of the environment pathname. As such, naming collisions are extremely unlikely. If a collision occurs, the results are unpredictable.

typedef struct MDB_txninfo MDB_txninfo

The actual reader table definition.