Zope Object Database
||This article is in a list format that may be better presented using prose. (November 2010)|
|Stable release||3.10.3 / April 12, 2011|
|License||Zope Public License|
The Zope Object Database (ZODB) is an object-oriented database for transparently and persistently storing Python objects. It is included as part of the Zope web application server, but can also be used independently of Zope.
Features of the ZODB include: transactions, history/undo, transparently pluggable storage, built-in caching, multiversion concurrency control (MVCC), and scalability across a network (using ZEO).
The ZODB is a mature Python datastore that has hundreds of thousands of systems today running on top of it.
- Created by Jim Fulton of Zope Corporation in the late 90s.
- Started as simple Persistent Object System (POS) during Principia development (which later became Zope)
- ZODB 3 was renamed when a significant architecture change was landed.
- ZODB 4 was a short lived project to re-implement the entire ZODB 3 package using 100% Python.
A ZODB storage is basically a directed graph of (Python) objects pointing at each other, with a Python dictionary at the root. Objects are accessed by starting at the root, and following pointers until the target object. In this respect, ZODB can be seen as a sophisticated Python persistence layer.
For example, say we have a car described using 3 classes Car, Wheel and Screw. In Python, this could be represented that way (coding style is awful, but this is for an illustration purpose):
class Car: [...] class Wheel: [...] class Screw: [...] myCar = Car() myCar.wheel1 = Wheel() myCar.wheel2 = Wheel() for wheel in (myCar.wheel1, myCar.wheel2): wheel.screws = [Screw(), Screw()]
(Car() creates a new instance of class Car).
If the variable zodb is the root of persistence, then
zodb['mycar'] = mycar
puts all the objects (the instances of car, wheel, screws and so on) into the storage, and can be retrieved later. If for example, another program gets a connection to the database through the zodb object, performing:
carzz = zodb['mycar']
retrieves all the objects, the pointer to the car being hold in the carzz variable.
The object can then be altered, for example if some later Python code reads:
carzz.wheel3 = Wheel() carzz.wheel3.screws = [Screw()]
the storage is altered to reflect the change of data (after a commit is ordered).
There is no declaration of the data structure in Python, so there is none in ZODB, new fields can be freely added to an existing object.
Storage unit 
Actually, the above oversimplifies a bit. For persistence to take place, the Python Car class must be derived from the persistence. Persistent class - this class both holds the data necessary for the persistence machinery to work, such as the internal object id, state of the object, and so on, but also defines the boundary of the persistence in the following sense: every object whose class derives from Persistent is the atomic unit of storage (the whole object is copied to the storage when a field is modified).
In the example above, if Car is the only class deriving from Persistent, when wheel3 is added to car, all the objects must be written to the storage (the Car, wheel1, wheel2, the screws and so on). In contrast, if Wheel also derives from Persistent, then when carzz.wheel3 = Wheel is performed, a new record is written to the storage to hold the new value of the Car, but the existing Wheel are kept, and the new record for the Car points to the already existing Wheel record inside the storage.
The ZODB machinery doesn't chase modification down through the graph of pointers. In the example above, carrz.wheel3 = something is a modification automatically tracked down by the ZODB machinery, because carrz is of (Persistent) class Car. The ZODB machinery does this basically by marking the record as dirty. However, if there is a list (for example), a change inside the list isn't noticed by the ZODB machinery, and the programmer must help by manually adding
carzz._p_changed = 1
to notify ZODB that the record actually changed. Thus the programmer must be aware to a certain point of the working of the persistence machinery.
The storage unit (that is, an object whose class derives from Persistent) is also the atomicity unit. In the example above, if Cars is the only Persistent class, a thread modifies a Wheel (the Car record must be notified), and another thread modifies another Wheel inside another transaction, the second commit will fail. If Wheel is also Persistent, both Wheels can be modified independently by two different threads in two different transactions.
Class persistence 
The class persistence (that is, writing the class of a particular object into the storage), is obtained by writing a kind of "fully qualified" name of the class into each record on the disk. It should be noted than, in Python, the name of the class involves the hierarchy of directory the source file of the class resides in. A consequence is that the source file of persisting object cannot be moved. If it is, the ZODB machinery is unable to locate the class of an object when retrieving it from the storage, resulting into a broken object.
Log file 
ZEO (Zope Enterprise Objects) is a ZODB storage implementation that allows multiple client processes to persist objects to a single ZEO server. This allows transparent scaling, but the ZEO server is still a single point of failure.
Pluggable Storages 
- Network Storage (aka ZEO) - Enables multiple python processes load and store persistent instances concurrently.
- File Storage - Enables a single python process to talk to a file on disk.
- relstorage - Enables the persistence backing store to be a RDBMS.
- Directory Storage - Each persistent data is stored as a separate file on the filesystem. Similar to FSFS in Subversion.
- Demo Storage - An in-memory back end for the persistent store.
- BDBStorage - Which uses Berkeley DB back end. Now abandoned.
- Zope Replication Services (ZRS) - A commercial add-on that removes the single point of failure, providing hot backup for writes and load-balancing for reads.
- zeoraid - An open source solution that provides a proxy Network Server that distributes object stores and recovery across a series of Network Servers.
- relstorage - since RDBMS technologies are used this obviates need for ZEO server.
- NEO - Distributed (fault tolerance, load-balancing) storage implementation.