Implementation
PyORQ relies heavily on new features in Python 2.2/2.3, including properties,
metaclasses (2.2) and the datetime module (2.3). I want to highlight some
parts of PyORQ that illustrate nicely what you can do with current Python. I
also want to explain some of the gory details of the query-to-SQL
translation. Simply because I'm rather proud of them.
I assume you have some basic understanding of the new features Python 2.3. You
can read up on them here.
Object identities
The key ingredient of any object-relational mapping is the notion of object
identity. PyORQ uses the shorthand oid . In addition, each
persistent class has a corresponding table with table identity
tid .
At the Python side of PyORQ, each instance i can be uniquely
identified with a (cls, oid) pair. If we know the
oid of an instance of class cls , we can retrieve
that instance using:
i = cls(oid=oid)
i.oid is oid # This will now be True
Of course, we normally don't know the object identity, and that's why we use
queries to instantiate objects. However, PyORQ does use this instantiation
mechanism internally.
At the SQL side each reference can be described by a (tid, oid)
pair. We need the tid , because references can point to subclasses.
The ptype metaclass
Modifying the semantics of class creation and object instantiation
are the canonical examples of the things you can do with metaclasses. The
metaclass for persistent objects is rather simple:
class ptype(type):
def __init__(cls, name, bases, attributes):
"""Called after class has been created"""
...do some bookkeeping on persistent properties...
cls.database.register_class(cls)
def __call__(cls, *args, **kwargs)
"""Called when object is instantiated"""
return cls.database.get_instance(cls, *args, **kwargs)
The ptype.__init__() method is the constructor of the ptype
instance, which is a class. It passes the newly created class object to the
database for registration. The database can then decide if it needs to create
a new table for the class.
The ptype.__call__() methods is called when an instance of a
persistent class is created (when a persistent class is 'called'). This job is
also passed to the database. The database can decide if a new instance
must be created or if an existing instance should be retrieved from either a
cache or the backend.
The main purpose of pobject , the base class of all persistent
classes, is to make these classes instances of ptype .
class pobject(object):
__metaclass__ = ptype
The full definition of ptype and pobject can be
found in module pyorq.ptype . The abstract base class defining the
database interface can be found in module pyorq.interface.db_base .
Persistent properties
Properties are objects that define __get__() ,
__set__() and __del__() methods to get, set, and
delete instance attributes.
The module pyorq.pprop defines persistent value properties
(pval objects) and persistent references (pref
objects). The different types of persistent values (pint ,
pfloat , etc.) derive from pval .
The __set__() method for persistent properties is very
straightforward. It performs a type check on the value, stores the value in
the instance __dict__ , and sets the dirty bit.
The __get__() method for persistent values is a little more
interesting, because it gets called when we get the attribute for an instance,
as well as when we get the attribute for the class:
class pval(persistent):
...
def __get__(self, instance, cls=None):
if instance is None:
# called as cls.attr
return prel.value_attr(cls, self)
else:
# called as instance.attr
return instance.__dict__.get(self.key, self.default)
If we do the attribute access on an instance, we return the value from the
__dict__ or a default. If we do an attribute access on a class, we
create a new object value_attr , which is where the
query-mechanism starts. I will come back to that later.
The __get__() method for persistent references is also
interesting, because it solves a small problem we have with retrieving
references from the database.
If we retrieve an instance, and also, recursively, retrieve the instances it
refers to, we could end up with a huge number of retrieval
operations. We therefore initially don't retrieve these
referred objects, but rather store the (cls, oid) pair. Only when
we access a reference do we instantiate this object. In code:
class pref(persistent):
...
def __get__(self, instance, cls=None):
if instance is None:
return prel.ref_attr(cls, self)
else:
val = instance.__dict__.get(self.key, self.default)
if isinstance(val, tuple):
val = val[0](oid=val[1])
instance.__dict__[self.key] = val
return val
Note that accessing a class attribute returns a prel.ref_attr
object if that attribute is a pref object.
Queries
The module pyorq.prel , defines the relational algebra for
PyORQ. As we saw earlier, attribute access on classes may return
value_attr and ref_attr objects for persistent values
and persistent references respectively. These are both subtypes of the
expression object, defined in this module.
Expressions define arithmetic operators which return other types of
expression object, depending on the operator. Expressions also
define comparison operators, which return relation objects.
Relations define logical operators (using bitwise operators), that return new
relation objects.
For example, the query
(A.x>=7) & (A.x + 1 < A.y.z)
returns an object
rel_and(comp_ge(value_attr(A, A_x),
7),
comp_lt(expr_add(value_attr(A, A_x),
1),
value_attr(ref_attr(A, A_y),
Y_z))
where A_x and Y_z are pval objects, and
A_y is a pref object.
Translating queries into SQL
The relation object in module pyorq.prel defines an
iterator method (__iter__() ) that generates instances that
satisfy the query.
I distinguish between queries that that don't contain references, and queries
that do. Let's first consider simple queries without references (or bound
variables, as I call them).
Given a class hierarchy:
class A(pobject):
x = pint()
class B(A):
pass
and the queries
q1 = (B.x == 3) # q1.free_variable() is B
q2 = (A.x == 3) # q2.free_variable() is A
The query q1 should only generate instances of type
B , since the free variable B has no subclasses. The
query q2 should generate instances of types A and/or
B , since in both cases the instance i , may satisfy
(isinstance(i, A) and i.x==3) .
Here is the __iter__ method for relation objects:
class relation(object):
...
def __iter__(self):
cls = self.free_variable()
db = cls.database
for i in cls.all_subclasses():
for j in db.query_generator(i, self):
yield j
The class method all_subclasses is defined in
ptype.ptype . It recursively generates the class and all of its
subclasses.
The query_generator method of the database interface is
responsible for generating the the SQL query, and retrieving the instances.
In the simple case, without references, it uses this code-path:
def query_generator(self, cls, relation)
cmnd = "SELECT _x.oid FROM %s _x WHERE %s" % (cls.table_name,
relation.sql_repr(self))
oids = self.fetchall(cmnd)
for oid in oids:
yield cls(oid=oid)
Note that I use _x as an alias for the free variable. The
sql_repr method of the relation object knows about
this aliasing convention, and can thus easily produce a WHERE-clause.
Queries become more complicated if they contain references. Consider the
following addition to our database schema:
class C(pobject):
ref1 = pref(A)
ref2 = pref(A)
class D(pobject):
ref = pref(C)
If we want to translate the query
C.ref1.x == 3
we have to take into account that C.ref1 refers to objects of type
A or B . To do this, the query generator first asks
the relation object for an inventory of bound variables and their 'reference
chains'. In this case there is one bound variable (A ) with
'chain' ('_x', 'ref1') .
The reference chain is used to generate aliases for the bound variable, in
this case '_x_ref1' . They are also used to define join
clauses. In this case _x.ref1_oid = _x_ref1.oid
We now have to loop over all permutations that are generated by considering
all subclasses of each bound variable. In this case, that means we execute the
query twice, Once with the table for A aliased to '_x_ref1', and
once with the table for B , using the same alias. In SQL:
SELECT * FROM c _x, a _x_ref1
WHERE _x.ref1_oid=_x_ref1.oid AND (_x_ref1.x = 3)
SELECT * FROM c _x, b _x_ref1
WHERE _x.ref1_oid=_x_ref1.oid AND (_x_ref1.x = 3)
Note that the WHERE clause does not change. Only the table
that is aliased to _x_ref1 is changed.
The query
(C.ref1.x == 3) | (C.ref2.x==3)
has two bound variables, that happen to be of the same type
(A ). The reference chains are ('_x', 'ref1') and
('_x', 'ref2') respectively. This means that we have to evaluate
the query for the the following permutations of the bound variables: (A,
A) , (A, B) , (B, A) , and (B,
B) . Or, four queries of the form:
SELECT * FROM c _x, a _x_ref1, a _x_ref2
WHERE _x.ref1_oid=_x_ref1.oid AND _x.ref2_oid = _x_ref2.oid
AND ((_x_ref1.x = 3) OR (_x_ref2.x = 3))
with four permutations of the tables that refer to _x_ref1 and
_x_ref2 respectively.
Queries containing references to references, introduce extra bound
variables. For example, the query
D.ref.ref1.x == 3
has two bound variables C and A , with reference
chains ('_x', 'ref') and ('_x', 'ref', 'ref1')
respectively. We have to execute the following query with permutations
(C, A) and (C, B) .
SELECT * FROM d _x, c _x_ref, a _x_ref_ref1
WHERE _x.ref_oid = _x_ref.oid AND _x_ref.ref1_oid = _x_ref_ref1.oid
AND (_x_ref_ref.x = 3)
The code path for the query generator with references is:
def query_generator(self, cls, relation):
# get the list of bound variables and their reference chains
# use a dict mapping chains to bound variables to avoid duplicate joins
bound_variables = {}
relation.update_bound_variables(bound_variables)
bound_variables = bound_variables.items()
cmnd_format = "SELECT _x.oid FROM %s _x,%s WHERE %s AND %s"
# loop over all permutations of the bound variables set
for current_vars in self.generate_bound_variables(bound_variables):
aliases = self.create_aliases(current_vars)
join_clause = self.create_join_clause(current_vars)
cmnd = cmnd_format % (cls.table_name, aliases,
join_clause, select_clause)
oids = self.fetchall(cmnd)
for oid in oids:
yield cls(oid=oid)
|