Introduction
PyORQ (Python Object Relational binding with Queries) implements persistence
for Python objects using a relational database (RDBMS, e.g. PostgreSQL MySQL)
for storage.
The innovative aspect of PyORQ is the use of Python expressions to denote
queries which can be automatically translated into SQL and then be executed by
the backend. This leverages the full search capabilities of RDBMSs in an
object-oriented programming environment. Contrary to other object-relational
Python-SQL mappings, the user needs no knowledge of SQL to search the
database.
Object-relational mappings have been done before. They are relatively
straightforward: classes map to tables, attributes map to columns and
instances map to rows. However, fundamental to the object paradigm is that
identity maps to state, and not the other way around. Hence, to search
(i.e. map state to identity) one has to loop over the collection of all
objects and examine their state. If the objects are in a persistent store, the
objects need to be instantiated first which may be prohibitively expensive.
Traditionally there have been two solutions to this problem:
-
Use persistent containers that use knowledge of the object's state to allow
efficient searches (e.g. B-Trees). However, this essentially generalizes the
notion of identity, and does not allow for arbitrary queries without
instantiation.
-
Use an object-relational mapping and write SQL queries that
return object identities, which can then be used to instantiate the results of
the query. However, this means that the mechanism of the object-relational
mapping becomes part of the interface, and requires the user to use SQL within
his application.
PyORQ provides a new solution to this problem. Using expressions with class
attributes as arguments, PyORQ defines a notation for queries on persistent
objects. These expressions can be translated automatically into SQL queries,
using knowledge of the SQL schema that is implied by the persistent class
hierarchy.
Python 2.2 introduced properties. Properties are objects that provide accessor
methods for instance attributes. However, as we explain in the section on implementation, PyORQ can do more with
properties. In particular, given the following definition of a persistent
class (where pint() is a persistent property object of type
int ):
class Thing(pobject):
attr = pint()
we can write an expression
Thing.attr == val
which we define as the set of all instances i for which
(isinstance(i, Thing) and i.attr==val) is true. Or, in SQL:
SELECT * FROM thing WHERE attr=val;
This approach is based on the idea that a class is, in a sense, the set of all
its instances.
History
PyORQ is based on a similar approach to object-relational mapping that I
implemented together with Danny Boxhoorn for the astro-wise astronomical data reduction
pipeline.
The astro-wise software will process the data from OmegaCAM, a large
(16k2 pixels) multi-CCD camera that will image the southern sky
with a field-of-view of 1 square degree. OmegaCAM will produce 512 MB images,
or up to 100 GB of data per day (night). At these data rates the data
reduction procedures have to be fully automated. Ironically, your biggest
problem is then no longer the need for raw processing power, but the need for
a data administration that allows you to find out what happened to all that
data.
The astro-wise data reduction pipeline is being developed in Python. The
software is built around a comprehensive OO data-model that describes the
relation between the science and calibration observations and the processing
parameters and operations. By implementing a persistence mechanism similar to
PyORQ we were able to 'get' the data-administration part for free.
About me
Hi, my name is Roeland Rengelink.
I am originally an astronomer. I got my PhD in 1999 for my thesis on the
Westerbork Northern Sky Survey and the cosmological evolution of radio
sources.
I discovered Python in 1998, when I was working for the ESO Imaging Survey (EIS). The
data-reduction pipeline of EIS then consisted of C-shell scripts, which we
desperately wanted to port to something more useful. Bless Mario Nonino, who
saw me getting the camel book from the library, and suggested Python
instead. The EIS data-reduction environment must now be one of the biggest
Python applications in existence (300k+ LOC, last I counted).
I became officially a software developer when I started working for
astro-wise in 1999. Python became the language of choice for that project
when I showed my boss a code fragment, and he asked me how long it would
take to get that 'pseudocode' working. March 1st, 2004, astro-wise delivered
the astro-wise data-reduction pipeline.
March 1st 2004 was also the day I said goodbye to astronomy. I've granted
myself a sabbatical, and one of the things I really wanted to do was to show
people this 'neat trick with queries in Python'. Hence, PyORQ.
|