PyORQ - Python Object Relational binding with Queries

Home
Introduction
Features/TODO
Tutorial
Documentation
Implementation

SF Summary
Download
Contact


SourceForge.net Logo

Introduction

PyORQ (Python Object Relational binding with Queries) implements persistence for Python objects using a relational database (RDBMS, e.g. PostgreSQL MySQL) for storage.

The innovative aspect of PyORQ is the use of Python expressions to denote queries which can be automatically translated into SQL and then be executed by the backend. This leverages the full search capabilities of RDBMSs in an object-oriented programming environment. Contrary to other object-relational Python-SQL mappings, the user needs no knowledge of SQL to search the database.

Object-relational mappings have been done before. They are relatively straightforward: classes map to tables, attributes map to columns and instances map to rows. However, fundamental to the object paradigm is that identity maps to state, and not the other way around. Hence, to search (i.e. map state to identity) one has to loop over the collection of all objects and examine their state. If the objects are in a persistent store, the objects need to be instantiated first which may be prohibitively expensive.

Traditionally there have been two solutions to this problem:

  • Use persistent containers that use knowledge of the object's state to allow efficient searches (e.g. B-Trees). However, this essentially generalizes the notion of identity, and does not allow for arbitrary queries without instantiation.
  • Use an object-relational mapping and write SQL queries that return object identities, which can then be used to instantiate the results of the query. However, this means that the mechanism of the object-relational mapping becomes part of the interface, and requires the user to use SQL within his application.

PyORQ provides a new solution to this problem. Using expressions with class attributes as arguments, PyORQ defines a notation for queries on persistent objects. These expressions can be translated automatically into SQL queries, using knowledge of the SQL schema that is implied by the persistent class hierarchy.

Python 2.2 introduced properties. Properties are objects that provide accessor methods for instance attributes. However, as we explain in the section on implementation, PyORQ can do more with properties. In particular, given the following definition of a persistent class (where pint() is a persistent property object of type int):

class Thing(pobject):
    attr = pint()

we can write an expression

Thing.attr == val

which we define as the set of all instances i for which (isinstance(i, Thing) and i.attr==val) is true. Or, in SQL:

SELECT * FROM thing WHERE attr=val;

This approach is based on the idea that a class is, in a sense, the set of all its instances.

History

PyORQ is based on a similar approach to object-relational mapping that I implemented together with Danny Boxhoorn for the astro-wise astronomical data reduction pipeline.

The astro-wise software will process the data from OmegaCAM, a large (16k2 pixels) multi-CCD camera that will image the southern sky with a field-of-view of 1 square degree. OmegaCAM will produce 512 MB images, or up to 100 GB of data per day (night). At these data rates the data reduction procedures have to be fully automated. Ironically, your biggest problem is then no longer the need for raw processing power, but the need for a data administration that allows you to find out what happened to all that data.

The astro-wise data reduction pipeline is being developed in Python. The software is built around a comprehensive OO data-model that describes the relation between the science and calibration observations and the processing parameters and operations. By implementing a persistence mechanism similar to PyORQ we were able to 'get' the data-administration part for free.

About me

Hi, my name is Roeland Rengelink.

I am originally an astronomer. I got my PhD in 1999 for my thesis on the Westerbork Northern Sky Survey and the cosmological evolution of radio sources.

I discovered Python in 1998, when I was working for the ESO Imaging Survey (EIS). The data-reduction pipeline of EIS then consisted of C-shell scripts, which we desperately wanted to port to something more useful. Bless Mario Nonino, who saw me getting the camel book from the library, and suggested Python instead. The EIS data-reduction environment must now be one of the biggest Python applications in existence (300k+ LOC, last I counted).

I became officially a software developer when I started working for astro-wise in 1999. Python became the language of choice for that project when I showed my boss a code fragment, and he asked me how long it would take to get that 'pseudocode' working. March 1st, 2004, astro-wise delivered the astro-wise data-reduction pipeline.

March 1st 2004 was also the day I said goodbye to astronomy. I've granted myself a sabbatical, and one of the things I really wanted to do was to show people this 'neat trick with queries in Python'. Hence, PyORQ.