From the Django ORM to SQLAlchemy, Gracefully
Contents
For some time now Axial has been outgrowing Django. We’ve been spreading out code into multiple services, each responsible for a core aspect of our data and member facing products. To ensure we don’t make the single worst strategic mistake we’ve generally decided to not rewrite large portions of code all at once. This has worked pretty well even though it sometimes feels a lot like we’re changing tires on a moving car:
One task that I’ve been focusing on is redesigning and unifying two disparate database models that have grown organically over time. My goal has been to restructure the models into a much more cohesive and easier-to-work-with design that will greatly increase our ability to grow and iterate rapidly.
As the core of several of our products these models are also quite intertwined with our once-unified django platform. This restructuring not only changes the database representation of the objects, but also delegates responsibility for the data into a standalone service.
A Reduced Testcase
Pretend we’re a company that sells entertainment media, like books and movies. At first, our media types (books and movies) had nothing to do with each other. One can’t talk about a book’s resolution nor about a movie’s font size, for example. Naturally, we had a table for books and a table for movies. Over time, however, we started to do things like recommend books if you’ve watched certain movies and vice versa.
Even though our media looked very different superficially, we realized we approach our data in very similar ways. We talk about media, sell media, and recommend media very similarly no matter its type. It began to make more and more sense to treat all media nearly identically, leading us to the following database design:
|
|
The Issue
While integrating this code into our existing codebase an omission
from Django’s ORM came to light. We have a concrete base model (Media
) and
two child models that derive from it (Book
, Video
). We also have a foreign key
on the base model (Media
) that points towards an existing (and currently
unchanging) model (User
), since both logical children reference that
tangential model. Django’s ORM provides no recourse for the related_name
attribute on Media.owned_by
to correctly point to the correct child
model directly.
In other words, we had code everywhere that does User.book_set
and
User.video_set
trying to use our new database definition. This case
cannot be handled by the Django ORM out of the box.
|
|
Interestingly, Django actually has support for this exact use case
but only if the parent model is an abstract model. In our case, it
is not - Media
is concrete and represents a real database table.
Although abstract inheritance is nice for mixin classes, concrete inheritance
is a more accurate match for how we view our data. SQLAlchemy supports this
case very well using joined table inheritance and we were
suprised to find Django’s support in this area lacking.
This issue is also described in this stackoverflow post but the
solutions provided rely on the author’s usage of an abstract base class.
Specifically, the reason the abstract scenario works is because Django
actually clones the Field
object that represents the foreign key allowing
it to assign related_name
descriptors normally through the Field
object.
Hacking at Django
As I’m apt to do when given a problem, I dived deeply into it’s source. In this case that source was literally the source of Django’s ORM. After some digging I a came back with (something of) an elegant solution:
|
|
This solution uses the Django ORM internals to copy what Django itself does
when a ForeignKey field is initialized. The reason this works well is that
after the classes are defined the same field is represented in both child
classes by the same attribute name. The reverse relations work properly.
Django correctly JOIN
s to the OneToOne
field in the children and everything
“just works”(TM). Additionally, this method is cleaner than simply adding
a property to the target model, since it exposes the full RelatedObjectManager
(which may have had custom methods added to it).
Though this code will not live in our codebase for long, it is an important step to move forward iteratively. Much like engineering improvements to a building, there is a place in code for temporary scaffolding.