For some time now Axial has been outgrowing Django. We’ve been spreading out code into multiple services, each responsible for a core aspect of our data and member facing products. To ensure we don’t make the single worst strategic mistake we’ve generally decided to not rewrite large portions of code all at once. This has worked pretty well even though it sometimes feels a lot like we’re changing tires on a moving car:

One task that I’ve been focusing on is redesigning and unifying two disparate database models that have grown organically over time. My goal has been to restructure the models into a much more cohesive and easier-to-work-with design that will greatly increase our ability to grow and iterate rapidly.

As the core of several of our products these models are also quite intertwined with our once-unified django platform. This restructuring not only changes the database representation of the objects, but also delegates responsibility for the data into a standalone service.

A Reduced Testcase

Pretend we’re a company that sells entertainment media, like books and movies. At first, our media types (books and movies) had nothing to do with each other. One can’t talk about a book’s resolution nor about a movie’s font size, for example. Naturally, we had a table for books and a table for movies. Over time, however, we started to do things like recommend books if you’ve watched certain movies and vice versa.

Even though our media looked very different superficially, we realized we approach our data in very similar ways. We talk about media, sell media, and recommend media very similarly no matter its type. It began to make more and more sense to treat all media nearly identically, leading us to the following database design:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Media(models.Model):
    name = models.CharField()
    description = models.TextField()
    owned_by = models.ForeignKey(User, related_name='all_media')

class Book(Media):
    page_count = models.IntegerField()

class Video(Media):
    runtime = models.IntegerField()

The Issue

While integrating this code into our existing codebase an omission from Django’s ORM came to light. We have a concrete base model (Media) and two child models that derive from it (Book, Video). We also have a foreign key on the base model (Media) that points towards an existing (and currently unchanging) model (User), since both logical children reference that tangential model. Django’s ORM provides no recourse for the related_name attribute on Media.owned_by to correctly point to the correct child model directly.

In other words, we had code everywhere that does User.book_set and User.video_set trying to use our new database definition. This case cannot be handled by the Django ORM out of the box.

1
2
3
4
5
6
# This works:
User.all_media.all()
# But what we really want is:
User.book_set.all()
User.video_set.all()
# because a non-trivial amount of existing code already uses these accessors.

Interestingly, Django actually has support for this exact use case but only if the parent model is an abstract model. In our case, it is not - Media is concrete and represents a real database table. Although abstract inheritance is nice for mixin classes, concrete inheritance is a more accurate match for how we view our data. SQLAlchemy supports this case very well using joined table inheritance and we were suprised to find Django’s support in this area lacking.

This issue is also described in this stackoverflow post but the solutions provided rely on the author’s usage of an abstract base class. Specifically, the reason the abstract scenario works is because Django actually clones the Field object that represents the foreign key allowing it to assign related_name descriptors normally through the Field object.

Hacking at Django

As I’m apt to do when given a problem, I dived deeply into it’s source. In this case that source was literally the source of Django’s ORM. After some digging I a came back with (something of) an elegant solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def add_adhoc_reverse_relation(source_model, source_column_name, dest_model, dest_attribute):
    """Add a descriptor on dest_model with the name dest_attribute that allows related queries.

    This function allows one to duplicate the functionality of the related_name keyword
    argument to a foreign_key field, potentially allowing multiple reverse related
    functionality. It also allows a reverse relation to a subclass of where the foreign
    key is defined.
    """
    from django.db.models.fields.related import ForeignRelatedObjectsDescriptor, add_lazy_relation
    from django.db.models.related import RelatedObject
    field = getattr(source_model, source_column_name).field
    def do_related_class(dest_model, source_model):
        rel_obj = RelatedObject(source_model, source_model, field)
        desc = ForeignRelatedObjectsDescriptor(rel_obj)
        setattr(dest_model, dest_attribute, desc)
    def resolve_related_class(field, dest_model, source_model):
        field.rel.to = dest_model
        do_related_class(dest_model, source_model)
    if isinstance(dest_model, basestring) or dest_model._meta.pk is None:
        add_lazy_relation(source_model, field, dest_model, resolve_related_class)
    else:
        do_related_class(dest_model, source_model)

add_adhoc_reverse_relation(User, 'owned_by', 'media.Book', 'book_set')
add_adhoc_reverse_relation(User, 'owned_by', 'media.Video', 'video_set')

This solution uses the Django ORM internals to copy what Django itself does when a ForeignKey field is initialized. The reason this works well is that after the classes are defined the same field is represented in both child classes by the same attribute name. The reverse relations work properly. Django correctly JOINs to the OneToOne field in the children and everything “just works”(TM). Additionally, this method is cleaner than simply adding a property to the target model, since it exposes the full RelatedObjectManager (which may have had custom methods added to it).

Though this code will not live in our codebase for long, it is an important step to move forward iteratively. Much like engineering improvements to a building, there is a place in code for temporary scaffolding.