Handling Relationships in RESTful APIs

Django Edition

Me

  • Grew up in Connecticut
  • PhD in Math at University of Connecticut
  • Have been in Salt Lake City working on Django based APIs since 2013
  • Currently Director of Engineering at Teem.com

So you want to write an App

Goal:

  • Fast, modern, web application
  • You are thinking API driven Single Page App

Modern web options:

  • REST or GraphQL?
  • Python, Node, Golang, or Java?
  • Angular, React, Ember, etc, etc

My defaults:

  • Python, Django and Django-Rest-Framework, Angular or React
  • If you are able to use Python 3 checkout apistar and pydantic

 

This talk focuses on the API design and making the App fast.

REST is Great

  • Easy to understand
  • Easy to explore
  • Easy to expand
  • Scales well 

 

BUT ...

REST is not so Great

Get many resources in a single request
GraphQL queries access not just the properties of one resource but also smoothly follow references between them. While typical REST APIs require loading from multiple URLs, GraphQL APIs get all the data your app needs in a single request. Apps using GraphQL can be quick even on slow mobile network connections.

~ graphql.org

  • Round Trip and Repeat Trip Times
  • Over/Under Fetching

 

 

Spoiler: we don't need to give up REST to resolve these issues

Quick Blog Example

// /api/v1/authors
{
    "id": 1,
    "created_at": "2017-01-01T00:00:00Z",
    "updated_at": "2017-03-11T13:04:00Z",
    "displayName": "Jane Smith",
    "email": "jane.smith@example.com",
    "image_url": "https://example.com/jane.smith.jpg",
    "recent_posts": [
        90,
        27,
        23
    ]
}

// /api/v1/posts
{
    "id": 90,
    "title": "Fullstack doesn't scale!"
    "content": "lorem ipsum ..."
    "created_at": "2017-05-01T00:00:00Z",
    "updated_at": "2017-05-01T00:00:00Z",
    "published_at": "2017-05-02T00:00:00Z",
    "author_id": 1
}

Results in multiple requests!

Quick Blog Example 2

{
    "id": 1,
    "created_at": "2017-01-01T00:00:00Z",
    "updated_at": "2017-03-11T13:04:00Z",
    "displayName": "Jane Smith",
    "email": "jane.smith@example.com",
    "image_url": "https://example.com/jane.smith.jpg",
    "recent_posts": [
        {
            "id": 90,
            "title": "Fullstack doesn't scale!"
            "content": "lorem ipsum ..."
            "created_at": "2017-05-01T00:00:00Z",
            "updated_at": "2017-05-01T00:00:00Z",
            "published_at": "2017-05-02T00:00:00Z",
            "author_id": 1
        },
        {
            "id": 27,
            "title": "Best micro-brews in LA"
            "content": "lorem ipsum ..."
            "created_at": "2017-03-11T00:00:00Z",
            "updated_at": "2017-03-11T00:00:00Z",
            "published_at": "2017-03-13T00:00:00Z",
            "author_id": 1
        },
    ]
}  // /api/v1/authors

Out of the box, DRF supports embedding related models.

 

A litte bit of repeated data.

Quick Blog Example 3

[
    {
        "id": 90,
        "title": "Fullstack doesn't scale!"
        "content": "lorem ipsum ..."
        "created_at": "2017-05-01T00:00:00Z",
        "updated_at": "2017-05-01T00:00:00Z",
        "published_at": "2017-05-02T00:00:00Z",
        "author": {
            "id": 1,
            "created_at": "2017-01-01T00:00:00Z",
            "updated_at": "2017-03-11T13:04:00Z",
            "displayName": "Jane Smith",
            "email": "jane.smith@example.com",
            "image_url": "https://example.com/jane.smith.jpg",
        }
    },
    {
        "id": 27,
        "title": "Best micro-brews in LA"
        "content": "lorem ipsum ..."
        "created_at": "2017-03-11T00:00:00Z",
        "updated_at": "2017-03-11T00:00:00Z",
        "published_at": "2017-03-13T00:00:00Z",
        "author": {
            "id": 1,
            "created_at": "2017-01-01T00:00:00Z",
            "updated_at": "2017-03-11T13:04:00Z",
            "displayName": "Jane Smith",
            "email": "jane.smith@example.com",
            "image_url": "https://example.com/jane.smith.jpg",
        }
    }
]  // /api/v1/posts?author_id=1

A lot of repeated data!

It Doesn't Have to be this Way

EmberJS recommends a solution called sideloading.

https://guides.emberjs.com/v1.10.0/models/the-rest-adapter/#toc_sideloaded-relationships

Sideloading attempts to partially resolve these issues in REST

  • Round trip and repeat trip times
    • You can request the details of related objects to reduce trips to the API
  • Over/Under fetching
    • You can request related objects to avoid under fetching
    • You only get related object details once in the API, partially avoiding over fetching
    • You really need sparse fieldsets to solve this completely, see JSONAPI

Blog Revisited

{
    "posts": [
        {
            "id": 90,
            "title": "Fullstack doesn't scale!"
            "content": "lorem ipsum ..."
            "created_at": "2017-05-01T00:00:00Z",
            "updated_at": "2017-05-01T00:00:00Z",
            "published_at": "2017-05-02T00:00:00Z",
            "author_id": 1
        },
        {
            "id": 27,
            "title": "Best micro-brews in LA"
            "content": "lorem ipsum ..."
            "created_at": "2017-03-11T00:00:00Z",
            "updated_at": "2017-03-11T00:00:00Z",
            "published_at": "2017-03-13T00:00:00Z",
            "author_id": 1
        }
    ],
    "authors": [
        {
            "id": 1,
            "created_at": "2017-01-01T00:00:00Z",
            "updated_at": "2017-03-11T13:04:00Z",
            "displayName": "Jane Smith",
            "email": "jane.smith@example.com",
            "image_url": "https://example.com/jane.smith.jpg"
        }
    ]
}  // /api/v1/posts?author_id=1&include[]=authors

Only repeated data are ids

Sideloading at Teem

At Teem we have two models that we use constantly and we want to load some or all of the related objects:

  1. Users
    • groups, organizations
  2. Rooms
    • floors, buildings, campuses, calendars, room_resources

Sideloading at Teem

  • We currently support sideloading for all new (v4+) APIs
  • Most important reason for doing this:

 

It minimizes the number of API calls during a hard refresh of the Single-Page-App

Sideloading in DRF v1

class RoomViewSet(viewsets.ModelViewSet):
    serializer_class = serializers.RoomSerializer
    queryset = models.Room.objects.all()
    resource_name = 'room'
    resource_name_plural = 'rooms'

    def list(self, request, **kwargs):
        page = self.paginate_queryset(
            self.filter_queryset(
                self.get_queryset()
            )
        )

        # base response will always contain the resource and the meta
        response = {
            self.resource_name_plural: self.get_serializer(
                page, many=True).data,
        }

        response.update(self.get_sideload_data(request, page))

        return Response(response)

    def get_sideload_data(self, request, rooms):
        if isinstance(rooms, models.Room):
            rooms = [rooms]

        data = {}
        sideload_calendars = self.has_sideload_field('calendars')
        sideload_licenses = self.has_sideload_field('licenses')
        sideload_campuses = self.has_sideload_field('campuses')
        sideload_buildings = self.has_sideload_field('buildings')
        sideload_floors = self.has_sideload_field('floors')
        sideload_room_resources = self.has_sideload_field('room_resources')
        sideload_room_resource_categories = self.has_sideload_field(
            'room_resource_categories')
        sideload_cloud_files = self.has_sideload_field('cloud_files')
        room_image_files = self.has_sideload_field('room_images')

        serializer_context = self.get_serializer_context()

        if sideload_licenses or sideload_campuses or sideload_buildings or \
                sideload_floors or sideload_cloud_files or room_image_files:

            licenses = set()
            floor_ids = set()
            cloud_files = []
            room_images = set()

            for room in rooms:
                for l in room.licenses:
                    licenses.add(l)

                if sideload_floors and room.floor_id:
                    floor_ids.add(room.floor_id)

                if sideload_cloud_files:
                    cloud_files += list(room.cloud_files.all())

                if room_image_files and room.room_image_id:
                    room_images.add(room.room_image)

        if sideload_licenses:
            data['licenses'] = LicenseSerializer(
                instance=list(licenses),
                context=serializer_context,
                many=True).data

        if sideload_floors or sideload_campuses or sideload_buildings:
            floor_ids = list(floor_ids)

            floors = models.Floor.objects.filter(
                or_item_query([r.floor_id for r in rooms]))

            if sideload_floors:
                data['floors'] = serializers.FloorSerializer(
                    instance=floors,
                    context=serializer_context,
                    many=True).data

        if sideload_buildings or sideload_campuses:
            buildings = models.Building.objects.filter(
                or_item_query([f.building_id for f in floors]))

            if sideload_buildings:
                data['buildings'] = serializers.BuildingSerializer(
                    instance=buildings,
                    many=True,
                    context=serializer_context).data

        if sideload_campuses:
            campuses = models.Campus.objects.filter(
                or_item_query([b.campus_id for b in buildings]))

            data['campuses'] = serializers.CampusSerializer(
                instance=campuses,
                many=True,
                context=serializer_context).data

        if sideload_room_resources:
            # note that Django has already selected the room_resources for us
            # because of the `prefetch_related` in the `get_queryset`
            room_resources = set()
            for r in rooms:
                for x in r.room_resource.all():
                    room_resources.add(x)
            data['room_resources'] = serializers.RoomResourceSerializer(
                instance=list(room_resources),
                context=serializer_context,
                many=True).data

        if sideload_room_resource_categories:
            room_resource_categories = \
                models.RoomResourceCategory.objects.filter(
                    roomresource__room__pk__in=[
                        r.id for r in rooms]).distinct()
            data['room_resource_categories'] = serializers.\
                RoomResourceCategorySerializer(
                    instance=room_resource_categories,
                    context=serializer_context,
                    many=True).data

        if sideload_cloud_files:
            cloud_files = list(set(cloud_files))
            data['cloud_files'] = CloudFileSerializer(
                instance=cloud_files,
                context=serializer_context,
                many=True).data

        if room_image_files:
            data['room_images'] = CloudFileSerializer(
                instance=list(room_images),
                context=serializer_context,
                many=True).data

        if sideload_calendars:
            # note that Django has already selected the calendars for us
            # because of the `select_related` in the `get_queryset`
            data['calendars'] = CalendarSerializer(
                instance=[r.calendar for r in rooms if r.calendar_id],
                many=True,
                context=serializer_context,
                ).data

        return data

* Code has been modified from its original version. It has been formatted to fit this screen

Sideloading in DRF v1 cont

class RoomViewSet(viewsets.ModelViewSet):
    def get_sideload_data(self, request, rooms):
        if isinstance(rooms, models.Room):
            rooms = [rooms]

        data = {}
        sideload_calendars = self.has_sideload_field('calendars')
        sideload_licenses = self.has_sideload_field('licenses')
        sideload_campuses = self.has_sideload_field('campuses')
        sideload_buildings = self.has_sideload_field('buildings')
        sideload_floors = self.has_sideload_field('floors')
        sideload_room_resources = self.has_sideload_field('room_resources')
        sideload_room_resource_categories = self.has_sideload_field(
            'room_resource_categories')
        sideload_cloud_files = self.has_sideload_field('cloud_files')
        room_image_files = self.has_sideload_field('room_images')

        serializer_context = self.get_serializer_context()

        if sideload_licenses or sideload_campuses or sideload_buildings or \
                sideload_floors or sideload_cloud_files or room_image_files:

            licenses = set()
            floor_ids = set()
            cloud_files = []
            room_images = set()

            for room in rooms:
                for l in room.licenses:
                    licenses.add(l)

                if sideload_floors and room.floor_id:
                    floor_ids.add(room.floor_id)

                if sideload_cloud_files:
                    cloud_files += list(room.cloud_files.all())

                if room_image_files and room.room_image_id:
                    room_images.add(room.room_image)

        if sideload_licenses:
            data['licenses'] = LicenseSerializer(
                instance=list(licenses),
                context=serializer_context,
                many=True).data

        if sideload_floors or sideload_campuses or sideload_buildings:
            floor_ids = list(floor_ids)

            floors = models.Floor.objects.filter(
                or_item_query([r.floor_id for r in rooms]))

            if sideload_floors:
                data['floors'] = serializers.FloorSerializer(
                    instance=floors,
                    context=serializer_context,
                    many=True).data

        if sideload_buildings or sideload_campuses:
            buildings = models.Building.objects.filter(
                or_item_query([f.building_id for f in floors]))

            if sideload_buildings:
                data['buildings'] = serializers.BuildingSerializer(
                    instance=buildings,
                    many=True,
                    context=serializer_context).data

        if sideload_campuses:
            campuses = models.Campus.objects.filter(
                or_item_query([b.campus_id for b in buildings]))

            data['campuses'] = serializers.CampusSerializer(
                instance=campuses,
                many=True,
                context=serializer_context).data

        if sideload_room_resources:
            # note that Django has already selected the room_resources for us
            # because of the `prefetch_related` in the `get_queryset`
            room_resources = set()
            for r in rooms:
                for x in r.room_resource.all():
                    room_resources.add(x)
            data['room_resources'] = serializers.RoomResourceSerializer(
                instance=list(room_resources),
                context=serializer_context,
                many=True).data

        if sideload_room_resource_categories:
            room_resource_categories = \
                models.RoomResourceCategory.objects.filter(
                    roomresource__room__pk__in=[
                        r.id for r in rooms]).distinct()
            data['room_resource_categories'] = serializers.\
                RoomResourceCategorySerializer(
                    instance=room_resource_categories,
                    context=serializer_context,
                    many=True).data

        if sideload_cloud_files:
            cloud_files = list(set(cloud_files))
            data['cloud_files'] = CloudFileSerializer(
                instance=cloud_files,
                context=serializer_context,
                many=True).data

        if room_image_files:
            data['room_images'] = CloudFileSerializer(
                instance=list(room_images),
                context=serializer_context,
                many=True).data

        if sideload_calendars:
            # note that Django has already selected the calendars for us
            # because of the `select_related` in the `get_queryset`
            data['calendars'] = CalendarSerializer(
                instance=[r.calendar for r in rooms if r.calendar_id],
                many=True,
                context=serializer_context,
                ).data

        return data

Sideloading in DRF v1 cont again

class RoomViewSet(viewsets.ModelViewSet):
    def get_sideload_data(self, request, rooms):
        # continued from previous slide...
        if sideload_licenses:
            data['licenses'] = LicenseSerializer(
                instance=list(licenses),
                context=serializer_context,
                many=True).data

        if sideload_floors or sideload_campuses or sideload_buildings:
            floor_ids = list(floor_ids)

            floors = models.Floor.objects.filter(
                or_item_query([r.floor_id for r in rooms]))

            if sideload_floors:
                data['floors'] = serializers.FloorSerializer(
                    instance=floors,
                    context=serializer_context,
                    many=True).data

        if sideload_buildings or sideload_campuses:
            buildings = models.Building.objects.filter(
                or_item_query([f.building_id for f in floors]))

            if sideload_buildings:
                data['buildings'] = serializers.BuildingSerializer(
                    instance=buildings,
                    many=True,
                    context=serializer_context).data

        if sideload_campuses:
            campuses = models.Campus.objects.filter(
                or_item_query([b.campus_id for b in buildings]))

            data['campuses'] = serializers.CampusSerializer(
                instance=campuses,
                many=True,
                context=serializer_context).data

        if sideload_room_resources:
            # note that Django has already selected the room_resources for us
            # because of the `prefetch_related` in the `get_queryset`
            room_resources = set()
            for r in rooms:
                for x in r.room_resource.all():
                    room_resources.add(x)
            data['room_resources'] = serializers.RoomResourceSerializer(
                instance=list(room_resources),
                context=serializer_context,
                many=True).data

        if sideload_room_resource_categories:
            room_resource_categories = \
                models.RoomResourceCategory.objects.filter(
                    roomresource__room__pk__in=[
                        r.id for r in rooms]).distinct()
            data['room_resource_categories'] = serializers.\
                RoomResourceCategorySerializer(
                    instance=room_resource_categories,
                    context=serializer_context,
                    many=True).data

        if sideload_cloud_files:
            cloud_files = list(set(cloud_files))
            data['cloud_files'] = CloudFileSerializer(
                instance=cloud_files,
                context=serializer_context,
                many=True).data

        if room_image_files:
            data['room_images'] = CloudFileSerializer(
                instance=list(room_images),
                context=serializer_context,
                many=True).data

        if sideload_calendars:
            # note that Django has already selected the calendars for us
            # because of the `select_related` in the `get_queryset`
            data['calendars'] = CalendarSerializer(
                instance=[r.calendar for r in rooms if r.calendar_id],
                many=True,
                context=serializer_context,
                ).data

        return data

Sideloading in DRF v2

class UserAPIViewset(SideloadViewSet):
    queryset = models.User.objects.all()
    serializer_class = serializers.UserSerializer
    filter_backends = (
        core_filters.IdFilter,
        core_filters.BooleanFieldFilterFactory('is_active'),
        core_filters.BooleanFieldFilterFactory('is_admin', 'is_ebadmin'),
        core_filters.DateTimeFilterFactory('created_at'),
        core_filters.DateTimeFilterFactory('updated_at'),
        account_filters.GroupIdFilter,
    )
    resource_name = 'user'
    resource_name_plural = 'users'

    sideload_relations = {
        'organizations': {
            'serializer': serializers.CompanyInfoSerializer,
            'field': 'company_id'
        },
        'groups': {
            'serializer': serializers.GroupSerializer,
            'manager': True,
            'field': 'ebgroups'
        },
        'calendars': {
            'serializer': 'calendars.drf.v4.serializers.CalendarSerializer',
            'manager': True,
            'field': 'calendar_set',
        }
    }

Sideloading Implementation

class SideloadViewSet(viewsets.ModelViewset):
    def get_sideload_data(self, request, resources):
        resources = [resources] if isinstance(resources, Model) else resources
        extra_response, context = {}, self.get_serializer_context()

        for field in self.sideload_fields_to_show(request):
            f = self.sideload_relations.get(field)
            if f is None: continue

            serializer = f['serializer']

            if f.get('manager', False):
                # The related objects are ManyToMany or a reverse ForeignKey
                field_obj_ids = []
                for x in resources:
                    field_obj_ids.extend(self.get_related_ids(x, f['field']))
            else:
                # The related object is a ForeignKey, a OneToOne, or a property
                field_obj_ids = [getattr(x, f['field']) for x in resources]

            # serialize the data
            if f.get('include_archived', False):
                qs = serializer.Meta.model.all_objects.filter(id__in=field_obj_ids)
            else:
                qs = serializer.Meta.model.objects.filter(id__in=field_obj_ids)
            extra_response[field] = serializer(qs, context=context, many=True).data

        return extra_response

Where all the magic happens

Sideloading Implementation

class SideloadViewSet(viewsets.ModelViewSet):
    resource_name = None
    resource_name_plural = None
    sideload_relations = {}

    def __init__(self, *args, **kwargs):
        """Lazy load the sideload relation serializers."""
        super(SideloadViewSet, self).__init__(*args, **kwargs)

        self.validate_resource_name()
        self.init_serializers()

    def init_serializers(self):
        """Initializes special serializers, like the ones that sideload data."""
        for field in self.sideload_relations:
            self.sideload_relations[field]['serializer'] = \
                self.get_sideload_serializer(field)

    def validate_resource_name(self):
        """Validates that `resource_name` and `resource_name_plural` are set correctly."""
        if self.resource_name is None:
            raise self.ResourceNameException(
                'You must set `resource_name` on the viewset.')

        if self.resource_name_plural is None:
            raise self.ResourceNameException(
                'You must set `resource_name_plural` on the viewset.')

Avoid circular import issues

Sideloading Implementation

# avoid circular imports etc
class SideloadViewSet(viewsets.ModelViewset):
    def get_sideload_serializer(self, field):
        fqp = self.sideload_relations.get(field, {}).get('serializer', '')

        # it is already a serializer, return now
        if isinstance(fqp, SerializerMetaclass):
            return fqp

        if not isinstance(fqp, str):
            raise self.InvalidSideloadSerializer(
                'Invalid serializer for {}'.format(field))

        app, serializer = fqp.rsplit('.', 1)

        try:
            serializer = importlib.import_module(app).__dict__.get(serializer)
        except ImportError:
            raise self.NotImportableSerializer(
                'Model path {} is not importable for'
                ' sideload_relation {}'.format(fqp, field)
            )
        return serializer

class UserAPIViewset(SideloadViewSet):
    sideload_relations = {
        'calendars': {
            'serializer': 'calendars.drf.v4.serializers.CalendarSerializer',
            'manager': True,
            'field': 'calendar_set',
        }
    }

Sideloading Implementation

# avoid circular imports etc
class SideloadViewSet(viewsets.ModelViewSet):
    def init_serializers(self):
        """Initializes special serializers, like the ones that sideload data."""
        for field in self.sideload_relations:
            self.sideload_relations[field]['serializer'] = \
                self.get_sideload_serializer(field)

    def get_sideload_serializer(self, field):
        """
        Handle importing the related serializers.
        """
        fqp = self.sideload_relations.get(field, {}).get('serializer', '')

        # it is already a serializer, return now
        if isinstance(fqp, SerializerMetaclass):
            return fqp

        if not isinstance(fqp, str):
            raise self.InvalidSideloadSerializer(
                'Invalid serializer for {}'.format(field))

        app, serializer = fqp.rsplit('.', 1)

        try:
            serializer = importlib.import_module(app).__dict__.get(serializer)
        except ImportError:
            raise self.NotImportableSerializer(
                'Model path {} is not importable for'
                ' sideload_relation {}'.format(fqp, field)
            )
        return serializer

Wins

  • This implementation significantly reduces boiler plate code
  • Easy for any backend dev to utilize
  • Time to build an API is almost completely determined by the effort to build the serializers (this is generally fast)
  • Seems to cover all of our needs, we haven't had to extend it in awhile

Things to Consider/Improve

  • Building the extra sideload data does require a nested loop.

 

 

  • Fetching the related model ids for M2M or reverse relationships can result in multiple db queries.  How can we do this in a single query?

 

 

 

 

 

  • How does this work with relationships crossing services?
if f.get('manager', False):
    # The related objects are ManyToMany or a reverse ForeignKey
    field_obj_ids = []
    for x in resources:
        field_obj_ids.extend(self.get_related_ids(x, f['field']))
for field in self.sideload_fields_to_show(request):
    field_obj_ids = [getattr(x, f['field']) for x in resources]

Thanks

Lucas Roesler (lucasroesler.com)

Made with Slides.com