Frontend Data Normalization

Flat Structure, Retrieval In O(1), No Duplication, Less Re-renderings

About Speaker

Muhammad Omer Khan

Senior Software Engineer @ KoderLabs

/omerkhan8

@omerkhan97

What is Normalization?

Normalization is the process of reducing a complex data structure into its simplest, most stable structure to minimize redundancy.

Why do we need normalization on frontend?

Many applications deal with data that is nested or relational in nature. For example, a blog editor could have many Posts, each Post could have many Comments, and both Posts and Comments would be written by a User. Data for this kind of application might look like:

const blogPosts = [
  {
    id: 'post1',
    author: { username: 'user1', name: 'User 1' },
    body: '......',
    comments: [
      {
        id: 'comment1',
        author: { username: 'user2', name: 'User 2' },
        comment: '.....'
      },
      {
        id: 'comment2',
        author: { username: 'user3', name: 'User 3' },
        comment: '.....'
      }
    ],
    {
        id: 'comment3',
        author: { username: 'user3', name: 'User 3' },
        comment: '.....'
    },
    {
        id: 'comment4',
        author: { username: 'user1', name: 'User 1' },
        comment: '.....'
    },
  },
  // and repeat many times
]

Notice that the structure of the data is a bit complex, and some of the data is repeated. This is a concern for several reasons:

When a piece of data is duplicated in several places, it becomes harder to make sure that it is updated appropriately.

Nested data means that the corresponding reducer logic has to be more nested and therefore more complex. In particular, trying to update a deeply nested field can become very ugly very fast.

Since immutable data updates require all ancestors in the state tree to be copied and updated as well, and new object references will cause connected UI components to re-render, an update to a deeply nested data object could force totally unrelated UI components to re-render even if the data they're displaying hasn't actually changed.

Designing a Normalized State

The basic concepts of normalizing data are:

Each type of data gets its own "table" in the state.

Each "data table" should store the individual items in an object, with the IDs of the items as keys and the items themselves as the values.

Any references to individual items should be done by storing the item's ID.

Arrays of IDs should be used to indicate ordering.

An example of a normalized state structure for the blog example above might look like:

{
    post: {
        post1: {
            id: "post1",
            authorId: "user1",
            body: "......",
            commentIds: ["comment1", "comment2"]
        },
        post2: {
            id: "post2",
            authorId: "user2",
            body: "......",
            commentIds: ["comment3", "comment4", "comment5"]
        }
    },
    comment: {
        comment1: {
            id: "comment1",
            authorId: "user2",
            comment: "....."
        },
        comment2: {
            id: "comment2",
            authorId: "user3",
            comment: "....."
        },
        comment3: {
            id: "comment3",
            authorId: "user3",
            comment: "....."
        },
        comment4: {
            id: "comment4",
            authorId: "user1",
            comment: "....."
        },
        comment5: {
            id: "comment5",
            authorId: "user3",
            comment: "....."
        }
    },
    user: {
        user1: {
            username: "user1",
            name: "User 1"
        },
        user2: {
            username: "user2",
            name: "User 2"
        },
        user3: {
            username: "user3",
            name: "User 3"
        }
    }
};

This state structure is much flatter overall. Compared to the original nested format, this is an improvement in several ways:

Because each item is only defined in one place, we don't have to try to make changes in multiple places if that item is updated.

The reducer logic doesn't have to deal with deep levels of nesting, so it will probably be much simpler.

The logic for retrieving or updating a given item is now fairly simple and consistent. Given an item's type and its ID, we can directly look it up in a couple simple steps, without having to dig through other objects to find it.

Since each data type is separated, an update like changing the text of a comment would only require new copies of the "comments > byId > comment" portion of the tree. This will generally mean fewer portions of the UI that need to update because their data has changed. In contrast, updating a comment in the original nested shape would have required updating the comment object, the parent post object, the array of all post objects, and likely have caused all of the Post components and Comment components in the UI to re-render themselves.

Note that a normalized state structure generally implies that more components are connected and each component is responsible for looking up its own data, as opposed to a few connected components looking up large amounts of data and passing all that data downwards. As it turns out, having connected parent components simply pass item IDs to connected children is a good pattern for optimizing UI performance in a React Redux application, so keeping state normalized plays a key role in improving performance.

// Parent Component
const Posts = props => {
    return (
        <div>
            {postIds.map(postId => (
                <PostItem key={postId} postId={postId} />
            ))}
        </div>
    );
};

// Child Component
const PostItem = props => {
    const postData = useSelector(state => state.entities.post[props.postId]);
    const postAuthor = useSelector(state => state.entities.user[postData.authorId]);

    return (
        <div>
            <span>{postData.body}</span>
            <span>{postAuthor.name}</span>
        </div>
    );
};

Organizing Normalized Data in State

A typical application will likely have a mixture of relational data and non-relational data. While there is no single rule for exactly how those different types of data should be organized, one common pattern is to put the relational "tables" under a common parent key, such as "entities". A state structure using this approach might look like:

{
    simpleDomainData1: {....},
    simpleDomainData2: {....},
    entities : {
        entityType1 : {....},
        entityType2 : {....}
    }
}

Relationships and Tables:

Because we're treating a portion of our Redux store as a "database", many of the principles of database design also apply here as well. For example, if we have a many-to-many relationship, we can model that using an intermediate table that stores the IDs of the corresponding items (often known as a "join table" or an "associative table"). For consistency, we would probably also want to use the same byId and allIds approach that we used for the actual item tables, like this:


{
    entities: {
        authors : { byId : {}, allIds : [] },
        books : { byId : {}, allIds : [] },
        authorBook : {
            byId : {
                1 : {
                    id : 1,
                    authorId : 5,
                    bookId : 22
                },
                2 : {
                    id : 2,
                    authorId : 5,
                    bookId : 15,
                },
                3 : {
                    id : 3,
                    authorId : 42,
                    bookId : 12
                }
            },
            allIds : [1, 2, 3]

        }
    }
}

What's Next?

normalizr