Quantum Software Architecture Design Course

Written by: Igor Korotach

Practical Part 1

Instagram Architecture Overview

Instagram

Instagram is a free photo and video sharing app available on iPhone and Android. People can upload photos or videos to our service and share them with their followers or with a select group of friends. They can also view, comment and like posts shared by their friends on Instagram.

Instagram numbers

The 2 billion MAUs made Instagram the 4th largest social network worldwide, with 37.74% of the world’s 5.3 billion internet users accessing the app monthly.

February 2013: Instagram hit 100 million users (MAU)
June 2016: Instagram hit 500 million users (MAU)
June 2018: Instagram hit 1 billion users (MAU)
October 2022: Instagram hit 2 billion users (MAU)

Right now, the country with the largest addressable Instagram ad audience size is in India (229.55 million), followed by the United States (143.35 million) and Brazil (113.5 million).

Instagram numbers

Intersting statistics

100 million photos and videos are uploaded on Instagram every 24 hours
Around 800 million users access Reels every day (short videos)
Reels are shared 1 billion times every 24 hours
An average American use Instagram for 30 mins a day
Initially, only square (1:1) images were allowed to be uploaded, to match the display of iPhones, but later, other aspects were also allowed
More than 500 million users access Instagram daily, across the world
Almost 70% of all Instagram posts are branded (business-linked)

Functional requirements

User Authentication and Authorization: Secure user login, password recovery, and access control to personal data and content.
Content Creation and Sharing: Ability to upload, edit, and share photos, videos, and stories with various sharing options.
Content Consumption: Displaying user feeds, explore page, ability to like, comment, and share others' content.
Direct Messaging: Sending and receiving text, photos, and videos through direct messages, including group chats.
User Profiles: Creating, editing, and managing personal profiles, including bio, profile picture, and follower/following lists.
Search and Discovery: Finding users, hashtags, and content through search functionalities.

Non-Functional requirements

Performance: Ensuring fast load times, smooth scrolling, and efficient handling of large amounts of data and users.
Security: Protecting user data, preventing unauthorized access, and ensuring data privacy.
Usability: Creating an intuitive and user-friendly interface with easy navigation.
Scalability: The system should be able to handle increasing user numbers and data volume without compromising performance.
Reliability: Maintaining high availability and preventing system failures or data loss.
Accessibility & Usability: Designing the platform to be usable by different kinds of people.

Evolution of Design

Early Instagram architecture

Early Instagram Tech Stack

Python/Django/Gunicorn
Postgres
S3/Cloudfront
Redis + Memcached
Gearman/Celery + RabbitMQ
Sentry/Munin/Pingdom
AWS (later migrated to Facebook Data Centers)

Sharding challenge

Session: The application server grabs the latest relevant photo IDs from Postgres.

The application server would pull data from PostgreSQL, which stored most of Instagram’s data, such as users and photo metadata.

The connections between Postgres and Django were pooled using Pgbouncer.

Instagram sharded their data because of the volume they were receiving (over 25 photos and 90 likes a second). They used code to map several thousand ‘logical’ shards to a few physical shards.

An interesting challenge that Instagram faced and solved is generating IDs that could be sorted by time. Their resulting sortable-by-time IDs looked like this:

41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)

13 bits that represent the logical shard ID

10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

Thanks to the sortable-by-time IDs in Postgres, the application server has successfully received the latest relevant photo IDs.

Source: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c

Current Instagram architecture

Storage @ Instagram

PostgreSQL is still the primary database of the application, it stores most of the data of the platform such as user data, tags, meta-tags, etc.

The main database cluster of Instagram contains 12 replicas in different zones and involves 12 Quadruple extra large memory instances.

Hive is used for data archiving. It’s a data warehousing software built on top of Apache Hadoop for data query and analytics capabilities. A scheduled batch process runs at regular intervals to archive data from PostgreSQL DB to Hive.

Vmtouch is used to manage in-memory data when moving from one machine to another.

Cassandra is used for analytics data storage.

Redis & Memcached are still used for caching

Data pipeline

Challenge: Computing Trending HashTags On The Platform

Pre-processor Node
The pre-processor node attaches the necessary data needed to apply filters on the original media that has metadata attached to it.

Parser Node
The parser node extracts all the hashtags attached to an image and applies filters to it.

Scorer Node
The scorer node keeps track of the counters for each hashtag based on time. All the counter data is kept in the cache, also persisted for durability.

Ranker Node
The role of the ranker node is to compute the trending scores of hashtags. The trends are served from a read-through cache that is Memcache and the database is Postgres.

Challenge: Search

Instagram initially used Elasticsearch for its search feature but later migrated to Unicorn, a social graph-aware search engine built by Facebook in-house.

Architecture assessment

I will quote the instagram guiding design principles as a basis for their architecture.

Instagram’s Guiding Principles:

Keep things very simple.
Don’t re-invent the wheel.
Use proven, solid technologies when possible.

Useful links

https://scaleyourapp.com/facebook-real-time-chat-architecture-scaling-with-over-multi-billion-messages-daily/

https://instagram-engineering.com/

https://engineering.fb.com/tag/instagram/

https://engineering.fb.com/2023/08/15/developer-tools/immortal-objects-for-python-instagram-meta/

https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system/

Thanks for your attention. You've been awesome!

Questions?

Presentation link: https://slides.com/emulebest/quantum-architecture-course-instagram

Quantum Software Architecture Design Course

Practical Part 1

Instagram Architecture Overview

Instagram

Instagram numbers

Instagram numbers

Intersting statistics

Functional requirements

Non-Functional requirements

Evolution of Design

Early Instagram architecture

Early Instagram Tech Stack

Sharding challenge

Current Instagram architecture

Storage @ Instagram

Data pipeline

Challenge: Computing Trending HashTags On The Platform

Challenge: Computing Trending HashTags On The Platform

Challenge: Search

Architecture assessment

Useful links

Thanks for your attention. You've been awesome!

Questions?

Quantum Architecture Course - Instagram Architecture

Quantum Architecture Course - Instagram Architecture

Igor Korotach

Quantum Software Architecture Design Course

Practical Part 1

Instagram Architecture Overview

Instagram

Instagram numbers

Instagram numbers

Intersting statistics

Functional requirements

Non-Functional requirements

Evolution of Design

Early Instagram architecture

Early Instagram Tech Stack

Sharding challenge

Current Instagram architecture

Storage @ Instagram

Data pipeline

Challenge: Computing Trending HashTags On The Platform

Challenge: Computing Trending HashTags On The Platform

Challenge: Search

Architecture assessment

Useful links

Thanks for your attention. You've been awesome!

Questions?

Quantum Architecture Course - Instagram Architecture

More from Igor Korotach