Surviving System Design Interviews

Agenda

  • How System Design Interviews Work

    • What interviewers really evaluate

    • Core technical concepts

    • The 4-step framework

    • Communication tips

  • Ticketmaster: Concert booking/Flight booking system

How System Design Interviews Work

What You Must Demonstrate

  • Requirements: scope, constraints, NFRs, edge cases
  • Architecture: components + data flows + bottlenecks
  • Data: models, indexes, consistency, transactions
  • Scale: caching, sharding, async, capacity thinking
  • Reliability: retries, idempotency, fallbacks, observability

What Interviewers Evaluate

They Evaluate They Look For
Problem-Solving Structured thinking, clarifying questions
Technical Depth Trade-offs, not just buzzwords
Communication Clear explanations, diagrams
Trade-off Analysis Pros and cons of decisions

It's Not About the "Perfect" Solution

Core Concepts (Beginners)

REST API Basics

POST   /bookings      → Create
GET    /bookings/:id  → Read
PUT    /bookings/:id  → Update
DELETE /bookings/:id  → Delete

Status Codes:

  • 200 OK | 201 Created | 400 Bad Request | 404 Not Found | 500 Internal Server Error

Caching

Why Cache?

  • Reduce database load
  • Faster responses
  • Handle traffic spikes

Where?

  • Redis/Memcached (most common)
  • CDN (static files)
  • Application memory

Load Balancing

           ┌─────────────┐
           │Load Balancer│
           └──────┬──────┘
        ┌─────────┼─────────┐
        ▼         ▼         ▼
    ┌───────┐ ┌───────┐ ┌───────┐
    │Server1│ │Server2│ │Server3│
    └───────┘ └───────┘ └───────┘
    

Strategies:

  • Round Robin - Equal distribution
  • Least Connections - To least busy

Databases: SQL vs NoSQL

CAP Theorem

In distributed systems, choose two of three: Consistency, Availability, Partition Tolerance.

Advanced Concepts (Experts)

CAP Theorem

Interview Quote: "For payments, I choose CP (PostgreSQL) - better an error than a duplicate charge. For the activity feed, AP (Cassandra) is fine - users tolerate stale data."

Choice Prioritizes Best For Examples
CP Consistency Payments, bookings PostgreSQL, MongoDB
AP Availability Feeds, caching Cassandra, DynamoDB

Since network partitions are unavoidable, the real choice is:

Concurrency

Technique Use When Example
Idempotency Keys Retries possible Payments
Optimistic Locking Low contention Seat booking
Pessimistic Locking High contention Inventory

Code Example:

UPDATE seats SET status = 'booked', version = version + 1
WHERE seat_id = 123 AND version = 5;

Partitioning - Single Database

  • Horizontal: Split rows across tables (e.g., orders_2023, orders_2024)
  • Vertical: Split columns (e.g., user_profile vs user_settings)

Trade-off: "Sharding adds complexity but unlocks write scalability beyond replicas."

Sharding - Multiple Databases

  • User-based: Users 1-1M → Shard 1, Users 1M-2M → Shard 2
  • Region-based: US users → US datacenter, EU users → EU datacenter

Trade-off: "Sharding adds complexity but unlocks write scalability beyond replicas."

Sharding & Partitioning

Partitioning Sharding
Same database Different databases
Simpler queries Complex cross-shard queries
Limited scale Massive scale

CQRS (Command Query Responsibility Segregation)

┌─────────────┐     ┌─────────────────┐
│   WRITES    │────▶│  Write Database │
│ (Commands)  │     │   (PostgreSQL)  │
└─────────────┘     └────────┬────────┘
                             │ Sync
┌─────────────┐     ┌────────▼────────┐
│   READS     │◀────│  Read Database  │
│  (Queries)  │     │  (Elasticsearch)│
└─────────────┘     └─────────────────┘

Use When: Read patterns differ significantly from write patterns.

Event Sourcing

Store events, not state:

Event 1: SeatReserved { seat_id: 123, user_id: 456 }
Event 2: PaymentProcessed { booking_id: 789 }
Event 3: BookingConfirmed { booking_id: 789 }
  • Complete audit trail (compliance, debugging)
  • Can replay to any point in time
  • Natural fit for distributed systems

Interview Tip: "For audit requirements, I would use event sourcing."

Service Mesh

Infrastructure layer for microservices:

Feature Benefit
mTLS Secure service-to-service communication
Retries/Timeouts Automatic resilience
Observability Distributed tracing, metrics
Traffic control Canary deployments, A/B testing

Interview Tip: "For microservices communication, a service mesh handles cross-cutting concerns."

Resilience

Circuit Breaker:

  • CLOSED → requests pass
  • OPEN → requests fail fast
  • HALF-OPEN → test recovery

Dead Letter Queue:

  • Failed messages → separate queue
  • Manual review or retry later

The Interview Framework

1. Requirements    2. High-Level     3. Deep Dive    4. Scale
   (5 min)            (10 min)         (15-20 min)     (10 min)
     │                   │                 │              │
     ▼                   ▼                 ▼              ▼
  Clarify            Big Picture        Database      Bottlenecks
  Constraints        Diagram            API           1M+ users

Step 1 - Clarify Requirements

Functional: What should it do?

  • "Can users search by location?"
  • "Do we need a seat map?"

Non-Functional: How well?

  • "How many users? 1K? 1M?"
  • "What latency is acceptable?"

Step 2 - High-Level Design

Content:

User → Load Balancer → API Gateway → Services → Database
                                        │
                                      Cache

Include:

  • Client/User
  • Load Balancer
  • API Gateway
  • Core Services
  • Database + Cache

Step 3 - Deep Dive

Database Design

  • Schema, indexes, replication

API Design

  • Key endpoints, error handling

Critical Path

  • The most important user journey
  • "What happens when user clicks Book?"

Step 4 - Scale and Bottlenecks

Identify Bottlenecks:

  • "What if 1M users hit ticket drop?"
  • "How to prevent overselling?"

Strategies:

  • Horizontal scaling
  • Read replicas
  • Caching layers
  • Queue-based processing
  • Sharding

Communication Tips

Do:

  • Lead with overview, then zoom in
  • Think aloud - share reasoning
  • Acknowledge trade-offs
  • Use simple left-to-right diagrams

Don't:

  • Jump to database schema first
  • Stay silent while thinking
  • Ignore interviewer hints
  • Draw complex nested diagrams

Surviving System Design Interviews

By Andrés Santos

Surviving System Design Interviews

  • 37