PagerDuty: Production Support as Front End Developer

Xinjiang Shao



  • The Mindset
  • Incident Examples In the past
  • Most Common Questions
  • Tools
  • Demo

The Mindset

  • Stay Calm
  • On-Call !== Bug Triage
  • Prefer to no have any code changes when being on call
    • Restart Service (e.g Flip Blue/Green)
    • Content republish
    • Site config to disable feature
    • Data changes or direct the request to the right team

Incident #1

CMDM is down


  • Logged in users cannot checkout in GNTC and MRTN
  • The new account cannot be created, existing user cannot log in
  • Coupons cannot be clipped


We published a system-wide notification

Incident #2

Client Log 503/413 Status


An increasing amount of client log requests


- The rate limit is added from Cloudflare for client log

- Add new site config to control the intervals of sending client log

- New ways of sending logs in batch from azure event hub

Incident #3

Missing PodBag


Missing podbag for delivery service, however the location config indicated the podbag is enabled


- Ask bus-system dev to fix the data configuration

- Gracefully turn off podbag if the product info cannot be retrieved after the incident


Incident #4

iOS App Native Login


App users cannot login 


- Disable native login from site config


Most Common Questions for Front End Devs

  • What kind of functionality would be impacted if we restart service x (e.g loyalty account API)?
  • Are our users(customers) recovered from the incident?
  • Could we disable feature x (e.g client log)?
  • When did certain incident start?

Common Root Causes for Other Teams

  • Copient/Quotient Offers
  • Informix DB perf
  • Too many concurrent users
  • MDM
  • Running out of disk
  • DNS server
  • Firewall outage
  • Expired SSL certificates
  • VMWare in QTS datacenter


  • Datadog RUM
  • Client Log in Splunk
  • JIRACore
  • Sourcemap in Prod
  • SuperUser
  • Webbase (BusSys)
  • CloudFlare (Security Team)
  • Shape Security (Security Team)
  • Optimizely (Product Team)

Demo Time

Made with