Q1 Review

HootSuite Ops


How it started



Q1 Priorities


- SJC Migration
- Multi AZĀ 
- Documentation
- Cost Savings
- Monitoring (graphite, sensu)

...as of Jan 22, 2014

How did it go?

SJC Migration


- ow.ly 99% moved and STABLE
- streaming 75% moved
- kerberos 50% moved

Monitoring

- sensu in production, nagios OFF
- graphite & statsd (mostly) stable
- PagerDuty live
- Status page live

Multi AZ

- heavily delayed, scope unclear

Documentation

- cookbook useful
- confluence useful
- ongoing (evernote, other sources)
- social contract

Cost Savings

- on hold

What happened?



What ELSE we DID

- outages....(DDOS, etc etc etc etc)
- nginx conversion
- 400+ pagerduty alerts
- Mugu marketing site
- Graphite redone 3 times (gluster lol)
- CDB (us-west), backup
- Ansible (build pipeline, repos)
- on call process solid
- Jira ops board
- DNS naming
- snapshot manager
- owly backups
- session DB
- elasticache for owly

JIRA TICKETS


WE WORKED AS A TEAM


WE LEARNED A LOT

Q2 & BEYOND


- finish Q1

- Multi AZ
- SPOF
- security (IAM, kerb)

- stable ops tools
- stable monitoring

Q1 Review

By Jeff Oliver

Q1 Review

  • 267