Tech from startup to enterprise
Miro Svrtan (@msvrtan)
15y PHP veteran
Now working in ZizooBoats (Vienna-based startup)
@Njuškalo guy for 5y
Technical analyst
Development team leader
Architect
Developer
Njuskalo.hr introduction
- online classifieds platform
- similar to willhaben.at / craiglist
- founded in 2007
- part of Styria Media Group AG
- startup inside a corporation
Team organization
2009 vs 2014 #1
Visitors
425.000 (06/2009)
1.000.000 (01/2014)
Source: Gemius Audience
PageViews
25.000.000 (06/2009)
295.000.000 (01/2014)
Source: Gemius Audience
2009 vs 2014 #2
Disk usage
< 30GB to ~10TB
Peak bandwidth
< 30Mb/s to > 450Mb/s
DB
~1GB to ~360GB
2009 to 2014 #1
2009 to 2014 #2
2014
Platform: software
- LAMP stack
- Nginx proxy
- Sphinx search
- Memcached + Redis
Platform: code
- in house (proprietary) PHP framework
- 2009: many flavours of OS FWs
- 2014: IMHO loosing a battle to OS FWs
- small team vs vast community
- learning best practices
- introduction to new developers
Platform: Physical location
- local data center
- low network latency -> 90% local traffic
- legal issues
Platform: Hardware
- bare metal servers
- pros
- dedicated resources
- constant performance
- cons
- no quick scaling
- months instead of minutes
- hard service isolation
- add new service
- ie. Sphinx example
Living on the edge
- developing features with no room to spare
- concentrating on performance instead of feature
- hard & exhausting
- measure
- say no, change specs...
- measure
- develop
- optimize
- test
- deploy
We got room
- new shiny server is here
- vastly oversized -> easy living
- developers stop thinking about performances
- develop
- test
- deploy
-
measure
-
optimize
Architecture #1
Front servers
- 2 servers
- DNS round robin
- Nginx
- work
- load balancers
- SSL offloading
- caching of assets & images
- gziping nonbinary content
Application servers
- 6 boxes
- Apache
- work
- application
- images
- assets
- host
- memcached cluster
- REDIS slaves
Main DB server
- MySQL master
- REDIS master
Main search server
- Sphinx server
- MySQL slave
- Sphinx indexing
- backups
Banner server
- serves banners :)
In 2009
all of that was running on 1 server
Staging servers
- 2 servers
- same architecture as production
- virtualized environment
- smaller scale
- "used" for final testing
Test Server
- hosts >10 Njuškalo applications
- test1.example.com
- test2.example.com
- ..
- test10.example.com
- most locations serve only 1 new feature/bug-fix
- helps to cherry pick what is ready for production
- multi feature development
Multi feature development
- test locations
- test1.example.com
- test2.example.com
- ..
- test10.example.com
- git
- moved from subversion
- always working on
- main branch + that feature changes
Maintenance vs development
- maintenance
- small change requests/features + bugfix
- kanban style
- low defined specs
- fast changing priorities + specs
- development
- new features
- waterfall model
- switching to scrum
- high defined specs
Startup
Enterprise
Becoming an enterprise
- transformation of goals/mindsets
- not something in a road map
When?
- mistakes cost more than
- better/longer preparation
- more testing
- additional hardware
Journey
- started with ~0 scaling experience
- hard to come by
- there is no manual
- listen & read what/how others did it
Broaden your horizons!
Common performance pitfalls
- not indexed queries
- SELECT ... GROUP BY ...
- SELECT DISTINCT ...
- SELECT ... WHERE X in (SELECT Y FROM .... )
use slow queries log
Low hanging fruit
- cache rarely changed content
- or implement cache busting
- implement versioning & long expires headers
- images
- javascript
- css
- separate database server
Horizontal scaling: application
- moving from 1 to 2 application servers
- hardest step
- shared resources
- sessions
- data
- cache
Horizontal scaling: database
- replication safe queries & engines
- moving from MyISAM to InnoDB
- migrating 250 GB of data
- implemented slave, ready for cluster
Test scaling results
- benchmark before & after
- used
- siege
- ab (Apache benchmark)
- short running tests
- testing performance
- locating bottlenecks
- long running test
- testing environment
- verify stability
Monitoring
- started with
- dstat
- gemius
- added
- cacti
- Google Analytics real time
- Graphite
- NewRelic
Autoload
- irritated with requires/includes
- different naming schemes
- before PSR-0
- parsing PHP files to generate map
- file array vs APC user cache
- after 4 years found a CRITICAL bug
- logical flaw
- production crashes
So when can you be sure your 100 lines of code are bug free?
If it worked first 10.000 times?
.. 100.000 times?
... 1.000.000 times?
... 1.000.000.000 times?
Well 8.000.000.000 times was not enough for this one :)
Limited feature availability #1
- allow only some users to access new feature
- case study: redesign
- included
- new design: switch to responsive
- code refactor of most userland pages
Limited feature availability #2
- after exhaustive testing on staging
- Phase 1: using separate server
- hand picked IP addresses
- corporate network
- Phase 2: using limited feature
- corporate network
- started with 5% of visitors
- incremented to 100%
Tips n tricks:
Linux desktop
- switched development team to Linux desktop
- easier environment to setup
- get to know terminal
- similar to production
- reduces fear when working on production
Tips n tricks:
face 2 face discussions
- improves communication
- meetings outside office when ever possible
- less interruptions
- more relaxed atmosphere
- change of scenery
- boosts creativity & engagement
Tips n tricks:
analyze seasonality
- locate
- peaks
- low times
- predict user behavior -> scaling
- we found patterns based on
- hours in day
- day of week
- months in year
- weather!
Tips n tricks:
deployment guidelines
- not after 4pm
- avoid fridays
- upgrade during night
Tips 'n tricks: Attitude
Questions?
Thank you
Miro Svrtan
@msvrtan
Njuskalo.hr
By Miro Svrtan
Njuskalo.hr
- 2,610