Tech from startup to enterprise

Miro Svrtan (@msvrtan)


15y PHP veteran


Now working in ZizooBoats (Vienna-based startup)


@Njuškalo guy for 5y

Technical analyst
Development team leader
Architect
Developer

Njuskalo.hr introduction


  • online classifieds platform
    • similar to willhaben.at / craiglist
  • founded in 2007
  • part of Styria Media Group AG
  • startup inside a corporation

Team organization



2009 vs 2014 #1

Visitors

425.000 (06/2009)

1.000.000 (01/2014)

Source: Gemius Audience


PageViews

25.000.000 (06/2009)

295.000.000 (01/2014)

Source: Gemius Audience


2009 vs 2014 #2


Disk usage

< 30GB to ~10TB


Peak bandwidth

< 30Mb/s to > 450Mb/s


DB

~1GB  to ~360GB

2009 to 2014 #1




2009 to 2014 #2


2014



Platform: software


  • LAMP stack
  • Nginx proxy
  • Sphinx search
  • Memcached + Redis


Platform: code


  • in house (proprietary) PHP framework
    • 2009: many flavours of OS FWs
    • 2014: IMHO loosing a battle to OS FWs
      • small team vs vast community
      • learning best practices
      • introduction to new developers


Platform: Physical location


  • local data center
    • low network latency -> 90% local traffic
    • legal issues

Platform: Hardware


  • bare metal servers
    • pros
      • dedicated resources
      • constant performance
    • cons
      • no quick scaling
        • months instead of minutes
      • hard service isolation
        • add new service
        • ie. Sphinx example

Living on the edge


  • developing features with no room to spare
    • concentrating on performance instead of feature
    • hard & exhausting
      • measure
      • say no, change specs...
      • measure
      • develop
      • optimize
      • test
      • deploy

We got room


  • new shiny server is here
    • vastly oversized -> easy living
    • developers stop thinking about performances
      • develop
      • test
      • deploy
      • measure
      • optimize

Architecture #1

Front servers


  • 2 servers
  • DNS round robin
  • Nginx
  • work
    • load balancers
    • SSL offloading
    • caching of assets & images
    • gziping nonbinary content

Application servers


  • 6 boxes
  • Apache
  • work
    • application
    • images
    • assets
  • host
    • memcached cluster
    • REDIS slaves

Main DB server


  • MySQL master
  • REDIS master

Main search server


  • Sphinx server
  • MySQL slave
    • Sphinx indexing 
    • backups

Banner server


  • serves banners :)

In 2009



all of that was running on 1 server

Staging servers


  • 2 servers
  • same architecture as production
    • virtualized environment
    • smaller scale 
  • "used" for final testing

Test Server


  • hosts >10 Njuškalo applications
    • test1.example.com
    • test2.example.com
    • ..
    • test10.example.com
  • most locations serve only 1 new feature/bug-fix
    • helps to cherry pick what is ready for production
    • multi feature development

Multi feature development


  • test locations
    • test1.example.com
    • test2.example.com
    • ..
    • test10.example.com
  • git
    • moved from subversion
    • always working on
      • main branch + that feature changes

Maintenance vs development


  • maintenance
    • small change requests/features + bugfix
    • kanban style
    • low defined specs
    • fast changing priorities + specs
  • development
    • new features
    • waterfall model
      • switching to scrum
    • high defined specs

Startup


Enterprise

Becoming an enterprise



  • transformation of goals/mindsets
  • not something in a road map

When?



  • mistakes cost more than
    • better/longer preparation
    • more testing
    • additional hardware

Journey



  • started with ~0 scaling experience
    • hard to come by
  • there is no manual
  • listen & read what/how others did it

Broaden your horizons!


Common performance pitfalls


  • not indexed queries
  • SELECT ... GROUP BY ...
  • SELECT DISTINCT ...
  • SELECT ... WHERE X in (SELECT Y FROM .... )


use slow queries log

Low hanging fruit


  • cache rarely changed content
    • or implement cache busting
  • implement versioning & long expires headers
    • images
    • javascript
    • css 
  • separate database server

Horizontal scaling: application


  • moving from 1 to 2 application servers
    • hardest step
  • shared resources
    • sessions
    • data
    • cache

Horizontal scaling: database



  • replication safe queries & engines
  • moving from MyISAM to InnoDB
    • migrating 250 GB of data
  • implemented slave, ready for cluster

Test scaling results


  • benchmark before & after
  • used
    • siege
    • ab (Apache benchmark)
  • short running tests
    • testing performance
    • locating bottlenecks
  • long running test
    • testing environment
    • verify stability

Monitoring


  • started with
    • dstat
    • gemius
  • added
    • cacti
    • Google Analytics real time
    • Graphite
    • NewRelic

Autoload


  • irritated with requires/includes
  • different naming schemes
  • before PSR-0
  • parsing PHP files to generate map
  • file array vs APC user cache
  • after 4 years found a CRITICAL bug
    • logical flaw
    • production crashes







So when can you be sure your 100 lines of code are bug free?


If it worked first 10.000 times?

.. 100.000 times?

... 1.000.000 times?

... 1.000.000.000 times?

Well 8.000.000.000 times was not enough for this one :)

Limited feature availability #1


  • allow only some users to access new feature
  • case study: redesign
  • included
    • new design: switch to responsive
    • code refactor of most userland pages


Limited feature availability #2


  • after exhaustive testing on staging
  • Phase 1: using separate server
    • hand picked IP addresses
    • corporate network
  • Phase 2: using limited feature
    • corporate network
    • started with 5% of visitors
    • incremented to 100%

Tips n tricks:

Linux desktop


  • switched development team to Linux desktop
    • easier environment to setup
    • get to know terminal
      • similar to production
        • reduces fear when working on production

        Tips n tricks:

        face 2 face discussions



        • improves communication
        • meetings outside office when ever possible
          • less interruptions
          • more relaxed atmosphere
          • change of scenery
          • boosts creativity & engagement

        Tips n tricks:

        analyze seasonality


        • locate
          • peaks
          • low times
        • predict user behavior -> scaling
        • we found patterns based on
          • hours in day
          • day of week
          • months in year
          • weather!

        Tips n tricks:

        deployment guidelines




        • not after 4pm
        • avoid fridays
        • upgrade during night

        Tips 'n tricks: Attitude






        Questions?




        Thank you



        Miro Svrtan

        @msvrtan