Njuskalo.hr

Miro Svrtan (@msvrtan)

15y PHP veteran

Now working in ZizooBoats (Vienna-based startup)

@Njuškalo guy for 5y

Technical analyst
Development team leader
Architect
Developer

Njuskalo.hr introduction

online classifieds platform

similar to willhaben.at / craiglist

founded in 2007
part of Styria Media Group AG
startup inside a corporation

Team organization

2009 vs 2014 #1

Visitors

425.000 (06/2009)

1.000.000 (01/2014)

Source: Gemius Audience

PageViews

25.000.000 (06/2009)

295.000.000 (01/2014)

Source: Gemius Audience

2009 vs 2014 #2

Disk usage

< 30GB to ~10TB

Peak bandwidth

< 30Mb/s to > 450Mb/s

DB

~1GB to ~360GB

2009 to 2014 #1

2009 to 2014 #2

2014

Platform: software

LAMP stack
Nginx proxy
Sphinx search
Memcached + Redis

Platform: code

in house (proprietary) PHP framework

2009: many flavours of OS FWs
2014: IMHO loosing a battle to OS FWs

small team vs vast community
learning best practices
introduction to new developers

Platform: Physical location

local data center

low network latency -> 90% local traffic
legal issues

Platform: Hardware

bare metal servers

pros

dedicated resources
constant performance

cons

no quick scaling

months instead of minutes

hard service isolation

add new service
ie. Sphinx example

Living on the edge

developing features with no room to spare

concentrating on performance instead of feature
hard & exhausting

measure
say no, change specs...
measure
develop
optimize
test
deploy

We got room

new shiny server is here

vastly oversized -> easy living
developers stop thinking about performances

develop
test
deploy
measure
optimize

Architecture #1

Front servers

2 servers
DNS round robin
Nginx
work

load balancers
SSL offloading
caching of assets & images
gziping nonbinary content

Application servers

6 boxes
Apache
work

application
images
assets

host

memcached cluster
REDIS slaves

Main DB server

MySQL master
REDIS master

Main search server

Sphinx server
MySQL slave

Sphinx indexing
backups

Banner server

serves banners :)

In 2009

all of that was running on 1 server

Staging servers

2 servers
same architecture as production

virtualized environment
smaller scale

"used" for final testing

Test Server

hosts >10 Njuškalo applications

test1.example.com
test2.example.com
..
test10.example.com

most locations serve only 1 new feature/bug-fix

helps to cherry pick what is ready for production
multi feature development

Multi feature development

test locations

test1.example.com
test2.example.com
..
test10.example.com

git

moved from subversion
always working on

main branch + that feature changes

Maintenance vs development

maintenance

small change requests/features + bugfix
kanban style
low defined specs
fast changing priorities + specs

development

new features
waterfall model

switching to scrum

high defined specs

Startup

Enterprise

Becoming an enterprise

transformation of goals/mindsets
not something in a road map

When?

mistakes cost more than

better/longer preparation
more testing
additional hardware

Journey

started with ~0 scaling experience

hard to come by

there is no manual
listen & read what/how others did it

Broaden your horizons!

Common performance pitfalls

not indexed queries
SELECT ... GROUP BY ...
SELECT DISTINCT ...
SELECT ... WHERE X in (SELECT Y FROM .... )

use slow queries log

Low hanging fruit

cache rarely changed content

or implement cache busting

implement versioning & long expires headers

images
javascript
css

separate database server

Horizontal scaling: application

moving from 1 to 2 application servers

hardest step

shared resources

sessions
data
cache

Horizontal scaling: database

replication safe queries & engines
moving from MyISAM to InnoDB

migrating 250 GB of data

implemented slave, ready for cluster

Test scaling results

benchmark before & after
used

siege
ab (Apache benchmark)

short running tests

testing performance
locating bottlenecks

long running test

testing environment
verify stability

Monitoring

started with

dstat
gemius

added

cacti
Google Analytics real time
Graphite
NewRelic

Autoload

irritated with requires/includes
different naming schemes
before PSR-0
parsing PHP files to generate map
file array vs APC user cache
after 4 years found a CRITICAL bug

logical flaw
production crashes

So when can you be sure your 100 lines of code are bug free?

If it worked first 10.000 times?

.. 100.000 times?

... 1.000.000 times?

... 1.000.000.000 times?

Well 8.000.000.000 times was not enough for this one :)

Limited feature availability #1

allow only some users to access new feature
case study: redesign
included

new design: switch to responsive
code refactor of most userland pages

Limited feature availability #2

after exhaustive testing on staging
Phase 1: using separate server

hand picked IP addresses
corporate network

Phase 2: using limited feature

corporate network
started with 5% of visitors

incremented to 100%

Tips n tricks:

Linux desktop

switched development team to Linux desktop

easier environment to setup
get to know terminal

similar to production

reduces fear when working on production

Tips n tricks:

face 2 face discussions

improves communication
meetings outside office when ever possible

less interruptions
more relaxed atmosphere
change of scenery
boosts creativity & engagement

Tips n tricks:

analyze seasonality

locate

peaks
low times

predict user behavior -> scaling
we found patterns based on

hours in day
day of week
months in year
weather!

Tips n tricks:

deployment guidelines

not after 4pm
avoid fridays
upgrade during night

Tips 'n tricks: Attitude

Questions?

Thank you

Miro Svrtan

@msvrtan

Tech from startup to enterprise