Load TEsting Your App

AustinPHP April 2019

Ian Littman / @iansltx

follow along at https://ian.im/loadaus19

QUestions We'll Answer

What's the difference between
- A smoke test
- A load test
- A stress test
- A spike test
When should I load test?
How can I match my load test with (anticipated) reality for more useful results?
What bottlenecks should I be looking for when testing?
What tools can I use to throw load at my site?
How can I use one to test my application?

Questions we won't answer

How do I use {JMeter|Gatling|Molotov}?
How can I set up clustered load testing?
How can I simulate far-end users?
- Slow connections tie up server/load balancer resources for longer
- Solutions for slow connections (e.g. compression) may affect system capacity elsewhere
How can I do deep application profiling? (e.g. Blackfire)
What about single-user load testing? (e.g. running an import with a larger data set than usual)

A Challengr Appears

This will be our system under test

This is what we'll test with*

Siege - a quick, rather simple command line utility
- GitLab Large Staging Collider
k6 - write your tests in JS**, run via a Go binary***
- HAR import for in-browser recording

* More tools are listed at the end of this presentation.
** It's JS, but it uses goja, not V8 or Node, and doesn't have a global event loop yet.
*** I've used this on a project significantly more real than Challengr, so that's a big reason we're looking at it today.

#IFNDEF

Load TEst

<= peak traffic
Your system shouldn't break
If it does, it's a stress test

Stress Test

Trying to break your system
Surfaces bottlenecks
Increase traffic above peak or decrease available resources
Capacity Test is a subset

Soak Test

Extended test duration
Watch behavior on ramp down as well as ramp up
Memory leaks
Disk space exhaustion (logs!)
Filled caches

Spike Test

Stress test with quick ramp-up
Woot.com at midnight
TV ad "go online"
System comes back online
after downtime
Everyone hits your API via
on-the-hour cron jobs

Source: https://twitter.com/troyhunt/status/1102312963401109504

Smoke test

An initial test to confirm the system operates properly without a large amount of generated load
May be integration tests in your existing test suite
May be your load test script, turned down to one (thorough) iteration and one Virtual User (VU)
Do this before you load test

Now that we've defined our terms...

When?

When your application performance may change
- Adding/removing features
- Refactoring
- Infrastructure changes
When your load profile may change
- Initial app launch
- Feature launch
- Marketing pushes/promotions

What are your metrics?

Speed - response latency
Scalability - throughput, resource utilization
Stability - % failed calls/transactions/flows

How should I test?

Accurately.

What should I test?

Flows, not just single endpoints
Frequently used
Performance intensive
Business critical

Concurrent Requests != Concurrent Users

Think Time
API client concurrency
Caching (client-side or otherwise)

Oversimplification...It's a trap!

No starting data in database
No parameterization
No abandonment at each step in the process
No input errors
No think times
Static think times
Uniformly distributed think times
- Normal distribution
- Exponential distribution
Assuming you have one type of user
Assuming that a distribution is normal

Let's see what that looks like WIth K6

Vary Your Testing

High-load Case: heavier endpoints get called more often
Anticipated Case
Low-load Case: validation failures + think time

Understand your load test tool

e.g. arrival rate vs. looping

Keep it real

Use logs/analytics to determine your usage patterns
Run your APM (e.g. New Relic, Tideways) on your load test env
- Better profiling info
- You'll have the same perf hit as production
Is your environment code-ified? (e.g. Terraform, CloudFormation)
- Easier to copy envs
- Cheaper to set up an env for an hour to run a load test
Decide whether testing from near your env is accurate enough
Test autoscaling/load-shedding facilities

Aggregate your metrics repsonsibly

~~Average~~
Median (~50th percentile)
90th, 95th, 99th percentile
Standard Deviation
Distribution of results
Explain your outliers

Let's run a test!

Bottlenecks

Web Server + Database
- FPM workers/Apache processes
- DB Connections
- CPU + RAM utilization
- Network utilization
- Disk utilization (I/O or space)
Load balancer
- Network utilization/warmup
- Connection count
External Services
- Rate limits (natural or artificial)
- Latency
- Network egress

Queues
- Per-job spin-up latency
- Worker count
- CPU + RAM utilization
  - Workers
  - Broker
- Queue depth
Caches
- Thundering herd
- Churning due to
  cache evictions

Bottleneck Gotchas

Just because a request is heavy doesn't mean
it's the biggest source of load
As a system reaches capacity you'll see
nonlinear performance degradation

Let's fix some bottlenecks...

Bonus material: More Tools

Tsung
- Erlang (efficient, high volume from a single box)
- Flexible (not just HTTP)
- XML based config
The Grinder
- Java-based
- Java, Jython or Clojure scripts

Bees With Machine Guns
- Uses EC2 instances
- Python-based
Goad (Go inside Lambda)
Gatling (Java-based)
- Tests in Scala...
- ...or use the recorder
ab
httperf
Apache JMeter

BONUS MATERIAL: Even More Tools!

Artillery.io
- Node-based
- Simple stuff in Yaml, can switch to JS (including npm)
Molotov (by Mozilla)
- Python 3.5+, uses async IO via coroutines

Locust
- Python based
- Can be run clustered
Wrk2
- Built in C
- Scriptable via Lua

Thanks! Questions?

ian.im/loadaus19 - these slides
github.com/iansltx/challengr - this code
twitter.com/iansltx - me
github.com/iansltx - my code
Performance Testing Guidance for Web Applications (from Microsoft)
Blazemeter Blog

Load Testing Your App - AustinPHP March 2019

By Ian Littman

Load Testing Your App - AustinPHP March 2019

Want to find out which pieces of your site break down under load first, so you know how you'll need to scale before your systems catch fire? Load testing answers this question, and these days you can simulate full user behavior in a load test, rather than merely hammering a single endpoint. In this talk, we'll go through a number of conceptual points that you won't want to miss in order for your load tests to perform their intended purpose, as well as jump into implementation details, using the K6 load test tool to build a load test that exercises an application in a way that's similar to what we'd see in real life.

1,622

Load TEsting Your App

AustinPHP April 2019

QUestions We'll Answer

Questions we won't answer

A Challengr Appears

This is what we'll test with*

#IFNDEF

Load TEst

Stress Test

Soak Test

Spike Test

Smoke test

Now that we've defined our terms...

When?

What are your metrics?

How should I test?

How should I test?

Accurately.

What should I test?

Concurrent Requests != Concurrent Users

Oversimplification...It's a trap!

Let's see what that looks like WIth K6

Vary Your Testing

Understand your load test tool

e.g. arrival rate vs. looping

Keep it real

Aggregate your metrics repsonsibly

Let's run a test!

Bottlenecks

Bottleneck Gotchas

Let's fix some bottlenecks...

Bonus material: More Tools

BONUS MATERIAL: Even More Tools!

Thanks! Questions?

Load Testing Your App - AustinPHP March 2019

More from Ian Littman