Load TEsting Your App

MidwestPHP 2019

Ian Littman / @iansltx

follow along at https://ian.im/loadmw19

QUestions We'll Answer

  • What's the difference between
    • A smoke test
    • A load test
    • A stress test
    • A spike test
  • When should I load test?
  • How can I match my load test with (anticipated) reality for more useful results?
  • What bottlenecks should I be looking for when testing?
  • What tools can I use to throw load at my site?
  • How can I use one to test my application?

Questions we won't answer

  • How do I use {JMeter|Gatling|Molotov}?
  • How can I set up clustered load testing?
  • How can I simulate far-end users?
    • Slow connections tie up server/load balancer resources for longer
    • Solutions for slow connections (e.g. compression) may affect system capacity elsewhere
  • How can I do deep application profiling? (e.g. Blackfire)
  • What about single-user load testing? (e.g. running an import with a larger data set than usual)

A Challengr Appears

This will be our system under test

This is what we'll test with*

 

* More tools are listed at the end of this presentation.
** It's JS, but it uses goja, not V8 or Node, and doesn't have a global event loop yet.
*** I've used this on a project significantly more real than Challengr, so that's a big reason we're looking at it today.

#IFNDEF

Load TEst

  • <= peak traffic
  • Your system shouldn't break
  • If it does, it's a stress test

Stress Test

  • Trying to break your system
  • Surfaces bottlenecks
  • Increase traffic above peak or decrease available resources
  • Capacity Test is a subset

Soak Test

  • Extended test duration
  • Watch behavior on ramp down as well as ramp up
  • Memory leaks
  • Disk space exhaustion (logs!)
  • Filled caches

Spike Test

  • Stress test with quick ramp-up
  • Woot.com at midnight
  • TV ad "go online"
  • System comes back online
    after downtime
  • Everyone hits your API via
    on-the-hour cron jobs

Smoke test

  • An initial test to confirm the system operates properly without a large amount of generated load
  • May be integration tests in your existing test suite
  • May be your load test script, turned down to one (thorough) iteration and one Virtual User (VU)
  • Do this before you load test

Now that we've defined our terms...

When?

  • When your application performance may change
    • Adding/removing features
    • Refactoring
    • Infrastructure changes
  • When your load profile may change
    • Initial app launch
    • Feature launch
    • Marketing pushes/promotions

What are your metrics?

  • Speed - response latency
  • Scalability - throughput, resource utilization
  • Stability - % failed calls/transactions/flows

How should I test?

How should I test?

Accurately.

What should I test?

  • Flows, not just single endpoints
  • Frequently used
  • Performance intensive
  • Business critical

Concurrent Requests != Concurrent Users

  • Think Time
  • API client concurrency
  • Caching (client-side or otherwise)

Oversimplification...It's a trap!

  • No starting data in database
  • No parameterization
  • No abandonment at each step in the process
  • No input errors
  • No think times
  • Static think times
  • Uniformly distributed think times
  • Assuming you have one type of user
  • Assuming that a distribution is normal

Vary Your Testing

  • High-load Case: heavier endpoints get called more often
  • Anticipated Case
  • Low-load Case: validation failures + think time

Understand your load test tool

Keep it real

  • Use logs/analytics to determine your usage patterns
  • Run your APM (e.g. New Relic, Tideways) on your load test env
    • Better profiling info
    • You'll have the same perf hit as production
  • Is your environment code-ified? (e.g. Terraform, CloudFormation)
    • Easier to copy envs
    • Cheaper to set up an env for an hour to run a load test
  • Decide whether testing from near your env is accurate enough
  • Test autoscaling/load-shedding facilities

Aggregate your metrics repsonsibly

  • Average
  • Median (~50th percentile)
  • 90th, 95th, 99th percentile
  • Standard Deviation
  • Distribution of results
  • Explain your outliers

Bottlenecks

  • Web Server + Database
    • FPM workers/Apache processes
    • DB Connections
    • CPU + RAM utilization
    • Network utilization
    • Disk utilization (I/O or space)
  • Load balancer
    • Network utilization/warmup
    • Connection count
  • External Services
    • Rate limits (natural or artificial)
    • Latency
    • Network egress
  • Queues
    • Per-job spin-up latency
    • Worker count
    • CPU + RAM utilization
      • Workers
      • Broker
    • Queue depth
  • Caches
    • Thundering herd
    • Churning due to
      cache evictions

Bottleneck Gotchas

  • Just because a request is heavy doesn't mean
    it's the biggest source of load
  • As a system reaches capacity you'll see
    nonlinear performance degradation

Let's fix some bottlenecks...

Bonus material: More Tools

  • Tsung
    • Erlang (efficient, high volume from a single box)
    • Flexible (not just HTTP)
    • XML based config
  • The Grinder
    • Java-based
    • Java, Jython or Clojure scripts

BONUS MATERIAL: Even More Tools!

  • Artillery.io
    • Node-based
    • Simple stuff in Yaml, can switch to JS (including npm)
  • Molotov (by Mozilla)
    • Python 3.5+, uses async IO via coroutines
  • Locust
    • Python based
    • Can be run clustered
  • Wrk2
    • Built in C
    • Scriptable via Lua

Thanks! Questions?

Load Testing Your App - MidwestPHP 2019

By Ian Littman

Load Testing Your App - MidwestPHP 2019

Want to find out which pieces of your site break down under load first, so you know how you'll need to scale before your systems catch fire? Load testing answers this question, and these days you can simulate full user behavior in a load test, rather than merely hammering a single endpoint. In this talk, we'll go through a number of conceptual points that you won't want to miss in order for your load tests to perform their intended purpose. Then we'll jump into implementation details, using the K6 load test tool to build a load test that exercises an application in a way that's similar to what we'd see in real life.

  • 2,530