Load TEsting Your App

MidwestPHP 2019

Ian Littman / @iansltx

follow along at https://ian.im/loadmw19

QUestions We'll Answer

  • What's the difference between
    • A smoke test
    • A load test
    • A stress test
    • A spike test
  • When should I load test?
  • How can I match my load test with (anticipated) reality for more useful results?
  • What bottlenecks should I be looking for when testing?
  • What tools can I use to throw load at my site?
  • How can I use one to test my application?

Questions we won't answer

  • How do I use {JMeter|Gatling|Molotov}?
  • How can I set up clustered load testing?
  • How can I simulate far-end users?
    • Slow connections tie up server/load balancer resources for longer
    • Solutions for slow connections (e.g. compression) may affect system capacity elsewhere
  • How can I do deep application profiling? (e.g. Blackfire)
  • What about single-user load testing? (e.g. running an import with a larger data set than usual)

A Challengr Appears

This will be our system under test

This is what we'll test with*

 

* More tools are listed at the end of this presentation.
** It's JS, but it uses goja, not V8 or Node, and doesn't have a global event loop yet.
*** I've used this on a project significantly more real than Challengr, so that's a big reason we're looking at it today.

#IFNDEF

Load TEst

  • <= peak traffic
  • Your system shouldn't break
  • If it does, it's a stress test

Stress Test

  • Trying to break your system
  • Surfaces bottlenecks
  • Increase traffic above peak or decrease available resources
  • Capacity Test is a subset

Soak Test

  • Extended test duration
  • Watch behavior on ramp down as well as ramp up
  • Memory leaks
  • Disk space exhaustion (logs!)
  • Filled caches

Spike Test

  • Stress test with quick ramp-up
  • Woot.com at midnight
  • TV ad "go online"
  • System comes back online
    after downtime
  • Everyone hits your API via
    on-the-hour cron jobs

Smoke test

  • An initial test to confirm the system operates properly without a large amount of generated load
  • May be integration tests in your existing test suite
  • May be your load test script, turned down to one (thorough) iteration and one Virtual User (VU)
  • Do this before you load test

Now that we've defined our terms...

When?

  • When your application performance may change
    • Adding/removing features
    • Refactoring
    • Infrastructure changes
  • When your load profile may change
    • Initial app launch
    • Feature launch
    • Marketing pushes/promotions

What are your metrics?

  • Speed - response latency
  • Scalability - throughput, resource utilization
  • Stability - % failed calls/transactions/flows

How should I test?

How should I test?

Accurately.

What should I test?

  • Flows, not just single endpoints
  • Frequently used
  • Performance intensive
  • Business critical

Concurrent Requests != Concurrent Users

  • Think Time
  • API client concurrency
  • Caching (client-side or otherwise)

Oversimplification...It's a trap!

  • No starting data in database
  • No parameterization
  • No abandonment at each step in the process
  • No input errors
  • No think times
  • Static think times
  • Uniformly distributed think times
  • Assuming you have one type of user
  • Assuming that a distribution is normal

Vary Your Testing

  • High-load Case: heavier endpoints get called more often
  • Anticipated Case
  • Low-load Case: validation failures + think time

Understand your load test tool

Keep it real

  • Use logs/analytics to determine your usage patterns
  • Run your APM (e.g. New Relic, Tideways) on your load test env
    • Better profiling info
    • You'll have the same perf hit as production
  • Is your environment code-ified? (e.g. Terraform, CloudFormation)
    • Easier to copy envs
    • Cheaper to set up an env for an hour to run a load test
  • Decide whether testing from near your env is accurate enough
  • Test autoscaling/load-shedding facilities

Aggregate your metrics repsonsibly

  • Average
  • Median (~50th percentile)
  • 90th, 95th, 99th percentile
  • Standard Deviation
  • Distribution of results
  • Explain your outliers

Bottlenecks

  • Web Server + Database
    • FPM workers/Apache processes
    • DB Connections
    • CPU + RAM utilization
    • Network utilization
    • Disk utilization (I/O or space)
  • Load balancer
    • Network utilization/warmup
    • Connection count
  • External Services
    • Rate limits (natural or artificial)
    • Latency
    • Network egress
  • Queues
    • Per-job spin-up latency
    • Worker count
    • CPU + RAM utilization
      • Workers
      • Broker
    • Queue depth
  • Caches
    • Thundering herd
    • Churning due to
      cache evictions

Bottleneck Gotchas

  • Just because a request is heavy doesn't mean
    it's the biggest source of load
  • As a system reaches capacity you'll see
    nonlinear performance degradation

Let's fix some bottlenecks...

Bonus material: More Tools

  • Tsung
    • Erlang (efficient, high volume from a single box)
    • Flexible (not just HTTP)
    • XML based config
  • The Grinder
    • Java-based
    • Java, Jython or Clojure scripts

BONUS MATERIAL: Even More Tools!

  • Artillery.io
    • Node-based
    • Simple stuff in Yaml, can switch to JS (including npm)
  • Molotov (by Mozilla)
    • Python 3.5+, uses async IO via coroutines
  • Locust
    • Python based
    • Can be run clustered
  • Wrk2
    • Built in C
    • Scriptable via Lua

Thanks! Questions?