Speed.
Scalability.
Stability.
QUestions We'll Answer
- What types of tests exist, and what sets each type apart?
- When should I build and run performance tests?
- How can I match my load test with (anticipated) reality?
- What does a real load test script look like on a small system?
- How do I properly analyze results during and after my test?
Questions we won't answer
- How do I use $otherPerfTestTool (!== 'k6')?
- How can I set up clustered load testing?
- How can I simulate far-end users?
- How can I test web page performance browser-side?
- How can I do deep application profiling? (Blackfire for PHP)
- What about single-user load testing?
We'll be testing with K6*
- Write your tests in JS**
- Run via a Go binary***
- HAR import for in-browser recording
* More tools are listed at the end of this presentation.
** Uses goja, not V8 or Node, and doesn't have a global event loop yet.
*** I've used this on a project significantly more real than the example in this presentation, so that's a big reason we're looking at it today.
#ifndef
- Smoke Test
- Load Test vs. Stress Test
- Soak Test vs. Spike Test
Smoke test
- An initial test to confirm the system operates properly without a large amount of generated load
- Do this before you load test
- Pick your implementation...
- Integration tests in your existing test suite
- Load test script, turned down to one (thorough) iteration and one Virtual User (VU)
Load TEst
- <= expected peak traffic
- Your system shouldn't break
- If it does, it's a...
Stress Test
- Increase traffic above peak || decrease available resources
- Try to break your system
- Surface bottlenecks
Soak Test
- Extended test duration
- Watch behavior on ramp down
as well as ramp up- Memory leaks
- Disk space exhaustion (logs!)
- Filled caches
Spike Test: A Stress test with QUick Ramp-up
- Woot.com at midnight
- TV ad "go online"
- System comes back online after downtime
- Everyone hits your API via on-the-hour cron jobs
When should you run a load test?
- When your application performance may change
- Adding or removing features
- Refactoring
- Infrastructure changes
- When your load profile may change
- Initial app launch
- Feature launch
- Marketing pushes and promotions
How should I test?
How should I test?
Accurately.
What should I test?
- Flows (not just single endpoints)
- Frequently used
- Performance intensive
- Business critical
Concurrent Requests != Concurrent Users
- Think Time
- API client concurrency
- Caching (client-side or otherwise)
How not to model think time
- Ignore it
- Use a static amount
- Use a uniform distribution
(use a normal distribution instead) - Assume you have one type of user
Oversimplifications to avoid
- No starting data in database
- No parameterization
- No abandonment at each step in the process
- No input errors
Vary Your Testing
- High-load Case: more expensive endpoints get called more often
- Anticipated Case
- Low-load Case: validation failures + think time
System under test: Challengr
Let's see what that looks like WIth K6
Yes, I should've used const instead of let everywhere.
import http from "k6/http"; import {check, fail, sleep} from "k6"; import {Trend} from "k6/metrics"; import {Normal} from [some gist URL]; // Browserified AndreasMadsen/distributions
// baseURL e.g. http://my-load-test-system.local/ const [baseURL, clientId, clientSecret] = open('./config.txt').split("\n"), // start on the second line of the document, one per line emails = open("./emails.csv").split("\n").slice(1),
pCorrectCredentials = 0.8, pRetryAfterFailedCreds = 0.5, pAbandonAfterHomeLoad = 0.15, pAddChallenge = 0.05, pAddAnotherActivity = 0.05, pIncludeChallengeDuration = 0.5, pIncludeChallengeMileage = 0.5, // start with larger units for more accurate approximation // of what challenges look like challengeMinHalfHours = 1, challengeMaxHalfHours = 80, challengeMinTenMiles = 1, challengeMaxTenMiles = 20, activitySpeed = new Normal(15, 3), activityMinSeconds = 180, activityMaxSeconds = 10800,
challengeThinkTime = new Normal(30, 10), activityThinkTime = new Normal(30, 10), secondActivityThinkTime = new Normal(10, 3),
challengeListResponseTime
= new Trend("challenge_list_response_time"),
activityListResponseTime
= new Trend("activity_list_response_time"),
userProfileResponseTime
= new Trend("user_profile_response_time");
export default function () {
let isIncorrectLogin = Math.random() > pCorrectCredentials,
email = emails[getRandomInt(0, emails.length)];
let resLogin = http.post(baseURL + "oauth/token", { "client_id": clientId, "client_secret": clientSecret, "grant_type": "password", "username": email, "password": isIncorrectLogin ? "seekrit" : "secret", }, { headers: { "Content-Type": "application/x-www-form-urlencoded" } })
if (isIncorrectLogin) { check(resLogin, { "invalid login caught": (res) => res.status === 401 }) || fail("no 401 on invalid login"); if (Math.random() > pRetryAfterFailedCreds) { return; // abandon on incorrect login } // log in the correct way this time resLogin = http.post(baseURL + "oauth/token", { "client_id": clientId,
// ...snip...
}
check(resLogin, { "login succeeded": (res) => res.status === 200 && typeof res.json().access_token !== "undefined", }) || fail("failed to log in");
let params = {
headers: {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": "Bearer " + resLogin.json().access_token
}
}, makeGet = function (path) {
return {method: "GET", url: baseURL + path, params: params};
};
let homeScreenResponses = http.batch({
"me": makeGet("api/me"),
"challenges": makeGet("api/me/challenges"),
"activities": makeGet("api/me/activities")
});
check(homeScreenResponses["me"],
{"User profile loaded": (res) => res.json().email === email})
|| fail("user profile email did not match");
check(homeScreenResponses["challenges"],
{"Challenges list loaded": (res) => res.status === 200})
|| fail("challenges list GET failed");
check(homeScreenResponses["activities"],
{"Activities list loaded": (res) => res.status === 200})
|| fail("activities list GET failed");
activityListResponseTime
.add(homeScreenResponses["activities"].timings.duration);
challengeListResponseTime
.add(homeScreenResponses["challenges"].timings.duration);
userProfileResponseTime
.add(homeScreenResponses["me"].timings.duration);
let pNextAction = Math.random();
if (pNextAction > (1 - pAbandonAfterHomeLoad)) {
return; // abandon here
} else if (pNextAction >
(1 - pAbandonAfterHomeLoad - pAddChallenge)) {
// think time before creating challenge
sleep(fromDist(challengeThinkTime));
let startMonth = getRandomInt(1, 3), endMonth = startMonth + getRandomInt(1, 2),
challengeRes = http.post(baseURL + "api/challenges", JSON.stringify({
"name": "Test Challenge",
"starts_at": "2020-0" + startMonth + "-01 00:00:00",
"ends_at": "2020-" + (endMonth >= 10
? endMonth : ("0" + endMonth)) + "-01 00:00:00",
"duration": Math.random() > pIncludeChallengeDuration ? null
: secondsToTime(
getRandomInt(challengeMinHalfHours, challengeMaxHalfHours) * 1800),
"distance_miles": Math.random() > pIncludeChallengeMileage ? null
: getRandomInt(challengeMinTenMiles, challengeMaxTenMiles) * 10
}), params);
check(challengeRes, {"challenge was created":
(res) => res.status === 201 && res.json().id
}) || fail("challenge create failed");
let challengeListRes = http.get(baseURL + "api/me/challenges", params);
check(challengeListRes, {
"challenge is in user challenge list": (res) => {
let json = res.json();
for (let i = 0; i < json.created.length; i++)
if (json.created[i].id === challengeRes.json().id)
return true;
return false;
}
}) || fail("challenge was not in user challenge list");
Understand your load test tool
For example, arrival rate vs. looping
k6 is working on it...slowly...
Aggregate your metrics repsonsibly
Average- Median (~50th percentile)
- 90th, 95th, 99th percentile
- Standard Deviation
- Distribution of results
- Explain (don't discard) your outliers
Keep it real
- Use logs and analytics to determine your usage patterns
- Run your APM (e.g. New Relic) on your system under test
- Better profiling info
- Same performance drop as instrumenting production
- Is your infrastructure code? (e.g. Terraform, CloudFormation)
- Easier to copy environments
- Cheaper to set up an environment for an hour to run a load test
- Decide whether testing from near your env is accurate enough
- Test autoscaling and/or load-shedding facilities
Warning: Tricky bottlenecks ahead
- Just because a request is expensive
doesn't mean it's the biggest source of load
- As a system reaches capacity
you'll see nonlinear performance degradation
Bottlenecks: Web Server + DAtabase
- Web workers (e.g. FPM)/Apache processes
- DB Connections
- CPU + RAM utilization
- Network utilization
- Disk utilization (I/O or space)
Bottlenecks: Load Balancer
- Network utilization/warmup
- Connection count
Bottlenecks: External Services
- Rate limits (natural or artificial)
- Latency
- Network egress
Bottlenecks: Queues
- Per-job spin-up latency
- Worker count
- CPU + RAM utilization
- Workers
- Broker
- Queue depth
Bottlenecks: Caches
- Thundering herd
- Churning due to cache evictions
Let's fix some bottlenecks...
Bonus: More Tools
- Apache JMeter (Java)
-
Gatling (Java)
- Tests in Scala...
- ...or use the recorder
- ab
- httperf
-
Wrk2 (C)
- Scriptable via Lua
-
Artillery.io (Node)
- Simple stuff in Yaml
- Can switch to JS (with npm)
-
Molotov (by Mozilla, in Python)
- Uses async IO via coroutines
-
Locust (Python)
- Can be run clustered
- Siege
What We Learned
- What types of tests exist, and when you should use them
- How to match load tests with (anticipated) reality
- What a real performance test script looks like in K6
- How to analyze results during and after your test
Further Reading
- Performance Testing Guidance for Web Applications (from Microsoft)
- Blazemeter Blog - solid info on load testing topics
- ian.im/loadarch - an article version of this talk (php[architect] sub req'd)
- test-api.loadimpact.com - an API to load test against from the k6 folks
Thanks! Questions?
- ian.im/loadfoo20 - these slides
- github.com/iansltx/challengr - this code
- twitter.com/iansltx - me
- github.com/iansltx - my code
- Please leave feedback; thanks :)
Load Testing Your App - ConFoo Montreal 2020
By Ian Littman
Load Testing Your App - ConFoo Montreal 2020
Want to find out which pieces of your site break down under load first, so you know how you'll need to scale before your systems catch fire? Load testing answers this question, and these days you can simulate full user behavior in a load test, rather than merely hammering a single endpoint. In this talk, we'll go through a number of conceptual points that you won't want to miss in order for your load tests to perform their intended purpose, as well as jump into implementation details, using the K6 load test tool to build a load test that exercises an application in a way that's similar to what we'd see in real life.
- 1,582