How Amazon Web Services Uses Formal Methods
Paper
Authors:
Chris Newcombe, Tim Rath, Fan Zhang, Bogdan ,...
Released:
Communication Of The ACM, April 2015
Amazon Web Services (AWS), is a collection of cloud computing services, also called web services, that make up a cloud-computing platform offered by Amazon.com.

PRODUCTS
S3
DynamoDB


EBS
Many Others...
Elastic Beanstalk, Auto Scaling, Virtual Private Cloud, Elastic Load Ballancing...
FORMAL METHODS
A particular kind of mathematically based techniques for the specification, development and verification of software and hardware systems.
formal methods have a reputation for require a huge amount of training and effort to verify a tiny piece of relatively straightforward code, so the return on investment is justified only in safety-critical domains (such as medical systems and aviation)
ALLOY
- Developed in MIT
- Taught in University Of Iceland
- Considered by Amazon
TLA+
Temporal Logic of Actions

N Queens Problem

- Fit N Queens on NxN spaces
- Solvable for all
- 8 Queens on chess-board
To a first approximation, we can say that that accidents are almost always the result of incorrect estimates of likelihood of one or more things
Human intuition is poor at estmiating the true probability of supposedly "extremely rare" combinations of events in systems operating at a scale of millions of requests per second
The nuber of reachable states in the code is astronomical
Why do they need formal methods??
Title Text
M.C. initially chose Alloy. Wrote specification for a non-trivial algo in Alloy. Later did the same in tla+.
Why they chose TLA+ over Alloy and others
steps
-
"What needs to go right"
- Safety - what system is allowed to do
- Liveness - what system should eventually do
- "asd"
Steps?? :/
SIDE BENEFIT
Great documentation
What are formal methods not good for
Sustained emergent performance degradation
Java garbage collection causes timeouts to be breached on clients, causing clients to retry requests, thus addong load to the server, and further shutdowns.
First Success
DynamoDB launched in January 2012
T.R. wrote TLA+ specification for several components
Verified that a complex part of the algo was correct
Found a bug that could lead to dataloss if a perticular series of failures and recovery steps would be interleaved with other processing
TLC model checker ran on 10 EC2 instances, each with 8 cores, hyberthreads and 23GB of RAM
very subtle bug
Convince Other Engineers
Testable Pseudo-code
Avoided terms like "formal" and "proof"
Don't mention formal methods
Incorrect impression of complexity
Hvernig plötuðu þeir aðra að nota?
More Success in S3
F.Z. found two bugs in an algo, verified fixes
Tweak specification to introduce optimizations
AWS managements starts encouraging teams to adopts formal methods.
TLA+ model finds a known very subtle bug that passed through mutliple reviews, in seconds.
Robustness
M.D. used CalcPlus to find a critical bug in AWS's most important new distributed algorithm
C.N. wrote a spec for the same algo, quite different in style, found the same bug.
Suggests TLA+ specifications are robust to variations among engineers.
Good For Data Modeling
Improve system scalability
How do we know that the executable code correctly implements the verified design?
We don't.
But formal methods help:
- At least get the design right
- Gain better understanding
- Write better code
CONCLUSION
- Bullet One
- Bullet Two
- Bullet Three
Review and reflection of the paper
AWS Formal Methods Dark
By Tryggvi Gylfason
AWS Formal Methods Dark
- 798