Authors:
Chris Newcombe, Tim Rath, Fan Zhang, Bogdan ,...
Released:
Communication Of The ACM, April 2015
Amazon Web Services (AWS), is a collection of cloud computing services, also called web services, that make up a cloud-computing platform offered by Amazon.com.
Many Others...
Elastic Beanstalk, Auto Scaling, Virtual Private Cloud, Elastic Load Ballancing...
A particular kind of mathematically based techniques for the specification, development and verification of software and hardware systems.
formal methods have a reputation for require a huge amount of training and effort to verify a tiny piece of relatively straightforward code, so the return on investment is justified only in safety-critical domains (such as medical systems and aviation)
Temporal Logic of Actions
To a first approximation, we can say that that accidents are almost always the result of incorrect estimates of likelihood of one or more things
Human intuition is poor at estmiating the true probability of supposedly "extremely rare" combinations of events in systems operating at a scale of millions of requests per second
The nuber of reachable states in the code is astronomical
Why do they need formal methods??
M.C. initially chose Alloy. Wrote specification for a non-trivial algo in Alloy. Later did the same in tla+.
Why they chose TLA+ over Alloy and others
Steps?? :/
Great documentation
Sustained emergent performance degradation
Java garbage collection causes timeouts to be breached on clients, causing clients to retry requests, thus addong load to the server, and further shutdowns.
DynamoDB launched in January 2012
T.R. wrote TLA+ specification for several components
Verified that a complex part of the algo was correct
Found a bug that could lead to dataloss if a perticular series of failures and recovery steps would be interleaved with other processing
TLC model checker ran on 10 EC2 instances, each with 8 cores, hyberthreads and 23GB of RAM
very subtle bug
Testable Pseudo-code
Avoided terms like "formal" and "proof"
Don't mention formal methods
Incorrect impression of complexity
Hvernig plötuðu þeir aðra að nota?
F.Z. found two bugs in an algo, verified fixes
Tweak specification to introduce optimizations
AWS managements starts encouraging teams to adopts formal methods.
TLA+ model finds a known very subtle bug that passed through mutliple reviews, in seconds.
M.D. used CalcPlus to find a critical bug in AWS's most important new distributed algorithm
C.N. wrote a spec for the same algo, quite different in style, found the same bug.
Suggests TLA+ specifications are robust to variations among engineers.
Improve system scalability
How do we know that the executable code correctly implements the verified design?
We don't.
But formal methods help: