COMP2511

🎨 9.1 - Risk Engineering

In this lecture

What is risk in Software Engineering?
Mitigating risk
Designing for Risk

The Flaw in the Plan

Why do plans (designs) not go according to plan? What went wrong?
- Flaws in the implementation / execution of the plan/design
- Flaws in the design/plan itself
We can't always plan for everything up front
Design flaws are often hard to spot; Risk is invisible
Can only tell through design smells / red flags
Over time, we learn to become better at recognising warning signs and identifying flaws earlier on

It's not what happened right before things went wrong that was the problem - it is what happened every step along the way that got us to that point

Design Debt, or Design Risk?

Risk - the probability of a bad outcome occurring
Design decisions come with a cost - "technical debt", the more technical debt, the more risk we accumulate
Greater software complexity leads to more risk
The design decisions and trade-offs we make are often the flaws in the plan - risks are inevitable
How does this manifest itself?
- Design problems often build in a "slow burn" fashion
- Incidents, defects, bugs
- Resistance to changes in software
- These in turn present Business Risks

Mitigating Risk

Risks are centred around events, e.g. software breaking.
Risk is often assessed in terms of probability and impact
Mitigations of probability
- Preventative measures that lower the chance of a bad outcome occurring
- E.g. Looking both ways before crossing the street
Mitigations of impact
- Reactive measures that decrease the negative outcome in the event that something bad does occur
- E.g. Wearing a bike helmet
This is often termed Quality Assurance

How do we design for risk?

Designing for Risk: Swiss Cheese Model

James Reason - Major accidents and catastrophes reveal multiple, smaller failures that allow hazards to manifest as risks
Each slice of cheese represents a barrier, each one of which can prevent a hazard from turning into consequences
No single barrier is foolproof - each slice of cheese has "holes"
When the holes all align, a risk event manifests as negative consequences

Designing for Risk: Swiss Cheese Model

Taking a layered approach to Software Safety
Testing at multiple levels:
- Static verification
- Unit and integration tests
- Usability tests
- Design and code reviews
- CI pipelines
Sometimes referred to as containment barriers
A defensive approach; multiple checks and balances in place
Probability is multiplicative (X AND Y AND Z = P(X) * P(Y) * P(Z))

Designing for Risk: Shifting Left

A waterfall / big design up front approach to quality assurance.

Designing for Risk: Shifting Left

Shift Left: A practice intended to find and prevent problems early in the engineering process.

Designing for Risk: Shifting Left

Shifting Left in principle: Moving risk forward in the software development timeline and designing systems and processes that are built for continuous testing
What does shifting left involve in practice?
- Automated testing over manual testing
- Continuous Integration
- Test-Driven Development

Shifting Left: An Example

Let's take an example - a python script which runs on a remote server
There is an error in the code, and the code fails when attempting to run a usability test

$ python3 -m svc.create_repo test
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/nicholaspatrikeos/Desktop/COMP2511-22T3/administration/svc/create_repo.py", line 11, in <module>
    PROJECT = gl.projects.get(f'{NAMESPACE}/{TERM}/STAFF/repos/{REPO}')
NameError: name 'REPO' is not defined

How could we shift left here?

Shifting Left: Dynamic Verification + CI

We can dynamically verify the correctness of the code and automatically run the tests in a pipeline:

$ pytest
============================= test session starts ==============================
platform darwin -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/nicholaspatrikeos/Desktop/COMP2511-22T3/administration
plugins: hypothesis-6.1.1, xdist-2.2.1, timeout-1.4.2, forked-1.3.0
collected 1 item                                                               

create_repo_test.py F                                                    [100%]

Problem here - we are still having to run our tests in order to pick up a simple name error, this takes a long time to catch a small problem

Shifting Left: Static Verification + CI

We can statically verify the correctness of our code, which is faster than running all the tests using a linter or a type checker:

$ pylint svc/*.py
************* Module svc.create_repo
svc/create_repo.py:11:64: E0602: Undefined variable 'REPO' (undefined-variable)
svc/create_repo.py:19:31: E0602: Undefined variable 'REPO' (undefined-variable)
svc/create_repo.py:27:24: E0602: Undefined variable 'REPO' (undefined-variable)

-------------------------------------------------------------------
Your code has been rated at 9.61/10 (previous run: 10.00/10, -0.39)

Problem here - we are still having to push to the CI for our breaking changes to be contained. Can we enforce running them before?

Shifting Left: Local Configurations

Pre-commit hooks and IDE tools can give us more friendly experiences that detect these problems earlier in the development loop, e.g.

Ideally, static verification is "baked in" to our programming language rather than added on...

Shifting Left: Type Safety

Types are statically verifiable - meaning that we can ensure correctness earlier on in the development process, shifting left
In Java, code that doesn't adhere to the rules of the type system fails to compile - a significant containment barrier
Extensions like mypy and TypeScript allow for an add-on of type checking
Unlike Java however, type safety wasn't part of the Big Design Up Front for Python and JS
Modern software design is favouring statically typed languages for these reasons

def my_function(message):
    if message == 'hello':
        return 1

    return '0'

result = my_function('goodbye')

Shifting Left: Type Safety

Features of type systems:
- Ability to define custom types (typedefs)
- Inheritance, Subtypes and Supertypes
- Interfaces
- Generics
- Unit types
- Enums
Well-designed type systems allow us to verify more of our code statically

Shifting Left: More Static Verification &

Design by Contract

Some programming languages (e.g. Dafny) allow for more static verification than just type checking - they can prove or disprove code according to a declarative contract where preconditions, postconditions and invariants are specified
Dafny makes use of a theorem prover which checks how well the implementation matches the specification (contractual correctness)

Summary

Risk forms a large part of modern-day Software Engineering
Designing for risk:
- Considering risks in the design process;
- Designing processes to accomodate for risk.
Murphy's law: Anything that can go wrong, will.