Writing Robust Systems

Dom Finn, Lead Developer at UNiDAYS

@cleverfinn

UNiDAYS

Leading global student marketing provider

Agenda

  • Excuses
  • Diatribe
  • Hyperbole
  • Dissent
  • Questions
  • Pub

What This Isn't...

  • CAP Theorem
  • SOLID / General good design
  • CQRS / Architecture

Assumptions

  • You've tested it
  • Your domain makes sense

Robustness

  • What is Robustness
  • Designing for failure
  • Risk Evaluation
  • Implementing for failure
  • Panic

What Is Robustness

Robust

/roh-buhst/ - adjective

Strong and effective in all or most situations and conditions

What Is Robustness

Robust

/roh-buhst/ - adjective

Strong and effective in all or most situations and conditions

What Is Robustness

Robust

/roh-buhst/ - adjective

Strong and effective in all or most situations and conditions

What Is Robustness

Robust

/roh-buhst/ - adjective

Strong and effective in all or most situations and conditions

Murphy's Law

  • Things break all the time
  • Don't deny it
  • Embrace it
  • Design for it

Graceful Degradation

It's Not This

try
{
    businessCriticalThing1();
    businessCriticalThing2();
}
catch (BadDevelopmentPractice b)
{
    b.sweepUnder(rug);
}

 

Graceful Degradation

Progressive Enhancement

Failure Handling

  • Is a business concern
  • Not a decision devs should be making (alone)
  • Involves UX, Dev, Ops, RoB

Code-Issue Failure

Who Cares?

Code-Issue Failure

Accounts - Code Management

Ops - Code Management

Support - On-Site Known Issues

Support - Support Team

Support - Social Team

Legal - SLAs

Commercial Analytics - Data loss

Business - KPI Impact

Fail-Well

  • Fail-Safe
  • Fail-Secure
  • Fail-Planned

Checklist

  • What can fail?
  • How many ways can it fail?
  • How can we (partially) recover?
  • What is the business impact?
  • What stakeholders are affected?
  • What work does a failure generate?

What Can Fail?

Everything (else)

CAP

What Can Fail?

Side Effect 1

Side Effect 2

Side Effect 3

Action

How many ways can that code be written / fail?

C(n,r) = {n! \over r!(n-r)!}
C(n,r)=n!r!(nr)!C(n,r) = {n! \over r!(n-r)!}

(Combinatorics)

MVC Example

    public ActionResult UpdateUser(UserViewModel viewModel)
    {
        var user = userRepository.Get(viewModel.Id);

        user.Email = viewModel.email;
        user.Password = viewModel.password;

        emailService.SendAccountUpdatedEmail(user);
        userRepository.Update(user);
        reportService.RecordEvent(new UserUpdatedEvent(user));

        return Redirect("/user-updated");
    }

Combinatorics

1) Email 2) Update 3) Report
Pass Pass Pass Yes
Pass Pass Fail Yes
Pass Fail No
Fail No
1) Update 2) Report 3) Email
Pass Pass Pass Yes
Pass Pass Fail Yes
Pass Fail Yes
Fail No
1) Email 2) Report 3) Update
Pass Pass Pass Yes
Pass Pass Fail No
Pass Fail No
Fail No
1) Update 2) Email 3) Report  
Pass Pass Pass Yes
Pass Pass Fail Yes
Pass Fail Yes
Fail No

25%

50%

75%

75%

1) Report 2) Email 3) Update
Pass Pass Pass Yes
Pass Pass Fail No
Pass Fail No
Fail No

25%

1) Report 2) Update 3) Email
Pass Pass Pass Yes
Pass Pass Fail Yes
Pass Fail No
Fail No

50%

MVC Example

    public ActionResult UpdateUser(UserViewModel viewModel)
    {
        var user = userRepository.Get(viewModel.Id);

        user.Email = viewModel.email;
        user.Password = viewModel.password;

+       userRepository.Update(user);
        emailService.SendAccountUpdatedEmail(user);
-       userRepository.Update(user);
        reportService.RecordEvent(new UserUpdatedEvent(user));

        return Redirect("/user-updated");
    }

How Can We Recover?

  • Redrive / Retry
  • Deadletter
  • Blackhole / Ignore
  • Explode

Redrive / Retry

Requires:

  • Atomicity
  • Idempotence

Side Effect 1

Side Effect 2

Side Effect 3

Action

Redrive / Retry

Action

Side Effect 1

Side Effect 2

Side Effect 3

Forgotten Password

Forgotten Password

Forgotten Password

Forgotten Password

Critical Path

(Immediately Consistent Path)

Critical Paths

  • Determine Critical Paths
  • Remove non-critical actions from CP
  • Determine recovery strategy for (non)critical actions

Blackhole

  • Does it matter if Side Effect X doesn't work?
  • Can be an quick, intermediary solution

Explode

  • Do it nicely
    • 500 vs 503 vs 404 vs 403
  • Help the user recover / get help

No!

What work does a failure generate?

  • Support Emails / Tweets
  • Account Management
  • Data Cleansing
  • Reporting alterations
  • Billing alterations

What problems does a failure hide?

  • Data corruption
  • Conversion failures
  • Loss of revenue
  • User disengagement

Help API Consumers Succeed, Not Fail

Encapsulation

try, try, try again

MVC Example

    public ActionResult UpdateUser(UserViewModel viewModel)
    {
        var user = userRepository.Get(viewModel.Id);

        user.Email = viewModel.email;
        user.Password = viewModel.password;

        userRepository.Update(user);
        emailService.SendAccountUpdatedEmail(user);
        reportService.RecordEvent(new UserUpdatedEvent(user));

        return Redirect("/user-updated");
    }

MVC Example

try {
  var user = userRepository.Get(resource.Id);
    try {
      userRepository.Update(user);
      try {
        emailService.SendAccountCreatedEmail(user);
      } catch { }
      try {
          reportService.RecordEvent(new UserUpdatedEvent(user));
      } catch { }
    } catch { }
} catch { }

try { } catch {}

  • Is a hack*
  • Bad design, equivalent to GOTO
  • Propagation hard to follow
  • Indication of developer ignorance
  • Illegible

 

*mostly

When did you last write a trycatch and consult your PO / BA / Stakeholder?

try { } catch {}

try
{
    // do stuff
} catch {
    // just in case lolol
}

1) Bury your head

try { } catch {}

try
{
    // do stuff
} catch (Exception e) {
    // no idea what will be thrown
}

2) Catch all the things

try { } catch {}

try
{
    // do stuff
} catch {
    throw;
}

3) Hot potato

Exceptions

Theres a time and a place...
(Probably not when and where you’re using them)

 

Cue sweeping generalisations...

Things that might / could / should throw

  • Low Level APIs
  • I/O
    • Disc
    • Network
  • Third Party Code (Including BCL)
    (Most of this doesn’t need to, but does)

 

Types of throw;

Development vs production

To throw or not to throw?

  • Protect APIs with helpful guard clauses
    • Don’t expect these to throw in production
  • Every unhandled exception logged to become a Bug Ticket
    • Pipe them into your bug tracker!
  • Don’t try and recover*

 

Basic Example

Guid ConvertToGuid(string guid)
{
    return Guid.Parse(guid);
}

No

Guid ConvertToGuid(string guid)
{
    try
    {
        return Guid.Parse(guid);
    }
    catch
    {
        return Guid.Empty;
    }
}

No

Guid ConvertToGuid(string guid)
{
    Guid g;
    if(Guid.TryParse(guid, out g);
        return g;

    return Guid.Empty;
}

Yes

MVC Example

try {
  var user = userRepository.Get(resource.Id);
    try {
      userRepository.Update(user);
      try {
        emailService.SendAccountCreatedEmail(user);
      } catch { }
      try {
          reportService.RecordEvent(new UserUpdatedEvent(user));
      } catch { }
    } catch { }
} catch { }

Solution

Enforce Boundaries

Solutions

public enum ExecutionResult
{
    Success = 1,
    Failure = 2
}

public sealed class ExecutionResult<T>
{
    public T Data;
    public ExecutionResult Result;
}

Solutions

User IUserRepository.Get (Guid id);


void IUserRepository.Update (User user);

void IEmailService.SendWelcomeEmail (User user);

void IReportService.RecordEvent<TEvent> (TEvent @event);
ExecutionResult<User> IUserRepository.Get (Guid id);

ExecutionResult IUserRepository.Update (User user);

ExecutionResult IEmailService.SendWelcomeEmail (User user);

ExecutionResult IReportService.RecordEvent<TEvent> (TEvent @event);

Before

After

MVC Example

var userResult = userRepository.Get(resource.Id);
if(userResult.Result == ExecutionResult.Failure)
    return ErrorResult.ServiceUnavailable();

var user = userResult.Data;

var result = userRepository.Update(user);
if(userResult.Result == ExecutionResult.Failure)
    return ErrorResult.ServiceUnavailable();

emailService.SendAccountUpdatedEmail(user); // ignore result status
reportService.RecordEvent(new UserUpdatedEvent(user)); // ignore result status

Other Anti-patterns

// double check, for saftey

​var thing = getThing(thingId);

// just in case, lol!
if(thing == null)
    return;

thing.doThing();

// null?

​var thing = getThing(thingId);

var things = getThings();

Panic!

Summary

  • Progressive Enhancement instead of Graceful Degradation
  • Let code fail under undesigned conditions
  • Write APIs that help you succeed
  • Consult business when handling/designing failure
Made with Slides.com