Evolution of Fault Tolerance in PostgreSQL

select * from me;

Postgres DBA & Automation Developer @ 2ndQuadrant

MSc Comp. & Systems Eng. @ Tallinn University of Technology

BSc Maths Eng. @ Yildiz Technical University

Writes blog on 2ndQuadrant blog

Does some childish paintings

Loves independent films

 

@apatheticmagpie

Skype: gulcin2ndq

Github: gulcin

Agenda

  • Overview of PostgreSQL Database
  • PostgreSQL Fault Tolerance: WAL
  • Transaction - Commit - Checkpoint
  • Replication Methods for PostgreSQL
  • Failover and Switchover
  • Managing Timeline Issues: pg_rewind
  • Synchronous Replication (synchronous_commit)
  • Logical Replication
  • Trigger-based Replication: Londiste, Slony
  • Logical Decoding : Postgres-BDR

Postgres is Robust!

  • Transactions + Transaction Log = Automatic Crash Recovery (Software failure)
  • Data block checksums (Hardware faults)
  • Multiple backup mechanisms (Full PITR)
  • Diagnostic tools (pageinspect)
  • Database replication is supported natively
  • Synchronous Replication "5 Nines" (99.999 percent) 

WAL

Write ahead logging mechanism is the main fault tolerance system for PostgreSQL which ensures durability of any db changes.

Transaction? Commit?

Database changes themselves are not written to disk at transaction commit. Those changes are written to disk sometime later by the background writer or checkpointer on a well-tuned server.

Standard SQL Transaction Isolation Levels

Checkpoint

Crash recovery replays the WAL, but from what point does it start to recover?

PostgreSQL Replication

Database replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system.

PostgreSQL Replication History

  • PostgreSQL 7.x (~2000)
    • Replication should not be part of core Postgres
    • Londiste – Slony (trigger based logical replication)
  • PostgreSQL 8.0 (2005)
    • Point-In-Time Recovery  (WAL)
  • PostgreSQL 9.0 (2010)
    • Streaming Replication (physical)
  • PostgreSQL 9.4 (2014)
    • Logical Decoding (changeset extraction)

Physical Replication

WAL over network from master to standby

  • Sending files: scp, rsync, ftp
  • Streaming changes: using internal protocol (sender and receiver processes)

Standby Modes

Warm

Hot

  • Can be activated almost immediately, but cannot perform useful work until activated.
  • Node is already active
  • Read-only queries only
WAL level Suitable for
minimal crash recovery
replica physical replication
file-based archiving
logical logical replication

WAL and Replication

wal_level parameter determines how much information is written to the WAL.

Failover and Switchover

Failover

Switchover

Timelines

Timelines provide protection from connecting to the wrong upstream after promotion (failover, switchover).

TL1

TL2

Master (Old master)

Standby (New master)

  • There are outstanding changes in the old master
  • Timeline increase represents new history of changes
  • Changes from the old timeline can't be replayed on the servers that switched to new timeline
  • The old master can't follow the new master

Failover

TL1

TL2

Master (Old master)

Standby (New master)

  • There are no outstanding changes in the old master
  • Timeline increase represents new history of changes
  • The old master can become standby for the new master

TL1

TL2

Standby (New master)

Switchover

TL1

TL2

Master (Old master)

  • Outstanding changes are removed using data from the new master
  • The old master can follow the new master

TL1

TL2

TL1

TL2

Standby (New master)

pg_rewind (9.5)

Synchronous Replication

Synchronous replication guarantees that data is written to at least two nodes before the user or application is told that a transaction has committed.

How Replication Works?

synchronous_commit = off

synchronous_commit=local

synchronous_commit = remote_write

synchronous_commit = on

synchronous_commit = remote_apply

Logical Replication

Logical replication allows us to stream logical data changes between two nodes.

Trigger-based Replication

Logical Decoding a.k.a Changeset Extraction

  • Extracts information from Write-Ahead-Log into logical changes (INSERT/UPDATE/DELETE)
  • Per row and commit ordered
  • No write amplification
  • C API for output plugin
  • No DDL
  • SQL Interface
  • Streaming Interface

Logical Streaming Replication

Postgres-BDR

  • The project is used for feeding logical replication development in PostgreSQL
  • Multi-master
  • Asynchronous
  • Optimistic conflict detection (after commit)
  • Does not prevent concurrent writes
  • Conflict resolution:
    • Happens automatically
    • Last update wins by default
    • Custom resolution triggers 
  • Eventually consistent (cluster)

Conclusion

There are several ways of how to look at fault tolerance. In this sense PostgreSQL provides different means of achieving fault tolerance and dependability either out of the box solutions or using extensions depending on what are the user’s priorities about their system.

 

General fault tolerance in PostgreSQL is improving over time and I expect this trend to continue. In my opinion, Logical Replication in core of PostgreSQL will be the next big leap of fault tolerance in PostreSQL.

Questions? 

 

Huge Thanks!

Evolution of Fault Tolerance in PostgreSQL

By Gülçin Yıldırım Jelínek

Evolution of Fault Tolerance in PostgreSQL

This presentation is prepared for PGConf.EU 2016 in Tallinn.

  • 3,674