Lock-Free Programming in a DB Connection Pool

whoami

  • Shay Rojansky
  • Long-time Linux/open source developer/enthusiast (installed Slackware somewhere in 1994/5)
  • Lead dev of Npgsql
    • .NET driver for PostgreSQL (took over in 2012)
    • Provider for Entity Framework Core
  • Joined Microsoft's data access team in January

whois Npgsql

  • Open-source .NET driver for PostgreSQL
  • Implements ADO.NET (System.Data.Common)
  • Supports fully async (but also sync)
  • Fully binary driver
  • Fully-featured (PostgreSQL special types, data import/export API, notification support...)

Focus on high performance...

"Database drivers are boring"

- Everyone, including me in 2012

Perf in .NET Core 2.1 and Npgsql 4.0

TechEmpower Fortunes results, round 17

Lock-Free Database Connection Pool

What is a DB connection pool?

  • Physical DB connections are very expensive to create - especially in PostgreSQL (process spawned in the backend).
  • If we open/close all the time, the app grinds to a halt.
  • So the driver includes a pool.
  • Pooled Open() and Close() must be very efficient (connection sometimes held for very short, e.g. to service one web request)
  • It's also the one place where synchronization happens: once a connection is taken out there's no thread-safety.

Identifying Contention Bottlenecks

  • Low local CPU usage (was only 91%)
  • Network bandwidth far below saturation (10Gbps Ethernet, very little iowait)
  • No resource starvation on the PostgreSQL side
  • Profiler will show Monitor.Enter()
  • Try reducing lock scopes if possible (already optimized)

Lock-Free Programming

  • Lock-free programming means, well, no locking.
  • All operations are interleaved, all the time, everywhere.
  • "Interlocked" CPU instructions are used to perform complex operations atomically.
  • It's really, really hard to get right.

Simple Rent

Connection[] _idle = new Connection[PoolSize];

public Connection Rent()
{
    for (var i = 0; _idle.Length; i++)
    {
        // If the slot contains null, move to the next slot.
        if (_idle[i] == null)
            continue;

        // If we saw a connection in this slot, atomically exchange it with a null.
        // Either we get a connection out which we can use, or we get null because
        // someone has taken it in the meantime. Either way put a null in its place.
        var conn = Interlocked.Exchange(ref _idle[i], null);
        if (conn != null)
            return conn;
    }
    // No idle connection was found, create a new one
    return new Connection(...);
}

Simple Return

public void Return(Connection conn)
{
    for (var i = 0; i < _idle.Length; i++)
        if (Interlocked.CompareExchange(ref _idle[i], conn, null) == null)
            return;

    // No free slot was found, close the connection...
    conn.Close();
}

This simple implementation was used as a proof-of-concept.

Let's take it up a notch

  • Since physical connections are very expensive, we need to enforce an upper limit (think multiple app instances, one DB backend).
  • Once the max limit has been reached, wait until a connection is released.
  • Release waiters in FIFO, otherwise some get starved.
  • So add a waiting queue next to our idle list: if no idle conn was found, enqueue a TaskCompletionSource.
  • When releasing, first check the waiting queue to release any pending attempts. Otherwise return to the idle list.

Two data structures, no locks

  • One idle list, one waiting queue. By the time we finish updating one, the other may have changed.
  • Example: rent attempt enters waiting queue, at the same time as a release puts a connection back in the idle list. Rent attempt waits forever.
  • Need a way to have a single, atomically-modified state (how many busy, idle, waiting)

Atomically manipulate state

[StructLayout(LayoutKind.Explicit)]
internal struct PoolState
{
    [FieldOffset(0)]
    internal short Idle;
    [FieldOffset(2)]
    internal short Busy;
    [FieldOffset(4)]
    internal short Waiting;
    [FieldOffset(0)]
    internal long All;
}
// Create a local copy of the pool state, in the stack
PoolState newState;
newState.All = _state.All;
// Modify it
newState.Idle--;
newState.Busy++;
// Commit the changes
_state.All = newState.All;

Problem: we have to have multiple counters (idle, busy, waiting), and manipulate them atomically.

But two operations may be racing, and we lose a change...

Handling race conditions

var sw = new SpinWait();
while (true)
{
    // Create a local copy of the pool state, in the stack
    PoolState oldState;
    oldState.All = Volatile.Read(ref _state.All); // Volatile prevents unwanted "optimization"
    var newState = oldState;
    // Modify it
    newState.Idle--;
    newState.Busy++;
    // Try to commit the changes
    if (Interlocked.CompareAndExchange(ref _state.All, newState.All, oldState.All) == oldState.All)
        break;
    // Our attempt to increment the busy count failed, Loop again and retry.
    sw.SpinOnce();
}
  • This is how a lot of lock-free code looks like: try until success.
  • SpinWait is a nice tool for spinning, and will automatically context switch after some time.
  • Beware of the ABA problem!

Very subtle issues

  • An open attempt is creating a new connection
    (idle == 0, total < max). It does busy++.
  • But a network error occurred during open. State has already been updated (busy++), need to decrement it.
  • But since we did busy++, maybe busy was max; someone may have gone into the waiting queue - so we need to un-wait them.

Be careful

  • Always keep perf in mind, but resist the temptation to over- (or pre-) optimize.
  • Reducing roundtrips and preparing commands is much more important than reducing an allocation or a doing fancy low-level stuff.
  • (even if it is much less sexy)

Conclusion

https://medium.com/@tylerneely/fear-and-loathing-in-lock-free-programming-7158b1cdd50c

8% perf increase

But probably the most dangerous 8% I've ever done.

Some Final Remarks

The ADO.NET API

  • The API hasn't received significant attention in a long time.
    • Batching API
    • Generic parameters to avoid boxing
    • Better support for prepared statements
  • However, it's still possible to do remarkably well on ADO.NET (no need to throw it out entirely).
  • The TechEmpower fortunes benchmark is one of the more realistic ones (e.g. involves DB)
  • But still, how many webapps out there send identical simple queries over a 10Gbps link to PostgreSQL, for a small dataset that is cached in memory?
  • Reducing contention is important as number of cores increases
  • Performance is usually about tradeoffs: code is more brittle/less safe (e.g. lock-free)

Is this all useful in the real world?

  • No good answer, especially difficult to decide when you're a library...
  • Your spouse & family may hate you, but...
  • You learn a ton of stuff and have lots of fun in the process.

So, is this all useful in the real world??

Thank you!

https://slides.com/shayrojansky/netherlands-dncse-2019-6-22

http://npgsql.org, https://github.com/npgsql/npgsql

Shay Rojansky, roji@roji.org, @shayrojansky

http://roji.org

Made with Slides.com