From Zero to Hero -

The Npgsql Optimization Story

whoami

  • Shay Rojansky
  • Long-time Linux/open source developer/enthusiast (installed Slackware somewhere in 1994/5)
  • Lead dev of Npgsql
    • .NET driver for PostgreSQL (took over in 2012)
    • Provider for Entity Framework Core
  • Joining Microsoft's data access team in January
  • Logistician for Doctors Without Borders (MSF)

whois Npgsql

  • Open-source .NET driver for PostgreSQL
  • Implements ADO.NET (System.Data.Common)
  • Supports fully async (but also sync)
  • Fully binary driver
  • Fully-featured (PostgreSQL special types, data import/export API, notification support...)

Focus on high performance...

"Database drivers are boring"

- Everyone, including me in 2012

Perf in .NET Core 2.1 and Npgsql 4.0

TechEmpower Fortunes results, round 17

The Raw Numbers

OS .NET Core Npgsql 3.2 RPS Npgsql 4.0 RPS
Windows 2.0 71,810 107,400
Windows 2.1 101,546 132,778
Linux 2.0 73,023 93,668
Linux 2.1 78,301 107,437
  • On Windows, 3.2 -> 4.0 gives 50% boost
  • .NET Core 2.0 -> 2.1 gives a further 24% (85% total)
  • Linux perf is lower...

Topics

  • Lock-free connection pool
  • Async perf techniques
  • Buffering / Network I/O
  • Command preparation

Breadth rather than depth

Lock-Free Database Connection Pool

What is a DB connection pool?

  • Physical DB connections are very expensive to create - especially in PostgreSQL (process spawned in the backend).
  • If we open/close all the time, the app grinds to a halt.
  • So the driver includes a pool.
  • Pooled Open() and Close() must be very efficient (connection sometimes held for very short, e.g. to service one web request)
  • It's also the one place where synchronization happens: once a connection is taken out there's no thread-safety.

Identifying Contention Bottlenecks

  • Low local CPU usage (was only 91%)
  • Network bandwidth far below saturation (10Gbps Ethernet, very little iowait)
  • No resource starvation on the PostgreSQL side
  • Profiler will show Monitor.Enter()
  • Try reducing lock scopes if possible (already optimized)

Lock-Free Programming

  • Lock-free programming means, well, no locking.
  • All operations are interleaved, all the time, everywhere.
  • "Interlocked" CPU instructions are used to perform complex operations atomically.
  • It's really, really hard to get right.

Simple Rent

Connection[] _idle = new Connection[PoolSize];

public Connection Rent()
{
    for (var i = 0; _idle.Length; i++)
    {
        // If the slot contains null, move to the next slot.
        if (_idle[i] == null)
            continue;

        // If we saw a connector in this slot, atomically exchange it with a null.
        // Either we get a connector out which we can use, or we get null because
        // someone has taken it in the meantime. Either way put a null in its place.
        var conn = Interlocked.Exchange(ref _idle[i], null);
        if (conn != null)
            return conn;
    }
    // No idle connection was found, create a new one
    return new Connection(...);
}

Simple Return

public void Return(Connection conn)
{
    for (var i = 0; i < _idle.Length; i++)
        if (Interlocked.CompareExchange(ref _idle[i], conn, null) == null)
            return;

    // No free slot was found, close the connection...
    conn.Close();
}

This simple implementation was used as a proof-of-concept.

Let's take it up a notch

  • Since physical connections are very expensive, we need to enforce an upper limit (think multiple app instances, one DB backend).
  • Once the max limit has been reached, wait until a connection is released.
  • Release waiters in FIFO, otherwise some get starved.
  • So add a waiting queue next to our idle list: if no idle conn was found, enqueue a TaskCompletionSource.
  • When releasing, first check the waiting queue to release any pending attempts. Otherwise return to the idle list.

Two data structures, no locks

  • One idle list, one waiting queue. By the time we finish updating one, the other may have changed.
  • Example: rent attempt enters waiting queue, at the same time as a release puts a connection back in the idle list. Rent attempt waits forever.
  • Need a way to have a single, atomically-modified state (how many busy, idle, waiting)

Atomically manipulate states

[StructLayout(LayoutKind.Explicit)]
internal struct PoolState
{
    [FieldOffset(0)]
    internal short Idle;
    [FieldOffset(2)]
    internal short Busy;
    [FieldOffset(4)]
    internal short Waiting;
    [FieldOffset(0)]
    internal long All;
}
// Create a local copy of the pool state, in the stack
PoolState newState;
newState.All = _state.All;
// Modify it
newState.Idle--;
newState.Busy++;
// Commit the changes
_state.All = newState.All;

Problem: we have to have multiple counters (idle, busy, waiting), and manipulate them atomically.

But two operations may be racing, and we lose a change...

Handling race conditions

var sw = new SpinWait();
while (true)
{
    // Create a local copy of the pool state, in the stack
    PoolState oldState;
    oldState.All = Volatile.Read(ref _state.All); // Volatile prevents unwanted "optimization"
    var newState = oldState;
    // Modify it
    newState.Idle--;
    newState.Busy++;
    // Try to commit the changes
    if (Interlocked.CompareAndExchange(ref _state.All, newState.All, oldState.All) == oldState.All)
        break;
    // Our attempt to increment the busy count failed, Loop again and retry.
    sw.SpinOnce();
}
  • This is how a lot of lock-free code looks like: try until success.
  • SpinWait is a nice tool for spinning, and will automatically context switch after some time.
  • Beware of the ABA problem!

Very subtle issues

  • An open attempt creates a new connection
    (idle == 0, total < max).
  • But a network error causes an exception. State has already been updated (busy++), need to decrement.
  • But we already did busy++, so maybe busy == max. Someone may have gone into the waiting queue - so we need to un-wait them.

Conclusion

https://medium.com/@tylerneely/fear-and-loathing-in-lock-free-programming-7158b1cdd50c

8% perf increase

But probably the most dangerous 8% I've ever done.

No standard connection pool API

  • Pooling is an internal concern of the provider, there's no user-facing API (unlike JDBC).
var conn = new NpgsqlConnection("Host=...");
conn.Open();
  • This imposes a lookup on every open - we need to get the pool for the given connection string. This is bad especially since an application typically has just one pool.
  • Npgsql implements this very efficiently (iterate over list, do reference comparison). But a first-class pooling API would obviate this entirely.

No standard connection pool API

  • A 1st-class pool API would allow us to take the pool out of the provider:
    • One strong implementation
    • Reuse across providers
    • Different pooling strategies
  • No need to lookup by connection string
  • https://github.com/dotnet/corefx/issues/26714

Async Perf Tips

Avoid async-related allocations

  • ValueTask allows us to avoid the heap allocation of a Task<T> when the async method completes synchronously.
  • Not necessary if the method returns void, bool and some other values.
  • Since 2.1, IValueTaskSource also allows us to eliminate the allocation when the method completes asynchronously. This is advanced.

Best async tip: avoid async

  • Conventional wisdom is that for asynchronously-completing async methods, any async-related CPU overhead will be dominated by I/O latency etc.
  • Not true for DbFortunesRaw (low-latency 10Gbps Ethernet, simple query with data in memory...).
  • Even synchronously-yielding method invocations need to be treated very aggressively. The cost of the state machine is noticeable.

Refactoring to Reduce Async Calls

var msg = await ReadExpecting<SomeType>();

async ValueTask<T> ReadExpecting<T>()
{
    var msg = await ReadMessage();
    if (msg is T expected)
        return expected;
    throw new NpgsqlException(...);
}
var msg = Expect<T>(await ReadMessage());

internal static T Expect<T>(IBackendMessage msg)
    => msg is T asT
        ? asT
        : throw new NpgsqlException(...);

async ValueTask<T> ReadMessage() { ... }

Reduced from ~7 layers to 2-3, ~3.5% RPS boost

Allows inlining

Creating Fast Paths

  • A more extreme way to eliminate async is to build fast paths for when data is in memory.
Task Open()
{
    // Attempt to get a connection from the idle list
    if (pool.TryAllocateFast(connString, out var connection))
        return connection;
    // No dice, do the slow path
    return OpenLong();

    async Task OpenLong() {
        ...
    }
}

  • This is similar to ValueTask: ValueTask eliminates the memory overhead of the async method, but we also want to get rid of the CPU overhead.
  • Of course, this causes code duplication...

Closing Remarks

The ADO.NET API

  • The API hasn't received significant attention in a long time.
    • Pooling API
    • Batching API
    • Better support for prepared statements
    • Generic parameters to avoid boxing
  • However, it's still possible to do remarkably well on ADO.NET (no need to throw it out entirely).

Be careful

  • Always keep perf in mind, but resist the temptation to over- (or pre-) optimize.
  • Reducing roundtrips and preparing commands is much more important than reducing an allocation or a doing fancy low-level stuff.
  • (even if it is slightly less sexy)

It's a Good Time to do Perf

  • Perf tools (BDN, profilers) are getting much better, but progress still needs to be made (e.g. Linux)
  • Microsoft has really put perf at the center.
  • Great open-source relationship with Microsoft data team, around EF Core and Npgsql perf.
  • Thanks to Diego Vega, Arthur Vickers, Andrew Peters, Stephen Toub...
  • The TechEmpower fortunes benchmark is one of the more realistic ones (e.g. involves DB)
  • But still, how many webapps out there send identical simple queries over a 10Gbps link to PostgreSQL, for a small dataset that is cached in memory?
  • Performance is usually about tradeoffs: code is more brittle/less safe (e.g. lock-free), duplication (e.g. fast paths).

Is this all useful in the real world?

  • No good answer, especially difficult to decide when you're a library...
  • Important for .NET Core evangelism: show the world what the stack is capable of, get attention.
  • Your spouse & family may hate you, but...
  • You learn a ton of stuff and have lots of fun in the process.

So, is this all useful in the real world??

Thank you!

https://slides.com/shayrojansky/dotnetos-2018-11-05/

http://npgsql.org, https://github.com/npgsql/npgsql

Shay Rojansky, roji@roji.org, @shayrojansky

http://roji.org

Dotnetos Presentation on Npgsql Performance

By Shay Rojansky

Dotnetos Presentation on Npgsql Performance

  • 2,349