.NET Data Access Performance

TechEmpower

Suite of standardized web tests
.NET Core on Linux in a good place without DB, bad place with DB.
Focused on the DbFortunesRaw scenario (raw DB read performance, no O/RM).
Microsoft contributed hardware, so results should be reproducible.

Npgsql

Open source ADO.NET provider for .NET
Close collaboration with the EF team (EF6 and EF Core providers also exist)

Peregrine

Minimal PostgreSQL driver that passes the benchmark.
Far from a full ADO.NET driver, but provides a baseline/upper limit and a source for ideas.

PostgreSQL Drivers (web)

Npgsql 3.2.6 on Linux	61,065
Npgsql 3.2.6 on Windows	85,845
Undertow/Java	119,638
NodeJS	22,287
Peregrine on Windows	154,396
Peregrine on Linux	123,010

Intel Xeon E5-1650 @ 3.50Ghz, 6 physical cores, 12 threads. 32GB DDR4

Database Comparison

Npgsql 3.2.6 on Linux	59,676
Npgsql 3.2.6 on Windows	68,708

PostgreSQL (Java/Undertow)	119,638
MySQL (Java/Undertow)	75,452
MongoDB (Java/Undertow)	105,984
SQL Server (.NET on Windows)	69,000

Database-only, sync vs. async

There's an issue with sync performance on Linux.
Linux in general performs significantly worse than Windows (even async).
Java - which is 100% sync - beats .NET doing both sync and async, even on high thread counts.
The JDBC PostgreSQL driver isn't amazingly optimized.

Lock-free Connection Pool

Ideally, the connection pool should be the only point of synchronization in an ADO.NET provider.
DbFortunesRaw immediately showed issues of contention: only 92% CPU usage.
Reimplemented the pool as lock-free for ~8% of RPS boost.

Pool Implementation

Single array (no resize needed - MaxPoolSize)
Scan from random point, use Interlocked.Exchange to get connection out and put a null in its place
If failed to get a connection:
- If below MaxPoolSize, create physical and return
- Otherwise enqueue a TaskCompletionSource in a ConcurrentQueue and wait until someone releases

SocketAsyncEventArgs

More efficient API than NetworkStream.{Read,Write}Async() - ~3% more RPS.
Can be wrapped to provide async/await (see https://blogs.msdn.microsoft.com/pfxteam/2011/12/15/awaiting-socket-operations/)
Does not provide Stream abstraction, so no SSL
In .NET Core 2.1, Span- and Memory-based overloads should provide similar performance.

Avoiding Async

Conventional wisdom is that for yielding async methods, any async CPU overhead will be dominated by I/O latency etc.
Not true for DbFortunesRow (low-latency 10Gbps Ethernet, simple query with data in memory...).
Even yielding method invocations need to be treated very aggressively.

Refactoring to Reduce Async Calls

var msg = await ReadExpecting<SomeType>();

async ValueTask<T> ReadExpecting<T>()
{
    var msg = await ReadMessage();
    if (msg is T expected)
        return expected;
    throw new NpgsqlException(...);
}

async ValueTask<T> ReadMessage() { ... }

var msg = Expect<T>(await ReadMessage());

internal static T Expect<T>(IBackendMessage msg)
    => msg is T asT
        ? asT
        : throw new NpgsqlException(...);

async ValueTask<T> ReadMessage() { ... }

Reduced from ~7 layers to 2-3, ~3.5% RPS boost

Creating Fast Paths

A more extreme way to eliminate async is to build fast paths for when data is in memory.
This is similar to ValueTask: ValueTask eliminates the memory overhead of the async method, but we also want to get rid of the CPU overhead.

Task Open() {
   ...
   if (pool.TryAllocateFast(connString, out var connection))
       return connection;
   return OpenLong();
}

async Task OpenLong() {
    ...
}

ADO.NET API Limitations

Pooling is an internal concern of the provider, there's no user-facing API.
This imposes a lookup on every open (connection string -> pool).
Mitigated by lock-free lookup, but a pool API would open important possibilities (provider-neutral optimized implementation, ability to choose between implementations...)
https://github.com/dotnet/corefx/issues/26714

Prepared Commands

Unlike SQL Server, in PostgreSQL preparation is vital.
ADO.NET has DbCommand.Prepare(), but for short-lived scenarios this isn't enough.
Npgsql persists prepared commands across pooled open/close, but still has to do a lookup based on the SQL.
Npgsql also autoprepares - important if you're using dapper/EF which don't. But a lookup is again necessary.

Allocations

Did the standard hunt to reduce allocations.
ADO.NET doesn't allow writing value types without boxing (DbParameter is non-generic). Introduced non-standard API in Npgsql.
https://github.com/dotnet/corefx/issues/8955.

Buffering

Each physical connection has 8KB read buffer, 8KB write buffer.
ADO.NET guarantees buffering at the row level (unless sequential mode is on, for huge rows).
This is great as it allows optimizations when reading (no need to pass through async).
Looking at Span/Memory as well as possibly Pipelines.

So where are we now?

Final Results

Npgsql 3.2.6 on Linux	61,065
Npgsql 3.2.6 on Windows	85,845
Npgsql vNext on Linux	97,906 (60%)
Npgsql vNext on Windows	134,648 (56%)
Undertow/Java	119,638
NodeJS	22,287
Peregrine on Windows	154,396
Peregrine on Linux	123,010

For the Curious:

Npgsql: http://npgsql.org, https://github.com/npgsql/npgsql
Branch perf: "stable" optimizations
Branch perfx: experimental (partially broken)
Data access perf repo: https://github.com/aspnet/DataAccessPerformance/
This presentation:
http://slides.com/shayrojansky/deck-npgsql-perf#/
Me: Shay Rojansky, roji@roji.org

Npgsql performance sprint

By Shay Rojansky

Npgsql performance sprint

Summary of Npgsql performance optimizations done at Microsoft in Jan/Feb 2018

1,960

Shay Rojansky

shayrojansky

.NET Data Access Performance

TechEmpower

Npgsql

Peregrine

PostgreSQL Drivers (web)

Database Comparison

Database-only, sync vs. async

Lock-free Connection Pool

Pool Implementation

SocketAsyncEventArgs

Avoiding Async

Refactoring to Reduce Async Calls

Creating Fast Paths

ADO.NET API Limitations

Prepared Commands

Allocations

Buffering

So where are we now?

Final Results

For the Curious:

Npgsql performance sprint

Npgsql performance sprint

Shay Rojansky

More from Shay Rojansky