.NET Data Access Performance

TechEmpower

  • Suite of standardized web tests
  • .NET Core on Linux in a good place without DB, bad place with DB.
  • Focused on the DbFortunesRaw scenario (raw DB read performance, no O/RM).
  • Microsoft contributed hardware, so results should be reproducible.

Npgsql

  • Open source ADO.NET provider for .NET
  • Close collaboration with the EF team (EF6 and EF Core providers also exist)

Peregrine

  • Minimal PostgreSQL driver that passes the benchmark.
  • Far from a full ADO.NET driver, but provides a baseline/upper limit and a source for ideas.

PostgreSQL Drivers (web)

Npgsql 3.2.6 on Linux 61,065
Npgsql 3.2.6 on Windows 85,845
Undertow/Java 119,638
NodeJS 22,287
Peregrine on Windows 154,396
Peregrine on Linux 123,010

Intel Xeon E5-1650 @ 3.50Ghz, 6 physical cores, 12 threads. 32GB DDR4

Database Comparison

Npgsql 3.2.6 on Linux 59,676
Npgsql 3.2.6 on Windows 68,708
PostgreSQL (Java/Undertow) 119,638
MySQL (Java/Undertow) 75,452
MongoDB (Java/Undertow) 105,984
SQL Server (.NET on Windows) 69,000

Database-only, sync vs. async

  • There's an issue with sync performance on Linux.
  • Linux in general performs significantly worse than Windows (even async).
  • Java - which is 100% sync - beats .NET doing both sync and async, even on high thread counts.
  • The JDBC PostgreSQL driver isn't amazingly optimized.

Lock-free Connection Pool

  • Ideally, the connection pool should be the only point of synchronization in an ADO.NET provider.
  • DbFortunesRaw immediately showed issues of contention: only 92% CPU usage.
  • Reimplemented the pool as lock-free for ~8% of RPS boost.

Pool Implementation

  • Single array (no resize needed - MaxPoolSize)
  • Scan from random point, use Interlocked.Exchange to get connection out and put a null in its place
  • If failed to get a connection:
    • If below MaxPoolSize, create physical and return
    • Otherwise enqueue a TaskCompletionSource in a ConcurrentQueue and wait until someone releases

SocketAsyncEventArgs

  • More efficient API than NetworkStream.{Read,Write}Async() - ~3% more RPS.
  • Can be wrapped to provide async/await (see https://blogs.msdn.microsoft.com/pfxteam/2011/12/15/awaiting-socket-operations/)
  • Does not provide Stream abstraction, so no SSL
  • In .NET Core 2.1, Span- and Memory-based overloads should provide similar performance.

Avoiding Async

  • Conventional wisdom is that for yielding async methods, any async CPU overhead will be dominated by I/O latency etc.
  • Not true for DbFortunesRow (low-latency 10Gbps Ethernet, simple query with data in memory...).
  • Even yielding method invocations need to be treated very aggressively.

Refactoring to Reduce Async Calls

var msg = await ReadExpecting<SomeType>();

async ValueTask<T> ReadExpecting<T>()
{
    var msg = await ReadMessage();
    if (msg is T expected)
        return expected;
    throw new NpgsqlException(...);
}

async ValueTask<T> ReadMessage() { ... }
var msg = Expect<T>(await ReadMessage());

internal static T Expect<T>(IBackendMessage msg)
    => msg is T asT
        ? asT
        : throw new NpgsqlException(...);

async ValueTask<T> ReadMessage() { ... }

Reduced from ~7 layers to 2-3, ~3.5% RPS boost

Creating Fast Paths

  • A more extreme way to eliminate async is to build fast paths for when data is in memory.
  • This is similar to ValueTask: ValueTask eliminates the memory overhead of the async method, but we also want to get rid of the CPU overhead.
Task Open() {
   ...
   if (pool.TryAllocateFast(connString, out var connection))
       return connection;
   return OpenLong();
}

async Task OpenLong() {
    ...
}

ADO.NET API Limitations

  • Pooling is an internal concern of the provider, there's no user-facing API.
  • This imposes a lookup on every open (connection string -> pool).
  • Mitigated by lock-free lookup, but a pool API would open important possibilities (provider-neutral optimized implementation, ability to choose between implementations...)
  • https://github.com/dotnet/corefx/issues/26714

Prepared Commands

  • Unlike SQL Server, in PostgreSQL preparation is vital.
  • ADO.NET has DbCommand.Prepare(), but for short-lived scenarios this isn't enough.
  • Npgsql persists prepared commands across pooled open/close, but still has to do a lookup based on the SQL.
  • Npgsql also autoprepares - important if you're using dapper/EF which don't. But a lookup is again necessary.

Allocations

  • Did the standard hunt to reduce allocations.
  • ADO.NET doesn't allow writing value types without boxing (DbParameter is non-generic). Introduced non-standard API in Npgsql.
  • https://github.com/dotnet/corefx/issues/8955. 

Buffering

  • Each physical connection has 8KB read buffer, 8KB write buffer.
  • ADO.NET guarantees buffering at the row level (unless sequential mode is on, for huge rows).
  • This is great as it allows optimizations when reading (no need to pass through async).
  • Looking at Span/Memory as well as possibly Pipelines.

So where are we now?

Final Results

Npgsql 3.2.6 on Linux 61,065
Npgsql 3.2.6 on Windows 85,845
Npgsql vNext on Linux 97,906 (60%)
Npgsql vNext on Windows 134,648 (56%)
Undertow/Java 119,638
NodeJS 22,287
Peregrine on Windows 154,396
Peregrine on Linux 123,010

For the Curious:

  • Npgsql: http://npgsql.org, https://github.com/npgsql/npgsql
  • Branch perf: "stable" optimizations
  • Branch perfx: experimental (partially broken)
  • Data access perf repo: https://github.com/aspnet/DataAccessPerformance/
  • This presentation:
    http://slides.com/shayrojansky/deck-npgsql-perf#/
  • Me: Shay Rojansky, roji@roji.org

Npgsql performance sprint

By Shay Rojansky

Npgsql performance sprint

Summary of Npgsql performance optimizations done at Microsoft in Jan/Feb 2018

  • 1,517