.NET Data Access Performance
TechEmpower
- Suite of standardized web tests
- .NET Core on Linux in a good place without DB, bad place with DB.
- Focused on the DbFortunesRaw scenario (raw DB read performance, no O/RM).
- Microsoft contributed hardware, so results should be reproducible.
Npgsql
- Open source ADO.NET provider for .NET
- Close collaboration with the EF team (EF6 and EF Core providers also exist)
Peregrine
- Minimal PostgreSQL driver that passes the benchmark.
- Far from a full ADO.NET driver, but provides a baseline/upper limit and a source for ideas.
PostgreSQL Drivers (web)
Npgsql 3.2.6 on Linux | 61,065 |
Npgsql 3.2.6 on Windows | 85,845 |
Undertow/Java | 119,638 |
NodeJS | 22,287 |
Peregrine on Windows | 154,396 |
Peregrine on Linux | 123,010 |
Intel Xeon E5-1650 @ 3.50Ghz, 6 physical cores, 12 threads. 32GB DDR4
Database Comparison
Npgsql 3.2.6 on Linux | 59,676 |
Npgsql 3.2.6 on Windows | 68,708 |
PostgreSQL (Java/Undertow) | 119,638 |
MySQL (Java/Undertow) | 75,452 |
MongoDB (Java/Undertow) | 105,984 |
SQL Server (.NET on Windows) | 69,000 |
Database-only, sync vs. async
- There's an issue with sync performance on Linux.
- Linux in general performs significantly worse than Windows (even async).
- Java - which is 100% sync - beats .NET doing both sync and async, even on high thread counts.
- The JDBC PostgreSQL driver isn't amazingly optimized.
Lock-free Connection Pool
- Ideally, the connection pool should be the only point of synchronization in an ADO.NET provider.
- DbFortunesRaw immediately showed issues of contention: only 92% CPU usage.
- Reimplemented the pool as lock-free for ~8% of RPS boost.
Pool Implementation
- Single array (no resize needed - MaxPoolSize)
- Scan from random point, use Interlocked.Exchange to get connection out and put a null in its place
- If failed to get a connection:
- If below MaxPoolSize, create physical and return
- Otherwise enqueue a TaskCompletionSource in a ConcurrentQueue and wait until someone releases
SocketAsyncEventArgs
- More efficient API than NetworkStream.{Read,Write}Async() - ~3% more RPS.
- Can be wrapped to provide async/await (see https://blogs.msdn.microsoft.com/pfxteam/2011/12/15/awaiting-socket-operations/)
- Does not provide Stream abstraction, so no SSL
- In .NET Core 2.1, Span- and Memory-based overloads should provide similar performance.
Avoiding Async
- Conventional wisdom is that for yielding async methods, any async CPU overhead will be dominated by I/O latency etc.
- Not true for DbFortunesRow (low-latency 10Gbps Ethernet, simple query with data in memory...).
- Even yielding method invocations need to be treated very aggressively.
Refactoring to Reduce Async Calls
var msg = await ReadExpecting<SomeType>();
async ValueTask<T> ReadExpecting<T>()
{
var msg = await ReadMessage();
if (msg is T expected)
return expected;
throw new NpgsqlException(...);
}
async ValueTask<T> ReadMessage() { ... }
var msg = Expect<T>(await ReadMessage());
internal static T Expect<T>(IBackendMessage msg)
=> msg is T asT
? asT
: throw new NpgsqlException(...);
async ValueTask<T> ReadMessage() { ... }
Reduced from ~7 layers to 2-3, ~3.5% RPS boost
Creating Fast Paths
- A more extreme way to eliminate async is to build fast paths for when data is in memory.
- This is similar to ValueTask: ValueTask eliminates the memory overhead of the async method, but we also want to get rid of the CPU overhead.
Task Open() {
...
if (pool.TryAllocateFast(connString, out var connection))
return connection;
return OpenLong();
}
async Task OpenLong() {
...
}
ADO.NET API Limitations
- Pooling is an internal concern of the provider, there's no user-facing API.
- This imposes a lookup on every open (connection string -> pool).
- Mitigated by lock-free lookup, but a pool API would open important possibilities (provider-neutral optimized implementation, ability to choose between implementations...)
- https://github.com/dotnet/corefx/issues/26714
Prepared Commands
- Unlike SQL Server, in PostgreSQL preparation is vital.
- ADO.NET has DbCommand.Prepare(), but for short-lived scenarios this isn't enough.
- Npgsql persists prepared commands across pooled open/close, but still has to do a lookup based on the SQL.
- Npgsql also autoprepares - important if you're using dapper/EF which don't. But a lookup is again necessary.
Allocations
- Did the standard hunt to reduce allocations.
- ADO.NET doesn't allow writing value types without boxing (DbParameter is non-generic). Introduced non-standard API in Npgsql.
- https://github.com/dotnet/corefx/issues/8955.
Buffering
- Each physical connection has 8KB read buffer, 8KB write buffer.
- ADO.NET guarantees buffering at the row level (unless sequential mode is on, for huge rows).
- This is great as it allows optimizations when reading (no need to pass through async).
- Looking at Span/Memory as well as possibly Pipelines.
So where are we now?
Final Results
Npgsql 3.2.6 on Linux | 61,065 |
Npgsql 3.2.6 on Windows | 85,845 |
Npgsql vNext on Linux | 97,906 (60%) |
Npgsql vNext on Windows | 134,648 (56%) |
Undertow/Java | 119,638 |
NodeJS | 22,287 |
Peregrine on Windows | 154,396 |
Peregrine on Linux | 123,010 |
For the Curious:
- Npgsql: http://npgsql.org, https://github.com/npgsql/npgsql
- Branch perf: "stable" optimizations
- Branch perfx: experimental (partially broken)
- Data access perf repo: https://github.com/aspnet/DataAccessPerformance/
- This presentation:
http://slides.com/shayrojansky/deck-npgsql-perf#/ - Me: Shay Rojansky, roji@roji.org
Npgsql performance sprint
By Shay Rojansky
Npgsql performance sprint
Summary of Npgsql performance optimizations done at Microsoft in Jan/Feb 2018
- 1,517