Focus on high performance...
"Database drivers are boring"
- Everyone, including me in 2012
OS | .NET Core | Npgsql 3.2 RPS | Npgsql 4.0 RPS |
---|---|---|---|
Windows | 2.0 | 71,810 | 107,400 |
Windows | 2.1 | 101,546 | 132,778 |
Linux | 2.0 | 73,023 | 93,668 |
Linux | 2.1 | 78,301 | 107,437 |
Breadth rather than depth
Connection[] _idle = new Connection[PoolSize];
public Connection Rent()
{
for (var i = 0; _idle.Length; i++)
{
// If the slot contains null, move to the next slot.
if (_idle[i] == null)
continue;
// If we saw a connector in this slot, atomically exchange it with a null.
// Either we get a connector out which we can use, or we get null because
// someone has taken it in the meantime. Either way put a null in its place.
var conn = Interlocked.Exchange(ref _idle[i], null);
if (conn != null)
return conn;
}
// No idle connection was found, create a new one
return new Connection(...);
}
public void Return(Connection conn)
{
for (var i = 0; i < _idle.Length; i++)
if (Interlocked.CompareExchange(ref _idle[i], conn, null) == null)
return;
// No free slot was found, close the connection...
conn.Close();
}
This simple implementation was used as a proof-of-concept.
[StructLayout(LayoutKind.Explicit)]
internal struct PoolState
{
[FieldOffset(0)]
internal short Idle;
[FieldOffset(2)]
internal short Busy;
[FieldOffset(4)]
internal short Waiting;
[FieldOffset(0)]
internal long All;
}
// Create a local copy of the pool state, in the stack
PoolState newState;
newState.All = _state.All;
// Modify it
newState.Idle--;
newState.Busy++;
// Commit the changes
_state.All = newState.All;
Problem: we have to have multiple counters (idle, busy, waiting), and manipulate them atomically.
But two operations may be racing, and we lose a change...
var sw = new SpinWait();
while (true)
{
// Create a local copy of the pool state, in the stack
PoolState oldState;
oldState.All = Volatile.Read(ref _state.All); // Volatile prevents unwanted "optimization"
var newState = oldState;
// Modify it
newState.Idle--;
newState.Busy++;
// Try to commit the changes
if (Interlocked.CompareAndExchange(ref _state.All, newState.All, oldState.All) == oldState.All)
break;
// Our attempt to increment the busy count failed, Loop again and retry.
sw.SpinOnce();
}
https://medium.com/@tylerneely/fear-and-loathing-in-lock-free-programming-7158b1cdd50c
8% perf increase
But probably the most dangerous 8% I've ever done.
var conn = new NpgsqlConnection("Host=...");
conn.Open();
var msg = await ReadExpecting<SomeType>();
async ValueTask<T> ReadExpecting<T>()
{
var msg = await ReadMessage();
if (msg is T expected)
return expected;
throw new NpgsqlException(...);
}
var msg = Expect<T>(await ReadMessage());
internal static T Expect<T>(IBackendMessage msg)
=> msg is T asT
? asT
: throw new NpgsqlException(...);
async ValueTask<T> ReadMessage() { ... }
Reduced from ~7 layers to 2-3, ~3.5% RPS boost
Allows inlining
Task Open()
{
// Attempt to get a connection from the idle list
if (pool.TryAllocateFast(connString, out var connection))
return connection;
// No dice, do the slow path
return OpenLong();
async Task OpenLong() {
...
}
}
public Task Flush(bool async)
{
if (async)
return Underlying.WriteAsync(Buffer, 0, WritePosition);
Underlying.Write(Buffer, 0, WritePosition);
return Task.CompletedTask;
}
using (var reader = cmd.ExecuteReader())
while (reader.Read())
Console.WriteLine(reader.GetString(1) + ": " + reader.GetInt32(0));
These two problems are obviated by the new Pipelines API
INSERT INTO...
OK
INSERT INTO...
OK
INSERT INTO...; INSERT INTO ...
OK; OK
The performance improvement is dramatic.
The further away the server is, the more the speedup (think cloud).
Supported for all CRUD operations, not just insert.
For bulk insert, binary COPY is an even better option.
Entity Framework Core takes advantage of this.
var cmd = new NpgsqlCommand("INSERT INTO ...; INSERT INTO ...", conn);
cmd.ExecuteNonQuery();
The batching "API" is shoving several statements into a single string.
This forces Npgsql to parse the text and split on semicolons...
(parsing needs to happen for other reasons)
var cmd = new NpgsqlCommand("INSERT INTO ...; INSERT INTO ...", conn);
cmd.ExecuteNonQuery();
var tx = conn.BeginTransaction(); // Sends BEGIN
// do stuff in the transaction
BEGIN
OK
INSERT INTO...
OK
BeginTransaction() simply writes the BEGIN into the write buffer, but doesn’t send. It will be sent with the next query.
Another example: when closing a (pooled) connection, we need to reset its state.
This is also prepended, and will be sent when the connection is “reopened” and a query is sent.
SELECT table0.data+table1.data+table2.data+table3.data+table4.data
FROM table0
JOIN table1 ON table1.id=table0.id
JOIN table2 ON table2.id=table1.id
JOIN table3 ON table3.id=table2.id
JOIN table4 ON table4.id=table3.id
Unprepared:
TPS: 953
Prepared:
TPS: 8722
Prepared statements + short-lived connections =
using (var conn = new NpgsqlConnection(...))
using (var cmd = new NpgsqlCommand("<some_sql>", conn) {
conn.Open();
cmd.Prepare();
cmd.ExecuteNonQuery();
}
In Npgsql, prepared statements are persisted.
Each physical connection has a prepared statement cache, keyed on the SQL.
The first time the SQL is executed on a physical connection, it is prepared. The next time the prepared statement is reused.
Prepared statements + O/RMs =
O/RMs (or Dapper) don't call DbCommand.Prepare()
Npgsql includes optional automatic preparation: all the benefits, no need to call Prepare().
Statement is prepared after being executed 5 times.
We eject least-recently used statements to make room for new ones.
https://slides.com/shayrojansky/dotnetos-2018-11-05/
http://npgsql.org, https://github.com/npgsql/npgsql
Shay Rojansky, roji@roji.org, @shayrojansky