Vladislav Shpilevoy PRO
Database C developer at Tarantool. Backend C++ developer at VirtualMinds.
Algorithm of Massively Parallel Networking in C++
Vladislav Shpilevoy
Meetup 2025 Spring
Networking in C++
Existing solutions
New scheduler
Examples
Benchmarks
System is already C++
Super expertise in C++
Need ultra performance
socket(), send(), recv(), connect(),
accept(), bind(), listen()
epoll
io_uring
IOCP
IoRing
kqueue
boost::asio
thrift, gRPC
libev, libuv, libevent
userver
seastar
boost::asio
thrift, gRPC
- unreadable code, questionable perf
userver
- no Windows
seastar
- no Windows, no MacOS
- enforce their protocol
Your company lives in its own framework
Fairness
Coroutines
Events
Scheduler Kingdom
The Battle For The Key
The Front
Tavern
The Waiting Prison
The Map
The Ready
Guild
The Many Tasks
The Strong Workers
😭
😴
😴
The King Calls To Battle
😳
The King Calls To Battle
🤔
The King Calls To Battle
🤔
The King Calls To Battle
The Kernel
Castle
The King Calls To Battle
Less than 2k lines*
Lock-free*
Low memory
Simple code
Formally verified in TLA+
clang, gcc, msvc
MacOS, Windows, Linux
epoll, io_uring, kqueue, IOCP
x86, ARM
>= C++17
int main() { mg::sch::TaskScheduler sched("tst", 1, // Thread count. 5 // Subqueue size. ); sched.Post(new mg::sch::Task([&](mg::sch::Task *self) { std::cout << "Executed in scheduler!\n"; delete self; })); return 0; }
Create the scheduler
Post a task
The body is a lambda function
class MyTask : public mg::sch::Task { public: MyTask() : Task([this](mg::sch::Task* aSelf) { TaskSendRequest(aSelf); }) {} private: // Step 1 void TaskSendRequest( mg::sch::Task* aSelf); // Step 2 void TaskRecvResponse( mg::sch::Task* aSelf); // Step 3 void TaskFinish( mg::sch::Task* aSelf); };
Yield between the steps.
Can inherit to add context
void TaskSendRequest(mg::sch::Task* aSelf) {
std::cout << "Send\n"; aSelf->SetCallback([this](mg::sch::Task* aSelf) { TaskRecvResponse(aSelf); }); mg::sch::TaskScheduler::This().Post(aSelf); } void TaskRecvResponse(mg::sch::Task* aSelf) { std::cout << "Receive\n"; aSelf->SetCallback([this](mg::sch::Task *aSelf) { TaskFinish(aSelf); }); mg::sch::TaskScheduler::This().Post(aSelf); } void TaskFinish(mg::sch::Task* aSelf) { std::cout << "Finish\n"; }
1. Do something.
2. Prepare next step.
3. Post self.
1. Do something.
2. Prepare next step.
3. Post self.
1. Finish the work.
2. Delete/destroy/reuse.
int main() { MyTask task; mg::sch::TaskScheduler scheduler("tst", 1, // Thread count. 5 // Subqueue size. ); scheduler.Post(&task); return 0; }
int main() { mg::sch::Task task; task.SetCallback([]( mg::sch::Task& aSelf) -> mg::box::Coro { std::cout << "Sending request ...\n"; co_await aSelf.AsyncYield(); std::cout << "Received response!\n"; co_await aSelf.AsyncYield(); std::cout << "Finish\n"; co_return; }(task)); mg::sch::TaskScheduler scheduler("tst", 1, // Thread count. 5 // Subqueue size. ); scheduler.Post(&task); return 0; }
Coroutine enabler
Reschedule self. Let other tasks to work
static void TaskSubmitRequest( mg::sch::Task& aSender) { mg::sch::Task* worker = new mg::sch::Task(); worker->SetCallback([]( mg::sch::Task& aSelf, mg::sch::Task& aSender) -> mg::box::Coro { aSelf.SetDelay(1000); co_await aSelf.AsyncYield(); aSender.PostSignal(); co_await aSelf.AsyncExitDelete(); }(*worker, aSender)); mg::sch::TaskScheduler::This().Post(worker); }
Temporary worker task
"Work" for 1 sec, wakeup the owner
int main() { mg::sch::Task task; mg::sch::TaskScheduler scheduler("tst", 1, // Thread count. 5 // Subqueue size. ); task.SetCallback([]( mg::sch::Task& aSelf) -> mg::box::Coro { TaskSubmitRequest(aSelf); do { aSelf.SetWait(); } while (!co_await aSelf.AsyncReceiveSignal()); co_return; }(task)); scheduler.Post(&task); return 0; }
Start work and wait for its comletion
int main() { mg::aio::IOCore core; core.Start(3 /* threads */); MyServer server(core); uint16_t port = server.Bind(); server.Start(); for (int i = 0; i < theClientCount; ++i) new MyClient(i + 1, core, port); core.WaitEmpty(); return 0; }
IO task scheduler
Task as a server
Tasks as clients
class MyServer final : private mg::aio::TCPServerSubscription { public: MyServer( mg::aio::IOCore& aCore) : myServer(mg::aio::TCPServer::NewShared(aCore)) {} uint16_t Bind() { myServer->Bind(mg::net::HostMakeAllIPV4(0)); return myServer->GetPort(); } void Start() { myServer->Listen(this); } // ... Some methods ... mg::aio::TCPServer::Ptr myServer; }
Server receives task-events via the "subscription"
Server socket is created attached to IOCore
Bind + get the resulting port if it was random
After Listen(subscription) the socket is active, runs in IOCore, and delivers events
void MyServer::OnAccept( mg::net::Socket aSock, const mg::net::Host& aPeerAddress) final { new MyPeer(myServer->GetCore(), aSock); }
Invoked by IOCore workers
Spawn a new task to handle the peer socket
class MyClient final : private mg::aio::TCPSocketSubscription { public: MyClient( mg::aio::IOCore& aCore, uint16_t aPort) : mySock(new mg::aio::TCPSocket(aCore)) { mySock->Open({}); mg::aio::TCPSocketConnectParams connParams; connParams.myEndpoint = mg::net::HostMakeLocalIPV4(aPort).ToString(); mySock->PostConnect(connParams, this); } // ... Some methods ... mg::aio::TCPSocket* mySock; };
Client receives task-events via the "subscription"
Will work in the given IOCore
Choose connection parameters
Enter the IOCore with the async connect, start receiving events
void MyClient::OnConnect() final { const char* msg = "hello handshake"; mySock->SendRef(msg, mg::box::Strlen(msg) + 1); mySock->Recv(1); } void MyClient::OnRecv( mg::net::BufferReadStream& aStream) final { MG_BOX_ASSERT(aStream.GetReadSize() > 0); mySock->PostClose(); } void MyClient::OnClose() final { mySock->Delete(); delete this; }
Async send "handshake" on connect and read an ack
Async close on confirmation
Delete self, when close is finished
class CalcClient final : private mg::aio::TCPSocketSubscription { public: void Submit( char aOp, int64_t aArg1, int64_t aArg2, std::function<void(int64_t)>&& aOnComplete); mg::aio::TCPSocket* mySock; };
Calculator frontend, operations executed in a "remote microservice"
Client for the remote calculator
Submit async operation
class MyRequest { public: MyRequest( CalcClient& aCalcClient, mg::sch::TaskScheduler& aSched) : myCalcClient(aCalcClient) , myTask(Execute(this)) { aSched.Post(&myTask); } private: mg::box::Coro Execute( MyRequest* aSelf); CalcClient& myCalcClient; mg::sch::Task myTask; };
User request
It has a client to the calculator host, and a scheduler to work in
Start execution
Body
mg::box::Coro MyRequest::Execute( MyRequest* aSelf) { int64_t res = 0; myCalcClient.Submit('+', 10, myID, [aSelf, &res](int64_t aRes) { res = aRes; aSelf->myTask.PostSignal(); }); while (!co_await aSelf->myTask.AsyncReceiveSignal()) aSelf->myTask.SetWait(); std::cout << "Result: " << res << std::end; // ... Do whatever else or delete the request. }
Submit async request to the calculator client
Async wait for completion
On completion save the result and wake the request up
Debian 11, 2.5GHz, 32 cores
18.7 mln / sec
3.8 mln / sec
4.3 mln / sec
8.2 mln / sec
1 thread, empty tasks:
x5.0
2 threads, empty tasks:
x1.4
10 threads, empty tasks:
x7.7
50 threads, empty tasks:
x168.5
==
x1.8
1.8 mln / sec
x4.0
2.8 mln / sec
x118.8
7.5 mln / sec
<=3 threads, micro tasks:
5 threads, micro tasks:
10 threads, micro tasks:
50 threads, micro tasks:
Ubuntu 22.04.4, 0.8-4.8GHz, 24 cores
5 threads, 100 clients, 128 b:
x1.54
962'630 msg / sec
1 thread, 100 clients, 128 b:
x1.45
330'200 msg / sec
5 threads, 200 clients, 100 KB:
x4.44
51'100 msg / sec
3 threads, 100 clients, 128 b:
x1.49
Ubuntu 24.04.1, 0.8-4.8GHz, 24 cores
218'500 msg / sec
1 thread, 100 clients, 128 b:
x1.27
245'470 msg / sec
3 threads, 200 clients, 100 KB:
x1.48
13'254 msg / sec
3 threads, 100 clients, 128 b:
x1.10
Windows 11 Home, 3.70GHz, 12 cores
195'430 msg / sec
1 thread, 100 clients, 128 b:
x1.20
80'510 msg / sec
3 threads, 200 clients, 100 KB:
x2.11
28'368 msg / sec
3 threads, 100 clients, 128 b:
x1.50
macOS Big Sur 11.7.10, 2.5GHz, 4 cores
146'930 msg / sec
1 thread, 100 clients, 128 b:
x1.41
101'800 msg / sec
3 threads, 200 clients, 100 KB:
x1.45
12'750 msg / sec
Get notified about events, then do operation on socket each individually.
Post many operations on many sockets, get notified when done.
Why x2 faster?
By Vladislav Shpilevoy
My talk is dedicated to the performance of C++ servers using an alternative to boost::asio for massively parallel network operations. boost::asio is effectively the standard for network programming in C++, but in rare cases, it may be unavailable or insufficient for various reasons. Drawing from my experience in high-performance projects, I developed a new task scheduling algorithm, built a networking library based on it, and am presenting them in this talk. The most interesting aspects include fair CPU load distribution, support for C++ coroutines, formal verification with TLA+, and reproducible benchmarks demonstrating an N-times speedup over boost::asio. The project is open-source: https://github.com/Gerold103/serverbox.
Database C developer at Tarantool. Backend C++ developer at VirtualMinds.