Benchmark network stack CPU usage #5220

sandreim · 2024-08-02T12:09:28Z

Currently the only way to catch network stack performance regressions or compare litep2p with libp2p is to actually run the nodes in a network and look at the CPU usage metrics of the networking tasks.

Implementing such a network stack performance benchmark (perhaps as part of the subsystem benchmarks) would provide the following benefits:

test expected gains from further optimization work
estimate the network stack CPU usage at arbitrary scale and the traffic profile of parachain consensus gossip and req/resp protocols.
catch regression with CI tests

CC @AndreiEres @dmitry-markin

dmitry-markin · 2024-09-26T07:33:48Z

We had a discussion with @AndreiEres & @lexnv, and to follow up, I'd like to summarize the understanding of the issue from the networking team perspective:

What we would like to have is benchmarking of individual network protocols. The protocols in questions are high level network protocols like Notifications, RequestResponses, and Kademlia. With libp2p network backend they are partially implemented in substrate (like Notifications & RequestResponses), in litep2p they are moved to the networking library itself.

The understanding of the individual protocols' performance will provide more useful information than having generalized performance measurements of the abstract "networking stack" as part of subsystem benchmarks, based on combination of these protocols.
Implementing the benchmarks of individual protocols is more straightforward and simple than implementing network performance bench based on subsystem benchmarks. For example, for Notifications & RequestResponsesprotocols it can be as simple as spawning just two nodes comprising solely from Litep2pNetworkBackend or NetworkWorker (aka "Libp2pNetworkBackend") and sending a stream of notifications or a series of requests with configurable payload sizes. For Kademlia more complex setup is needed to spawn multiple nodes to form a DHT and execute different queries on it, but it is still simpler than implementing a sysbsystem bench and tricking it to use Kademlia protocol heavily.
The measurement of performance of individual protocols will allow direct comparison of libp2p and litep2p, with understanding of what part of the networking stack to focus optimization efforts on. By measuring specific protocols we will understand what part of the library a regression comes from, if we receive a CI alert.

It would be really helpful if you could implement such benchmarks, as we are going to need them anyway

alexggh · 2024-09-26T08:58:13Z

Implementing the benchmarks of individual protocols is more straightforward and simple than implementing network performance bench based on subsystem benchmarks

There is a disadvantage here, currently the subsystem-bench are a good tool for estimating the load of subsystem on real networks, it allows you to configure high level properties of the network(num_validators, num_cores, num_parachains, num_candidates to validate) and it generate the equivalent messages a node would have to process.

Currently, the oversimplified setup is: Messages Generator -> Real overseer and Real orchestra subsystems -> Mocked Network stack., with this setup you can realistically estimated the usage of the Real overseer and Real orchestra subsystems in realistic traffic conditions, extending this setup to:

Messages Generator -> Real overseer and Real orchestra subsystems -> Real p2p network stack ->Mocked OS networking primitives` , allows you to realistically simulate and estimate even more parts of our system running together in stress conditions, so I think that's valuable.

What I understand you are suggesting implementing a pipeline like:
Mocked Traffic -> Real p2p network stack -> Mocked OS networking primitives

I think that's a good first step and it allows you for comparison between Litep2p and Libp2p backend, but it doesn't cover the part where you benchmark as much as possible from the node running all together.

My suggestion would be, when we implement this to also have in mind the longer pipeline I suggested above, so that we would still be able to glue everything together at some point in the future.

sandreim · 2024-09-26T12:20:40Z

+1 on what @alexggh said above.

3. The measurement of performance of individual protocols will allow direct comparison of libp2p and litep2p, with understanding of what part of the networking stack to focus optimization efforts on. By measuring specific protocols we will understand what part of the library a regression comes from, if we receive a CI alert.

The subsystem benchmarks will report the networking stack usage as part of the tests we already have. So you can compare the libp2p and litep2p performance in a more realistic scenario, which is what we aim to do.

lexnv · 2024-09-26T13:10:01Z

Sounds like a plan! Thanks for the clarifications! 🙏

Indeed the subsystem benchmark brings a good improvement from the current state.

We can handle the protocol-specific benchmarks as part of the networking team, since we'd need to know if new changes impact the performance of the lower-level components and make informed decisions about optimizations

AndreiEres · 2024-09-26T18:52:42Z

Sorry for being late to the party, and sorry for the delay in providing the collected information. Here is my understanding of the current situation.

In Polkadot, we are replacing libp2p with litep2p, a more lightweight and efficient alternative. This change should enhance the network stack, but the only way to evaluate its performance is to run the nodes and examine the metrics. We need a tool to measure the performance of the network stack in the parachains protocol. This tool will allow us to compare the two libraries and estimate overall CPU usage. At the same time, we must remember that the network stack consists of various protocols, such as Notifications, RequestResponses, and Kademlia. While the stack based on litep2p may function better, specific protocols may underperform. Therefore, in addition to overall CPU usage, we also need detailed measurements focused on individual protocols.

Thus, the actual work can be divided into two different tasks that are not connected to each other and can be implemented independently:

Add network stack to subsystem benchmarks to estimate CPU usage at arbitrary scale in the tested part of parachain consensus, track optimization gains, and catch regressions.
Implement benchmarks for individual network protocols to compare libp2p and litep2p, and focus optimization efforts.

Using the analogy of testing, the first benchmark is the integration test, while the second consists of several unit tests.

How do you guys find this approach? What should I change or add?

# Description Implemented benchmarks for Notifications and RequestResponse protocols with libp2p implementation. These benchmarks allow us to monitor regressions and implement fixes before they are observed in real chain. In the future, they can be used for targeted optimizations of litep2p compared to libp2p. Part of #5220 Next steps: - Add benchmarks for litep2p implementation - Optimize load to get better results - Add benchmarks to CI to catch regressions ## Integration Benchmarks don't affect downstream projects. --------- Co-authored-by: alvicsam <[email protected]> Co-authored-by: GitHub Action <[email protected]>

sandreim added the T10-tests This PR/Issue is related to tests. label Aug 2, 2024

sandreim added this to parachains team board Aug 2, 2024

github-project-automation bot moved this to Backlog in parachains team board Aug 2, 2024

dmitry-markin added this to Networking Aug 5, 2024

dmitry-markin moved this to Backlog 🗒 in Networking Aug 5, 2024

AndreiEres self-assigned this Aug 6, 2024

lexnv mentioned this issue Oct 10, 2024

network: Update libp2p to 0.54.1 #5996

Closed

AndreiEres mentioned this issue Oct 17, 2024

Add networking benchmarks for libp2p #6077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark network stack CPU usage #5220

Benchmark network stack CPU usage #5220

sandreim commented Aug 2, 2024 •

edited

Loading

dmitry-markin commented Sep 26, 2024

alexggh commented Sep 26, 2024 •

edited

Loading

sandreim commented Sep 26, 2024

lexnv commented Sep 26, 2024

AndreiEres commented Sep 26, 2024

Benchmark network stack CPU usage #5220

Benchmark network stack CPU usage #5220

Comments

sandreim commented Aug 2, 2024 • edited Loading

dmitry-markin commented Sep 26, 2024

alexggh commented Sep 26, 2024 • edited Loading

sandreim commented Sep 26, 2024

lexnv commented Sep 26, 2024

AndreiEres commented Sep 26, 2024

sandreim commented Aug 2, 2024 •

edited

Loading

alexggh commented Sep 26, 2024 •

edited

Loading