Performance Highlight
TCP Loopback (
tcp://127.0.0.1), 10-second window, release build on an AMD Ryzen 5 7640U
- 2.2 M msg/s — PushPull · 64 B · Linux · io_uring + cork · 4 workers
- 6.6 GB/s — PushPull · 32 KB · Linux · io_uring + cork + multishot + zerocopy · 8 workers
This, I imagine, is near saturation levels on a TCP Loopback pretty much limited by CPU/Memory Bandwidth. No other ZeroMQ library achieves this insane level of throughput. None.
The IO-uring path was simplified, write paths using far more efficient techniques and unnecessary flags removed. Keep in mind, there's only one event loop (uring worker), works alongside Tokio and is completely opt-in per socket. No other ZeroMQ library offers this flexibility and speed, maybe some day, but not today, not now.
One day, I may add the ability to configure the number of IO-uring workers to increase throughput.
I've recently added a bench crate so that anyone could reproduce the numbers. Given the CPU I used, which is not even a crazy powerful one means that the benchmark numbers are even higher.
The bench should have been something I added here a while ago. I refrained from it since XSMB was the library to do benching between libzmq and rzmq. This effort doesn't matter if XSMB did not gain any performance over libzmq. XSMB v1 used libzmq, XSMB v2 uses rzmq. Hi Stakes Markets Game uses XSMB v2.
Anyway, the rumors of this library beating out libzmq are true. Try it yourself.
MacOS can also reach 6.3 GB/s with 8 workers using PushPull. Meaning you don't even need io-uring on Mac, but you will burn more CPU without IO-uring so it is more inefficient.
Pay attention to how much memory is used during benchmarks and you will see the needed size for high throughput workload is still very very small. rzmq can be put on a Raspberry PI and you'll be just fine blasting out messages like nothing, it SCREAMS! rzmq SCREAMS!
