I've been dedicating time to make rzmq much faster more cpu/memory efficient across the board and more correct. 0.5.17 was step up and 0.5.18 was a minor but good performance update
0.5.19 is shaping up to be a huge performance and correctness update not only due to fibre bug fixes and performance updates, but general rzmq restructuring. Many bugs were found and squashed. Data flows are being simplified and less generalized. io-uring was substantially improved. There's more code reuse this time around.
The proof is in the benchmarks. 0.5.19 now delivers a 3 GB per second upgrade over 0.5.18 when using push pull. From 8GB/s to 11GB/s (32KB, 8 push sockets, 1 pull socket) on a TCP loopback using Tokio. io-uring with 1 push / 1 pull socket, 32KB delivers an astonishing 6GB/s - 7GB/s on Linux, it tops out at 8-9GB/s with more push workers. This is all on single io-uring thread. This all while replicating libzmq expectations and remaining cpu and memory efficient.
I experimented with offloading work like handshake, framing, etc. from io-uring worker to tokio and released that in 0.5.18, but that was a regrettable decision, so I switched back to the old design where the worker handles everything. The closer the work such as handshaking, framing, etc. is kept to the buffers, the faster they can be released to to the kernel. I'm glad I tried the design as it showed code reuse is possible, so no more having two different implementations for tokio vs io-uring.
With this upcoming update, RZMQ remains the fastest and most efficient ZeroMQ implementation in the world. I'd argue most correct too. io-uring is an optional feature and opt-in per socket, this is what makes it truly unique among all implementations, you are not locked into this choice at compile time. No conflicts.
There's still work to be done before releasing this bad boy, but many kinks are worked out increasing my confidence.
