Lately I've taken an interest in facilitating Remote Procedure Calls (RPC). Spawned from service management (mgmt) application I've been building in Rust. Along with that service management system, I built an in memory db for it in Rust as well. Very interesting stuff for me espcially Rust. More on that some other time.
What is an RPC?
RPCs are used to communicate across applications. Like if I wanted to get a user's email then I would submit a GetUserEmailRPC Request and get back a response at some point (or timeout if too long). If you think about HTTP, it is what browsers use to communicate with a server in order to get the data that makes up what you see here.
My Use case
In my case my service mgmt application will use RPCs to communicate with any application that connects to it. Most of the RPCs should be a PingRPC just to make sure that the other application is still communicating other wise it is considered down. To expand the use, I always wanted a way to connect to any of my applications and draw some data out of them (pub/sub). Namely, metrics and maybe logging in a linux `tail` fashion. I would create a persistent connection to the application and send an RPC to subscribe to the "metrics" channel and get fine grained information on what is going on outside of what is being sent to the TSDB. Really amazing.
How do you build an RPC system?
So what does an RPC system need? Two endpoints (client, server) that communicate through a socket and a protocol. Client connects to server endpoint using a socket and then uses the protocol, as the language, to understand what the other side is saying. Simple enough!
Ok what should I use for socket creation? There is TCP and UDP, but too low level. There is WebSockets, which is really good. GRPC pretty much solves everything out of the box, but uses HTTP. ZeroMQ (ZMQ) is low level, but not as low level as TCP/UDP. Hmmm, well I narrowed it down to WebSockets and ZMQ chose ZMQ.
It is like a toolbox of sockets with really really low latency. I'm talking about 0.06ms (60 microsec) kind of latency in the JVM 🙀. This is with serialization tacked on 👊. You just cannot reach that kind of speed with HTTP + Serialization, so that eliminated GRPC and HTTP right off the bat.
Alright, how about that serialization..
I've looked at various serialization libraries and articles for ProtoBuf, FlatBuffers, Cap'n Proto, MessagePack, JSON, Colfer and etc. I went with MessagePack, it is not the fastest 🙀, but ease of use and language support mattered a lot to me.
Ease of Use means that I should be able to create a request class or struct with my bare hands and not have to worry about serializer particulars. If I want to create a code auto generator, then it should easily be able to generate the Plain Old Object (POO).
Why not JSON? JSON is great but is bloated and slow.
Why not ProtoBuf? If you ask me, it does a lot and I don't want all of it. It is easy enough to learn, but I don't want all the code that comes in with it. All those objects it creates comes at a cost on the JVM side. Cap'n Proto cites as an advantage to protobuf that it generate less objects, so meh.
Why not FlatBuffers? FlatBuffers is not made for RPC. It was made with Game Dev in mind and is slower to use on JVM. Just like protobuf, you'll have to learn the definition language, which I have no issue with.
Why not Cap'n Proto? I actually did try to use it for Rust at first as a test project. Then I tried to use it on JVM and found that it is lacking support. The guides and toolchain lack the love needed. It is built with C++ serialization/deserialization in mind, so it is not great for me in terms of language support. While I could continue and make some contributions to it, I threw out the test projects and moved on to Colfer.
Why not Colfer? Language support, primarily! It appears super fast, but does not support Rust.
Why MessagePack? It supports Java, JS and Rust. If you use Jackson in Java, and rmp in rust then ease of use is fulfilled. JS is easy, one liners. Remember that 0.06ms req/resp latency I talked about above? Yes, I used message pack as the serialization. Simple flat get request and response objects with strings and byte arrays which account for the majority of RPC calls in my system are serialized in microseconds. Simple and amazing.
Yeah, so now I have my serializer. Now we need a protocol and figure out how to use ZMQ 🤔.
The RPC Protocol
If you look at HTTP request and response bodies, you'll get a sense of what you want to include in your own RPC protocol. I'll summarize without giving too much info. You can wrap an RPC object with a RPCRequest and RPCResponse. Both would contain a request or response header. This header would include some identifying information about the RPC object (just like MIME type). The header is small enough so that it can be encoded and decoded using a serializer or make a custom serializer/deserializer. Yeah, don't serialize the header and the rpc rogether same as HTTP.
When you serialize data, it is all in bytes right? When the other side gets these bytes what should it do with it? Right, pop off the header bytes and deserialize it, just like HTTP. From the header, you can use the RPC id to find the deserializer and deserialize the rpc data bytes and voiia, you now have the RPC data. Find the handler for that RPC object and execute!
How does ZMQ play into this?
It serves as the transport layer to move data between machines! RPC Client and RPC Server. I want to send requests and receive responses using the client and receive requests and send responses using the server.
Probably the toughest part of this. I spent a week or more off and on trying to understand how ZeroMQ worked. I didn't really get it until I started playing around with it by trying different socket types and doing connect or bind. The idea that you can connect or bind a socket was weird to me. A pub that binds and connects? A sub that bind and connects? Very foreign to me. Fast forward..., eventually got to the Client (Acceptor(Dealer), Sender (Dealers) <-> send_recv_proxy<-> Forward (Dealer)) <-> Server (Forward (Router) <-> send_rev_proxy <-> Acceptor(Dealer), Sender(Dealers)) architecture that I am using today.
The basic idea is that the Router Socket attaches the identity to a message when it is coming in so that the message can be responded to. The Dealer only handles the distribution of the messages to all connected sockets.
Why does the architecture look complicated? It really isn't. If you look at how a HTTP server works, it has acceptor threads to recv messages and a queue to send messages. HTTP clients tend to be synchronous so they wait for a response. With the way I gone about my RPC Client, it is fully asynchronous. Fire and forget.
It is easy to build a sync client because the message header includes data for this. 😀
You might be wondering what is the send_recv_proxy? It is a thread that does low level polling on the three sockets (acceptor, senders and the forward socket). Yeah, using zmq_poll. It is modeled after the zmq_proxy except that data coming in from the forward socket goes to the acceptor while data coming from sender socket goes into forward socket. Didn't know how far I would get into ZMQ internals to make this part work. I did not arrive at this solution immediately. Using Rust here taught me some things that I did not catch in Java such as "Don't share ZMQ sockets across threads!"
I'm amazed at the speed of this library. It is amazingly simple too to get everything going. The workflow is, include library, create client/server, create the RPC POO then create handlers on client/server for the objects. I will build a sync client just so I can cover cases where sync is needed. I focused on the Req/Resp lifecycle, so will look at the adding PUB/SUB and PULL/PUSH support for the client and server.
Will I open source? Probably not since I would have to bring in some other libraries I created for myself for metrics and logging. Creating interfaces with default implementations and open sourcing those then this library would be on the table.
Basically now, I have a really fast way to talk between applications that are written in different languages using their native data types.