Skip to end of metadata
Go to start of metadata

MessagePack-RPC Design

This page describes the design of MessagePack-RPC, Remote Procedure Call (RPC) system built on top of MessagePack data format. MessagePack-RPC enables clients to call pre-defined server functions remotely.

Introduction of RPC System

A large computer application system generally comes with many components and is normally written in the same programming language.  However, the situation has become common in such a system where some of the components would be easier to write in other languages.

For example, in a modern internet service, the frontend is often written in a scripting language (Ruby, Python, etc.) and the backend components are written in languages that have higher runtime performance (C, C++, Java, etc).

In such a case, Remote Procedure Call (RPC) is useful. RPC is an implementation technique with which a program (RPC client) can delegate function calls to another program (RPC server) as if they were dispatched locally. A RPC client initiates a RPC with a request to a RPC server that specifies the function to be dispatched and the arguments. The server then handles the request, dispatches the corresponding function and sends back the response that encapsulates the invocation result to the client.

This mechanism enables one to use a suitable language for each component.

Common Requirements for RPC System

These are the common requirements for RPC systems.

  • Fast: resources necessary to encode / decode the messages should be minimized.
  • Parallelism: requests and responses should be handled optimally in a parallel manner.
  • Compact: protocol overhead should be minimized, to reduce network bandwidth.
  • Interoperability: the RPC system should be designed so that it can be naturally integrated into many different hardwares, OSs and programming languages.

MessagePack-RPC Approach

Fast

The MessagePack-RPC implementation is significantly fast, due to its careful design that takes advantage of modern hardware features (multi-core, multi-cpu, etc). The stream deserialization + zero-copy feature effectively overlaps the network transfer and the deserialization.

Parallelism

MessagePack-RPC protocol takes account of request pipelining. The server doesn't need to reply in the same order as the requests for the sake of maximum parallelism.

Some client implementations support asynchronous calls so that  the user can handle the multiple RPC calls simultaneously. This is useful when calling many functions at the same time.

Compact

Messages exchanged between the client and the server are packed in MessagePack data format, which features less header overhead compared to other general-purpose data exchange format like JSON or XML.  The network bandwidth consumption can be reduced dramatically with MessagePack.

Interoperability

The language bindings of MessagePack-RPC are well prepared so you can integrate it to your program quickly by using the default packaging system for each language (e.g. gem for Ruby).

MessagePack-RPC Feature List

The following features are supported in MessagePack-RPC. Some language implementations still lack one or more features, yet the implementations are able to support them in an appropriate way.

Asynchronous RPC

Synchronous RPC is easy to understand because it blocks until the server returns the result just like the ordinary function call, but there are some cases where multiple calls need to be initiated at the same time.  Asynchronous RPC is useful in such cases.  A synchronous RPC returns immediately after the request has been sent, with a `Future` object that will be signaled when the client gets the response.

Our specification requires client implementation to be able to communicate with multiple servers in parallel. Consider the case below where you need to communicate with three servers.

The following diagrams depict the difference between synchronous and asynchronous RPC.  The former sends and gets the reply for each server one by one, while the latter, MessagePack-RPC, first sends all of the requests to the server at once and then wait the completion of them. This feature enables the client to send the request in parallel.

Parallel Pipelining

Each MessagePack-RPC request is given an unique `Message-ID` for identifying it  from one another. The server sends the response with the same ID as the corresponding request has. It eventually enables pipelining (out-of-order transfer) between the clients and the server.

Suppose a client is trying to send a couple of requests, Request1 and Request2. Without pipelining, the server has to return in the order the requests have been submitted. With pipelining, the server is allowed to return in the reversed order. This is made possible because the requests have different IDs.

When processing of Request1 is taking more time than Request2, the server can even process and send the result of Request2 without waiting for the completion of Request1.

IDL Support

Although MessagePack-RPC supports dynamic typing, it also supports IDL (Interface Definition Language). Dynamic typing is very handy for scripting languages, but in some languages such as Java, builtin types cannot be well mapped to the MessagePack types. For example, Java distinguishes strings and raw byte arrays while MessagePack doesn't.

The IDL support eliminates this issue. Although the programmers need to pre-define the interface and the types, it is able to map the data into language native types.

Dynamic Typing

Because every MessagePack message contains the type information side-by-side, clients and servers don't need any schemas or interface definitions basically. This is handy for utilizing it both in dynamically typed and statically typed languages.

Connection Pooling

If you use TCP as a transport layer, opening the connection between the clients and the server can cost high. MessagePack-RPC automatically reuse the already established connection in the library. Users don't need to manage the connections by their own.

Delayed Return

Event-driven I/O

To keep up with thousands of connections, the server should be able to concurrently deal with them in an efficient manner (e.g. The C10K problem). The MessagePack-RPC implementations uses event-driven I/O architecture to overcome that problem.

The List of Related Projects

These are the other cross language RPC systems.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jun 09, 2011

    In the Parallel Pipelining picture, shouldn't the msgid's match the color?  I think they are mislabeled.  If the "light task" returns first, I would expect it to keep the same message id.

  2. Jun 22, 2011

    Thank you! The msgid was wrong.

  3. Mar 29, 2012

    Does the parallel pipelining work in ruby implementation of msgpack rpc?

  4. Dec 31, 2012

    Although it is asked as an add to the format specification, I believe it is more relevant here to add in a layer wrapping around the underlying RPC to do a checksum of the underlying packed message (CRC32 or MD5 would be easy formats to start with).  There is a real danger otherwise of corruption if you're using this as your library to pass millions of messages between your servers.  Is this anywhere on the roadmap?