Skip to end of metadata
Go to start of metadata

Notice

This page describes the MessagePack data format, which is required for developing the MessagePack language bindings.

MessagePack format specification

MessagePack saves type-information to the serialized data. Thus each data is stored in *type-data* or *type-length-data* style.
MessagePack supports following types:

  • Fixed length types
    • Integers
    • nil
    • boolean
    • Floating point
  • Variable length types
    • Raw bytes
  • Container types
    • Arrays
    • Maps

Each type has one or more serialize format:

  • Fixed length types
    • Integers
      • positive fixnum
      • negative fixnum
      • uint 8
      • uint 16
      • uint 32
      • uint 64
      • int 8
      • int 16
      • int 32
      • int 64
    • Nil
      • nil
    • Boolean
      • true
      • false
    • Floating point
      • float
      • double
  • Variable length types
    • Raw bytes
      • fix raw
      • raw 16
      • raw 32
  • Container types
    • Arrays
      • fix array
      • array 16
      • array 32
    • Maps
      • fix map
      • map 16
      • map 32

To serialize strings, use UTF-8 encoding and Raw type.

See this thread to understand the reason why msgpack doesn't have string type: https://github.com/msgpack/msgpack/issues/121

Integers

positive fixnum

save an integer within the range [0, 127] in 1 bytes.

negative fixnum

save an integer within the range [-32, -1] in 1 bytes.

uint 8

save an unsigned 8-bit integer in 2 bytes.

uint 16

save an unsigned 16-bit big-endian integer in 3 bytes.

uint 32

save an unsigned 32-bit big-endian integer in 5 bytes.

uint 64

save an unsigned 64-bit big-endian integer in 9 bytes.

int 8

save a signed 8-bit integer in 2 bytes.

int 16

save a signed 16-bit big-endian integer in 3 bytes.

int 32

save a signed 32-bit big-endian integer in 5 bytes.

int 64

save a signed 64-bit big-endian integer in 9 bytes.

Nil

nil

save a nil.

Boolean

true

save a true.

false

save a false.

Floating point

float

save a big-endian IEEE 754 single precision floating point number in 5 bytes.

double

save a big-endian IEEE 754 double precision floating point number in 9 bytes.

Raw bytes

fix raw

save raw bytes up to 31 bytes.

raw 16

save raw bytes up to (2^16)-1 bytes. Length is stored in unsigned 16-bit big-endian integer.

raw 32

save raw bytes up to (2^32)-1 bytes. Length is stored in unsigned 32-bit big-endian integer.

Arrays

fix array

save an array up to 15 elements.

array 16

save an array up to (2^16)-1 elements. Number of elements is stored in unsigned 16-bit big-endian integer.

array 32

save an array up to (2^32)-1 elements. Number of elements is stored in unsigned 32-bit big-endian integer.

Maps

fix map

save a map up to 15 elements.

map 16

save a map up to (2^16)-1 elements. Number of elements is stored in unsigned 16-bit big-endian integer.

map 32

save a map up to (2^32)-1 elements. Number of elements is stored in unsigned 32-bit big-endian integer.

Type Chart

Type Binary Hex
Positive FixNum 0xxxxxxx 0x00 - 0x7f
FixMap 1000xxxx 0x80 - 0x8f
FixArray 1001xxxx 0x90 - 0x9f
FixRaw 101xxxxx 0xa0 - 0xbf
nil 11000000 0xc0
reserved 11000001 0xc1
false 11000010 0xc2
true 11000011 0xc3
reserved 11000100 0xc4
reserved 11000101 0xc5
reserved 11000110 0xc6
reserved 11000111 0xc7
reserved 11001000 0xc8
reserved 11001001 0xc9
float 11001010 0xca
double 11001011 0xcb
uint 8 11001100 0xcc
uint 16 11001101 0xcd
uint 32 11001110 0xce
uint 64 11001111 0xcf
int 8 11010000 0xd0
int 16 11010001 0xd1
int 32 11010010 0xd2
int 64 11010011 0xd3
reserved 11010100 0xd4
reserved 11010101 0xd5
reserved 11010110 0xd6
reserved 11010111 0xd7
reserved 11011000 0xd8
reserved 11011001 0xd9
raw 16 11011010 0xda
raw 32 11011011 0xdb
array 16 11011100 0xdc
array 32 11011101 0xdd
map 16 11011110 0xde
map 32 11011111 0xdf
Negative FixNum 111xxxxx 0xe0 - 0xff
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jun 11, 2011

    With the proliferation of 128-bit UUIDs (IPv6, ext4/btrfs, MS COM objects, etc, etc), I could argue that it makes sense to include a dedicated UINT128 type. While it could be handled with prefix 0xB0 (16-byte FixRaw), the format loses the type specifications that makes it otherwise so nice, forcing the application to understand the low-lying transport mechanism (a bad thing). Because 128-bit numbers are used so often as identifiers (I'm having trouble thinking of numerical applications needing 128-bit integers), I would tentatively propose using 0xC1 as "128-bit UUID", packing to 17 bytes, while retaining type information. 

    Is there a place for this, or a reason against it?

    1. Oct 04, 2011

      On the other hand there is no mention of a "string" type nor a "date/time" type, which perhaps be more useful than a 128-bit integer.

      If a "string" type would be added, the encoding form must also be specified in order to keep interoperability. In that case I would prefer if UTF-8 would be chosen.

    2. Jan 07, 2012

      Msgpack, protocol buffers, even json are basic serialization formats. You build on top of them.

      I've modified the .NET implementation of message buffers to output the string representation as my code speaks with ruby clients that love strings everwhere. But you could just write your own defition of what is a GUID.

      As I see it is a two layer process. The first one using messagepack primitives, and the other adapting it to your language/API.

      Another examples are dates. I just use strings and write ISO dates there.

      Javier.

  2. Jan 20, 2012

    I am missing a way to easily detect if the string is msgpack serrialized.  Other formats, ie. PHPs serialize, you can easily detect if its PHP serialized by looking at the second char (: or ;). 

    I can receive a string in my code that might be serialized in PHP, igbinay, or msgpack.  Furthermore, msgpack_unserialize() extension for PHP does not return false or error on arbitrary strings.  

    Any suggestion without performing the unserialize?  apart from checking other formats first?

  3. Apr 20, 2012

    I'm missing an extension point. That is, a way to encode the usage of more complex abstract types built from the basic ones. What I'd like is something like

    (type, value)

    where both type and value can be values of any msgpack type. The interpretation above parsing those values individually would be outside of the scope of msgpack, and other specifications may define an interpretation for value given a certain type. Parsers would by default decode / encode this to / from an object of a specific type with two properties, "type" and "value", but could allow registration of encoding/decoding functions in whatever fashion makes sense in the implementation language / framework.

    Implementation would be very much like FixArray, but the length is always exactly two. I propose using 11000001 followed by exactly two objects.

    Note: It generally makes very little sense for the type to be anything but a string, array or map. I can however not see any good gains by artificially restricting this.

    Example usages (outside of the scope of the msgpack specification):

    • A msgpack date specification could define ("date", [2011,12,01]) to be interpreted as the date 1st of december, 2011.
    • A msgpack crypto specification could define ("signed-pgp", [signature, value]) to be interpreted as a value and the digital signature on the msgpack representation of that value.
    • A msgpack object orientation specification could define (["object", package_and_classname], { member:value,... }) to be interpreted as the instantiation of the specified class, with the specified members (maybe allowing classes to declare if they can be deserialized in this fashion).
  4. Jul 03, 2012

    I'm missing an optional CRC tag for basic security and data corruption. If pack provides a crc checksum, unpack should check the result against the given crc and report an error otherwise. The crc tag must be the last tag in the buffer. The crc checksum does not include itself and its type tag.

    There are several reserved bytes free for this. For example 0xc6 is similar to 0xce, only one bit apart.

    0xc6 crc with uint32 (optional)

    I implemented this for c and perl here:

    https://github.com/msgpack/msgpack/pull/114

    https://github.com/msgpack/msgpack-perl/pull/7

    Our main advantage to use the optional CRC  is to be sure that the writer has already finished writing by checking the fifth-last byte to be 0xc6. Omit parsing and unpacking then. Some writers buffer its output, and the reader optimisticly tries to parse whenever the ctime changes.

  5. Jun 07, 2013

    MsgPack is great, and I completely agree about strings.

    But could we get this wiki into GitHub? This site was down recently, and that's a major problem. I cannot find the spec anywhere else on the web.

    I've copied the bare minimum here: https://github.com/cdunn2001/msgpack-spec

    That's only a README. With GitHub, the wiki attached to a repo is itself available via git. That would be prefect.

    1. Jul 01, 2013

      I agree.

      I think Confluence is over engineering and some committers doesn't check this wiki.

      We should move to github page (or related site) with ML / issue / pull request combination.

  6. Jun 07, 2013

    See wiki:

    https://github.com/cdunn2001/msgpack-spec/wiki/Spec

    and README:

    https://github.com/cdunn2001/msgpack-spec/blob/master/README.md

    The wiki looks a little better because of the surprising syntax-highlighting. Feel free to clone.