src.dualinventive.com/dinet/libdi-php/libdi/3rdparty/mpack/docs/md_docs_protocol.html

152 lines
21 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.11"/>
<title>MPack: Protocol Clarifications</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { init_search(); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="doxygen-mpack-css.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">MPack
&#160;<span id="projectnumber">0.8.2</span>
</div>
<div id="projectbrief">A C encoding/decoding library for the MessagePack serialization format.</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.11 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li class="current"><a href="pages.html"><span>Pages</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Protocol Clarifications </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>The MessagePack specification contains overlap between different types, allowing the same data to be encoded in many different representations. For example there are overlong sequences, signed/unsigned overlap for non-negative integers, different floating-point widths, raw/str/bin types, and more.</p>
<p>MessagePack also does not specify how types should be interpreted, such as whether maps are ordered, whether strings can be treated as binary data, whether integers can be treated as real numbers, and so on.</p>
<p>MPack currently implements the <a href="https://github.com/msgpack/msgpack/blob/0b8f5ac67cdd130f4d4d4fe6afb839b989fdb86a/spec.md?">v5/2.0 MessagePack specification</a>. This document describes MPack's implementation.</p>
<h2>Overlong Sequences</h2>
<p>MessagePack provides several different widths for many types. For example fixint, int8, int16, int32, int64 for integers; fixstr, str8, str16, str32 for strings; and so on. The number -20 could be encoded as <code>EC</code>, <code>D0 EC</code>, <code>D1 FF EC</code>, and more.</p>
<p>UTF-8 has similar overlap between codepoint widths. In UTF-8, inefficient representations of codepoints are called "overlong sequences", and decoders are required to treat them as errors. MessagePack on the other hand does not have any restrictions on inefficient representations.</p>
<ul>
<li>When encoding any value (other than floating point numbers), MPack always chooses the shortest representation. This is the case for integers and all compound types.</li>
<li>When encoding a string, MPack always uses the str8 type if possible (which is not mapped to a "raw" from the old version of the MessagePack spec, since there was no "raw8" type.)</li>
<li>When encoding an ext type, MPack always chooses a fixext type if available. This means if the ext size is 1, 2, 4, 8 or 16, the ext start tag will be encoded in two bytes (fixext and the ext type.) Otherwise MPack chooses the shortest representation.</li>
<li>When decoding any value, MPack allows any length representation. Inefficiently encoded sequences are not an error. So <code>EC</code>, <code>D0 EC</code> and <code>D1 FF EC</code> would all be decoded to the same value, <code>mpack_type_int</code> of value -20.</li>
</ul>
<p>As of this writing, all C and C++ libraries seem to write data in the shortest representation, and none forbid overlong sequences (including the reference implementation.)</p>
<h2>Integer Signedness</h2>
<p>MessagePack allows non-negative integers to be encoded both as signed and unsigned, and the specification does not specify how a library should serialize them. For example the number 100 in the shortest representation could be encoded as either <code>CC 64</code> (unsigned) or <code>D0 64</code> (signed).</p>
<ul>
<li>When encoding, MPack writes all non-negative integers in the shortest unsigned integer type, regardless of the signedness of the input type. (The signedness of the type is discarded.)</li>
<li>When decoding as a dynamic tag or node, MPack returns the signedness of the serialized type. (This means you always need to handle both <code>mpack_type_int</code> and <code>mpack_type_uint</code>, regardless of whether you want a signed or unsigned integer, and regardless of whether you want a negative or non-negative integer.)</li>
<li>When retrieving a signed integer with the Expect or Node APIs, MPack will automatically convert between signedness without loss of data. (For example if you call <code><a class="el" href="group__expect.html#gab217c0e2062b87129f948c6359c3825a" title="Reads an unsigned int. ">mpack_expect_uint()</a></code> or <code><a class="el" href="group__node.html#ga28f422427efed19ce0a292b135a13067" title="Returns the unsigned int value of the node. ">mpack_node_uint()</a></code>, MPack will allow both signed and unsigned data, and will flag an error if the type is signed with a negative value. Likewise if you call <code><a class="el" href="group__expect.html#ga728dc9cb317871bbf3360361a713d471" title="Reads an 8-bit signed integer. ">mpack_expect_i8()</a></code> or <code><a class="el" href="group__node.html#ga5bc29e58ae03edd0d764eac30b6ab38a" title="Returns the 8-bit signed value of the node. ">mpack_node_i8()</a></code>, MPack will allow both signed and unsigned data, and will flag an error for values below <code>INT8_MIN</code> or above <code>INT8_MAX</code>.) The expect and node integer functions allow an integer of any size or signedness, and are only checking that it falls within the range of the requested type.</li>
</ul>
<p>A library could technically preserve the signedness of variables by writing any signed variable as int8/int16/int32/int64 or a negative fixint even if the value is non-negative. This does not seem to be the intent of the specification. For example there are no positive signed fixint values, so encoding the signed int with value 1 would take two bytes (<code>D0 01</code>) to preserve signedness. This is why MPack discards signedness.</p>
<p>As of this writing, all C and C++ libraries supporting the modern MessagePack specification appear to discard signedness and write all non-negative ints as unsigned (including the reference implementation.)</p>
<h2>Floating Point Numbers</h2>
<p>In addition to the integer types, the MessagePack specification includes "float 32" and "float 64" types for real numbers.</p>
<ul>
<li>When encoding, MPack writes real numbers as the original width of the data (so <code><a class="el" href="group__writer.html#ga285ce9cc180f3b623fbf07c67e2f9ca3" title="Writes a float. ">mpack_write_float()</a></code> writes a "float 32", and <code><a class="el" href="group__writer.html#ga2bc0c6c6416b9d808be718b41681fd25" title="Writes a double. ">mpack_write_double()</a></code> writes a "float 64".)</li>
<li>When decoding as a dynamic tag or node, MPack returns the width of the serialized type. (It is recommended to handle both <code>mpack_type_float</code> and <code>mpack_type_double</code> (or neither) since other libraries may write real numbers in any width.)</li>
<li>When expecting a real number with the Expect API, or when getting a float or double from a node in the Node API, MPack includes two different sets of functions:<ul>
<li>The lax versions are the default. These will allow any integer or real type and convert it to the expected type, which may involve loss of precision. These include <code><a class="el" href="group__expect.html#ga49c6bea0c4d7e14a636d703ffe304264" title="Reads a number, returning the value as a float. ">mpack_expect_float()</a></code>, <code><a class="el" href="group__expect.html#ga51a47ece249cd4d6d795bf3211fde745" title="Reads a number, returning the value as a double. ">mpack_expect_double()</a></code>, <code><a class="el" href="group__node.html#ga66fed30759650f65e7edaef79b2c73f4" title="Returns the float value of the node. ">mpack_node_float()</a></code> and <code><a class="el" href="group__node.html#ga22bfc52e19de4f1b5f9db6a9373cce06" title="Returns the double value of the node. ">mpack_node_double()</a></code>.</li>
<li>The strict versions, suffixed by <code>_strict</code>, will allow only real numbers, and only of a width of at least that of the expected type. So <code><a class="el" href="group__node.html#ga2b7dce2bf350091c232cf556ee50ace9" title="Returns the float value of the node. ">mpack_node_float_strict()</a></code> or <code><a class="el" href="group__expect.html#ga4d56f4be6f5376ebfa6fcd0ebac6cce1" title="Reads a float. ">mpack_expect_float_strict()</a></code> allow only "float 32", while <code><a class="el" href="group__node.html#ga74a690ab822846b19baa59f32cba766a" title="Returns the double value of the node. ">mpack_node_double_strict()</a></code> or <code><a class="el" href="group__expect.html#gad676ff64ce7933cade0c85da8a83799f" title="Reads a double. ">mpack_expect_double_strict()</a></code> allow both "float 32" and "float 64".<ul>
<li>If you want to allow only a "float 64", you would have to read a tag or check the node type and make sure it contains <code>mpack_type_double</code>.</li>
<li>If you want a <code>float</code> version that allows either "float 32" or "float 64" but not integers, you could use <code>(float)<a class="el" href="group__node.html#ga74a690ab822846b19baa59f32cba766a" title="Returns the double value of the node. ">mpack_node_double_strict()</a></code> or <code>(float)<a class="el" href="group__expect.html#gad676ff64ce7933cade0c85da8a83799f" title="Reads a double. ">mpack_expect_double_strict()</a></code>. But if you are using <code>float</code> you probably don't care much about precision anyway so you should just use <code><a class="el" href="group__node.html#ga66fed30759650f65e7edaef79b2c73f4" title="Returns the float value of the node. ">mpack_node_float()</a></code> or <code><a class="el" href="group__expect.html#ga49c6bea0c4d7e14a636d703ffe304264" title="Reads a number, returning the value as a float. ">mpack_expect_float()</a></code>.</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>MessagePack libraries in dynamic languages may support an option to generate floats instead of doubles for space efficiency. If you're converting data from JSON, you could use <a href="https://github.com/ludocode/msgpack-tools">json2msgpack -f</a> to convert to floats instead of doubles.</p>
<h2>Map Ordering</h2>
<p>MessagePack does not specify the ordering of map key/value pairs. Key/value pairs have a well-defined order when serialized, but the specification does not specify whether implementations should observe it when encoding or preserve it when decoding, and does not specify whether it should be adapted to an ordered associative array when de-serialized.</p>
<p>MPack always preserves map ordering. Key/value pairs are written in the given order in the write API, read in the serialized order in the read and expect APIs, and provided in their original serialized order in the Node API. In particular this means <code><a class="el" href="group__node.html#ga74e16d4e1723a959ec90365078b9668c" title="Returns the key node in the given map at the given index. ">mpack_node_map_key_at()</a></code> and <code><a class="el" href="group__node.html#ga88b570192e47f589bc49cca2e88583b4" title="Returns the value node in the given map at the given index. ">mpack_node_map_value_at()</a></code> are always ordered as stored in the original serialized data. An application using only MPack can always assume a fixed map order.</p>
<p>However, MPack strongly recommends writing code that allows for map re-ordering. This is for two reasons:</p>
<ul>
<li>MessagePack is often used to interface with languages that do not preserve map ordering. For example the msgpack-python library in Python unpacks a map to a <code>dict</code>, not an <code>OrderedDict</code>. Many languages use hashtables to store keys, so MessagePack encoded by these languages will have map key/value pairs in a random order. The order may be different between compiler or interpreter versions even for identical map content.</li>
<li>MessagePack is designed to be at least partly compatible with JSON. It is sometimes converted from JSON, and is sometimes recommended as an efficient replacement for JSON. Unlike MessagePack, JSON explicitly allows map re-ordering. Two JSON documents that have re-ordered key/value pairs but are otherwise the same are considered equivalent.</li>
</ul>
<p>MPack contains functions to make it easy to parse messages with re-ordered map pairs. For the Node API, lookup functions such as <code><a class="el" href="group__node.html#ga6ce515ec366036b1602bafe65a56ef7e" title="Returns the value node in the given map for the given null-terminated string key. ...">mpack_node_map_cstr()</a></code> and <code><a class="el" href="group__node.html#gad85dcc667be1797e106fc106346ee828" title="Returns the value node in the given map for the given integer key. ">mpack_node_map_int()</a></code> will find the value for a given key regardless of ordering. For the expect API, the functions <code><a class="el" href="group__expect.html#gada27a479e6ad56faaa14528d1a3dfb26" title="Expects a string map key matching one of the strings in the given key list, marking it as found in th...">mpack_expect_key_cstr()</a></code> and <code><a class="el" href="group__expect.html#ga78f80d9fabe961d661eabe95732c2d59" title="Expects an unsigned integer map key between 0 and count-1, marking it as found in the given bool arra...">mpack_expect_key_uint()</a></code> can be used to switch on a key in a read loop, which allows parsing map pairs in any order.</p>
<h2>Duplicate Map Keys</h2>
<p>MessagePack has no restrictions against duplicate keys in a map, so MPack allows duplicate keys in maps. Iterating over a map in the Node or Reader will provide key/value pairs in serialized order and will not flag any errors for duplicates. However, helper functions that compare keys (such as the "lookup" or "match" functions) do check for duplicates.</p>
<p>In the Node API, the MPack lookup functions that search for a given key to find its value always check for duplicates. They are meant to provide a unique value for a given key. For example <code><a class="el" href="group__node.html#ga6ce515ec366036b1602bafe65a56ef7e" title="Returns the value node in the given map for the given null-terminated string key. ...">mpack_node_map_cstr()</a></code> and <code><a class="el" href="group__node.html#gad85dcc667be1797e106fc106346ee828" title="Returns the value node in the given map for the given integer key. ">mpack_node_map_int()</a></code> will always check the whole map and will flag an error if a duplicate key is found. If you want to find multiple values for a given key, you will need to iterate over them manually with <code><a class="el" href="group__node.html#ga74e16d4e1723a959ec90365078b9668c" title="Returns the key node in the given map at the given index. ">mpack_node_map_key_at()</a></code> and <code><a class="el" href="group__node.html#ga88b570192e47f589bc49cca2e88583b4" title="Returns the value node in the given map at the given index. ">mpack_node_map_value_at()</a></code>.</p>
<p>In the Expect API, the key match functions (such as <code><a class="el" href="group__expect.html#gada27a479e6ad56faaa14528d1a3dfb26" title="Expects a string map key matching one of the strings in the given key list, marking it as found in th...">mpack_expect_key_cstr()</a></code> and <code><a class="el" href="group__expect.html#ga78f80d9fabe961d661eabe95732c2d59" title="Expects an unsigned integer map key between 0 and count-1, marking it as found in the given bool arra...">mpack_expect_key_uint()</a></code>) check for duplicate keys, and will flag an error when a duplicate is found. If you want to use the match functions with duplicates, you can toggle off the <code>bool</code> flag corresponding to a found key to allow it to be matched again. This allows implicit checking of duplicate keys, with an opt-in to safely handle duplicates in order.</p>
<p>Despite the allowance for duplicate keys, MPack recommends against providing multiple values for the same key in order to more safely interface with other languages and formats (as with the Map Ordering recommendations above.) A safer and more explicit way to accomplish this is to simply use an array containing the desired values as the single value for a map key.</p>
<h2>v4 Compatibility</h2>
<p>The MessagePack <a href="https://github.com/msgpack/msgpack/blob/acbcdf6b2a5a62666987c041124a10c69124be0d/spec-old.md?">v4/1.0 spec</a> did not distinguish between strings and binary data. It only provided the "raw" type in widths of fixraw, raw16 and raw32, which was used for both. The <a href="https://github.com/msgpack/msgpack/blob/0b8f5ac67cdd130f4d4d4fe6afb839b989fdb86a/spec.md?">v5/2.0 spec</a> on the other hand renames the raw type to str, adds the bin type to represent binary data, and adds an 8-bit width for strings. This means that even when binary data is not used, the new specification is not backwards compatible with the old one that expects raw to contain strings, because a modern encoder will use the str8 type. The new specification also adds an ext type to distinguish between arbitrary binary blobs and MessagePack extensions.</p>
<ul>
<li>MPack always encodes with the str8 type for strings when possible. This means that MessagePack encoded with MPack is not backwards compatible with decoders that only understand the raw types from the old specification. This matches the behaviour of other C/C++ libraries that support the modern spec, including the reference implementation.</li>
<li>Since MPack allows overlong sequences, it does not require that the str8 type be used, so data encoded with an old-style encoder will be parsed correctly by MPack (with raw types parsed as strings.)</li>
</ul>
<p>However, other libraries typically also include functions to write an old-style raw in order to create backwards-compatible data, such as <code>msgpack_pack_v4raw()</code> in the reference implementation. MPack does not. There hasn't been any demand to create backwards-compatible data so far. If there were, I would be more likely to implement an option on a writer to always generate an old-style raw for both str and bin (and to flag an error if ext is used.) This is not implemented yet, and will hopefully never be implemented if libraries for the old specification can be phased out. If you need this feature, please let me know. </p>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.11
</small></address>
</body>
</html>