Comparison of data serialization formats
|
|
This article needs additional citations for verification. (August 2009) |
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Contents |
Overview [edit]
| Name | Creator/Maintainer | Based on | Standardized? | Specification | Binary? | Human-readable? | Supports references?e | Schema/IDL? | Standard APIs |
|---|---|---|---|---|---|---|---|---|---|
| ASN.1 | ISO, IEC, ITU-T | N/A | Yes | ISO/IEC 8824; X.680 series of ITU-T Recommendations | Yes (BER, DER, PER, or custom via ECN) |
Yes (XER, GSER, or custom via ECN) |
Partialf | Yes (built-in) | N/A |
| Bencode | Bram Cohen (creator) BitTorrent, Inc. (maintainer) |
N/A | Yes | Part of BitTorrent protocol specification | Partially (numbers and delimiters are ASCII) |
Partially | No | No | No |
| BSON | MongoDB | JSON | Yes | BSON Specification | Yes | No | No | No | No |
| Candle Markup | Henry Luo | XML, JSON, JavaFX | Yes | Candle Markup Reference | No | Yes | Yes (XPointer, XPath) |
Yes (Candle Pattern Reference) |
Yes (XQuery, XPath) |
| Comma-separated values (CSV) | RFC author: Yakov Shafranovich |
N/A | Partial (myriad informal variants used) |
RFC 4180 (among others) |
No | Yes | No | No | No |
| D-Bus Message Protocol | freedesktop.org | N/A | Yes | D-Bus Specification | Yes | Yes (Type Signatures) |
No | No | Yes (see D-Bus) |
| JSON | Douglas Crockford | JavaScript syntax | Yes | RFC 4627 | No, but see BSON | Yes | Partial (JSONPath, JPath, JSPON, json:select()) |
Partial (JSON Schema Proposal, Kwalify, Rx, Itemscript Schema) |
Partial: Clarinet (like SAX), JSONQuery (like XQuery), JSONPath (like XPath) |
| MessagePack | Sadayuki Furuhashi | JSON (loosely) | Yes | MessagePack format specification | Yes | No | No | No | No |
| Netstrings | Dan Bernstein | N/A | Yes | netstrings.txt | Yes | Yes | No | No | No |
| OGDL | Rolf Veen | ? | Yes | 1.0 Working draft | Yes (Binary 1.0 Working draft) |
Yes | Yes (Path 1.0 Working draft) |
Yes (Schema WD) |
|
| Property list | NeXT (creator) Apple (maintainer) |
? | Partial | Public DTD for XML format | Yesa | Yesb | No | ? | Cocoa, CoreFoundation, OpenStep, GnuStep |
| Protocol Buffers | N/A | Partial | Developer Guide: Encoding | Yes | Partiald | No | Yes (built-in) | ||
| S-expressions | Internet Draft author: Ron Rivest |
Lisp, Netstrings | Partial (largely de facto) |
"S-Expressions" Internet Draft | Yes ("Canonical representation") |
Yes ("Advanced transport representation") |
No | No | |
| Sereal | Yves Orton, Steffen Müller et al | N/A | Yes | Sereal Specification | Yes | No | Yes | No | No |
| Structured Data eXchange Formats | Max Wildgrube | N/A | Yes | RFC 3072 | Yes | No | No | No | |
| Thrift | Facebook (creator) Apache (maintainer) |
N/A | No | Original whitepaper | Yes | Partialc | No | Yes (built-in) | |
| eXternal Data Representation | Sun Microsystems (creator) IETF (maintainer) |
N/A | Yes | RFC 4506 | Yes | No | Yes | Yes | Yes |
| XML | W3C | SGML | Yes | W3C Recommendations: 1.0 (Fifth Edition) 1.1 (Second Edition) |
Partial (Binary XML) |
Yes | Yes (XPointer, XPath) |
Yes (XML schema) |
Yes (DOM, SAX, XQuery, XPath) |
| XML-RPC | Dave Winer[1] | XML, SOAP[1] | Yes | XML-RPC Specification | No | Yes | No | No | No |
| YAML | Clark Evans, Ingy döt Net, and Oren Ben-Kiki | C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] | Yes | Version 1.2 | No | Yes | Yes | Partial (Kwalify, Rx, built-in language type-defs) |
No |
- a. ^ The current default format is binary.
- b. ^ The "classic" format is plain text, and an XML format is also supported.
- c. ^ Theoretically possible due to abstraction, but no implementation is included.
- d. ^ The primary format is binary, but a text format is available.[3]
- e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers. However, there is no standard for "marking"/"tagging" an arbitrary piece of data in a document with an OID. There is also no standard format for locally unique identifiers within a document. Therefore, a generic ASN.1 tool/library can not automatically encode/decode/resolve references within a document without help from custom-written program code.
Syntax comparison of human-readable formats [edit]
| Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
|---|---|---|---|---|---|---|---|---|
| ASN.1 (XML Encoding Rules) |
<foo /> |
<foo>true</foo> |
<foo>false</foo> |
<foo>685230</foo> |
<foo>6.8523015e+5</foo> |
<foo>A to Z</foo> |
<SeqOfUnrelatedDatatypes>
<isMarried>true</isMarried>
<hobby />
<velocity>-42.1e7</velocity>
<bookname>A to Z</bookname>
<bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
|
An object (the key is a field name):
<person>
<isMarried>true</isMarried>
<hobby />
<height>1.85</height>
<name>Bob Peterson</name>
</person>
A data mapping (the key is a data value):
<competition>
<measurement>
<name>John</name>
<height>3.14</height>
</measurement>
<measurement>
<name>Jane</name>
<height>2.718</height>
</measurement>
</competition>
|
| Candle Markup | (), "" |
true |
false |
685230-685230 |
6.8523015e+5 |
"A to Z"""" |
(true, (), -42.1e7, "A to Z") |
_{%342=true A%20to%20Z=(1, 2, 3)}
or
_{
_{key=42 value=true}
_{key="A to Z" value=(1, 2, 3)}
}
|
| CSVb | nulla(or an empty element in the row)a |
1atruea |
0afalsea |
685230-685230a |
6.8523015e+5a |
A to Z"We said, ""no""." |
true,,-42.1e7,"A to Z" |
42,1 A to Z,1,2,3 |
| Netstringsc | 0:,a4:null,a |
1:1,a4:true,a |
1:0,a5:false,a |
6:685230,a |
9:6.8523e+5,a |
6:A to Z, |
29:4:true,0:,7:-42.1e7,6:A to Z,, |
41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a |
| JSON | null |
true |
false |
685230-685230 |
6.8523015e+5 |
"A to Z" |
[true, null, -42.1e7, "A to Z"] |
{"42": true, "A to Z": [1, 2, 3]} |
| OGDL[verification needed] | nulla |
truea |
falsea |
685230a |
6.8523015e+5a |
"A to Z"'A to Z'NoSpaces |
true null -42.1e7 "A to Z"
|
42 true "A to Z" 1 2 3 42 true "A to Z", (1, 2, 3) |
| Property list (plain text format)[4] |
N/A | <*BY> |
<*BN> |
<*I685230> |
<*R6.8523015e+5> |
"A to Z" |
( <*BY>, <*R-42.1e7>, "A to Z" ) |
{
"42" = <*BY>;
"A to Z" = ( <*I1>, <*I2>, <*I3> );
}
|
| Property list (XML format)[5][6] |
N/A | <true /> |
<false /> |
<integer>685230</integer> |
<real>6.8523015e+5</real> |
<string>A to Z</string> |
<array>
<true />
<real>-42.1e7</real>
<string>A to Z</string>
</array>
|
<dict>
<key>42</key>
<true />
<key>A to Z</key>
<array>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</array>
</dict>
|
| S-expressions | NILnil |
T#tetrue |
NIL#fefalse |
685230 |
6.8523015e+5 |
abc"abc"#616263#3:abc{MzphYmM=}|YWJj| |
(T NIL -42.1e7 "A to Z") |
((42 T) ("A to Z" (1 2 3))) |
| YAML | ~nullNullNULL[7] |
yYyesYesYESonOnONtrueTrueTRUE[8] |
nNnoNoNOoffOffOFFfalseFalseFALSE[8] |
685230+685_230-685230024722560x_0A_74_AE0b1010_0111_0100_1010_1110190:20:30[9] |
6.8523015e+5685.230_15e+03685_230.15190:20:30.15.inf-.inf.Inf.INF.NaN.nan.NAN[10] |
A to Z"A to Z"'A to Z' |
[y, ~, -42.1e7, "A to Z"]
- y - - -42.1e7 - A to Z |
{"John":3.14, "Jane":2.718}
42: y A to Z: [1, 2, 3] |
| XMLd | <null />a |
<boolean val="true"/>a
|
<boolean val="false"/>a
|
<integer>685230</integer>a |
<float>6.8523015e+5</float>a |
A to Z |
a
<array> <element type="boolean">true</element> <element type="null"/> <element type="float">-42.1e7</element> <element type="string">A to Z</element> </array> |
a
<associative-array>
<entry>
<key type="integer">42</key>
<value type="boolean">true</value>
</entry>
<entry>
<key type="string">A to Z</key>
<value>
<array>
<element type="integer" val="1"/>
<element type="integer" val="2"/>
<element type="integer" val="3"/>
</array>
</value>
</entry>
</associative-array>
|
| XML-RPC | <value><boolean>1</boolean></value> |
<value><boolean>0</boolean></value> |
<value><int>685230</int></value> |
<value><double>6.8523015e+5</double></value> |
<value><string>A to Z</string></value> |
<value><array> <data> <value><boolean>1</boolean></value> <value><double>-42.1e7</double></value> <value><string>A to Z</string></value> </data> </array></value> |
<value><struct>
<member>
<name>42</name>
<value><boolean>1</boolean></value>
</member>
<member>
<name>A to Z</name>
<value>
<array>
<data>
<value><int>1</int></value>
<value><int>2</int></value>
<value><int>3</int></value>
</data>
</array>
</value>
</member>
</struct>
|
- a. ^ One possible encoding; the specification document does not specifically give an encoding for this datatype.
- b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- d. ^ XML in and of itself is not a data serialization language, but many data serialization formats have been derived from it; as such, there are many different ways, in addition to those shown, to serialize programming data structures into XML.
- e. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats [edit]
| Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/Object |
|---|---|---|---|---|---|---|---|
| ASN.1 (BER or PER encoding) |
NULL type | BOOLEAN; BER as 1 byte in binary form | INTEGER; variable length big-endian binary representation up to 2^2^1024 bits | REAL; representation as IEEE double or as three integers (mantissa, base, exponent) | Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) | data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) | user definable type |
| BSON[11] | Null type - 0 bytes for value | True: one byte \x01False: \x00 |
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | double: little-endian binary64 | UTF-8 encoded, preceded by int32 encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
| MessagePack | \xc0 |
True: \xc3False: \xc2 |
Single byte "fixnum" (values -32..127)
or typecode (one byte) + big-endian (u)int8/16/32/64 |
Typecode (one byte) + IEEE single/double | As "fixraw" (single-byte prefix + up to 31 raw bytes)
or typecode (one byte) + 2-4 bytes length + raw bytes |
As "fixarray" (single-byte prefix + up to 15 array items)
or typecode (one byte) + 2-4 bytes length + array items |
As "fixmap" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + 2-4 bytes length + key-value pairs |
| Netstrings | 0:, |
True: 1:1,
False: |
|||||
| OGDL Binary | |||||||
| Property list (binary format) |
|||||||
| Protocol Buffers[12] | Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)
Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded |
floats: little-endian binary32 | UTF-8 encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag | N/A | ||
| Sereal | 0x25 |
True: 0x3bFalse: 0x3a |
Single byte POS/NEG (values -16..15)
or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer |
Typecode (one byte) + IEEE single/double/quad | As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)
or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes |
As "ARRAYREF" (single-byte prefix + up to 15 array items)
or typecode (one byte) + "varint" encoded length + array items |
As "HASHREF" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances. |
| Thrift | |||||||
| Structured Data eXchange Formats (SDXF) | big-endian signed 24bit or 32bit integer | big-endian IEEE double | either UTF-8 or ISO 8859-1 encoded | list of elements with identical ID and size, preceded by array header with int16 length | chunks can contain other chunks to arbitrary depth |
See also [edit]
References [edit]
- ^ a b http://www.xml.com/pub/a/ws/2001/04/04/soap.html
- ^ Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
- ^ https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
- ^ http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
- ^ http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
- ^ http://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
- ^ "Null Language-Independent Type for YAML Version 1.1". YAML.org. 2005-01-18. Retrieved 2009-09-12.
- ^ a b "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-01-18. Retrieved 2009-09-12.
- ^ "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-02-11. Retrieved 2009-09-12.
- ^ "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-01-18. Retrieved 2009-09-12.
- ^ http://bsonspec.org
- ^ https://developers.google.com/protocol-buffers/docs/encoding
External links [edit]
- XML-QL Proposal discussing XML benefits
- When to use XML
- XmlSucks at the Portland Pattern Repository
- Daring to Do Less with XML