Talk:Comparison of data-serialization formats

This article was nominated for deletion on 6 August 2009. The result of the discussion was Move to Comparison of data serialization formats.

Reason for this article

This content was a long list in the main XML article. I removed the list to put it here, because the XML article is already very long. Hervegirod (talk) 00:45, 6 August 2009 (UTC)[reply]

I don't think this article should be deleted if it is still up for deletion. There are lots of these articles, they are useful for finding out information and comparing different things quickly.

SeanJA (talk) 05:31, 12 September 2009 (UTC)[reply]

Suggestions

A common term for "data serialization format" is encoding. You may want to include in this comparison:

XDR (from Sun): External_Data_Representation
CORBA CDR: Common_Data_Representation
The Ice encoding: see Internet_Communications_Engine and http://www.zeroc.com/doc/Ice-3.3.1/manual/Protocol.39.2.html —Preceding unsigned comment added by Bhjn (talk • contribs) 00:31, 31 October 2009 (UTC)[reply]
Rison [javascript & python](https://github.com/Nanonid/rison), [java](https://github.com/bazaarvoice/rison), [php](https://github.com/deceze/Kunststube-Rison) — Preceding unsigned comment added by 91.206.37.12 (talk) 14:29, 7 October 2015 (UTC)[reply]

I second the inclusion of XDR Jann.poppinga (talk) 10:57, 19 March 2010 (UTC)[reply]

This section is in the wrong place

This section should be on the XML page...

XML

Advantages

XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations. XML data is stored in plain text format.^[1] This software- and hardware-independent way of storing data allows different incompatible systems to share data without needing to pass them through many layers of conversion. This also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing any data.
It supports Unicode, allowing almost any information in any written human language to be communicated.
It can represent common computer science data structures: records, lists and trees.
Its self-documenting format describes structure and field names as well as specific values.
The strict syntax and parsing requirements make the necessary parsing algorithms extremely simple, efficient, and consistent.
Content-based XML markup enhances searchability, making it possible for agents and search engines to categorize data instead of wasting processing power on context-based full-text searches.
The hierarchical structure is suitable for most (but not all) types of documents.
It is platform-independent, thus relatively immune to changes in technology.
Its predecessor, SGML, has been in use since 1986, so there is extensive experience and software available.

Disadvantages

XML syntax is redundant or large relative to binary representations of similar data,^[2] especially with tabular data.
The redundancy may affect application efficiency through higher storage, transmission and processing costs.^[3]^[4]
XML syntax is verbose, especially for human readers, relative to other alternative 'text-based' data transmission formats.^[5]^[6]
The hierarchical model for representation is limited in comparison to an object oriented graph.^[7]^[8]
Expressing overlapping (non-hierarchical) node relationships requires extra effort.^[9]
XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser.^[10]
XML is commonly depicted as "self-documenting" but this depiction ignores critical ambiguities.^[11]^[12]
The distinction between content and attributes in XML seems unnatural to some and makes designing XML data structures harder.^[13]
Transformations, even identity transforms, result in changes to format (whitespace, attribute ordering, attribute quoting, whitespace around attributes, newlines). These problems can make diff-ing the XML source very difficult except via Canonical XML.

References

^ "How Can XML be Used?". W3schools.com. Retrieved 2009-07-31.
^ Harold, Elliotte Rusty (2002). Processing XML with Java(tm): a guide to SAX, DOM, JDOM, JAXP, and TrAX. Addison-Wesley. ISBN 0201771861.XML documents are too verbose compared with binary equivalents.
^ Harold, Elliotte Rusty (2002). XML in a Nutshell: A Desktop Quick Reference. O'Reilly. ISBN 0596002920. XML documents are very verbose and searching is inefficient for high-performance largescale database applications.
^ However, the Binary XML effort strives to alleviate these problems by using a binary representation for the XML document. For example, the Java reference implementation of the Fast Infoset standard parsing speed is better by a factor 10 compared to Java Xerces, and by a factor 4 compared to the Piccolo driver, one of the fastest Java-based XML parser [1].
^ Bierman, Gavin (2005). Database Programming Languages: 10th international symposium, DBPL 2005 Trondheim, Norway. Springer. ISBN 3540309519.XML syntax is too verbose for human readers in for certain applications. Proposes a dual syntax for human readability.
^ Although many purportedly "less verbose" text formats actually cite XML as both inspiration and prior art. See e.g., http://yaml.org/spec/current.html, http://innig.net/software/sweetxml/index.html, http://www.json.org/xml.html.
^ A hierarchical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors, but not both.
^ Lim, Ee-Peng (2002). Digital Libraries: People, Knowledge, and Technology. Springer. ISBN 3540002618.Discusses some of the limitation with fixed hierarchy. Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002, held in Singapore in December 2002.
^ Searle, Leroy F. (2004). Voice, text, hypertext: emerging practices in textual studies. University of Washington Press. ISBN 0295983051. Proposes an alternative system for encoding overlapping elements.
^ (See e.g., http://www-128.ibm.com/developerworks/library/x-abolns.html )
^ "The Myth of Self-Describing XML" (PDF). Retrieved 2007-05-12.
^ (See e.g., Use–mention distinction, Naming collision, Polysemy)
^ "Does XML Suck?". Retrieved 2007-12-15.(See "8. Complexity: Attributes and Content")

Human Readable?

XML should only be tagged as partially human-readable, the simpler XML files, basic XML files can be, but onces xmlns and xsd come into play, it quickly becomes not human-readable. Another factor is that it's not always possible to properly reformat/indent XML for readability without affecting content. 81.220.246.44 (talk) 14:20, 24 October 2014 (UTC)[reply]

JSON Associative Array Error

The JSON associative array sample - {42: true, "A to Z": [1, 2, 3]} - looks wrong to me (and also to JSONLint). In JSON the property names ("keys" if you will) must be double-quoted strings. Neither numbers nor unquoted strings are valid, hence 42 cannot be a property name, although "A to Z" can, as can "42".

See JSON.org, as follows:

An object is an unordered set of name/value pairs
pair
- string : value
A string is a sequence of zero or more Unicode characters, wrapped in double quotes

--Mikepeat (talk) 15:30, 10 January 2011 (UTC)[reply]

Missing Apache Avro

Especially for the subsection about "binary formats", but also for the "Overview", I would expect some information about Apache Avro: http://en.wikipedia.org/wiki/Apache_Avro ... till now I don't have enough own knowledge to write something about it --217.24.206.242 (talk) 11:10, 18 September 2012 (UTC)[reply]

Missing Java Serialization

As I understand the intention of this article, Java Serialization should be a part of it. It is one of the commonly used object serialization formats (e.g. for RMI communication). — Preceding unsigned comment added by 217.18.178.110 (talk) 12:52, 20 June 2014 (UTC)[reply]

Missing Python Pickle

As I understand the suggestion has been made that Java Serialization should be part of this article, what about other language-specific serialization formats, such as Python's pickle? 195.212.29.89 (talk) 07:00, 25 September 2014 (UTC)[reply]

Missing Microsoft Bond

https://github.com/Microsoft/bond/

From the github: "Bond is a cross-platform framework for working with schematized data. It supports cross-language de/serialization and powerful generic mechanisms for efficiently manipulating data. Bond is broadly used at Microsoft in high scale services." — Preceding unsigned comment added by 82.136.100.19 (talk) 10:11, 30 January 2015 (UTC)[reply]

Missing other Protocol Buffers flavors

To be complete, FlatBuffers (http://google.github.io/flatbuffers/) and Cap'N Proto (https://capnproto.org/) could be mentiond. 128.237.28.16 (talk) 16:00, 24 February 2015 (UTC)[reply]

Misleading "Standardized?" Column

The term Standardized leads to a page describing National and International Standards. Many of the of the entries are misleadingly listed as "Standardized" when in fact they are not standardized protocols, never having been approved by a due-process ANSI or ISO approved standards development organization. For example, Apache is not an ANSI or ISO approved standards development organization and therefore Avro is not a standardized protocol unless it is submitted and approved by such a body. — Preceding unsigned comment added by Posicks (talk • contribs) 15:45, 9 May 2015 (UTC)[reply]

Something is standardized if a useful specification is publicly aviable. --195.14.219.99 (talk) 21:01, 3 November 2015 (UTC)[reply]

The same can be said for Protocol Buffers - the link just refers to its own documentation.

EDN is missing

https://github.com/edn-format/edn — Preceding unsigned comment added by 164.144.252.29 (talk) 18:56, 11 November 2015 (UTC)[reply]

EDN seems to have an article here: Extensible Data Notation. 50.53.1.21 (talk) 22:04, 29 October 2017 (UTC)[reply]

External links modified

Hello fellow Wikipedians,

I have just modified one external link on Comparison of data serialization formats. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://web.archive.org/web/20081210064322/http://docs.sun.com/app/docs/doc/802-2112/6i63mn65o?a=view to http://docs.sun.com/app/docs/doc/802-2112/6i63mn65o?a=view

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 16:54, 11 August 2017 (UTC)[reply]

[w3chowxmluse-1] "How Can XML be Used?". W3schools.com. Retrieved 2009-07-31.

[Elliotte001-2] Harold, Elliotte Rusty (2002). Processing XML with Java(tm): a guide to SAX, DOM, JDOM, JAXP, and TrAX. Addison-Wesley. ISBN 0201771861.XML documents are too verbose compared with binary equivalents.

[Elliotte000-3] Harold, Elliotte Rusty (2002). XML in a Nutshell: A Desktop Quick Reference. O'Reilly. ISBN 0596002920. XML documents are very verbose and searching is inefficient for high-performance largescale database applications.

[However000-4] However, the Binary XML effort strives to alleviate these problems by using a binary representation for the XML document. For example, the Java reference implementation of the Fast Infoset standard parsing speed is better by a factor 10 compared to Java Xerces, and by a factor 4 compared to the Piccolo driver, one of the fastest Java-based XML parser [1].

[Bierman000-5] Bierman, Gavin (2005). Database Programming Languages: 10th international symposium, DBPL 2005 Trondheim, Norway. Springer. ISBN 3540309519.XML syntax is too verbose for human readers in for certain applications. Proposes a dual syntax for human readability.

[VerbRebut000-6] Although many purportedly "less verbose" text formats actually cite XML as both inspiration and prior art. See e.g., http://yaml.org/spec/current.html, http://innig.net/software/sweetxml/index.html, http://www.json.org/xml.html.

[TreeLimit000-7] A hierarchical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors, but not both.

[Lim000-8] Lim, Ee-Peng (2002). Digital Libraries: People, Knowledge, and Technology. Springer. ISBN 3540002618.Discusses some of the limitation with fixed hierarchy. Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002, held in Singapore in December 2002.

[Searle000-9] Searle, Leroy F. (2004). Voice, text, hypertext: emerging practices in textual studies. University of Washington Press. ISBN 0295983051. Proposes an alternative system for encoding overlapping elements.

[Names000-10] (See e.g., http://www-128.ibm.com/developerworks/library/x-abolns.html )

[selfdesc000-11] "The Myth of Self-Describing XML" (PDF). Retrieved 2007-05-12.

[12] (See e.g., Use–mention distinction, Naming collision, Polysemy)

[XMLSuck8-13] "Does XML Suck?". Retrieved 2007-12-15.(See "8. Complexity: Attributes and Content")

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]