Jump to content

Marshalling (computer science)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 2a02:8071:3eb5:5300:9a0:d9e9:b4d5:4e1f (talk) at 15:39, 23 January 2021 (Fixed typo). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, marshalling or marshaling (US spelling) is the process of transforming the memory representation of an object to a data format suitable for storage or transmission,[citation needed] and it is typically used when data must be moved between different parts of a computer program or from one program to another. Marshalling is similar to serialization and is used to communicate to remote objects with an object, in this case a serialized object. It simplifies complex communication, using composite objects in order to communicate instead of primitives. The inverse of marshalling is called unmarshalling (or demarshalling, similar to deserialization). An unmarshalling interface takes the serialized object and transforms it into an internal data structure, which can be referred to as executable.

The definition of marshalling differs across Python, Java, and .NET. In some contexts, it is used interchangeably with serialization.

Comparison with serialization

To "serialize" an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object.

The term "marshal" is used for a specific type of "serialization" in the Python standard library[1] – storing internal python objects:

The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files.

...

If you’re serializing and de-serializing Python objects, use the pickle module instead

— The Python Standard Library[2]

In the Java-related RFC 2713, marshalling is used when serialising objects for remote invocation. An object that is marshalled records the state of the original object and it contains the codebase (codebase here refers to a list of URLs where the object code can be loaded from, and not source code). Hence, in order to convert the object state and codebase(s), unmarshalling must be done. The unmarshaller interface automatically converts the marshalled data containing codebase(s) into an executable Java object in JAXB. Any object that can be deserialized can be unmarshalled. However, the converse need not be true.

To "marshal" an object means to record its state and codebase(s) in such a way that when the marshalled object is "unmarshalled," a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote (that is, implements the java.rmi.Remote interface). Marshalling is like serialization, except marshalling also records codebases. Marshalling is different from serialization in that marshalling treats remote objects specially.

...

Any object whose methods can be invoked [on an object in another Java virtual machine] must implement the java.rmi.Remote interface. When such an object is invoked, its arguments are marshalled and sent from the local virtual machine to the remote one,

where the arguments are unmarshalled and used.

— Schema for Representing Java(tm) Objects in an LDAP Directory (RFC 2713)[3]

In Microsoft .NET, marshalling is also used to refer to serialization when using remote calls:

When you marshal an object by value, a copy of the object is created and serialized to the server. Any method calls made on that object are done on the server

— How To Marshal an Object to a Remote Server by Value by Using Visual Basic .NET (Q301116)[4]

Usage

Marshalling is used within implementations of different remote procedure call (RPC) mechanisms, where it is necessary to transport data between processes and/or between threads. In Microsoft's Component Object Model (COM), interface pointers must be marshalled when crossing COM apartment boundaries.[5][6] In the .NET Framework, the conversion between an unmanaged type and a CLR type, as in the P/Invoke process, is also an example of an action that requires marshalling to take place.[7]

Additionally, marshalling is used extensively within scripts and applications that use the XPCOM technologies provided within the Mozilla application framework. The Mozilla Firefox browser is a popular application built with this framework, that additionally allows scripting languages to use XPCOM through XPConnect (Cross-Platform Connect).

Example

In the Microsoft Windows family of operating systems the entire set of device drivers for Direct3D are kernel-mode drivers. The user-mode portion of the API is handled by the DirectX runtime provided by Microsoft.

This is an issue because calling kernel-mode operations from user-mode requires performing a system call, and this inevitably forces the CPU to switch to "kernel mode". This is a slow operation, taking on the order of microseconds to complete.[8] During this time, the CPU is unable to perform any operations. As such, minimizing the number of times this switching operation must be performed would optimize performance to a substantive degree.

Linux OpenGL drivers are split in two: a kernel-driver and a user-space driver. The user-space driver does all the translation of OpenGL commands into machine code to be submitted to the GPU. To reduce the number of system calls, the user-space driver implements marshalling. If the GPU's command buffer is full of rendering data, the API could simply store the requested rendering call in a temporary buffer and, when the command buffer is close to being empty, it can perform a switch to kernel-mode and add a number of stored commands all at once.

Formats

XML objects are one means of transferring data between systems. Microsoft, for example, uses it as the basis of the file formats of the various components (Word, Excel, Access, PowerPoint, etc.) of the Microsoft Office suite: see Office Open XML. While this typically results in a lengthier (i.e., more verbose) message wire format, XML's fully-bracketed "start-tag", "end-tag" syntax allows provision of more accurate diagnostics and easier recovery from transmission or disk errors. In addition, because the tags occur repeatedly throughout the object, one can use standard compression methods to shrink the object: all the Office file formats are created by employing the ZIP algorithm on the raw XML.[9]

Alternative formats such as JSON (JavaScript Object Notation) are more concise - JSON uses curly braces instead of start/end tags - but correspondingly less robust for error recovery.

Once the data is transferred back to a program or an application, it needs to be converted back to an executable object for usage. Hence, unmarshalling is generally used in the receiver end of the implementations of Remote Method Invocation (RMI) and Remote procedure call (RPC) mechanisms to unmarshal transmitted objects in an executable form.

JAXB

JAXB or Java Architecture for XML Binding is the most common framework used by developers to marshal and unmarshal Java objects. JAXB provides for the interconversion between fundamental data types supported by Java and standard XML schema data types.[10]

XmlSerializer

XmlSerializer is the framework used by C# developers to marshal and unmarshal C# objects. One of the advantages of C# over Java is that C# natively supports marshalling due to the inclusion of XmlSerializer class. Java, on the other hand requires a non-native glue code in the form of JAXB to support marshalling.[11]

XML and executable representation

An example of unmarshalling is the conversion of an XML representation of an object to the default representation of the object in any programming language. Consider the following class.

public class Student
{
    private char name[50];
    private int ID;
    public String getName()
    {
        return this.name;
    }
    public int getID()
    {
        return this.ID;
    }
    void setName(String name)
    {
        this.name = name;
    }
    void setID(int ID)
    {
        this.ID = ID;
    }
}
  • XML representation of Student object:
<!-- Code Snippet: 1 -->

<?xml version = “1.0” encoding = “UTF-8”?>
    <student id = “11235813”>
        <name>Jayaraman</name>
    </student>
    <student id = “21345589”>
        <name>Shyam</name>
    </student>
  • Executable representation of Student object:
//Code Snippet: 2

Student s1 = new Student();
s1.setID(11235813);
s1.setName("Jayaraman");
Student s2 = new Student();
s2.setID(21345589);
s2.setName("Shyam");

The conversion of the XML representation of the objects created by code snippet 1 to the default executable Java representation of the objects created by code snippet 2 is called unmarshalling.

Unmarshalling in Java

Unmarshaller in JAXB

The process of unmarshalling the XML data into an executable Java object is taken care of by the in-built Unmarshaller class. It also validates the XML data as it gets unmarshalled. The unmarshal methods defined in the Unmarshaller class are overloaded for the different types of XML inputs. Some of the important implementations of unmarshal methods:[12]

  • Unmarshalling from an XML File:
JAXBContext jcon = JAXBContext.newInstance( "com.acme.foo" );
Unmarshaller umar = jcon.createUnmarshaller();
Object obj = umar.unmarshal( new File( "input.xml" ) );
  • Unmarshalling from an XML file in InputStream:
InputStream istr = new FileInputStream( "input.xml" );
JAXBContext jcon = JAXBContext.newInstance( "com.acme.foo" );
Unmarshaller umar = jcon.createUnmarshaller();
Object obj = umar.unmarshal( istr );
  • Unmarshalling from an XML file in a URL:
JAXBContext jcon = JAXBContext.newInstance( "com.acme.foo" );
Unmarshaller umar = jcon.createUnmarshaller();
URL url = new URL( "http://merrilllynch.employee/input.xml" );
Object obj = umar.unmarshal( url );

Unmarshalling XML Data

Unmarshal methods can deserialize an entire XML document or a small part of it. When the XML root element is globally declared, these methods utilize the JAXBContext's mapping of XML root elements to JAXB mapped classes to initiate the unmarshalling. If the mappings are not sufficient and the root elements are declared locally, the unmarshal methods use declaredType methods for the unmarshalling process. These two approaches can be understood below.[12]

Unmarshal a global XML root element

The unmarshal method uses JAXBContext to unmarshal the XML data, when the root element is globally declared. The JAXBContext object always maintains a mapping of the globally declared XML element and its name to a JAXB mapped class. If the XML element name or its @xsi:type attribute matches the JAXB mapped class, the unmarshal method transforms the XML data using the appropriate JAXB mapped class. However, if the XML element name has no match, the unmarshal process will abort and throw an UnmarshalException. This can be avoided by using the unmarshal by declaredType methods.[13]

Unmarshal a local XML root element

When the root element is not declared globally, the application assists the unmarshaller by application-provided mapping using declaredType parameters. By an order of precedence, even if the root name has a mapping to an appropriate JAXB class, the declaredType overrides the mapping. However, if the @xsi:type attribute of the XML data has a mapping to an appropriate JAXB class, then this takes precedence over declaredType parameter. The unmarshal methods by declaredType parameters always return a JAXBElement<declaredType> instance. The properties of this JAXBElement instance are set as follows:[14]

JAXBElement Property Value
name xml element name
value instanceof declaredType
declaredType unmarshal method declaredType parameter
scope null (actual size is not known)

See also

References

  1. ^ "marshal — Internal Python object serialization". Python Software Foundation. Retrieved 4 November 2016.
  2. ^ "marshal — Internal Python object serialization". Python Software Foundation. Retrieved 9 October 2019.
  3. ^ "Schema for Representing Java(tm) Objects in an LDAP Directory". IETF. October 1999. Retrieved 4 November 2016.
  4. ^ "How To Marshal an Object to a Remote Server by Value by Using Visual Basic .NET". Microsoft. July 2004. Archived from the original on 2004-11-15. Retrieved 4 November 2016.
  5. ^ "Apartments and COM Threading Models". Archived from the original on 2015-09-23. Retrieved 2009-06-19.
  6. ^ "CoInitializeEx function (COM)". Windows Desktop App Development. Retrieved 2013-02-22.
  7. ^ Interop Marshaling Overview
  8. ^ Code Quality: The Open Source Perspective.
  9. ^ What is a DOCX file? https://docs.fileformat.com/word-processing/docx/ Accessed Oct 13, 2020.
  10. ^ "Binding XML Schemas - The Java EE 5 Tutorial". docs.oracle.com. Retrieved 2016-09-14.
  11. ^ "Using the XmlSerializer Class". msdn.microsoft.com. Retrieved 2016-09-23.
  12. ^ a b "Unmarshaller (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-14.
  13. ^ "JAXBContext (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-23.
  14. ^ "JAXBElement (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-23.