Jump to content

Protocol Buffers

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by YumOooze (talk | contribs) at 03:21, 16 April 2012 (fix date). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Protocol Buffers
Developer(s)Google
Initial releaseJuly 7, 2008 (2008-07-07)
Stable release
2.4.1 / April 30, 2011 (2011-04-30)
Repository
Operating systemAny
PlatformCross-platform
Typeserialization format and library, IDL compiler
LicenseBSD
Websitehttp://code.google.com/apis/protocolbuffers/

Protocol Buffers are a serialization format with an interface description language developed by Google. The original Google implementation for C++, Java and Python is available under a free software, open source license. Various other language implementations are either available or in development.[1]

The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than XML.[2]

Protocol Buffers are widely used at Google for storing and interchanging all kinds of structured information. Protocol Buffers serve as a basis for a custom remote procedure call (RPC) system that is used for nearly all inter-machine communication at Google.[3]

Protocol Buffers are very similar to the Apache Thrift protocol (used e.g. by Facebook), except it does not include a concrete RPC stack to use for defined services.

Data structures (called "messages") and services are defined in the Proto Definition file (.proto) which is then compiled with protoc. This compilation generates code that matches the message definitions. For example, example.proto will produce example.pb.cc and example.pb.h which will define C++ classes for each Message and Service example.proto defines.

Canonically, Protocol Buffers are serialized into a binary wire format which is compact, forwards-compatible, backwards-compatible, but not self-describing (that is, there is no way to tell the names, meaning, or full datatypes of fields, without having an external specification; there is no defined way to include or refer to such a schema within a Protocol Buffer file. The officially supported implementation includes an ASCII serialization format[4], but this format — though self-describing — loses the forwards-and-backwards-compatibility behavior, and is thus not a good choice for applications other than debugging.

Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future.

Example

A schema for a particular use of protocol buffers associated data types and field names, with integers to be used to identify them (the protocol buffer data then has only the numbers (this amounts to data compression of field separators, though not of the data)):

message Point {
  required int32 x = 1;
  required int32 y = 2;
  optional string label = 3;
}

message Line {
  required Point start = 1;
  required Point end = 2;
  optional string label = 3;
}

message Polyline {
  repeated Point point = 1;
  optional string label = 2;
}

The "Point" message defines two mandatory data items, x and y. The data item label is optional. Each data item has a tag. The tag is defined after the equal sign, e.g. x has the tag 1.

The "Line" and "Polyline" messages demonstrate how composition works in Protocol Buffers (they both use Point). Polyline has a repeated field, which behaves like a vector.

This is subsequently compiled with proto, which writes out a C program that can then read and write the data (at this time, there appear to be no other implementations of protoc, and fields within a protocol buffer file cannot be identified without the intervention of a program equivalent to the C program protoc creates). A C++ program can then use it like so:

#include "polyline.pb.h"  // generated by calling protoc polyline.proto (defined above)

Line* createNewLine(const std::string& name) {
  Line* line = new Line;
  line->mutable_start()->set_x(10);
  line->mutable_start()->set_y(20);
  line->mutable_end()->set_x(30);
  line->mutable_end()->set_y(40);
  line->set_label(name);
  return line;
}

Polyline* createNewPolyline() {
  Polyline* polyline = new Polyline;
  Point* point1 = polyline->add_point();
  point1->set_x(10);
  point1->set_y(10);
  Point* point2 = polyline->add_point();
  point2->set_x(10);
  point2->set_y(10);
  return polyline;
}

See also

Notes and references

  1. ^ http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns
  2. ^ Eishay Smith. "jvm-serializers Benchmarks". Retrieved 2010-07-12.
  3. ^ Kenton Varda. "A response to Steve Vinoski". Retrieved 2008-07-14.
  4. ^ "text_format.h - Protocol Buffers - Google Code". Retrieved 2012-03-02.