OpenDocument
The OpenDocument format (ODF), short for the OASIS Open Document Format for Office Applications, is an open document file format for saving and exchanging editable office documents such as text documents (including memos, reports, and books), spreadsheets, charts, and presentations. This standard was developed by the OASIS industry consortium, based upon the XML-based file format originally created by OpenOffice.org.
The standard was publicly developed by a variety of organizations, is publicly accessible, and can be implemented by anyone without restriction. The OpenDocument format is intended to provide an open alternative to proprietary document formats including the popular DOC, XLS, and PPT formats used by Microsoft Office, as well as Microsoft Office Open XML format (this latter format has various licensing requirements that prevent some competitors from using it). Organizations and individuals that store their data in an open format such as OpenDocument avoid being locked in to a single software vendor, leaving them free to switch software if their current vendor goes out of business, raises their prices, changes their software, or changes their licensing terms to something less favorable.
OpenDocument is the only standard for editable office documents that has been vetted by an independent recognized standards body, has been implemented by multiple vendors, and can be implemented by any supplier (including proprietary software vendors and developers using the GNU GPL).
Public policy implications
Since one objective of open formats like OpenDocument is to guarantee long-term access to data without legal or technical barriers, governments have become increasingly aware of open formats as a public policy issue. For example, in 2002, Dr. Edgar David Villanueva Nuñes, a lawyer and Congressman of the Republic of Perú, wrote a letter to Microsoft Peru raising questions about free and permanent document access with proprietary formats. Europe and Massachusetts in particular have been examining the ramifications of selecting a document format.
Europe
European governments have, since at least 2003, been investigating various options for storing documents in an XML-based format, commissioning technical studies such as the "Valoris Report" (Valoris). In March 2004, European governments asked an OpenOffice team and a Microsoft team to present on the relative merits of their XML-based office document formats (Bray, September 29, 2004).
In May 2004, the Telematics between Administrations Committee (TAC) issued a set of recommendations, in particular noting that, "Because of its specific role in society, the public sector must avoid [a situation where] a specific product is forced on anyone interacting with it electronically. Conversely, any document format that does not discriminate against market actors and that can be implemented across platforms should be encouraged. Likewise, the public sector should avoid any format that does not safeguard equal opportunities to market actors to implement format-processing applications, especially where this might impose product selection on the side of citizens or businesses. In this respect standardisation initiatives will ensure not only a fair and competitive market but will also help safeguard the interoperability of implementing solutions whilst preserving competition and innovation." It then issued recommendations, including:
- Industry actors not currently involved with the OASIS Open Document Format consider participating in the standardisation process in order to encourage a wider industry consensus around the format;
- Microsoft considers issuing a public commitment to publish and provide non-discriminatory access to future versions of its WordML specifications;
- Microsoft should consider the merits of submitting XML formats to an international standards body of their choice;
- The public sector is encouraged to provide its information through several formats. Where by choice or circumstance only a single revisable document format can be used this should be for a format around which there is industry consensus, as demonstrated by the format's adoption as a standard. (TAC, May 25, 2004)
OpenDocument is already a standard by a recognized independent standards body (OASIS), and is being submitted to ISO for standardization, while there is no evidence that the Microsoft XML formats or the older DOC/PPT/XLS formats will go through such a process. Many expect ISO will accept and approve OpenDocument using its fast-track process, and that once ISO ratifies the standard, the European Union will require OpenDocument as the office suite standard for the European Union. (Marson, October 18, 2005)
Massachusetts
Massachusetts has also been examining its options for implementing XML-based document processing. In early 2005, Eric Kriss, Secretary of Administration and Finance in Massachusetts, was the first government official in the United States to publicly connect open formats to a public policy purpose: "It is an overriding imperative of the American democratic system that we cannot have our public documents locked up in some kind of proprietary format, perhaps unreadable in the future, or subject to a proprietary system license that restricts access." [1]
At a September 16, 2005 meeting with the Mass Technology Leadership Council Kriss stated that he believes this is fundamentally an issue of sovereignty. [2] While supporting the principle of private intellectual property rights, he said sovereignty trumped any private company's attempt to control the state's public records through claims of intellectual property. [3]
Subsequently, in September 2005, Massachusetts became the first state to formally endorse OpenDocument formats for its public records and, at the same time, reject Microsoft's proprietary XML format, now named Microsoft Office Open XML format (see WordprocessingML). This decision was made after a two-year examination of file formats, including many discussions with Microsoft, other vendors, and various experts. Microsoft Office, which has a nearly 100% market share among the state's employees, does not currently support OpenDocument formats. Microsoft has indicated that OpenDocument formats will not be supported in new versions of Office, even though they support many other formats (including ASCII, RTF, and WordPerfect), and analysts believe it would be easy for Microsoft to implement the standard. If Microsoft chooses to not implement OpenDocument, Microsoft will disqualify themselves from future consideration. Several analysts (such as Ovum) believe that Microsoft will eventually support OpenDocument.
After this announcement by Massachusetts supporting OpenDocument, a large number of people and organizations spoke up about the policy, both pro and con (see the references section). Adobe, Corel, IBM, and Sun all sent letters to Massachusetts supporting the measure. In contrast, Microsoft sent in a letter highly critical of the measure. A group named "Citizens Against Government Waste" (CAGW) also opposed the decision. The group claimed that Massachusetts' policy established "an arbitrary preference for open source," though both open source software and proprietary software can implement the specification, and both kinds of developers were involved in creating the standard (CAGW, 2005). Many considered this group's statement as simply a paid statement by Microsoft; InternetNews and Linux Weekly News noted that CAGW has received funding from Microsoft, and that in 2001 CAGW was caught running an astroturfing campaign on behalf of Microsoft when two letters they submitted supporting Microsoft in Microsoft's anti-trust case, were found to have the signatures of deceased persons (Linux Weekly News). James Prendergast, executive director of a coalition named "Americans for Technology Leadership" (ATL), also criticized the state's decision in a Fox News article (Prendergast 2005). In the article, Prendergast failed to disclose that Microsoft is a founding member of ATL. Fox News later published a follow-up article disclosing that fact (FOX News, 2005; Jones, September 29, 2005).
Other countries
According to OASIS' OpenDocument datasheet, "Singapore's Ministry of Defense, France's Ministry of Finance and its Ministry of Economy, Finance, and Industry, Brazil's Ministry of Health, the City of Munich, Germany, UK's Bristol City Council, and the City of Vienna in Austria are all adopting applications that support OpenDocument."
Standardization
Process
Version 1.0 of the OpenDocument specification was developed after lengthy development and discussion by multiple organizations. The first official OASIS meeting to discuss the standard was December 16, 2002; OASIS approved OpenDocument as an OASIS standard on May 1, 2005. The group decided to build on an earlier version of the OpenOffice.org format, since this was already an XML format with most of the desired properties, and had been in use since 2000 as the program's primary storage format (demonstrating its utility). Note, however, that OpenDocument is not the same as the older OpenOffice.org format; many changes and lessons learned were incorporated based on the feedback from many different individuals and companies.
According to Gary Edwards, a member of the OpenDocument TC, the specification was developed in two phases. Phase one (which lasted from November of 2002 through March of 2004), had the goal of ensuring that the OpenDocument format could capture all the data from a vast array of older legacy systems. Edwards expressed this goal as perfecting "the Open Document XML as a transformation layer" (a universal intermediate format) where "interoperability with legacy information systems was our primary concern." This considered "at least 30 years of legacy information systems that cross an incredible spectrum of information and file format types," including various versions of Microsoft Office and many other products and formats as well. Phase Two focused on Open Internet based collaboration. (Einfeldt, 2005).
Participants
The standardization process included the developers of many office suites or related document systems, including (in alphabetical order):
- Adobe (Framemaker, Distiller)
- Arbortext (Arbortext Enterprise Publishing System)
- Corel (WordPerfect)
- IBM (Lotus 1-2-3, Workplace)
- KDE (KOffice)
- SpeedLegal (SmartPrecedent enterprise document assembly system); both product and company later changed names to Exari.
- Sun Microsystems / OpenOffice.org (StarOffice/OpenOffice.org)
Notably absent from the group of active participants was Microsoft, especially since Microsoft is a member of OASIS and is the dominant vendor of office suite software. This absence was in spite of the European Union's TAC (Telematics between Administrations Committee) 2004 request for all industry actors to consider participating in the OASIS Open Document Format work (TAC, 2004). Instead, Microsoft decided to only develop their own incompatible format, without external input or review. Due to this lack of widespread independent and public review of Microsoft's format, many are concerned that Microsoft's format will be harder for others to implement or that Microsoft's format lacks important capabilities compared to OpenDocument. For example, the European Union commissioned a report (Valoris, 2004) which noted that, "It is quite trivial to add elements to an XML document that place processing requirements and restrictions on the document, thus preventing cross-platform processing capability... While properly developed XML should in theory be platform-neutral, experience has shown that vendors who wish to maintain and protect their platform's market will go to extents to encode elements that are capable of being processed only by their own application suites. The only counter-balance to this natural force is the development of open, cross-industry, widely adopted standards that serve to block the inclusion of application or platform specific encoding." Microsoft also imposes additional license conditions on users of their format; many believe these additional license conditions inhibit competition, as discussed below.
The OpenDocument standardization process also included many document users, especially those with the need to handle complex documents or to be able to retrieve documents for long periods of time after their development. Document-using organizations who initiated or were involved in the standardization process included (alphabetically):
- Boeing (complex large documents)
- CSW Informatics
- Drake Certivo
- Intel (complex large documents; they are developing sample documents as a test suite) (Bastian, 2005)
- National Archive of Australia (retrieve documents long after development)
- New York State Office of the Attorney General (complex large documents retrieved long after development)
- Society of Biblical Literature (large multilingual documents, long-term retrieval)
- Sony
- Stellent
As well as having many formal members, draft versions of the specification were released to the public and subject to worldwide review. Many others, who were not formal members of the standardization committee, submitted comments to the committee. These external comments were then adjudicated publicly by the committee.
Next Steps
OASIS has submitted the OpenDocument standard to a joint technical committee of the International Organization for Standardization ISO and the International Electrotechnical Commission (IEC) for approval as an international ISO/IEC standard. ISO spokesman Roger Frost stated that the committee will send the specification out to its members, probably at the end of this month, and they will have five months to study and vote on it (Sayer, 2005). Many expect that OpenDocument's broad support and demonstrated open development process will result in quick passage of OpenDocument as an ISO/IEC standard. OASIS is one of the few organizations which has been granted the right to propose standards directly to ISO as a proposed "publicly available specification" (PAS). This process is specifically designed to fast-track public specifications into becoming ISO standards when they have already been developed in an open manner. OpenDocument advocates note that, in contrast, there is no evidence that the competing Microsoft XML formats or the older DOC/PPT/XLS formats will go through an independent standardization process to be standardized. The older DOC/PPT/XLS formats are not even publicly specified, which is one reason why documents written in these formats sometimes cannot be read by later versions of the same office suite.
Gary Edwards, a member of the OpenDocument TC, says that after ISO standardization, "there is no doubt in my mind that OpenDocument is heading to the W3C for ratification as the successor to HTML and XHTML." (Einfeldt, 2005). The W3C has not made any public statements supporting or denying this statement, however.
Licensing
The OpenDocument specification is available for free download and use [4]. An irrevocable intellectual property covenant made by key contributor Sun Microsystems [5] is the only IPR Statement connected with the specification, providing all implementers with the guarantee that it contains no material that necessitates licensing from any author. Reciprocal, royalty-free licensing terms are being promoted by some standards developing organizations, such as the W3C and OASIS, as a method for avoiding conflict over intellectual property concerns while still promoting innovation. See also software patent debate. In short, anyone can implement OpenDocument, without restraint, and as shown below both proprietary and open source software programs implement the format.
All of this is in contrast with the competing "Microsoft Office Open XML" developed by Microsoft. Microsoft has released their format royalty-free, but with additional conditions not imposed by OpenDocument. Independent analysts have stated that Microsoft's licensing requirements will prevent many competitors from ever implementing Microsoft's format. The extent of this incompatibility is the source of significant controversy between Microsoft and other parties. The text below attempts to capture these differences, since they are often one of the reasons people consider using OpenDocument.
Microsoft states, in their FAQ, that they believe that some open source software licenses are compatible with their license, and that if a developer believes that some license is in conflict, they must "choose other forms of open source licenses." Microsoft has not publicly issued its opinion about the compatibility of any particular open source software license. However, several independent analysts have determined that the legal obligations for the Microsoft format are such so it cannot be used by competing programs licensed under the GNU General Public License (GPL), and possibly many other open source software / Free-libre software licenses as well. This is important because the GPL is the most popular license by far for open source software. In particular, the GPL is used by many competing office applications such as the entire KOffice office suite, the Gnumeric spreadsheet program, and the Abiword word processor. Microsoft is well aware of widespread use of the GPL license by many of its competitors; at one time Microsoft CEO Ballmer referred to Linux as a "cancer" because of the effects of the GPL (the license the Linux kernel uses) (Greene, 2001). Thus, many independent analysts believe that Microsoft's license terms are designed to inhibit competition, in spite of Microsoft's claims otherwise. Some of these concerns are described as follows:
- Richard Stallman, president of the Free Software Foundation and the author of the GPL, states that Microsoft's license was "designed to prohibit all free software. It covers only code that implements, precisely, the Microsoft formats, which means that a program under this license does not permit modification... The freedom to modify the software for private use and the freedom to publish modified versions are two of the essential components in the definition of free software. If these freedoms are lacking, the program is not free software." Thus, it would violate the GPL. (Galli, 2005)
- Jean Paoli, senior director of XML architecture for Microsoft, acknowledged that their attribution requirement might preclude any program that uses the file formats from being used in Linux and other open-source software licensed under the GPL. Microsoft's license requires developers who use Office Open XML Formats to attribute the use of the file format in their code. Paoli admitted, "The GPL may not allow code that is attributable to another company like Microsoft to be included." (Galli, 2005)
- Dan Ravicher, executive director of the Public Patent Foundation, says that "If [Microsoft has] rights and a license is needed, then the term in the license that requires attribution by the licensee of all of its downstream licenses is, in fact, not compatible with the GPL." (Galli, 2005)
- User "gustl" on Brian Jones' blog stated on September 6, 2005, that OpenDocument was far more open than Microsoft's format. He stated that OpenDocument can be implemented by any implementor, even using the GPL or BSD licenses. He argued that the "may not sublicense" clause covering Microsoft's format "effectively prohibits any open source project from using [their] specifications." MS' He argues that Microsoft's XML license is prohibitively restrictive, while OpenDocument's license permits any competitor to implement the format. See (Jones, 2005).
- Groklaw posted a legal analysis by Marbux, a retired lawyer, whose detailed analysis found that Microsoft's specification excluded competition, in contrast with Microsoft's public claims. "Competitors are... effectively precluded from bidding against Microsoft or its suppliers for any... contract specifying use of Microsoft's software file formats." He first noted that the patent license for the format "is structured to be read restrictively, in Microsoft's favor... it states that: 'All rights not expressly granted in this license are reserved by Microsoft. No additional rights are granted by implication or estoppel or otherwise.' This is not the customary 'all rights reserved' phrase more commonly encountered... If you cannot find words in the license explicitly stating that you have the right to do something, you don't get that right." Then, by examining the patent license in detail, he found a number of omissions and conditions that suppress competition: there is no integration clause, no license for the schemas themselves, no grant of copyright was included in the patent license, no commitment to delivering any future changes to the schemas or right to develop software implementing them under the same or more liberal license (this particular issue may have been resolved later by Microsoft), no identification of the Microsoft patents involved, no identification of third-party patents, no right to sell or sublicense implementing software, a prohibition against sale and licensing of implementing software, a prohibition against software having functions other than to read and write files using the specification without modification, no license to convert files to and from other formats, no right to write files using the schemas, vagueness and ambiguities will deter implementation by developers and adoption by end users, and a discriminatory incompatibility with F/OSS licensing, and discriminatory incompatibility with proprietary software competitors. In short, he believes Microsoft's license prohibits effective competition from using the format. (Marbux, 2005)
- David Berlind of ZDNet notes that the technical proviso in Microsoft's license that says, "You are not licensed to sublicense or transfer your rights" is a deal breaker. "Included in the notion of state sovereignty is the right of the state's agencies, employees, contractors and citizens to choose any type of software they want to read or write public documents. By not allowing its license to be transferred or sublicensed, Microsoft's patent license automatically prevents just about all open source software -- including OpenOffice.org -- from supporting Microsoft's XML formats." Berlind notes that the Internet Engineering Task Force (IETF) e-mail sender authentication standards (to combat spam) and the OASIS specification's WS-Security have both foundered because some organizations would not permit sublicensing or transfer. (Berlind, October 17, 2005)
- Microsoft's Yates wrote, "Our license may not be compatible with the GPL, but it is compatible with many other open source licenses." (Berlind, October 17, 2005)
- Larry Rosen, author of a book on licensing of open source software, states that provisions that prevent sub-licensing and transferability are antithetical to open source. "[The Microsoft license] not only prevents transfer or sublicensing of the patent rights," said Rosen, "but it also requires that open source developers put Microsoft's patent notices in our licenses." These are terms that open source developers find to be unacceptable. For example, Rosen disputes Microsoft's claims of broad compatibility, stating that, "Open source depends on the right to sub-license... Among the licenses that are explicitly sublicenseable are the MIT, MPL, CPL, Apache 2.0, OSL/AFL, and all licenses derived from them. That's most, I believe. Microsoft's patent license is incompatible with all of them." He also stated, "The Microsoft license is incompatible with any open source license that explicitly authorizes sublicensing and is incompatible with open source processes that as of matter of practice do sublicensing. Every open source project operates on the basis that sublicensing is allowed. That's how open source works, even if not every license says so explicitly." (Berlind, October 17, 2005)
Microsoft has stated that it has been granted a number of patents related to its format, and that it may have more pending. Microsoft states that it offers royalty-free rights both to its issued patents and patents that may be issued in the future as an outcome of the patent process in order to implement the Office 2003 XML Reference Schemas. However, these patents can be used to force anyone to strictly adhere to their license, and as noted above, many people have analyzed the license in detail and concluded that the license inhibits competition. The most common open source software license (the GPL) forbids these kinds of limitations; if software is included, it must be usable for any purpose. There is also concern by some that Microsoft could change its licensing terms at any time; no contract actually binds Microsoft to these terms. Microsoft did restate in a clarification that their terms were offered in perpetuity, but since no enforceable contract was signed, there appears to still be some suspicion. These concerns about patents were raised in part because formerly secret Microsoft documents (known as Halloween documents I and II), which were developed in collaboration with key people in Microsoft, recommended that Microsoft suppress competition by "de-commoditizing" protocols (creating proprietary formats that could not be used by others) and by attacking competitors through patent lawsuits.
Dan Ravicher argues that Microsoft's licenses may not be valid, saying, "we should not presume Microsoft has any valid rights here." For example, one of the relevant patents was a patent Microsoft was granted covering the conversion of programming objects into XML files, based on a filing by Microsoft on June 2001. However, only a week after the announcement of the patent, independent analysts found that SXP, an open source software library for converting C++ programming objects into XML files, was made available on Sourceforge in February 2000. Since SXP's release predates Microsoft's filing, many believe Microsoft's patent is invalidatable in court due to the existence of prior art. Ravicher and others speculate this may be true for all the patents; patent offices have no database for examining software patent claims, and spend very little time examining patent claims, so there is general consensus that many invalid software patents are granted (Galli, 2005). However, since software patent litigation typically costs millions of dollars, invalidatable patents can still be used to intimidate and inhibit competition if the patent-holder chooses to do so.
After discussions with the European Union and Massachusetts, Microsoft issued a clarification. In particular, in the clarification Microsoft stated that, "We are acknowledging that end users who merely open and read government documents that are saved as Office XML files within software programs will not violate the license." However, observers quickly noted that this exception only applied to government documents (not other documents) and only for opening and reading them (not for writing them, and possibly not for printing them or translating them to another format). Neither governments nor software developers want formats that are limited for use only by governments; it is much better to have a single format for any such data. This exemption would not by itself permit open source software implementations, since the Open Source Definition forbids discrimination against persons (including non-government personnel), groups, or fields of endeavor; this exemption also contradicts the Free Software definition, which requires as freedom 0 the "freedom to run the program for any purpose". Also, the whole point of these formats is to permit editing, not just reading them; for read-only documents, other formats such as PDF tend to be used instead. If the term "reading" is interpreted as applying only to humans, then this grant is even more limited (prohibiting printing and transforming), but even a broad interpretation is limiting since it does not grant the privilege to write the format. Thus, independent analysts reported that none of these clarifications addressed the concern that Microsoft's XML format cannot be used by many of Microsoft's competitors, while OpenDocument can be used by anyone -- both Microsoft and its competitors.
Promotion
OASIS promotes OpenDocument (since it is their work). In October 2005 the Open Document Fellowship was founded with the aim of "[supporting] the work of community volunteers in promoting, improving and providing user assistance for the OASIS Open Document Format for Office Applications (OpenDocument) and software designed to operate on data in this format." It was founded by Friends of OpenDocument Inc.m an incorporated association in the State of Queensland, Australia. [6] Some early reports incorrectly stated that is was founded by OASIS [7]. Other promotional websites include friendsofopendocument.org and spreadopendocument.org.
Applications supporting OpenDocument
Current support
A number of applications currently support OpenDocument; listed alphabetically they include:
- Abiword 2.4 (reading)
- Aukyla Document Management System 2.0, lightweight web-based document management system. Has OpenDocument viewer and indexing functions [8]
- DocMgr 0.53.3, full featured document management system. Included search engine indexes OpenDocument files. [9]
- docvert, web service software takes multiple word processor files (typically .doc) and converts them to OpenDocument (builds on OpenOffice.org) [10]
- eZ publish 3.6, with OpenOffice extension
- IBM Workplace
- Knomos case management 1.0 [11]
- KOffice 1.4.2, released on October 11th 2005
- ooo-word-filter, a plugin for Microsoft Word 2003 XML to open OpenOffice XML documents (alpha stage)
- OpenOffice.org 1.1.5 (reading) and 2.0 (reading and writing)
- Scribus 1.2.2, imports OpenDocument Text and Graphics
- Sun StarOffice 8, proprietary commercially-supported product that reads and writes OpenDocument; based on OpenOffice.org
- TextMaker 2005 beta [12]
- Visioo Writer 0.6 [13]
- Gnumeric Incomplete support for reading and writing OpenDocument Spreadsheet.
Microsoft's letter to Massachusetts claimed that all current OpenDocument implementations were based on OpenOffice.org and its derivatives. However, this turns out to be untrue. For example, KOffice is a completely independent implementation of OpenDocument not based on OpenOffice.org -- their main functions have been implemented independently, and even their code for reading and writing the OpenDocument format was developed independently. This is important, because independent implementations from the same specification are generally considered the best way to find and fix any problems in a specification. For example, the IETF even requires two independent implementations for its final stage of standardization.
The first application to implement OpenDocument was KOffice. OpenDocument was developed starting from an XML format developed for OpenOffice.org; OpenOffice.org has since been updated so that it also supports OpenDocument.
Corel WordPerfect status
Corel's WordPerfect office suite may release support for OpenDocument, even though they have not yet made a formal announcement. Corel is an original member of the OASIS Technical Committee on the Open Document Format, and Paul Langille, a senior Corel developer, is one of the original four authors of the OpenDocument specification. Also, Corel sent a letter to Massachusetts supporting their selection of OpenDocument, saying, "Corel strongly supports the broad adoption of the open standards Massachusetts has outlined, including XML, the OASIS Open Document Format and PDF.... Corel remains committed to working alongside OASIS and other technology vendors to ensure the continued evolution of the ODF standard and the adoption of open standards industry-wide." [14] Many find it improbable that Corel would invest so much effort, and say that they will work to ensure adoption, without implementing it themselves.
At the September 16, 2005 "Town Meeting," an IBM representative said that they were implementing OpenDocument and that Corel was also actively implementing OpenDocument. Steven J. Vaughan-Nichols's eWeek article of September 26, 2005, states without caveats that Corel is actively implementing OpenDocument in their WordPerfect suite. On September 28, 2005, he clarified further that Corel's WordPerfect "will soon be supporting the OpenDocument format", noting that while "Corel won't commit to a date for adding OpenDocument to WordPerfect, the company made it clear that it is working towards that goal."
A month later, on October 18, 2005, Corel detailed their position in an interview for BetaNews [15]: they do not see OpenDocument format support as a priority for them just now, and cannot even evaluate the time it would need for them to support it, if ever.
Programmatic Support
OpenDocument is an ordinary Java archive (JAR) containing standard XML files. JAR files are simply a set of files compressed together using zip. Thus, any of the vast number of tools for handling zip/jar files and XML data can be used to handle OpenDocument. Nearly all programming languages have libraries (built-in or available) for processing XML files and zip files. The XSLT language was specifically designed to process XML files.
J. David Eisenberg has developed and released the Java class com.catcode.odf.OpenDocumentTextInputStream, which extracts the text information from an OpenDocument text file. It extracts only the text within <text:p> and <text:h>, unless they are in <text:tracked-changes> (i.e., it automatically handles tracked changes). The lists of "capture" and "omit" elements is user-selectable.
Some free Perl extensions for OpenDocument file processing are available at CPAN, such as OpenOffice::OODoc, OpenOffice::OOCBuilder, OpenOffice::OOSheets, PBib::Document::OpenOffice, and others.
Microsoft
Microsoft has publicly stated that it does not plan to support OpenDocument. Its stated rationale is that OpenDocument is missing some important functionality. Many are very sceptical of this claim; ZDNet said, "Does OpenDocument, which is the result of a lot of hard work from people fully versed in contemporary corporate computing, really fail at the very things it was designed to provide?" InfoWorld's Neil McAllister noted that even if OpenDocument were missing important functionality, this statement is inconsistent; Microsoft Office already supports formats with far less functionality than OpenDocument (such as HTML and ASCII text). Instead, he believes that the real reason Microsoft will not support OpenDocument (so far) is because "An open document standard won't help Microsoft lock in its loyal addicts -- excuse me, customers -- so an open standard isn't in Microsoft's business interests. Microsoft refuses to support OpenDocument; it doesn't get more bald-faced than that." ZDNet urges Microsoft to add support for OpenDocument.
A Boston Globe article quoted Peter Quinn of Massachusetts saying that the state could implement OpenDocument without abandoning Microsoft Office: "We are not asking anybody to take anything off their desktop." Instead, they plan to modify an estimated 50,000 computers with software that would let Office users store their files in the OpenDocument format, instead of Microsoft's proprietary format, if Microsoft continues to refuse to support the format.
Recent reports suggest that Microsoft is considering supporting OpenDocument in the future; it has not committed itself either way. Nick Tsilas, a Senior Attorney at Microsoft, said that, "features are dictated by customer demand and, until the Massachusetts-related activity occurred, Open Document was not even on our radar screens." Microsoft General Manager of Information Worker Business Strategy Alan Yates confirmed that this was the company position; "For us this has been, and will continue to be a matter of evaluating the flow of customer requirements, and this is a new issue." (Updegrove, 2005) On Sep. 25, 2005, Alan Joch of Federal Computer Week reported that Microsoft has changed its stance and that its next Office release will support OpenDocument, though not natively. This "means users would have to select that format option every time they save a file." (Joch, 2005) As of this time this report has not been independently confirmed, however, and other reports suggest this is merely being considered.
Other
Phase-n is developing OpenOpenOffice ("O3"), a open source software plug-in for Microsoft Office so it can open OpenDocument documents. Instead of installing a complete office application or even a large plug-in, it will install a tiny plug-in to the Microsoft Office system. This tiny plug-in would automatically send the file to some server, which would then do conversions and send it back. The server could be local or over the Internet. A version is expected to be available by the end of November 2005. OpenOpenOffice is a partnership between Open Source Victoria, Phase N Australia and the Open Source community.
File types
The recommended file extensions and MIME types are included in the official standard (OASIS, May 1, 2005).
Documents
The most common file extensions used for OpenDocument documents are .odt for text documents, .ods for spreadsheets, .odp for presentation programs, .odg for graphics and .odb for database applications. These are easily remembered by considering ".od" as being short for "OpenDocument", and then noting that the last letter indicates its more specific type (such as t for text). Here is the complete list of document types, showing the type of file, the recommended file extension, and the MIME:
File type | Extension | Mime Type |
---|---|---|
Text | .odt | application/vnd.oasis.opendocument.text |
Spreadsheet | .ods | application/vnd.oasis.opendocument.spreadsheet |
Presentation | .odp | application/vnd.oasis.opendocument.presentation |
Drawing | .odg | application/vnd.oasis.opendocument.graphics |
Chart | .odc | application/vnd.oasis.opendocument.chart |
Formula | .odf | application/vnd.oasis.opendocument.formula |
Database | .odb | application/vnd.oasis.opendocument.database |
Image | .odi | application/vnd.oasis.opendocument.image |
Master Document | .odm | application/vnd.oasis.opendocument.text-master |
Templates
OpenDocument also supports a set of template types. Templates represent formatting information (including styles) for documents, without the content themselves. The recommended filename extension begins with ".ot" (which can be viewed as short for "OpenDocument template"), with the last letter indicating what kind of template (such as "t" for text). The supported set are:
File type | Extension | Mime Type |
---|---|---|
Text | .ott | application/vnd.oasis.opendocument.text-template |
Spreadsheet | .ots | application/vnd.oasis.opendocument.spreadsheet-template |
Presentation | .otp | application/vnd.oasis.opendocument.presentation-template |
Drawing | .otg | application/vnd.oasis.opendocument.graphics-template |
Chart template | .otc | application/vnd.oasis.opendocument.chart-template |
Formula template | .otf | application/vnd.oasis.opendocument.formula-template |
Image template | .oti | application/vnd.oasis.opendocument.image-template |
Web page template | .oth | application/vnd.oasis.opendocument.text-web |
Capabilities
As noted above, the OpenDocument format can describe text documents (e.g., those typically edited by a word processor), spreadsheets, presentations, drawings/graphics, images, charts, mathematical formulas, databases, and "master documents" (which can combine them). It can also represent templates for many of them.
The official OpenDocument standard (OASIS, May 1, 2005) defines OpenDocument's capabilities. Haumacher (2005) provides a hyperlinks formal specification (Haumacher, 2005) derived from the official standard. Eisenberg (2005)'s book describes the format in more detail. The text below provides a brief summary of the format's capabilities.
Metadata
The OpenDocument format supports storing metadata (data about the data) by having a set of pre-defined metadata elements, as well as allowing user-defined and custom metadata. The predefined metadata are: Generator, Title, Description, Subject, Keywords, Initial Creator, Creator, Printed By, Creation Date and Time, Modification Date and Time, Print Date and Time, Document Template, Automatic Reload, Hyperlink Behavior, Language, Editing Cycles, Editing Duration, and Document Statistics.
Content
OpenDocument's text content format supports both typical and advanced capabilities. Headings of various levels, lists of various kinds (numbered and not), numbered paragraphs, and change tracking are all supported. Page sequences and section attributes can be used to control how the text is displayed. Hyperlinks, ruby text (which provides annotations and is especially critical for some languages), bookmarks, and references are supported as well. Text fields (for autogenerated content), and mechanisms for automatically generating tables such as tables of contents, indexes, and bibliographies, are included as well.
In the OpenDocument format, spreadsheets are an example of a set of tables. Thus, there are extensive capabilities for formatting the display of tables and spreadsheets. Database ranges, filters, and data pilots (known to Excel users as "pivot tables") are also supported. Change tracking is available for spreadsheets as well.
The graphics format supports a vector graphic representation, in which a set of layers and the contents of each layer is defined. Available drawing shapes include Rectangle, Line, Polyline, Polygon, Regular Polygon, Path, Circle, Ellipse, and Connector. 3D Shapes are also available; the format includes information about the Scene, Light, Cube, Sphere, Extrude, and Rotate (it is intended for use as for office data exchange, however, and not sufficient to represent movies or other extensive 3D scenes). Custom shapes can also be defined.
Presentations are supported. Animations can be included in presentations, with control over the Sound, showing a shape or text, hiding a shape or text, or dimming something, and these can be grouped. In OpenDocument, much of the format capabilities are reused from the text format, simplifying implementations.
Charts define how to create graphical displays from numerical data. They support titles, subtitles, a footer, and a legend to explain the chart. The format defines the series of data that is to be used for the graphical display, and a number of different kinds of graphical displays (such as line charts, pie charts, and so on).
Forms are specially supported, building on the existing XForms standard.
Formatting
The style and formatting controls are numerous, providing a number of controls over how information is displayed.
Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.
Headers and footer can have defined fixed and minimum heights, margins, border border line width, padding, background, shadow, and dynamic spacing.
There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.
Spreadsheet formulas issue
OpenDocument is fully capable of describing mathematical formulas that are displayed on the screen. It is also fully capable of exchanging spreadsheet data, formats, pivot tables, and other information typically included in a spreadsheet. OpenDocument can exchange spreadsheet formulas (formulas that are recalculated in the spreadsheet); formulas are exchanged as values of the attribute table:formula.
However, some believe that the allowed syntax of table:formula is not defined in sufficient detail. The OpenDocument version 1.0 specification defines spreadsheet formulas using a set of simple examples which show, for example, how to specify ranges and the SUM() function. Some critics argue that a more detailed, precise specification for spreadsheet functions, including syntax and semantics, should be created to augment these examples. The OpenDocument committee argued that this was outside their scope, since the syntax of such formulas is not in XML. Others have argued that, while the specification is less specific than one might like, the intent is fairly clear (especially since formulas tend to follow decades-long traditions), and also because the vast majority of spreadsheets only use a small set of functions (such as SUM) which are universally supported by all spreadsheet implementations anyway. In practice, many developers look to OpenOffice.org as a "canonical implementation"; since its code is public for anyone to review, and its XML output can be trivially inspected, this can resolve many questions. There is draft work proposing a more detailed specification for spreadsheet formulas (e.g. OpenFormula). Such work is expected to simply clarify in more detail what is acceptable in a spreadsheet formula; no one expects such work to invalidate any of the current OpenDocument standard. For more information, see the OpenFormula article.
Format internals
An OpenDocument file is a JAR compressed archive containing a number of files and directories. This simple compression mechanism means that OpenDocument files are normally significantly smaller than equivalent Microsoft ".doc" or ".ppt" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to organizations those who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the data contents (once uncompressed) have the typical ease of modification and processing of XML files.
The zipped set of files and directories includes the following:
- XML files
- content.xml
- meta.xml
- settings.xml
- styles.xml
- Other files
- mimetype
- layout-cache
- Directories
- META-INF/
- Thumbnails/
- Pictures/
- Configurations2/
The OpenDocument format provides a strong separation between content, layout and metadata. The most notable components of the format are described in the subsections below. The files in XML format are further defined using the RELAX NG language for defining XML schemas. RELAX NG is itself defined by an OASIS specification, as well as by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).
content.xml
content.xml is the most important file. It carries the actual content of the document (except for binary data, like images). The base format is inspired by HTML, and though far more complex, it should be reasonably legible to humans:
<text:h text:style-name="Heading_2">This is a title</text:h> <text:p text:style-name="Text_body"/> <text:p text:style-name="Text_body"> This is a paragraph. The formatting information is in the Text_body style. The empty text:p tag above is a blank paragraph (an empty line). </text:p>
styles.xml
styles.xml contains style information. OpenDocument makes heavy use of styles for formatting and layout. Most of the style information is here (though some is in content.xml). Styles types include:
- Paragraph styles.
- Page Styles.
- Character Styles.
- Frame Styles.
- List styles.
The OpenDocument format is somewhat unusual in that you cannot avoid using styles for formatting. Even "manual" formatting is implemented through styles (the application dynamically makes new styles as needed).
meta.xml
meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this:
<meta:creation-date>2003-09-10T15:31:11</meta:creation-date> <dc:creator>Daniel Carrera</dc:creator> <dc:date>2005-06-29T22:02:06</dc:date> <dc:language>es-ES</dc:language> <meta:document-statistic meta:table-count="6" meta:object-count="0" meta:page-count="59" meta:paragraph-count="676" meta:image-count="2" meta:word-count="16701" meta:character-count="98757"/>
The names of the <dc:...> tags are taken from the Dublin Core XML standard.
settings.xml
settings.xml includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.
Pictures/
Pictures/ is a folder containing all images in the document. They are refered to from content.xml using a <draw:image> tag, similar to the HTML <img> tag:
<draw:image xlink:href="Pictures/10000000000005E80000049F21F631AB.tif" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>
The layout information (width, anchor, etc) is provided by a <draw:frame> tag that contains the <draw:image> tag.
Most images are kept in their original format (GIF, JPEG, PNG) but bitmap images are converted to PNG for size considerations.
mimetype (file)
mimetype is just a one-line file with the mimetype of the document. One implication of this is that the file extension is actually immaterial to the format. The file extension is only there for the benefit of the user.
Reuse of existing formats
OpenDocument is designed to reuse existing open XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. So, OpenDocument uses DublinCore for metadata, MathML for displayed formulas, SVG for vector graphics, SMIL for multimedia, etc.
References
These references were used to justify the article text above, but not all of them are specifically cited. Please help us modify the text above to identify which statements are supported by which references.
General:
- Bastian, Waldo (July 15, 2005). "Fwd: OpenDocument Sample Documents" posted by Michael Brauer. (Waldo Bastian works for Intel; this message describes Intel's work to create OpenDocument sample documents for use as a test suite).
- Berlind, David (October 17, 2005). Microsoft: We were railroaded in Massachusetts on ODF.
- Bray, Tim (September 24-26, 2004). SmartEC Accessed on October 17, 2005. (Discussing Open Office XML ISO Certification).
- Carrera, Daniel (January 30, 2005). The Future Is Open: What OpenDocument Is And Why You Should Care. Groklaw.
- Darrow, Barbara (September 27, 2005). StarOffice 8 Ships With Boost From OpenDocument Format. CRN.
- Eisenberg, David J. (2005). OASIS OpenDocument Essentials. To be published by O'Reilly. (A book describing the OpenDocument format.)
- Galli, Peter (June 20, 2005). "Open XML Incompatible With GPL ". eWeek.
- Einfeldt, Christian (October 11, 2005). Gary Edwards: OpenOffice.org 2.0 leaping over legacy lockdown with clean XML.
- Greene, Thomas C. (June 2, 2001). "Ballmer: 'Linux is a cancer'". The Register.
- Haumacher, Bernhard (2005). RelaxNG with generated cross-references. Accessed on October 17, 2005. (The Relax-NG specification of OpenDocument, with all explanatory text removed and a hyperlinked cross-reference added).
- Jones, Brian (June 13, 2005). Brian Jones: Office XML Formats (Brian Jones is a program manager in Microsoft Office working on the XML functionality and file formats).
- Joch, Alan (September 26, 2005). "5 stars of open-source products: If you're not using these tools, you may be missing out". Federal Computer Week.
- Marbux (June 2, 2005). The Great Massachusetts Legal Donnybrook. Section 4 (Dissecting Microsoft's Patent License). Groklaw.
- Marson, Ingrid (June 2, 2005). Possible prior art for Microsoft XML patent found. ZDNet UK.
- Marson, Ingrid (October 18, 2005). ISO crunch time for OpenDocument. ZDNet.
- Microsoft (January 27, 2005). Office 2003 XML Reference Schemas Frequently Asked Questions (FAQ). Published November 17, 2003, Updated January 27, 2005. Accessed on October 17, 2005.
- OASIS (May 1, 2005). Open Document Format for Office Applications (OpenDocument) v1.0 (The OpenDocument version 1.0 specification).
- OASIS (May 23, 2005). Members Approve OpenDocument as OASIS Standard: IBM, Sun Microsystems, and Others Develop Royalty-Free Standard for Office Applications Document Format (Announcing OpenDocument's approval by OASIS).
- OASIS (October 4, 2005). Sun Patent Non-Assertion Covenant for OpenDocument Offers Model for Standards. OASIS Coverpages. Accessed on October 17, 2005.
- OASIS (2005a). OpenDocument FAQ (formally the OASIS Open Document Format for Office Applications (OpenDocument) TC FAQ). Accessed on October 17, 2005.
- OASIS (2005b). OASIS OpenDocument datasheet. Accessed on October 17, 2005.
- Olavsrud, Thor (August 23, 2001). "Microsoft Supported by Dead People". InternetNews.com.
- Phipps, Simon. Raising the Bar on Patents (Sun's Simon Phipps announces and explains their Patent Covenant).
- Sayer, Peter (October 12, 2005). ISO to review OpenDocument as a standard. IDG News Service/ComputerWorld.
- Sutor, Bob (June 7, 2005). Open Document Formats: "Open" must be more than a marketing term.
- TAC aka the European Union's Telematics between Administrations Committee (May 25, 2004). TAC approval on conclusions and recommendations on open document formats.
- Updegrove, Andy (October 10, 2005) Microsoft Says "Maybe Someday" on OpenDocument.
- Valoris (2004). Comparative Assessment of Open Documents Formats Market Overview aka the "Valoris Report".
- Vaughan-Nichols, Steven J. (September 28, 2005). WordPerfect Will Support OpenDocument - Someday. eWeek.
Official Information from the Commonwealth of Massachusetts:
- Enterprise Technical Reference Model (ETRM) Version 3.5, effective September 21, 2005.
- Final ETRM Version 3.5 Open Document Format Standard: Frequently Asked Questions (FAQ).
Formal comments to Massachusetts on their decision for Open Formats and posted by Massachusetts (alphabetical order):
- Adobe Systems, Inc.
- Corel Corporation.
- IBM Corporation.
- Microsoft Corporation
- Sun Microsystems, Inc.
- Sam Hiser (Managing Director of Hiser + Adelstein).
- Statement from Peter Quinn on ETRM v.3.5 Public Review and Data Formats
- Open Formats Summit Notes - June 9, 2005.
Other commentary specifically about Massachusetts' decision to use OpenDocument, besides those posted by Massachusetts (note that the length of this list justifies the claim in the main text that many people and organizations discussed the Massachusetts decision):
- Berlind, David (September 22, 2005). Microsoft vs Mass.: What ever happened to 'The customer is always right'?. ZDNet.
- Berlind, David (September 26, 2005). "Did Microsoft send the wrong guy to Massachusetts' ODF hearing?". ZDNet.
- Bradshaw, David (September 28, 2005). Sun gives OpenDocument format a Windows boost Ovum.
- Bray, Hiawatha (September 23, 2005). Policy deals blow to Microsoft: State adopting a new format for documents. Boston Globe. (Hiawatha Bray is on the Boston Globe Staff).
- Bray, Tim (September 10, 2005) ongoing? Massachusetts Back-Room.
- Bray, Tim (September 20, 2005). New England Town Meeting (summarizing the September 16, 2005 "town hall meeting").
- Brooks, Jason (September 9, 2005). Massachusetts vs. Microsoft?. eWeek.
- Carr, Nicholas (September 19, 2005). Massachusetts and Microsoft. Rough Type.
- CAGW (Citizens Against Government Waste) (September 21, 2005). CAGW Criticizes Open Source Mandate in Massachusetts.
- Coursey, David (September 16, 2005). Microsoft Exec Weighs In on Massachusetts Flap. eWeek.
- Demerjian, Charlie (September 23, 2005). Geriatric Microsoft scuppered by file formats The Inquirer.
- Edwards, Gary (September 25, 2005). Comments on the Massachusetts Decision / "You're Kidding, Right?". Groklaw. (With an introduction by Pamela Jones).
- FOX News (October 12, 2005). Your Mail: Open Debate About OpenDocument.
- Hiser, Sam (September 22, 2005). What has Microsoft done for Massachusetts lately?. NewsForge.
- Jones, Brian (September 22, 2005). More on the royalty-free licenses for the Microsoft Office Open XML formats (Jones is a program manager at Microsoft in Office who works on the Microsoft Office XML functionality and file formats).
- Jones, Pamela (September 23, 2005). It's Final - MA Goes With Open Document. Groklaw.
- Jones, Pamela (September 29, 2005). FOX's Anti-MASS FUD is a Dud. Groklaw. Accessed on October 17, 2005.
- LaMonica, Martin (September 23, 2005). Massachusetts moves ahead sans Microsoft. CNET News.com.
- Linux Weekly News. The return of Citizens Against Government Waste.
- Marson, Ingrid (September 30, 2005). OpenDocument could 'turn the world inside out' ZDNet UK.
- Massachusetts Technology Leadership Council (September 16, 2005). Open Format Meeting September 2005. (Audio recording of the September 16, 2005 ("Town Hall") meeting).
- McAllister, Neil (September 12, 2005). Kicking the Microsoft Office habit. InfoWorld.
- Phipps, Simon (September 12, 2005). A Study in Framing.
- Prendergast, James (September 28, 2005). "Massachusetts Should Close Down OpenDocument" FoxNews.com Views (Posted opinions). (Prendergast is Executive Director of Americans for Technology Leadership).
- Rooney, Paula (September 2, 2005). Microsoft Blasts Massachusetts' New XML Policy. InformationWeek.
- Samuel, Stephen. "Hoist by their own petard: Judging MS Office by Microsoft's criticism."
- Smith, Michael (September 21, 2005). You know "Citizens Against Government Waste" is a corporate front group, right?.
- Source Watch. "Americans for Technology Leadership".
- Sutor, Bob (September 22, 2005). IBM Letter to the Boston Globe supporting the Massachusetts decision to require OpenDocument support. Boston Globe.
- Vaughan-Nichols, Steven J. (September 26, 2005). Massachusetts Makes Smart Move Official. eWeek.
- Wagner, Marc (September 27, 2005). "Microsoft and public access".
- Wallin, Inge (September 23, 2005). Open Letter to Alan Yates of Microsoft KDE.news. (Noting that KOffice is a completely independent implementation of OpenDocument).
- Walli, Steven (September 15, 2005). Microsoft, Massachusetts and a Standards Primer. (Walli is a former Microsoft employee).
- Wheeler, David A. (September 2-15, 2005). Why OpenDocument Won (and Microsoft Office Open XML Didn’t).
- ZDNet UK (September 2, 2005). Microsoft must drop its Office politics.
External links
- Organizations
- OASIS Open Document Format Technical Committee coordinates the OpenDocument development and is the official source for specifications, schemas, etc.
- OpenDocument Fellowship is an industry coalition that provides information about OpenDocument and advocates its deployment.
- friendsofopendocument.org advocates OpenDocument
- spreadopendocument.org advocates OpenDocument
- Deployment in Europe
- Documentation on the Promotion of Open Document Exchange Format; this page gathers all available information regarding the European programme’s activities for supporting the uptake of open document formats, including the "Valoris report on Open Document Formats".
See also
- List of applications supporting OpenDocument
- Comparison of applications supporting OpenDocument
- WordprocessingML
- List of document markup languages
- Comparison of document markup languages
- Open Document Architecture - An older standard file format that failed to gain acceptance.
- Open format
- OpenFormula