Document file format
A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exists a multitude of incompatible document file formats.
A rough consensus has been established that XML is to be the technical basis for future document file formats, although PDF is likely to remain the format of choice for fixed-layout documents. Examples of XML-based open standards are DocBook, XHTML, and, more recently, the ISO/IEC standards OpenDocument (ISO 26300:2006) and Office Open XML (ISO 29500:2008).
In 1993, the ITU-T tried to establish a standard for document file formats, known as the Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description languages such as PostScript and PDF have become the de facto standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO/IEC standards for PDF began to be published, including the specification for PDF itself, ISO-32000.
The default binary file format used by Microsoft Word (.doc) has become widespread de facto standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.
Common document file formats
- ASCII, UTF-8 — plain text formats
- .doc for Microsoft Word — Structural binary format developed by Microsoft (specifications available since 2008 under the Open Specification Promise)
- DjVu — file format designed primarily to store scanned documents
- DocBook — an XML format for technical documenation
- HTML (.html, .htm), (open standard, ISO from 2000), in combination with possible image files referred to.
- FictionBook (.fb2) — open XML-based e-book format
- Office Open XML — .docx (XML-based standard for office documents, ISO standard from 2008)
- OpenDocument — .odt (XML-based standard for office documents, ISO standard from 2006)
- OpenOffice.org XML — .sxw (open, XML-based format for office documents)
- OXPS — Open XML Paper Specification
- PalmDoc — Common Handheld document format
- Plucker — Handheld navigable widely used document standard
- .pages for Pages
- PDF — Open standard for document exchange. ISO standards include PDF/X (eXchange), PDF/A (Archive), PDF/E (Engineering), ISO 32000 (PDF), PDF/UA (Accessibility) and PDF/VT (Variable data and transactional printing). PDF is readable on almost every platform with free or open source readers. Open source PDF creators are also available.
- PostScript - .ps
- Rich Text Format (RTF) — meta data format being developed by Microsoft since 1987 for Microsoft products and cross-platform document interchange
- SYmbolic LinK (SYLK)
- TeX — Popular open-source typesetting program and format. First successful mathematical notation language.
- TEI — XML format for digital publication
- Uniform Office Format — Chinese standard
- WordPerfect (.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)
- List of file formats
- List of document markup languages
- Comparison of document markup languages
- Open format
- "Microsoft Office Binary (doc, xls, ppt) File Formats". 2008-02-15. Retrieved 2010-03-18.
- Microsoft Corporation (2010-07-23). "MS-DOC - Word Binary File Format (.doc) Structure Specification". Retrieved 2010-08-08.
- "What is DjVu - DjVu.org". DjVu.org. Retrieved 2009-03-05.
- Microsoft Corporation (May 1999). "Rich Text Format (RTF) Specification, version 1.6". Retrieved 2010-03-13.
- "4.3 Non-HTML file formats". e-Government Unit. May 2002. Retrieved 2010-03-13.[dead link]
- Ranjan Parekh, Ranjan (2006). Principles of Multimedia. Tata McGraw-Hill. p. 87. ISBN 0-07-058833-3.
- Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples
- Secure document storage