Document Content Architecture
|Type of format||Document file format|
The Document Content Architecture, or DCA for short, is a standard developed by IBM for text documents in the early 1980s. DCA was used on mainframe and IBM i systems, and formed the basis of DisplayWrite's file format. DCA was later extended as MO:DCA (Mixed Object Document Content Architecture), which added embedded data files, like graphics.
The original purpose of DCA was to provide a common document format that could be used across multiple IBM word processing platforms–such as the IBM PC, IBM mainframes, the Displaywriter dedicated word processor, and the IBM 5520 Administrative System.
- Revisable-Form Text (DCA/RFT) which is editable.
- Final-Form Text (DCA/FFT) which is "formatted for a particular output device and cannot be changed."
DCA defines a data stream representing a document.
Documents may contain fonts, overlays and other resource objects required at presentation time to present the data properly. Finally, documents may contain resource objects, such as a document index and tagging elements supporting the search and navigation of document data, for a variety of application purposes.:2
MO:DCA is the wrapper or container for various objects that can make up the document. Each object is defined by its own subordinate architecture. The architectures are::4
- Presentation Text Object Content Architecture (PTOCA) describes formatted text, including text attributes such as font or color.
- Image Object Content Architecture (IOCA) describes resolution-independent images.
- Graphics Object Content Architecture (GOCA) describes vector graphic images. A variation of GOCA, AFP GOCA, is used in Advanced Function Presentation environments.
- Bar Code Object Content Architecture™ (BCOCA™) describes bar codes in a number of different formats.
- Font Object Content Architecture (FOCA) describes fonts to be used in the document
- Color Management Object Content Architecture™ (CMOCA™) describes required color management information.
Each architecture uses a series of binary structured fields to describe its corresponding object.
|Type of format||Document file format|
Revisable-Form Text (abbreviated RFT or RFT-DCA) is part of DCA. It is sometimes referred to as Revisable Format Text. It was used by IBM DisplayWrite 4 and 5 word processors on System/360 and 370 mainframe computers, and OfficeVision/400 to allow transfer of formatted documents to other systems.
RFT has a counterpart Final-Form Text (abbreviated FFT or FFT-DCA), which was not intended to be editable and was output-only.
The drive to initiate international standards for the DCAs was initiated in 1980 at the IBM Rochester facility. The team consisting of two MODCA architects, an RTOCA architect, and a PTOCA architect was assembled. These architects as they were called were responsible for bringing together the IBM consensus for the design of the data streams and to take the work into the international standards arena. There was a concerted effort to bring the international community into the development. This decision was based in part on the experience gained over the acceptance of GML into an international SGML standard. To avoid the long delay of creating the architecture, they wanted to get everyone involved early. SGML standardization had taken many years and man-hours to develop. IBMs work with document content had been pushed by the needs of main frame computers where GML and DCA were in use, but that experience was pointing to a need for standardized component architectures for revisable and non-revisable text in particular.
In 1981, shortly after its inception, the group was moved along with the IBM 5280 Distributed Data System to IBM Austin near Round Rock, TX, where the work continued with mixed success. As the architectures were becoming more firmly positioned on the international stage, the team was moved again in 1987 to The IBM Dallas Programming Center near Roanoke, Texas (Westlake), where in 1998 it was disbanded and the work discontinued on the DCA architectures due mainly to the PC community which had gone in a different direction of necessity. The DCA architectures were fully completed, but not totally agreed upon in the details after 18 years. And there were no active implementations in sight.
The world of the PC had decided on HTML (believed to be an application of the SGML international standard) and used portions of it for their purposes. Microsoft Word eventually used the similar datastream for the internal working datastream for storage of editable content. Even though the SGML standard was available, it was impractical for the full SGML parser implementation to be useful so a potential subset of it became the de facto standard for revisable text used today in the PC arena.
At about the same time Adobe Systems designed and produced the printable document encoding called PDF which has become the standard for PC-produced printable documents. The international standard was set in 2008 without any input from anyone except the users who decided to use the products offered in greater numbers than the managers of the data stream architects had ever dreamed possible. The decision was driven by the need for the product and the solution found was far more acceptable than the standards committees could design as a standard in the time frame in which the decision was needed. Over 10 years of work had not produced the acceptable method and the pc-computing community created what they needed in less time.
Attempting to achieve a consensus document data stream was quickly out-flanked by the available and usable content provided by the companies who did not attempt to share with others, but created a workable solution and sold it to users - and they liked it. So the output of the word processing software is 'printed' into the PDF format provided by the most used presentation product. That is, for example, Microsoft Word provides a printer selection 'Microsoft Print to PDF' in order to produce the requisite output for a PDF document - a very acceptable solution for most users. A similar method could have been used to produce the international standard had one eventually arrived.
When IBM disbanded its Dallas Programming Center in 1998, the entire staff of architects retired and left the company except the manager who was moved to another location and another position, ending the DCA architecture project for the foreseeable future at IBM.
- Advanced Function Presentation
- Rich Text Format (RTF) – another formatting code system that is sometimes confused with Revisable-Form Text.
- List of document markup languages
- Henkel, Tom (21 May 1984), "IBM taking the standardization route to DPP", Computerworld, IDG Enterprise, 18 (21), p. 7, ISSN 0010-4841
- "PC Magazine Encyclopedia". Retrieved July 25, 2012.
- de la Beaujardière, Jean Marie (1988). "Well-established document interchange formats". Document Manipulation and Typography: Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography, Nice (France) April 20-22 1988. CUP Archive. p. 83. ISBN 978-0-521-36294-8.
- IBM Corporation (May 2006). Mixed Object Document Content Architecture Reference (PDF). Retrieved Feb 7, 2020.
- AFP Consortium homepage
- Advanced Function Presentation Consortium (April 2017). Graphics Object Content Architecture for Advanced Function Presentation Reference (PDF) (Fourth ed.). Retrieved Feb 7, 2020.
- Williams; et al. (1996). Method and Apparatus for Multistage Document Format Transformation in a Data Processing System (PDF). United States patent number 5,513,323