Document conversion

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Document conversion is the act of converting one document's format to another, which allows the document to be read in many more applications. Documents can be converted into

  • other source document formats
  • consumer formats
  • structured data

How it works[edit]

The conversion of the file is usually done by the application that it was created with, though there are also various third-party tools to perform it. Most file formats can be disassembled with a hex editor. Alternatively conversions can be automatically provided by Web services that connect to a document storage or delivery system - such as file directory or a document / content management applications. The content transformation services can run on a local server, in the Web or in the cloud. Conversion tools can also be combined with a delivery component - that publishes converted data into a database, filesystem or other systems.

PHi provides Online Publishing, Digital Publishing Services, Document Conversion, Electronic Publishing print and electronic publishing work.

Examples[edit]

Paper documents conversion[edit]

The task of converting scanned paper documents to useful electronic formats is one of the most important applications for document conversion. Documents, scanned to image formats, have lots of limitations such as large file size, impossibility of context search and content reuse. Consideration should be given to conversion to more useful formats, such as:

Content extraction from the document image is the task of Optical Character Recognition (OCR) or Intelligent Character Recognition (ICR) technologies. Modern OCR applications convert image files to different document formats with saving not just content but also the structure of document (ADRT).

Paper documents conversion applications[edit]

Company Product Import formats Export formats
Expervision TypeReader 2008 BMP, PCX, DCX, JPEG, PNG, TIFF, PDF DOC, XLS, DOCX, XLSX, RTF, TXT, HTML, DBF, CSV, PDF,

ASCII (Comma Delimited or Tab Delimited), WordPerfect, TypeReader Native Format, TypeReader Text Only

ABBYY FineReader 9.0 BMP, PCX, DCX, JPEG, JPEG 2000, PNG, TIFF, PDF, GIF, XPS, DjVu DOC, XLS, DOCX, XLSX, PPT, RTF, TXT, HTML, DBF, CSV, PDF/A, PDF, MRC-PDF, LIT, WordML
Coextant Systems Hyper.Net Version 6 TXT, TIFF, JPEG, BMP, PCX, GIF, PDF PDF, PDF/A, Flash, HTML
ExactCODE ExactScan Pro 2 TIFF, JPEG, JPEG 2000, PNG, BMP, PCX, GIF, PDF PDF, RTF, HTML, TXT,
I.R.I.S. Group Readiris 12 JPEG, BMP, TIFF, PDF, DjVu, JPEG 2000 DOC, DOCX, XLS, XLSX, PDF, ODT, XPS, PDF/A, HTML, RTF, WPD
Nuance Communications OmniPage Professional 17 TXT, TIFF, JPEG, BMP, PCX, GIF, PDF, MAX DOC, DOCX, XML, XLS, XLSX, PPTX, PDF, RTF, HTML, XSN, XPS, WordML

Special PDF conversion applications:

Company Product Convert PDF from (formats) Convert PDF to (formats)
ABBYY PDF Transformer 3.0 DOC, XLS, DOCX, XLSX, PPT, RTF, PPTX, VSD, VSDX and any application via printing function DOC, XLS, DOCX, XLSX, PPT, RTF, TXT, HTML, DBF, searchable PDF/A, searchable PDF
Ascertia PDF Sign&Seal DOC, DOCX, XLS, XLSX, PPT, RTF - any file using File > Print JPG files
Coextant Systems Hyper.Net Version 6 DOC, XLS, DOCX, PPT, RTF, PPTX, VSD, VSDX and any application via printing function searchable PDF/A, searchable PDF, Flash, MP3, Combined PDF
ExactCODE OCRKit 2 TIFF, JPEG, JPEG 2000, PNG, BMP, PCX, GIF, PDF PDF, RTF, HTML, TXT,
Nitro PDF Software Nitro PDF Professional DOC, XLS, DOCX, XLSX, PPT, RTF and others DOC, DOCX, RTF, image files
Software Depot Online Docsmartz PDF Converter Professional - DOC, RTF, image files, XLSX, Postscript, Text
Software Depot Online Docsmartz PDF Creator DOC, XLS, DOCX, XLSX, PPT, RTF, PPTX, VSD, VSDX and any application via printing function or right click -
Nuance Communications PDF Converter 6 - DOC, DOCX, XML, XLS, XLSX, PPTX, WDP, XPS, PDF, MRC-PDF

Consumer format conversion applications:

Company Product Input formats Output formats
Cobynsoft Cobynsoft's Review TXT, RTF, DOC, DOCX, ODT, ePUB TXT, RTF, DOC, DOCX, ODT, ePUB, PDF
Coextant Systems Hyper.Net Version 6 DOC, XLM, DOCX, PPT, RTF, PPTX, VSD, VSDX, DWG, XLS, XLSX, OpenOffice.org formats Hypertext, HTML, Flash, MP3, PDF, PDF/A, Combined PDF, XLM

See also[edit]

References[edit]