Apache PDFBox

From Wikipedia, the free encyclopedia
Jump to: navigation, search
PDFBox
Developer(s) Apache Software Foundation
Stable release 1.8.6 / June 22, 2014; 28 days ago (2014-06-22)
Written in Java
Operating system Cross-platform
Type Portable Document Format (PDF)
License Apache License 2.0
Website https://pdfbox.apache.org

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Ohloh reports almost 2,000 commits (since the start as an Apache project) by 17 contributors representing more than 100,000 lines of code. PDFBox has a well established, mature codebase maintained by a large development team with increasing Y-O-Y comments. [1]

Using the COCOMO model, it took an estimated 30 person-years of effort.[2]

Structure[edit]

Apache PDFBox has these components:

  • PDFBox: the main part
  • FontBox: handles font information
  • JempBox: handles XMP metadata
  • Preflight (optional): checks PDF files for PDF/A conformity.

History[edit]

PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.[3] It became an Apache Incubator project in 2008, and an Apache top level project in 2009. [4]

Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.[5]

See also[edit]

References[edit]

External links[edit]