Jump to content

Apache PDFBox

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Tilman (talk | contribs) at 16:25, 20 September 2016 (2.0.3 released). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

PDFBox
Developer(s)Apache Software Foundation
Stable release
2.0.3 / September 18, 2016; 7 years ago (2016-09-18)
Repository
Written inJava
Operating systemCross-platform
TypePortable Document Format (PDF)
LicenseApache License 2.0
Websitehttps://pdfbox.apache.org

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Open Hub reports over 4,000 commits (since the start as an Apache project) by 17 contributors representing more than 120,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with stable Y-O-Y commits. Using the COCOMO model, it took an estimated 34 person-years of effort. [1]

Structure

Apache PDFBox has these components:

  • PDFBox: the main part
  • FontBox: handles font information
  • XmpBox: handles XMP metadata
  • Preflight (optional): checks PDF files for PDF/A-1b conformity.

History

PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.[2] It became an Apache Incubator project in 2008, and an Apache top level project in 2009. [3]

Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.[4]

In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association. [5]

See also

References

  1. ^   (2016-03-21). "The Apache PDFBox Open Source Project on Open Hub". openhub.net. Retrieved 2016-03-21.{{cite web}}: CS1 maint: extra punctuation (link)
  2. ^ Apache PDFBox and FontBox 1.0.0 released, The H Open, 16 February 2010
  3. ^ PDFBox Project Incubation Status
  4. ^ PaDaF Preflight Codebase Intellectual Property (IP) Clearance Status
  5. ^ Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association, February 3, 2015