Jump to content

Apache PDFBox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
added Ohloh stats (inspired by itext article)
Line 17: Line 17:
}}
}}


Apache PDFBox is a pure-[[Java]] library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of [[PDF]] files.
Apache PDFBox is an open source pure-[[Java]] library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of [[PDF]] files.

[[Ohloh]] reports almost 2,000 commits (since the start as an Apache project) by 17 contributors representing more than 100,000 lines of code. PDFBox has a well established, mature codebase maintained by a large development team with increasing Y-O-Y comments. <ref>{{cite web|author=&nbsp; |url=https://www.ohloh.net/p/pdfbox/ |title=The Apache PDFBox Open Source Project on Ohloh |publisher=Ohloh.net |date=2014-06-25 |accessdate=2014-06-25}}</ref>

Using the [[COCOMO]] model, it took an estimated 30 [[person-year]]s of effort.<ref>{{cite web|author=&nbsp; |url=https://www.ohloh.net/p/pdfbox/estimated_cost |title=Ohloh Estimated development cost |publisher=Ohloh.net |date=2014-06-25 |accessdate=2014-06-25}}</ref>


==Structure==
==Structure==

Revision as of 19:26, 25 June 2014

PDFBox
Developer(s)Apache Software Foundation
Stable release
1.8.6 / June 22, 2014; 10 years ago (2014-06-22)
Repository
Written inJava
Operating systemCross-platform
TypePortable Document Format (PDF)
LicenseApache License 2.0
Websitehttps://pdfbox.apache.org

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Ohloh reports almost 2,000 commits (since the start as an Apache project) by 17 contributors representing more than 100,000 lines of code. PDFBox has a well established, mature codebase maintained by a large development team with increasing Y-O-Y comments. [1]

Using the COCOMO model, it took an estimated 30 person-years of effort.[2]

Structure

Apache PDFBox has these components:

  • PDFBox: the main part
  • FontBox: handles font information
  • JempBox: handles XMP metadata
  • Preflight (optional): checks PDF files for PDF/A conformity.

History

PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.[3] It became an Apache Incubator project in 2008, and an Apache top level project in 2009. [4]

Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.[5]

See also

References

  1. ^   (2014-06-25). "The Apache PDFBox Open Source Project on Ohloh". Ohloh.net. Retrieved 2014-06-25.{{cite web}}: CS1 maint: extra punctuation (link)
  2. ^   (2014-06-25). "Ohloh Estimated development cost". Ohloh.net. Retrieved 2014-06-25.{{cite web}}: CS1 maint: extra punctuation (link)
  3. ^ Apache PDFBox and FontBox 1.0.0 released, The H Open, 16 February 2010
  4. ^ PDFBox Project Incubation Status
  5. ^ PaDaF Preflight Codebase Intellectual Property (IP) Clearance Status