Jump to content

EPUB: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Quillaja (talk | contribs)
Undid revision 371989293 by AlisteRrr (talk) Adobe InDesign is already listed as editing software.
Line 243: Line 243:
| [[EPUBReader]] || [[Firefox]] add-on [[Windows]], [[Mac OS X]], [[Linux]] || {{dunno}} ||Free Firefox addon, with which you can read ePub-files in Firefox.
| [[EPUBReader]] || [[Firefox]] add-on [[Windows]], [[Mac OS X]], [[Linux]] || {{dunno}} ||Free Firefox addon, with which you can read ePub-files in Firefox.
|-
|-
| [[FBReader]] || [[Windows]], [[Linux]], [[PDA]]s || {{dunno}} || Incomplete ePub support.
| [[FBReader]] || [[Windows]], [[Linux]], [[PDA]]s || {{dunno}} || Incomplete ePub support.{{citation needed}}
|-
|-
| [[FBReaderJ]] || [[Android (operating system)|Android]] || {{dunno}} ||Open source
| [[FBReaderJ]] || [[Android (operating system)|Android]] || {{dunno}} ||Open source

Revision as of 18:29, 6 July 2010

Electronic Publication (EPUB)
Filename extension
.epub
Internet media type
application/epub+zip (unofficial[1])
Developed byInternational Digital Publishing Forum (IDPF)
Initial releaseSeptember, 2007
Latest release
2.0.1
May 25, 2010[2]
Type of formate-book file format
Contained byOEBPS Container Format (OCF) (ZIP)
Extended fromOpen eBook, XHTML, CSS, DTBook
WebsiteIDPF Home Page

EPUB (short for electronic publication; alternatively capitalized as ePub, EPub, or epub, with "EPUB" preferred by the vendor) is a free and open e-book standard by the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.[3]

History

EPUB became an official standard of the International Digital Publishing Forum (IDPF) in September 2007, superseding the older Open eBook standard.[4]

In August 2009, the IDPF announced that they will begin work on maintenance tasks of the EPUB standard.[5] Two broad objectives are defined by this working group: "One set of activities governs maintenance of the current EPUB Standards (i.e. OCF, OPF, and OPS), while another set of activities addresses the need to keep the Standards current and up-to-date." The working group was expected to be active through 2010, publishing updated standards throughout its lifetime.[6] On April 6, 2010, it was announced that this working group will complete their update in April 2010. The result was to be a minor revision to EPUB 2.0.1 which "corrects errors and inconsistencies and does not change functionality".[7] On July 2, 2010, drafts of the version 2.0.1 standards appeared on the IDPF website.[2]

On April 6, 2010, it was announced that a working group will be formed to revise the EPUB specification to version 2.1.[7] In the working group's charter draft, 14 main problems with EPUB are identified which the group will address. The group is chartered through May 2011, and will submit a final draft on May 15, 2010.[8]

Features

  • Free and open
  • Re-flowable (word wrap) and re-sizable text
  • Inline raster and vector images
  • Embedded metadata
  • DRM support
  • CSS styling
  • Support for alternative renditions in the same file
  • Use of out-of-line and inline XML islands to extend the functionality of EPUB

File format

EPUB consists of three specifications:

  • Open Publication Structure (OPS) 2.0, contains the formatting of its content.[9]
  • Open Packaging Format (OPF) 2.0, describes the structure of the .epub file in XML.[10]
  • OEBPS Container Format (OCF) 1.0, collects all files as a ZIP archive.[11]

Basically, EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document, and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.

Open Publication Structure 2.0

An EPUB file uses XHTML 1.1 (or DTBook) to construct the content of a book as of version 2.0. This is different from previous versions (OEBPS 1.2 and earlier) which used a subset drawn from XHTML. There are, however, a few restrictions on certain elements. The mimetype for XHTML documents in EPUB is application/xhtml+xml.[9] For a table of the required XHTML modules and a description of the restrictions, please see Section 2.2 of the specification.

Styling and layout is performed using a subset of CSS 2.0, referred to as OPS Style Sheets. This specialized syntax requires only a portion of CSS properties to be supported by reading systems and adds a few custom ones. Some custom ones are oeb-page-head, oeb-page-foot, and oeb-column-number. Font-embedding can be accomplished using the @font-face property, as well as including the font file in the OPF's manifest (see below). The mimetype for CSS documents in EPUB is text/css.[9] For a table of supported properties and detailed information, please see Section 3.0 of the specification.

EPUB also requires that PNG, JPEG, GIF, and SVG are supported for image types. These use the mimetypes image/png, image/jpeg, image/gif, image/svg+xml respectively. Other media types are allowed, but creators must include alternative renditions in supported types.[9] For a table of all required mimetypes, see Section 1.3.7 of the specification.

Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding.[9] This is to support international and multilingual books. However, reading systems are not required to provide the fonts necessary to display every unicode character, though they are required to display at least a placeholder for characters that cannot be displayed fully.[9]

An example skeleton of an XHTML file for EPUB looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
    <title>Pride and Prejudice</title>
    <link rel="stylesheet" href="css/main.css" type="text/css" />
  </head>
  <body>
    ...
  </body>
</html>

Open Packaging Format 2.0

The OPF specification's purpose is to "[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication."[10] This is accomplished by two XML files with the extensions .opf and .ncx.

.opf file

The .opf file houses the EPUB book's metadata, file manifest, and linear reading order. This file has a root element package and four child elements metadata, manifest, spine, guide. All of these except guide are required. Furthermore, the package node must have the unique-identifier attribute. The .opf file's mimetype is application/oebps-package+xml.[10]

The metadata element contains all the metadata information for a particular EPUB file. Three metadata tags are required, though there are many more available: title, language, identifier. title contains the title of the book. language contains the language of the book's contents in RFC 3066 format or its successors such as the newer RFC 4646. identifier contains a unique identifier for the book, such as its ISBN or a URL. The identifier's id attribute should equal the unique-identifier attribute from the package element.[10] For a full listing of EPUB metadata, please see Section 2.2 of the specification.

The manifest element lists all the files contained in the package. Each file is represented by an item element, and has the attributes id, href, media-type. All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the .ncx file should be listed here. Only the .opf file, container.xml, and mimetype files should not be included.[10] Note that in the example below, an arbitrary media-type is given to the included font file, even though no mimetype exists for fonts.

The spine element lists all the XHTML content documents in their linear reading order. Also, any content document that can be reached through linking or the table of contents must be listed as well. The toc attribute of spine must contain the id of the .ncx file listed in the manifest. Each itemref element's idref is set to the id of its respective content document.[10]

The guide element is an optional element for the purpose of identifying fundamental structural components of the book. Each reference element has the attributes type, title, href. Files referenced in href must be listed in the manifest, and are allowed to have an element identifier (e.g. #figures in the example).[10] A list of possible values for type can be found in Section 2.6 of the specification.

An example .opf file:

<?xml version="1.0"?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">

  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:title>Pride and Prejudice</dc:title>
    <dc:language>en</dc:language>
    <dc:identifier id="BookId" opf:scheme="ISBN">123456789X</dc:identifier>
    <dc:creator opf:file-as="Austen, Jane" opf:role="aut">Jane Austen</dc:creator>
  </metadata>

  <manifest>
    <item id="chapter1" href="chapter1.xhtml" media-type="application/xhtml+xml"/>
    <item id="stylesheet" href="style.css" media-type="text/css"/>
    <item id="ch1-pic" href="ch1-pic.png" media-type="image/png"/>
    <item id="myfont" href="css/myfont.otf" media-type="application/x-font-opentype"/>
    <item id="ncx" href="book.ncx" media-type="application/x-dtbncx+xml"/>
  </manifest>

  <spine toc="ncx">
    <itemref idref="chapter1" />
  </spine>

  <guide>
    <reference type="loi" title="List Of Illustrations" href="appendix.html#figures" />
  </guide>

</package>


.ncx file

The .ncx file (Navigation Control file for XML) contains the hierarchical table of contents for the EPUB file. The specification for .ncx was developed for Digital Talking Book (DTB), is maintained by the DAISY Consortium, and is not a part of the EPUB specification. The .ncx file has a mimetype of application/x-dtbncx+xml.

Of note here is that the values for the docTitle, docAuthor, meta name="dtb:uid" elements should match their analogs in the .opf file. Also, the meta name="dtb:depth" element is set equal to the depth of the navMap element. navPoint elements can be nested to create a hierarchical table of contents. navLabel's content is the text that will appear in the table of contents generated by reading systems that use the .ncx. navPoint's content element points to a content document listed in the manifest and can also include an element identifier (e.g. #section1).[10][12]

A description of certain exceptions to the NCX specification as used in EPUB can be found in Section 2.4.1 of the specification. The complete specification for NCX can be found in Section 8 of the Specifications for the Digital Talking Book.[12]

An example .ncx file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
"http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

<ncx version="2005-1" xml:lang="en" xmlns="http://www.daisy.org/z3986/2005/ncx/">

  <head>
<!-- The following four metadata items are required for all NCX documents,
including those conforming to the relaxed constraints of OPS 2.0 -->

    <meta name="dtb:uid" content="123456789X"/> <!-- same as in .opf -->
    <meta name="dtb:depth" content="1"/> <!-- 1 or higher -->
    <meta name="dtb:totalPageCount" content="0"/> <!-- must be 0 -->
    <meta name="dtb:maxPageNumber" content="0"/> <!-- must be 0 -->
  </head>

  <docTitle>
    <text>Pride and Prejudice</text>
  </docTitle>

  <docAuthor>
    <text>Austen, Jane</text>
  </docAuthor>

  <navMap>
    <navPoint class="chapter" id="chapter1" playOrder="1">
      <navLabel><text>Chapter 1</text></navLabel>
      <content src="chapter1.xhtml"/>
    </navPoint>
  </navMap>

</ncx>

OEBPS Container Format 1.0

An EPUB file is a group of files conforming to the OPS/OPF standards that is wrapped in a ZIP file.[3] The OCF specifies how these files should be organized in the ZIP, and defines two additional files that must be included.

The mimetype file must be a text document in ASCII and must contain the string application/epub+zip. It must also be uncompressed, unencrypted, and the first file in the ZIP archive. The purpose of this file is to provide a more reliable way for applications to identify the mimetype of the file than just the .epub extension.[11]

Also, there must be a folder named META-INF which contains the required file container.xml. This XML file points to the file defining the contents of the book. This will be the .opf file, though additional alternative rootfile elements are allowed.[11]

An example file structure:

--ZIP Container--
mimetype
META-INF/
  container.xml
OPS/
  book.opf
  chapter1.xhtml
  ch1-pic.png
  css/
    style.css
    myfont.otf

An example container.xml, given the above file structure:

<?xml version="1.0" encoding="UTF-8" ?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OPS/book.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

Digital Rights Management

An EPUB file can optionally contain DRM as an additional layer, but it is not required by the specifications.[13] In addition, the specification does not name any particular DRM system to use, so publishers can choose a DRM scheme to their liking. However, future versions of EPUB (specifically OCF) may specify a format for DRM.[11]

When present, DRMed EPUB files must contain a file called rights.xml within the META-INF directory at the root level of the ZIP container.[11]

Validation

An open source tool called epubcheck exists for validating and detecting errors in the structural markup (OPS, OPF, OCF) as well as the XHTML and image files. The tool can be run from the command line, or used in webapps and applications as a library. A large part of the original work on the tool was done at Adobe Systems.[14]

Criticism

One criticism of EPUB is that, while good for text-centric books, it may be unsuitable for publications which require precise layout or specialized formatting. One example of such a publication is a comic book.[15]

The EPUB specification does not enforce or suggest a particular DRM scheme. This could affect the level of support for various DRM systems on devices and the portability of purchased e-books. Consequently, such DRM incompatibility may prove to segment the EPUB format along the lines of DRM systems, negating the advantages of a single standard format and confusing the consumer.[16][17][18][19][20][21]

Another criticism of EPUB revolves around the specification's lack of detail on linking into, between, or within an EPUB book, as well as its lack of a specification for annotation. Such linking is hindered by the use of a ZIP file as the container for EPUB. Furthermore, it is unclear if it would be better to link by using EPUB's internal structural markup (the OPF specification mentioned above) or directly to files through the ZIP's file structure.[22] The lack of a standardized way to annotate EPUB books could lead to difficulty sharing and transferring annotations and therefore limit the use scenarios of EPUB, particularly in educational settings, because it can't provide a level of interactivity comparable to the web.[23]

Software

Software reading systems

Software that reads, and presumably displays, EPUB files is called a reading system. An EPUB reading system is defined as:

“A combination of hardware and/or software that accepts OPS Publications and makes them available to consumers of content. Great variety is possible in the architecture of Reading Systems. A Reading System may be implemented entirely on one device, or it may be split among several computers....”[3]

Reading Systems and Software[3]
Software Platform DRM formats supported Notes
Adobe Digital Editions Windows, Mac OS X Adobe Content Server
Aldiko Android ?
BookGlutton Web ? Free, online ePub reader with a focus on the social aspects of reading.
Bookworm Web ? Free, open source, online ePub reader.
Calibre Windows, Mac OS X, Linux ? More often used for library management, conversion, and transferring to devices than reading.
EPUBReader Firefox add-on Windows, Mac OS X, Linux ? Free Firefox addon, with which you can read ePub-files in Firefox.
FBReader Windows, Linux, PDAs ? Incomplete ePub support.[citation needed]
FBReaderJ Android ? Open source
Freda Windows Mobile None
i2Reader Apple iPhone ?
iBooks Apple iPad, Apple iPhone, Apple iPod Touch FairPlay[24]
Lexcycle Stanza iPad, iPhone,Windows, Mac OS X ?
Lucidor (software) Windows, Mac OS X, Linux ?
Mobipocket Windows, BlackBerry, Symbian, Windows Mobile ?
Okular Linux ?
Openberg Lector Firefox add-on ? Openberg
Ouiivo Reader Apple iPhone ?
Talking Clipboard Windows ? Text-to-speech software, that can read ebooks.
WordPlayer Android ?
eBook Reader Opera widget None eBook Reader is not available for Opera for mobiles or TV
URead Windows ? Free Universal Reader, text-to-speech, a large free e-library.

Editing systems

Creation Software
Software Platform Notes
Adobe InDesign Windows, Mac OS X Commercial license.
Atlantis Word Processor Windows, Portable app Converts any document to EPUB; supports multilevel TOCs, font embedding, and batch conversion. Shareware.
BookGlutton Converter Web Conversion tool
eBooksWriter Windows Can also produce MobiPocket files. Commercial license.
eCub Windows, Mac OS X, Linux, FreeBSD, Solaris, Portable app Non-encrypted files only, can also produce mobi. Free, proprietary license.
eLML Windows, Mac OS X, Linux The eLesson Markup Language is a platform-independent XML-based open source framework to create eLearning content. It supports various output formats like SCORM, HTML, PDF and also eBooks based on the ePub format.
ePub Bud Web Free ePub publishing and distribution social networking. Users either upload existing formats or use online WYSIWYG editor to create DRM-free ebooks in the EPUB format. For children's books especially.
ePubExport (Mediawiki extension) Web Experimental Mediawiki extension to export wiki pages in EPUB format.
Feedbooks Web Free cloud service for downloading public domain works and for self-publishing
iStudio Publisher Mac OS X Desktop publishing and page layout application. Commercial license.
Jutoh Windows, Mac OS X, Linux, FreeBSD, Solaris, Portable app Non-encrypted files only, imports from ODT, Epub, HTML and text, can also produce Mobipocket, ODT and text. Shareware.
Sigil Windows, Linux, Mac OS X Free, Open source under GPLv3. Go to Sigil at Google Project page. Currently the only application that can also open and edit EPUB books, instead of just converting from other formats to EPUB.

Hardware reading systems

The boundary between hardware and software is not clear cut. Some of these devices are dedicated to E-book tasks while others are platforms that include E-book readers or can have them added. See Comparison of e-book readers for details of dedicated devices (not all support EPUB).

See also

References

  1. ^ application/epub+zip has not been registered with IANA as of June 2010.
  2. ^ a b "Specifications". IDPF. {{cite web}}: Unknown parameter |accessed= ignored (help)
  3. ^ a b c d Conboy, Garth (May 11, 2009). "EPUB 101" (PDF). IDPF. eBook Technologies.
  4. ^ IDPF (Mon Oct 15, 2007). "OPS 2.0 Elevated to Official IDPF Standard". IDPF. {{cite web}}: Check date values in: |date= (help)
  5. ^ "IDPF Launches EPUB Standards Maintenance Work". IDPF. August 16, 2009.
  6. ^ "Charter for EPUB Standards Maintenance WG". IDPF. August 12, 2009.
  7. ^ a b "Draft Charter for revision to EPUB Standard for IDPF Comment". IDPF. April 6, 2010.
  8. ^ "EPUB 2.1 Working Group Charter – DRAFT 0.11". IDPF. May 7, 2010. {{cite web}}: Unknown parameter |accessed= ignored (help)
  9. ^ a b c d e f IDPF (September 11, 2007). "Open Publication Structure (OPS) 2.0 - Recommended Specification". IDPF.
  10. ^ a b c d e f g h IDPF (September 11, 2007). "Open Packaging Format (OPF) 2.0 - Recommended Specification". IDPF.
  11. ^ a b c d e IDPF (September 11, 2006). "OEBPS Container Format (OCF) 1.0 - Recommended Specification". IDPF.
  12. ^ a b "Specifications for the Digital Talking Book". April 21, 2005.
  13. ^ IDPF (Nov 20, 2006). "IDPF's Digital Book Standards FAQs". IDPF.
  14. ^ "epubcheck: Validation tool for Epub". Google Code. Retrieved January 29, 2010.
  15. ^ Rothman, David (July 27, 2008). "The ePub torture test: Starring 'Three Shadows,' a graphic novel". TeleRead: Bring the E-Books Home.
  16. ^ Gelles, David (January 29, 2010). "Walls close in on e-book garden". Financial Times.
  17. ^ Rothman, David (August 13, 2009). "Adobe-DRMed ePub isn't 'open': Why the New York Times urgently needs to clarify its Sony eBook Store article". TeleRead: Bring the E-Books Home.
  18. ^ Biba, Paul (December 21, 2009). "Does the Nook use its own incompatible DRM scheme?". TeleRead: Bring the E-Books Home.
  19. ^ Biba, Paul (January 28, 2010). "iPad adds to the DRM mess? Apple ebook DRM exclusive to Apple hardware". TeleRead: Bring the E-Books Home.
  20. ^ Kendrick, James (January 28, 2010). "Who Really Needs an iPad?". jkOnTheRun.
  21. ^ Dave Dickson (January 27, 2010). "EPUB, iPad and Content Interoperability". Digital Editions.
  22. ^ "Links, pointers, bookmarks, highlights: How should .epub do it?". FrontMatters. BookGlutton. March 29, 2008.
  23. ^ Rothman, David (November 5, 2007). "'Social annotation and the marketplace of ideas': Time for an IDPF annotation standard for books and other e-pubs!". TeleRead: Bring the E-Books Home.
  24. ^ Pham, Alex (February 15, 2010). "Apple to wrap digital books in FairPlay copy protection". Los Angeles Times.