A note to remind me knowledges about epubs. It does not include all details, just those I did not quite familiar with. For original resource please refers to https://zhuanlan.zhihu.com/p/29954562
XXX.epub
| mimetype
|- META-INF
container.xml
|-OEBPS
| content.opf
| toc.ncx
| - Audio
| xxx.mp3
| - Fonts
| xxx.ttf
| - Images
| xxx.png
| - Styles
| Stylesheet.css
| - Text
| xxx.html
| - Video
| xxx.mp4
Both are necessary
Short for media type.
A media type (formerly known as MIME type or content type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications. Media types were originally defined in Request for Comments 2045 in November 1996 as a part of MIME (Multipurpose Internet Mail Extensions) specification, for denoting type of email message content and attachments;
Tell the system the content is in epub format. The content of epub should be application/epub+zip.
Here is a reference for different available mimetypes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types
This file would direct the parser to the package file content.opf. Every epub must have a unique package file named content.opf, it specifies everything you need for an epub including all content files, the related resources, meta info, navigation info, reading order and so on.
As mentioned before, opf shorts for open package formats which is essentially an xml file. It normally contains 5 parts:
-
package element: it is the biggest container which contains all other elements in the file. The package element has an important property called
unique-identifier=book-idwhich could be used to identify a book. -
metadata element: As the name suggests, used to save metadata including:
- language
- title
- author
- date: with options like creation, modification and publication
- identifier:could be UUID
- subject
- description
- contributor
- source
- rights
-
Manifest element manifest provides a detailed list of the source files we need for an epub file. Each listed in
itemelement. The order ofitemdoes not matter. Items would contain three properties which areid(name of the file),href(path of the file) andmedia-type(MIMETYPE). -
Spine element Spine element defines the reading order of epub.
-
Guide element Guide element provide syntactic information to specify the cover, foreword or so.
toc.ncx is replaced by nav.xhtml in opf2 standard, for compatibility we could still add toc.ncx.