determining the EPUB version of an ebook

EPUB testing discussion for the SDPP-D Grant project 2018-19
Post Reply

determining the EPUB version of an ebook

Post by
Heidi
»
Thu Oct 25, 2018 2:07 pm
Good afternoon, everyone. I hope all of you are enjoying a great week.
EPUB2 and EPUB3 support differing feature sets. EPUB3 is more up-to-date, offering full-fledged ARIA support and more modern HTML5 markup. This article outlines critical differences between EPUB versions 2 and 3.
When recommending accessibility enhancements that could improve an ebook, it's important to know what EPUB version you're working with and what accessibility features it supports. In an ideal world, all new ebooks would be in the EPUB3 format. However, NNELS still receives a lot of new material published using EPUB2.
Determining what EPUB version an ebook is encoded with is not always straightforward. An EPUB document's property sheet does not contain this information. Sometimes, unzipping an EPUB file and looking at the underlying HTML syntax still doesn't produce enough information. The Manatano ebook reader website contains a great support article with detailed instructions to help you with this task.
I hope these resources will be of help to you as you work. If you know of an easier way to determine what version an EPUB book uses, please let us know!
Heidi
Posts: 9
Joined: Wed Oct 03, 2018 12:29 pm
Contact:

Re: determining the EPUB version of an ebook

Post by
Danny
»
Mon Nov 05, 2018 10:22 am
Hi Heidi,

This is fascinating! I was just poking around on the same topic, and was extremely surprised that it isn't easier to tell which version a given ePub has been produced in. Thanks for clarifying that it is truly more difficult than checking a particular file or tag.

The question has been raised about converting ePub 2 to ePub 3. In researching it, I found some background info that quickly outlines the differences on StackOverflow.

"EPUB3 adds, changes or improves:
• Added HTML5 support
• SVG documents can now appear in the spine in EPUB 3
• Support for MathML
• Semantic Inflection
• Content switching has been simplified by having its processing model defined so that it does not require document preprocessing
• Navigation
• Linking
• Scripting
• Triggers
• Bindings
• Added modules from CSS3
• EPUB 3 requires Reading Systems to support the OpenType and WOFF font formats for embedded fonts in conjunction with the CSS @font-face rules.
• Audio and video
• Media overlays
• Publication Metadata and Identity
• Resource Metadata
• Text-to-speech
• Manifest Fallbacks
• Remote Resources
• Whitespace in MIMETYPE file
• Disallowed characters in OCF list has been extended
Things that have been removed:
• DTBook
• Out-of-Line XML Islands
• Tours
• Filesystem Container
• Guide
• NCX
• 2.0.1 meta element

From a production standpoint, there aren't really that many differences at a bare-bones level--there's the switch from toc.ncx to toc.xhtml to provide navigational information, a few new metadata elements required in the content.opf file (like dc:terms modified, which indicates when the file was last modified), and a new header to use. While you can leave out the old toc.ncx and still get a valid epub 3.0 file, for backwards compatability it's best practice to include both toc files--epub 3 readers will ignore the old one, and epub 2 readers will ignore the new one.
As far as bells and whistles go, there are some neat new things: HTML 5 support, including things like <canvas> and Javascript support, <video>, and <audio>. Fixed Layout is a pretty big deal, of course. MathML is great if you're doing anything that involves equations, and you can use epub:type to include semantic information that reading systems can do some neat things with (like the popup footnotes in iBooks).
Popularity of the formats is a bit of a non-issue; epub 3.0 files are viewable on every epub 2.0.1 reading system and device that I've tried them on, and I've been making epub 3.0 files for over two years at this point. The new additions that epub 3.0 bring don't work on the older devices, of course, but the books themselves don't break and are perfectly readable, even if the videos and javascript don't show up. Older epub files are, as you would expect, readable on the new reading systems out there. There's no reason not to produce epub 3.0 files--doing so will only add functionality to your books as more reading systems catch up to epub 3."

So, it would seem that, if the book's archive contains a toc.xhtml, it's ePub 3. That's one thing to look for, anyway!

Now, getting back to the conversion question. I'm not sure if it would really be a very large service improvement to convert ePub 2 to ePub 3 with an automated process. It doesn't look like it would improve compatibility, and I wouldn't expect an automated engine to add features that take advantage of the new ePub 3 styles. But I'm fairly new to all this. What do all of you think? Are there enough benefits to make converting ePub 2 to ePub 3 worthwhile? I know there are lots of new features offered in version 3 that are terrific, but I doubt a computer program could detect and make use of them. If benefits exist, do you know of a program that can do this in an automated way? If no program is currently available, do you think one could be written to accomplish this task?
Danny
Posts: 31
Joined: Thu Oct 04, 2018 9:17 am
Contact:

Re: determining the EPUB version of an ebook

Post by
Danny
»
Mon Nov 05, 2018 11:35 am
Hi again,

Well, surprise surprise, even this appears to be non-standard. I just cracked open "Empire and Environment", and found only toc.ncx, no toc.xhtml. A clear indication of a version 2 ePub, right? However, just to be sure, I then tried Heidi's trick of taking a quick peak in content.opf, and was stunned to see the package version of 3.0.

Huh?

Okay, so it's a V3 ePub with no toc.xhtml. No wonder my poor Tablet can't navigate by page number!!! That is supposed to be how it works, right? I still haven't found a book my tablet can navigate by page marker, so I'm just guessing on that.

Anyway, long story short, Heidi's trick (content.opf) seems to be a lot better than mine (toc.xhtml)!
Danny
Posts: 31
Joined: Thu Oct 04, 2018 9:17 am
Contact:

Re: determining the EPUB version of an ebook

Post by
Karoline
»
Mon Nov 05, 2018 11:59 am
Thank you Heidi for posting so much helpful information. I was not sure how to find out these details. I will now go try it out.
Karoline
Posts: 52
Joined: Sun Feb 04, 2018 9:31 am
Contact:

Re: determining the EPUB version of an ebook

Post by
Danny
»
Sun Nov 11, 2018 9:12 pm
Hey team,

I finally came across a book with page navigation that actually worked in Dolphin EasyReader! There was a page list included after the TOC section of navigation in the toc.xhtml file. I was so excited to find this actually done correctly for a change.

Just thought it might help someone else who is wondering why something does - or does not - work!
Danny
Posts: 31
Joined: Thu Oct 04, 2018 9:17 am
Contact:

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest