ePub windows and widgets – a proposal

Ebooks have one major advantage over other forms of interactive media: They are extremely late to the game. One of the benefits of arriving after everybody else is that you get the chance to avoid their mistakes and learn from their experiments.

Interactivity in ebooks today sucks. Implementations are non-standard, diverge from their web-based parallels in both subtle and substantial ways. Implementations are buggier than Internet Explorer 6 on a pirated, compromised, Windows 98 machine. Javascript is inconsistently supported in the few cases it is supported.

Many of those involved in ebook development today don’t come from web development but are often print-oriented people who are doing an admirable job of expanding their skills.

One of the things they can’t pick up from howto pages, manuals, and tutorials are the battle scars that come from having to deal with insane browser developers. This is the only reason I can imagine why they aren’t terrified of some of the current developments in ebook reading systems and the fragmentation of the epub market.

The horrors a vendor can inflict are enormous

No support is sometimes superior to buggy support. The effort of having to deal with and hack around a browser’s buggy implementation is far greater than using CSS’s cascade or javascript’s feature detection to gracefully degrade in the absence of a feature. Dealing with iBooks’ buggy support of javascript, which changes and moves around every release, is much more expensive and time-consuming than implementing the same features for a web browser, especially when you consider iBooks’ non-existent developer tools (such as a web inspector).

Fragmentation in the features ebook reading systems support and how they implement these features will reintroduce the nightmare the web world went through during the browser war between IE and Netscape to a new generation of developers to. When the only browsers in wide use were two very broken browsers, IE and Netscape 4, and only a tiny minority used sane, standards-compliant, browsers, only the insane enjoyed dealing with the real-world practice of web development.

(The extra work involved was often called ‘the browser compatibility tax’. As Tankek Celik says “web standards are an agreement between authors and browsers”. They aren’t a private agreement between browser vendors.)

In a world where ebooks, based on web tech as they are, don’t support the solutions the web community has hammered together for the problems it shares with the web, sanity is in short supply.


The differences between ebooks and the web, in terms of how the same basic tech is applied in the two different contexts, fall into roughly two groups: rendering and javascript.

The rendering differences, ostensibly due to the use of a reflowable, paginated, view of the text, are in fact often arbitrary and damaging. The limitations of reflowable ebooks are wholly unjustified. (More on that later.)

The decision to support javascript, however, is one that has consequences and it’s understandable that some vendors have decided to avoid those consequences. Allowing javascript means that the vendor needs to deal with security issues and implications that match that of any general purpose computing platform. When ebooks become apps, maintaining an ebook ecosystem is maintaining an application platform. A javascript platform will also, inevitably, require debugging tools that at least match that of most web browsers. The toolkits that javascript developers need to maintain a basic hold on their sanity are quite clearly beyond the capabilities of vendors who can’t even publicly provide basic documentation of their software’s capabilities. A vendor who can’t exhaustively document the features they support, in detail, version by version, change by change, has no business even attempting to implement javascript support.

Consider the differences, between the anaemic and vague iBookstore Asset Guide and the Mozilla Developer Network or even Apple’s own Safari Dev Center. The Asset Guide – secret as it is – is an afterthought that lists Apple’s own needs and requirements. It is not documentation.

The only vendor I know of who provides public documentation, with attempts to list all supported HTML and CSS features, is Amazon. Their documentation needs much improvement (see the MDN or SDC for how it should be done), but the rest don’t even seem to be trying.1

Javascript is the web’s general purpose tool for interactivity. There is no simple declarative markup for creating dynamic interactivity because there hasn’t been the need. (Actually, CSS does have a few, if a bit limited. More on that later.) In its absence we lack the basic tools to implement dynamic ebooks. This means we will have to reconstruct some of the basic primitives using the declarative and semantic tools that are available.

To do that we first need to figure out what those primitives are. And to do that we need to examine the conceptual mechanics that underly some of the technology we are using.

And so, the mechanics

Scrolling layout. The default rendering in all web browsers today. It supports every feature available in web tech because this is the context where they were invented. Any ebook vendor who supports a scrolling layout in some context (say a popup) but then cripples it, is insane. Beyond the obvious case of javascript, there is no reason to remove features from a scrolling layout.

Reflowable layout. The default rendering of most ebooks today. For all intents and purposes this should be nothing more complicated than a paginated version of the more standard scrolling layout. Instead, it is crippled. Even the most advanced implementations today lack a basic feature like the ability to set full-bleed backgrounds. Neither assigning a background to the HTML tag or the BODY tag does the trick, there’s always an ugly white margin. (See my note on full-bleed backgrounds for more detail on this issue.) Some even require an obscure, non-standard, XML file to be included to turn on basic features such as the designer’s ability to set fonts. Others override the ebook’s stylesheet unless the reader specifically chooses to load it.

I have no idea why vendors think non-standard trickery like this is acceptable. Not supporting a feature at all is one thing. There we have the hope that it will eventually be implemented. But disregarding the ebook designer’s decisions by default is arrogant and hostile. Either you support CSS or you don’t. Don’t be coy and require developers to jump through hoops.

Fixed layout. Apple’s invention, copied by others. Essentially, it’s a way of using web-technology to design an image of a fixed size. Apple’s method uses a combination of the insane and the standard. It requires an obscure XML to be set but relies on <meta name="viewport" width="something" height="something"> in each page’s header. The viewport meta tag is a de facto standard, and now well supported across the web. Amazon’s implementation, unfortunately, doesn’t use the viewport at all (it should).

Media queries. The web has been dealing with varying devices and browser capabilities for a long time but it’s only in the last few years that this problem area has been properly addressed. Media queries enable the designer to deliver different CSS statements depending on what the browser (or, in this case ebook reader) supports. The ability to have the design of an ebook adapt itself depending on whether it’s being read on a phone, tablet, desktop device, or eink reader is obviously a vital capability. Media queries let us do that based on colour depth, screen size, orientation, and, as in Amazon’s KF8, based on file format.

We need media queries. Full support, please. We’d be ever so grateful.

OBJECT tag. For the longest time web developers had to deal with two different ways of embedded interactive objects in their web pages (usually either flash or some sort of video): The EMBED and the OBJECT tags. OBJECT is the one that was standardised. The principles of the OBJECT tag, the ones relevant to ebooks, are that you point at the object’s data using the data attribute and set the parameters using a series of PARAM elements. Any other child elements to the OBJECT tag are ignored if the browser supports the media type or used as a fallback if it doesn’t.

The way Apple uses the OBJECT tag to implement no-code widgets in iBooks 2.0 breaks this in several ways. The contents of the tag are clearly used as either data or as parameters and not ignored as they should. It uses the HTML5 data-* attributes to configure the object instead of PARAM tags. Both render Apple’s format unusable in a cross-platform context. If you are going to implement widgets using the OBJECT tag (which, in and of itself, seems to be a good idea) you should at the very least use it according to the standard. Both standards, in fact. The HTML4 and HTML5 specs are in agreement on how the tag should behave.

The epub:type attribute. From the ePub3 spec: “The epub:type attribute inflects semantics on the element on which it appears”. It doesn’t change or add to the meaning of the element but it clarifies, specialises, or narrows the semantics inherent in the carrying element. Reading systems can do one of two things with an epub:type attribute. They can associated specialised behaviours with some of the terms that appear as the attribute’s values. Or, they can ignore it.

The implication of epub:type is that you can assign a role to an element and that way trigger behaviours in reading systems that understand it. This seems ideal as a method for indicating interactivity.

There are even precedents in the default epub:type vocabulary. The vocabulary, as defined, is full of print-oriented design features that have attained implicit meanings through repeated use: sidebar, footnote, rearnote, pagebreak, copyright-page, titlepage.

These are print conventions and print terms that only have connotations due to tradition.

One obvious solution for how to implement interactivity in the absence of javascript is to put together a vocabulary of interaction conventions that have connotations due to repeated practice in interactive projects, websites, applications, UI design, etc. I’m going to outline a suggestion for the start of such a vocabulary below.

By focusing on primitives that can be combined and layers we can enable intricate interactive projects at a fraction of the complexity that javascript requires.

Non-linear HTML files (linear=”no”). For any of this to work, reading systems must provide authors with a way of including XHTML files in the ebook but keeping it out of the document’s reading flow.

If I want to include an extended example using an XHTML file that is only reachable by clicking on a link I include in the text, I must be able to do so. That means the XHTML file must not appear anywhere in the ebook’s main text, not before it, not after it, not interspersed with it. I must be allowed to keep it out of the NCX file’s or the EPUB Navigation Document’s main flow.

Since, according to the ePub 2 specification, all XHTML documents that are included and reachable in the ebook must be listed in the book’s <spine> that means that any reading system that hopes to support any involved interactivity must support the linear="no" attribute (or here for the ePub3 version).

Short version: Remove any file that has the linear attribute set to “no” from the ebook’s main flow. You just gotta.

Without support for non-linear chapters the main text of even a moderately ambitious ebook will be cluttered and appended with oodles of material that completely destroys the coherence of the work. By including ancillary material in the main body (or after it, which is just as ruinous) the reading system gives the reader an extremely false indication of the book’s true length.

Not supporting linear="no" is hostile towards the ebook developer and the reader.

The :target CSS pseudo-class. One of the most underrated feature of CSS today is the :target pseudo-class. Quite simply, when you link to a specific part of an HTML document using a fragment identifier (#something) the :target pseudo-class allows you to style the target specifically.

It may not sound like much but, combined with things like display, opacity, and CSS Transitions, this lets you use CSS for a large part of the most common uses of javascript today such as tabbed interfaces, zooms, accordion interfaces, and much much more.

Any ebook reading system that doesn’t intend to implement javascript should seriously consider the :target pseudo-class as it addresses a large number of use cases without any of javascript’s drawbacks.

The HTML5 data-* attributes. One of the features that the HTML5 spec provides are custom data attributes. Any attribute that starts with ‘data-’ will be treated as private data storage. This is obviously useful for javascript programming as it means you can include all of the relevant data (for configuration, etc.) in the page itself.

Apple used these attributes on the <object> tag for configuration. Which is exactly where they aren’t needed. The main value of arbitrary data-* attributes is to add configuration data to elements with no standard facilities for configuration. The object tag has the <param> tags for configuration. Using private data attributes for configuring an object tag is an exercise in wilful obfuscation.

There are also interesting proposals at the W3 for leveraging data attributes to add variables to CSS.

I think these attributes can be useful when we need to attach behaviours to pre-existing elements.


Here are three basic patterns of interactivity that can be used as building blocks for more complex behaviours. They are primitives that are in frequent use in web and UI design and they have a long history and many precedents.

Explanatory window. This is a feature we see every day: a window that pops up over the main window, with a pointer to its original context, that explains or expands upon the item it points at. Google maps labels, Mac OS X’s and iOS’s dictionary lookups, tooltips, tutorials, wherever you find them, they have been a part of our computing lives since we first invented GUI’s.

These windows are usually rather smaller than the main window. Their general design can vary a lot but the basic feature remains the same: A box that points at an element.

My suggestion is that our interactivity epub:type vocabulary includes ‘window:explanatory’ (the window prefix being an abbreviation for whatever URI that people end up settling on, possibly something PURL related). The basic idea is simple.

You would use the epub:type="window:explanatory" attribute on links (A tags) that are supposed to open explanatory windows. The file it links to would then be opened in exactly that window. Reading systems that don’t support the vocabulary would treat it as a simple link.

If the reading system properly supports media queries in the explanatory window context then we as authors don’t need to be able to dictate the window’s exact size. We could simply have the HTML file adapt its design according to the @media rules we’ve written.

Overlay window. Another basic interaction type, especially on the web, is the overlay. You click on a link and it blacks out the page and loads the target page or image in an overlay. This overlay can be full-screen – which it generally is when it’s done in native code – or it can be in the more web-style lightbox with a transparent background.

I think the exact details would be best left up to the ebook reading systems, provided they deliver on the intent and meaning the author has signified by attaching a epub:type="window:overlay" to a link.

Embedded window. A classic, used on almost every major website in existence, is an embedded window. Generally implemented as either an iframe (for HTML files) or object tags, the only thing we need here is for ebook reading systems to support the spec. There is no major reason to prevent a chapter in a book from embedding an HTML file in an iframe as long as it’s a file properly included in the epub.

The same principle applies to objects. It’s reasonable to look at the object tag for our implementations of interactive widgets.

You should be able to combine these windows, like using explanatory windows with an overlay window to create an interactive image (e.g. touch parts of an image to see it explained).

Vendors should also provide a way for an embedded view, whether it’s an iframe or an object, to transition to a full screen view, something that’s currently done on the iPad with gestures.


Scrolled layout file. An HTML file that, once loaded in an overlay, explanatory window, or embedded, renders its contents in a scrolling layout like most web browsers do. While it may be problematic to allow authors to designate arbitrary files in the ebook’s flow to have a web-style layout, I don’t see the problem with designating non-linear files, loaded in one of the epub:type windows, as web layout. This could be done as simply as adding an epub:type=”window:scroll” to the file’s meta name=”viewport” tag.

Like so:

  <meta name="viewport" width="something" height="something" epub:type="window:scroll">

The idea being that the epub:type ‘inflects’ the semantics of exactly what type of viewport is being loaded. In this case the reading system would treat the viewport dimensions the same way browsers do, either ignore them, or use the width to scale the layout down to fit the width of the reading system’s window (depending on whichever one of the three basic window types it is).

Scrolling layout files are useful as they are more appropriate than the reflowable paginated views for quite a few contexts, such as any substantial supportive hypertext (a detailed glossary, for instance).

Fixed layout file. My suggestion is the same as with web layout files: add an epub:type attribute to the viewport meta tag to indicate that the file loaded has fixed dimensions.

Like so:

  <meta name="viewport" width="something" height="something" epub:type="window:fixed">

This file is then treated essentially like an image with fixed proportions, zoomed and scaled in the manner that the reading system treats all fixed layout XHTML files.

Vendors also should consider the possibility of letting us mixing them in with paginated in the ebook’s flow. There’s no reason to stick to conventions from print. One chapter in an ebook can easily have different dimensions from another and we should have the capability to do so if it’s appropriate for the book.

One interesting possibility crops up when you consider embedding fixed layout HTML files. If the iframe is of a fixed size (dictated by it’s styles) then the reading system should let the reader pan and zoom (or scroll and zoom if not in a touch UI) the embedded fixed layout file much in the same way you would pan and zoom a map widget.

Paginated file. In the same way that a scrolled web layout is more appropriate for some contexts and fixed layout for others, a reflowable paginated layout should be an option for the designer.

Designated like so:

<meta name="viewport" width="something" height="something" epub:type="window:paginated">

The idea with these basic window types is that you can combine them to create complex interactions. (See the “Breaking Google Maps down to interaction primitives” note for more details.)

Alternatively, if the file has no viewport tag, the epub:type attribute could be placed on the body tag.


Solutions have to be considered along several dimensions.

One temptation with a specialised widget is to build it out in enough detail for it to solve the entirety of it’s potential problem area.

But complexity has diminishing returns, especially when a more general purpose, yet complex tool, already exists.

The web stack already has extensive, if redundant, technologies for animations. When creating a widget for galleries or slideshows we are solving problems that can be largely addressed by SVG+SMIL or a combination of CSS Animations and :target.

The only valid rationale for specialised widgets is if it makes life easier for everybody involved.

That means simple to author.

And it means that it has to be relatively easy for reading system vendors to create implementations of those widgets that are richer and of a higher quality than anything that can be done in a more general purpose solution.

Simple. Easy. Beautiful.

My suggestion is for two of the widgets, slideshow and gallery, to be an experimental media type combined with a very bare-bones JSON file.

The JSON format is simple. You describe objects like so:

{ "property1":"value", "property2":"value"}

An object is enclosed in curly brackets and is a comma-separated list of colon-separated property and value statements. It has several standard value types: strings, numbers, objects, arrays, as well as boolean values (true, false).

It’s supported in too many languages to count.

The benefit of using a JSON file is twofold:

  1. Its object model is a much closer match to that of most programming languages, which makes implementations easier.
  2. It places an upper limit on complexity. A JSON file cannot easily accommodate the same intricate and complex structures XML can represent.

What we have so far is this: A simple file format that addresses a subset of the problem in a very clean way.

Slideshow. In this case, this should be nothing more than a set of fixed layout XHTML files with an optional transition. The transition property on each slide object defines how that slide exits and the next enters. The duration property defines the duration of that transition in seconds or millisecond.

The file would look something like this:

    "type": "slideshow",
    "title": "This slideshow has a title",
    "slides": [
        {"src":"example.html", "transition":"slideRight", "duration":"500ms"},
        {"src":"example2.html", "transition":"slideUP"},
        {"src":"example3.html", "transition":"fadeOutIn"},
        {"src":"example4.html", "transition":"crossfade"},
        {"src":"example5.html", "transition":"pushRight"}   

The URLs are relative to the JSON file, not the XHTML document.

The widget spec would define a set of standard transitions but reading systems could optionally implement more. If the transition is missing or unrecognised (specific to another reading system) the duration would apply to the default transition, which in my book should be a simple fade out of the current slide, followed by a fade in of the next.

The background of the slideshow is whatever background style the object tag has.

The slideshow title is optional.

Every slide is a fully interactive fixed layout xhtml file with something like this in its header:

 <meta name="viewport" width="something" height="something" epub:type="window:fixed">

The reading system should have a relative free hand in what sort of UI to implement. No point in dictating any details as the needs of the various platforms differ.

Gallery. Just a set of images with optional captions and an optional title.

Like so:

    "type": "gallery",
    "title": "This gallery has a title",
    "images": [
        {"src":"example.jpg", "caption":"A fancy caption"},
        {"src":"example2.jpg", "caption":"This is already repetitive"},
        {"src":"example3.jpg", "caption":"I'm sure the image is interesting"},
        {"src":"example4.jpg", "caption":"Or maybe not"},
        {"src":"example5.jpg", "caption":"I don't know"}    

Again, the URLs are relative to the JSON file, not the XHTML document.

And again, no point in specifying UIs. Whatever is best for each platform is best.

Quiz. A problem that can’t be solved by basic widgetery. Instead HTML5 rides to the rescue.

Any questionnaire or quiz needs to implement a form so my proposal is to attach validation and evaluation data to the form itself. The <form> tag would be marked with a epub:type="widget:quiz" and each input tag would be marked with the following data-* attributes:

  • data-answer is required and must contain the correct answer to the question. In the absence of pattern or data-answer-contains attributes the value of data-answer is checked against the input tag’s value. If the values are exactly the same, the answer is deemed to have been correctly answered.

  • data-answer-contains takes precedence over data-answer for checking correctness, if present. It contains a space seperated list of words that the input tag’s value must contain for the answer to be correct. No order is implied, so if the order of the words is different than it is in the data-answer-contains attribute it is still deemed correct. A partial match must be calculated against the weight given in the data-weight attribute for a partial score.

  • data-success contains the id of the element to display in case the reader gives a correct answer. The display state or visibility of the element is orthogonal to its use in the report since the reading system must provide their own report UI. The designer should be free to make the decision whether to display or hide the answer irrespective of the widget’s basic functionality.

  • data-failure contains the id of the element to display in case the reaer give a wrong answer.

  • data-weight is used to calculate the reader’s overall score.

The widget works as follows:

  1. The ebook developer includes a form anywhere in the book where HTML is allowed. It is structured using bog-standard HTML tags. Its input elements are marked up with the above attributes.

  2. The reader fills out a form and presses the submit button (which can be named anything, really, it’s up to the form’s designer).

  3. The reading system adds the combined weights of the inputs to get the full score used to calculate the reader’s percentage score. If an input doesn’t have a data-weight attribute, the default value of 1 is used instead.

  4. For every input, it checks the value submitted against the pattern, data-answer-contains, or data-answer tags, in that order, using the first one that it encounters. For the pattern and data-answer attributes, if the values mismatch, the reader’s score for that answer is zero. For the data-answer-contains attribute, the reader’s score is a percentage of the input’s data-weight value, or 1, if that attribute isn’t present.

  5. The reading system then adds together the reader’s score total and calculates their final score as a percentage of the full score.

  6. The final step is where the reading system presents the reader with a report. This reports must list the reader’s score, how many questions they got wrong and right, and should present the reader with a list of correct answers and the elements referred to by the data-success and data-failure attributes.

A few rationales:

  • Adding a weight to the answers allows for some granularity in evaluating the quiz’s success or failure, even in a model as simple as this.

  • The three different methods of checking the answers should provide the questionnaire author with enough flexibility to design a wide variety of questions.

  • The reading system should have quite a bit of freedom when it comes to designing the report UI.

  • Vendors can, optionally, store the scores and completed questionnaires, let the reader review them all, study their improvement history if they retake them, etc.

In my opinion, this should solve a large proportion of both the ebook developer’s needs in terms of questionnaires and gives vendors the freedom to implement native UIs that outmatch anything done in javascript.

3D objects. This should, hopefully, be as conceptually simple as linking to a file for a 3D object in the object tag’s data attribute.

(The details are beyond any of my expertise. I know absolutely nothing about 3D file formats or standards.)

The actual implementation is, of course, much more complicated than that, possibly even the most complex implementation detail of all of those mentioned here.

One thing to note, however, is that the background of the actual 3D object should be transparent so that the designer can set the background by styling the object tag, as with the other widgets.

Video and audio. These would be a largely solved problem if it weren’t for the fact that several of the major implementors can’t agree on formats. This isn’t even a ‘closed’ versus ‘open’ issue as it is on the web, both Amazon and Apple use patent-laden codecs. They just somehow manage to require different file formats just to make things problematic.

Stop that.

Just grow up and support what the other guy supports. That goes for all of you reading system vendors. The sanity of ebook developers is at stake here.

The proposals

I make no claim of ownership over the ideas presented here.

I am not a spec writer, nor do I pretend to know much about the nuances and detail required in the art of specification writing. Instead, I’ve stuck to plain and descriptive language where I can in these proposals.

I’ve tried to outline the problem area and the solutions as simply and as clearly as I can, in the hopes of starting a discussion of where we can go from here. I think that my proposals – combined with :target and CSS transitions – are both more flexible and simpler than anything else I’ve seen suggested or implemented, but I obviously can only see the problem from my perspective and am sure to have missed out on some issues (although I’ve outlined a few more issues in my notes linked below).

I hope these proposals help, even if they don’t end up getting used.

(The next problem I’m pondering are conceptual suggestions for authoring tools.)

Additional notes

  1. Starting a developer wiki, like the Mozilla Developer Network, would be a useful start. But it wouldn’t be a replacement for proper documentation unless the vendor assigns staff to maintain and add to it. Doing it as a wiki might lower costs but it doesn’t remove them. Also, Mozilla has the advantage of being a community-lead non-profit, something that doesn’t apply to any of the current ebook vendors. ↩