Over the past few days I’ve had several interesting conversations on ebooks, interactivity, widgets, standardisation, and other issues that have cropped up as a result of Apple forking the ePub3 format.
Most of them have been people making very good points that have forced me to clarify my thoughts and reconsider some of my ideas.
One such conversation was the following email exchange with Grant Sutherland (posted with his permission).
We didn’t quite manage to convince each other, but I think we each made a pretty good case for our respective approaches for the discussion to be useful and educational to others, no matter which side you take.
Both of our respective principles and approaches to developing more interactive ebooks are, I think, much healthier than the one-sided and opaque approach Apple has decided to take.
I’m a writer, currently published by Macmillan.
I’ve been interested in the development of ebooks for some while, and have been enjoying your posts.
( I wrote something last year along similar lines, probably out of date). Traditional Publishing is a Burning Platform
In relation to your recent exchange with Joseph Pearson, I agree with you both about the widgets being authored declaratively (with bindings).
However I strongly disagree that microformats are the appropriate solution for books (however useful they might be on the web). The ‘vocabularies’ extension in epub3 allows for any numbers of xml dialects to be dropped into the basic xhtml5. Why not use them? They’ve already been developed rigorously (e.g. ChemML, KML) and something like TEI seems to me ideally suited as a basis for interactivity/enhancement in a variety of non-fiction ebooks. Microformats are meant to evolve towards useful common standards; but in this case, the common standards already exist.
Why reinvent the wheel?
Anyway, just a thought.
To which I replied:
That’s a valid point and it’s one I debated a lot when I was an academic studying and authoring interactive projects. Generally speaking, only a minority thought it was a good idea to reuse these vocabularies because they simple weren’t designed for our purposes (creating interactive texts). I was a part of that minority in favour of reuse. I changed my mind as soon as I gained more practical experience in authoring interactive projects.
There’s only one, widely accepted, open format that suits our purposes: HTML+CSS+JS. There hasn’t been a need to create a new format for interactive hypertexts (which is what we are talking about, since text is a subset of hypertext) because we already have one. The problem arises when we remove JS from that equation; we need simple methods of getting back some of the interactive functionality we’ve lost. None of the other pre-existing formats suit our purposes because we’ve just given up on the only one that does.(^1)
So, short answer: Because it’s not reinventing the wheel. Not really.
Most of those XML formats, like TEI, are both specialised and largely focus on meaning (that is, preserving a fidelity of the information it carries). But when you are creating an interactive project you need to explicitly declare your intent: this is how this element is supposed to behave. Using existing XML formats for this purpose can get extremely complex and awkward because it’s not what they were designed to do. Relying on a reading system to infer a behaviour based on semantics is an extremely limited way to author interactive ebooks.
(TEI is also bloody complicated.)
Microformats is a term that Joseph Pearson brought up and I’m not quite sure I agree with it being applicable in this circumstance, because microformats generally describe semantics and meaning and not behaviour and intent.
A simple OBJECT tag, configured by linking to something like a json file using the PARAM tag, is much easier to author than any of these pre-existing XML formats, and has the added benefit of being both explicit in terms of the behaviour that’s desired, and flexible in terms of how the reading system wants to implement it. It’s also something that’s very easy to implement in almost any programming language. None of these apply to a complex XML vocabulary, even if it really did what we need it to do.
A small set of simple but flexible widgets (behavioural objects) is very much outside the remit of any of the preexisting formats.
The only exception I can think of are SVG+SMIL which could cover a lot of bases but even that is still quite complicated both to author and implement (which is why support is so sparse and buggy, even among web browsers).
And even then SVG+SMIL doesn’t cover some of the basic behaviours we need.
I’d love to have it, though.
For the record, I don’t think the iBooks widgets cover exactly our needs either. They are a starting point.
So, that’s my reasoning. I hope it makes some sort of sense :-)
Thanks for the thoughtful reply. I take your well-made points.
I think your key statement is this one:
My own belief is that certain classes of texts are well-suited to solution (a), whereas others - and I would guess that among these are the type of interactive texts you’re working on - are more suited to solution (b).
As an example of a text of type (a), take any classic history. By impregnating the text at appropriate points with the various
<geo>related tags of the TEI, a work like ‘Decline and Fall…’ could be gently enhanced by the reading system with a sidebarred timeline, map and bio reference. The primary purpose of this type of enhancement (and, in my view, probably the only useful kind of enhancement possible in this type of work) is navigational. It is clearly not a hugely ambitious aim, but it has the inestimable advantage, over the many ‘snake-oil’ type’enhancements we’ve lately seen, of actually being useful. Because this is the kind of text I tend to read, my view is admittedly skewed toward this solution.
(In a way, the Kindle’s X-Ray facility is something like this, but prone to error because it’s using text-mining to make the semantic enhancements. And it’s ugly.)
As to texts of type (b), I think you are absolutely right.
There is obviously a massive problem here with the stability of this type of book. And I think the lessons to be learned about ‘maintainability’ will be learned from the past experiences and current best-practices of programmers.
The big lesson I see is that ‘unstable’ is the natural state in this world. I’m reminded of the old joke in biology: ‘We biologists have a special word for obects in a stable state - dead.’
Publishers want ‘stable’, but I don’t think they’re ever going to get it in this area. The best they can hope for to manage the ‘instability’ of these books in a commercially viable manner.
My own guess is that books will come to exist on a (lumpy) spectrum of stability that looks something like this:
- Plain-text, stored on clay tablets (the first, and still the best for longevity)
- Plain-text, stored on paper
- Plain-text semantically enhanced, stored on computers. Behaviors (if any) determined by reading system.
- Plain-text semantically+programmaticaly enhanced, stored on computers. Behaviours (if any) determined by some combination of reading system and program embedded in text.
With regard to your remarks about the widgetization of behaviours, I agree that SVG+SMIL is the only sane long-term answer. In the meantime, here are a couple of links that might interest you (if you’re not aware of them already):
you’ll see that they use a similar mechanism to that
Hope some of this is of some use to you.
There is a subset of interactive patterns that can’t be tied into semantics since the behaviour is the primary carrier of meaning (understanding is derived form the actions the reader takes). In this case you have to author a non-semantic widget into the text. Texts full of this behaviour lend themselves to solution (b), as you call it. They are also, in my opinion, generally only appropriate to non-fiction; like you I’m skeptical of the benefits of ‘enhancing’ narrative text.
Then there’s the issue of supporting true interactive narratives, creating a platform that can support hypertext and non-linear stories.
Which, really, is the origin of my conundrum.
On the stability of these texts:
Very true, and it’s a problem I and other academics who were researching and teaching in this field ten years ago have had to tackle again and again, and, believe it or not, what we have now is much better than it was.
Most of the interactive works I studied ten years ago and I cited in my research are unplayable now. Hypercard projects, Director-authored CD-ROMs, old flash files, and plain old executable programs, many of the big, influential, texts are now hard to access, locked in a dead platform (Mac Classic). So, this has been a subject of debate for decades now. There was a lot of worrying and scaremongering about the issue at the time, but it ended up not being as much of an issue as people expected.
Of course, some works have been lost. It’s hard to find and play a Voyager Expanded Book today. But the solution was articulated by Mark Bernstein (of Eastgate Systems, publisher of hypertext, maker of tools, etc.): The only thing that preserves works is interest.
When there is interest, someone will make sure that the work is available and accessible (see, for example the conversion and update of the “If Monks Had Macs” CD-ROM).
When there isn’t interest, no amount of open formats or standardisation will save the work from oblivion.
Of course, there are exceptions such as the rights situation with many of the Voyager Expanded Books, but that kind of legal bind won’t be solved by any amount of standardisation.
And, as I wrote, we’re in a much better place now than we were then. We have an open standard, ePub, which, even with books published on the Kindle, will remain an archival and authoring format.
Extending that format with a set of documented, standardised (even if just a de facto standard) objects will not threaten their archivability in any way, provided we do it in a sensible manner.
Which brings me to Apple’s new iBooks 2.0 format. That format runs the risk of instability and obsolescence. It’s undocumented, intentionally incompatible with the standard, extends it in odd ways. It’s a very problematic approach to creating interactive texts.
I think the next step for me would be to write up a description of the forms of interactivity I think are needed for ebooks (there are a few basic actions that can be combined to create 90% of what authors of interactive works need) followed by a list of widgets, with suggested format extensions, that would implement those forms of interactivity.
I wouldn’t be surprised if, with a little bit of thought, those of us who want this can come up with conventions that are closer to the semantic ideal than the hodgepodge of opaque data that Apple is using.
I think, for example, that epub:type can be put to effective use. One approach would be to create a vocabulary of interactivity for epub:type which might give us the best of both worlds—if we assume that an interactive act has meaning in and of itself, that is.
The analogy would be ‘footnote’, which is a name of a print design feature that has attained an added meaning derived from its common use. If we define a epub:type vocabulary for commonly used interactive design features, especially those that have attained some added meaning from repeated practice, then I think we would have something that is sustainable, usable, and would satisfy the qualms of most.
I’ve really enjoyed this exchange. :-) Would you mind if I posted this conversation on my website? I won’t do it if you’d rather I didn’t, but I think a lot of people would find it interesting to see an exchange that presented both sides equally like this.
- epub:type in the ePub3 specification
- EPUB 3 Structural Semantics Vocabulary
- The iBooks 2.0 textbook format
- The iBooks 2.0 built-in widgets
- The pros and cons of the iBooks 2.0 textbook format
- A favour from Goliath: How Apple does ebook widgets right