Is it safe?

27 September 2012 – Baldur Bjarnason

Web formats are too complicated for the publishing industry.

My father was of the firm opinion that it was an essential part of any child’s upbringing to teach them about film and film history.

Unsurprisingly, me and my sister have been watching movies with him ever since we were small children. Then, when we got a VCR, our education began in earnest.

Throughout the years, our dad covered all bases (except for horror, he’s never been able to watch horror movies beyond the stuff Hitchcock made).

Actors. Charlie Chaplin. Laurel and Hardy. Buster Keaton. James Cagney. Bogart. John Wayne. James Stewart. Cary Grant. Steve McQueen. James Dean. Marlon Brando. Marlene Dietrich. Greta Garbo. Directors. Howard Haws. Frank Capra. Orson Welles. Billy Wilder.

The list is endless. Full of fun stuff.

As we grew older (although earlier than you’d expect, he took me to see Lethal Weapon when I was ten) he introduced us to the crazy but fun stuff from the seventies. French Connection. The Getaway. Towering Inferno.

And Marathon Man.

The scene in Marathon Man where Dustin Hoffman is being tortured with a dentist’s drill, constantly being asked questions he doesn’t understand and can’t hope to answer, is one of those scenes that burns itself in your mind, no matter what age you were when you first saw it.

It’s also a handy dandy metaphor for what it feels like to put together ebooks for current ebook platforms.

You’re in constant pain and none of what your torturers shout at you makes any sort of sense whatsoever.

Mark Zuckerberg’s announcement that HTML5 hadn’t worked out that well for Facebook caused a bit of ruckus in the web dev world, but as it turns out a lot of the context was being left out from the quote.

In fact, when a Facebook engineer left a message on the Core Mobile Web Platform Community Group mailing list, their concerns turned out to be downright reasonable:

The lack of tooling in mobile browsers makes it very difficult to dig down and find out what the real issues are. Hence tooling, or rather, lack-thereof is a key issue.

You can read through the list of issues facing a major HTML5 app developer like Facebook and others and it boils down to one thing: they need the web to be an app platform while today it’s a mishmash of a publishing platform with some app features.

The good news, if you’re a web person, is that these things are being worked on. The bad news is that the web is steadily becoming more and more complicated to develop for and is diverging further and further away from being a publishing platform.

I was asked ages ago to put together an overview of how the features of iBooks Author’s multi-touch ebooks could be implemented in EPUB3.

I discovered very quickly that it can’t and so never wrote it up. (Yeah, I completely changed my mind on this one from when I originally wrote about the .ibooks format. I had to, I can’t deny the facts.)

The reasons are simple:

In iBooks Author pagination is dictated by the style system and so can change with the orientation (and in theory, with other media queries, allowing in the future for iPhone versions that just scroll). In EPUB3 FXL pagination is in the ebook’s metadata which doesn’t allow for the same chapter to be non-paginated in one orientation and paginated in another.
The standard CSS equivalents to the design and layout features of iBooks Author are unstable. Any widespread implementation today is likely to become incompatible with the final spec (which is why test versions are disabled by default, by the way) and I’m not convinced they would do everything iBooks Author does, anyway.

There are more reasons but those two are the biggies. They mean that the only way to implement something like multitouch ebooks is to:

Fork the Open Container Format or extend it to implement a different pagination model.
Make a proprietary extension to CSS to add things like pagination control, or adopt an early, unstable, spec that is likely to change, anyway, leaving you with a proprietary CSS fork.
Adopt an early, unstable, specs for layout, exclusions, etc. which are likely to change, leaving you with a proprietary CSS fork.

Basically, you’d have to extend and fork the CSS and extend and fork the EPUB3 format.

Which is exactly what Apple did. Ergo, you can’t do iBooks Author and multi-touch ebooks without doing it, roughly, the way Apple did. Q.E.D.

The web is, very messily, becoming a fully fledged app platform. We’ve got a few competing module systems. Package managers. Build systems. Javascript is being nudged into proper shape.

If you’re into software development it’s exciting. If you aren’t, it’s horrifying.

As you may know, me and my sister just officially launched Studio Tendra and its first project, Heartpunk (a sword and sorcery ebook thing).

Preparing this project has been very educational. Most of my other ebook projects have been focussed on art or research but this one was a chance to experiment with a few approaches to fiction ebooks.

For starters, I think we’re the only publisher (self- or no) who is planning on selling backwards-compatible EPUB3 files direct. Not a EPUB2/EPUB3 bundle. Just an EPUB3 file that, in theory, should work everywhere. The EPUB3 that’s free to download from the site is basically what we have planned for the rest of the series.

The reason why nobody is doing this becomes abundantly clear when you’re implementing it: EPUB3 gets you nothing but pain. No added benefits (especially not for fiction) and a whole lotta more complexity.

It also gave me a chance to try out some of the ideas I’d described earlier on this blog. I set up a template-based system using an app called Tinderbox (I’ve been a heavy Tinderbox user for many many years). If I want to edit or change the ebook, the only thing I have to do is change the text, press publish, and three ebook files will automatically be generated (EPUB2, EPUB3, and Kindle). Tinderbox gives me a rich text editor so I don’t have to touch or look at markup at all once the templates are set up. Bonus: Tinderbox imports OPML files, Scrivener files, and more, so I can write the first drafts in Scrivener and finalise in Tinderbox if I want to.

Kobo (and others, if they open up their self-publishing platforms to the UK/Iceland) get the stripped down EPUB2. There’s no point in giving them anything with styles because they’ll ignore it by default and make a hash out of it when they don’t.

The problem with all of this added work is that I get nothing of value in return. I seriously doubt anybody outside of the ebook production industry cares about how the ebooks look. Everybody else just wants it to be error free. Most fiction ebook publishers would be much better off if they just stuck to Scrivener’s EPUB and MOBI exports.

In fact, it would be irresponsible of them not to, if they can standardise on Scrivener among their writers and editors.

Now, normally I don’t comment on what Bill McCoy writes. He’s got a tough enough job herding the cats that are the IDPF member organisations and evangelising EPUB3 without me quibbling details and arguing with him. But something he wrote the other day contains a couple of extremely misleading statements.

His series on Portable Documents for the Open Web is full of odd things that don’t quite gel that well with anybody’s experience of how EPUB3 works so far, even if you’re generous enough to include Readium as a viable platform. The only way I have to explain these statements is that he’s being much more optimistic than anything in recent history warrants.

You might say that a .epub file is a website “in a box”, one that’s been “domesticated” so that it can be distribute through channels, and used online as well as offline. (Bill McCoy – Tools of Change for Publishing)

Uh, no. Not unless ‘domesticated’ means ‘broken’. The way this is worded implies that you could take any website and make an EPUB3 version of it that is a functional and faithful representation of the original.

You can’t. It won’t work. It won’t even come close to working. Not even in ereader apps like Readium. There are too many differences between the basic model of EPUB and the web (paged versus scrolling, limited positioning versus full positioning, limited if any javascript versus full javascript, XHTML parsing versus HTML5 parsing, CSS overrides, etc. etc.).

The amount of work involved in converting an existing website to an EPUB is enormous because there is, actually, a massive divergence between EPUB3 and the web.

Critically, you don’t add to a user’s security exposure beyond what they already get with a web browser, and the browser stack is heavily scrutinized by both vendors and 3rd parties, and any exploits are rapidly patched. (Bill McCoy – Tools of Change for Publishing)

Not true either. Javascript in EPUB very much adds to a user’s security exposure beyond what they already get with a browser. (Interesting how I didn’t get any response to that post whatsoever when it was originally published. Not even a ‘you’re wrong’ snipe or anything.)

Also, by the very nature of most ereading devices (either bespoke hardware or Android-based tablets) they will not be rapidly patched at all. Judging by the history of Android and ereading devices so far, exploits will remain unpatched for years, or months at the very least.

I did a talk at a conference in Milan, Italy a while back on the ebook format transition. I tried my best to describe the issues publishers would be facing as the industry migrated towards file formats that are more directly based on the web stack of technologies.

Basically, KF8 and EPUB3 bring ebooks closer to how the web works and have some of the same dynamics.

One of my points – the point, really – was that a key characteristic of the web platform is constant change and EPUB3 embraces that as a feature. (Bill McCoy even says as much in his TOC piece.)

I don’t think any of them got what I was getting at because they didn’t look nearly as scared as they should have. Constant change at the scale we experience on the web platform is like a meat-grinder for the staid, conservative, companies that dominate publishing, especially academic publishing. Unless they re-architect radically, none of them will survive the decade. They’ll be chewed up, ground down, and mashed into pulp.

I even ended the talk with a black slide with a single white word on it, all caps: “SURVIVAL”, before going into a WW2 anecdote to try and hammer the message in, but they still didn’t look scared enough.

Web and ebook developers are arrogant bastards who often don’t know what they’re talking about.

It is de rigueur among web developers to badmouth WYSIWYG tools while themselves churning out templates and themes bloated with markup and often several megabytes worth of javascript, CSS, and HTML.

‘But our code is more semantic!’

No. It isn’t. It’s a bloated mess.

Code generated by WYSIWYG tools is often odd, sure. It does things a hand coder would never do. But that’s okay. It does the job. And even though I’d never use one, I don’t resent how useful and enabling those tools can be to those without the skills (or insanity) to dive into hand coding markup and CSS.

The situation is even worse in ebooks.

A few weeks ago I got into a massive argument with several ereader app developers about CSS overrides. (And, yes, I was loud and obnoxious, as I’m wont to do in Twitter debates.) I maintained that their insistence on CSS overrides was the single biggest issue that ebook developers are facing. It increases costs, complicates development, makes testing next to impossible. It is a nightmare.

But, as it happens, one of the biggest issues ereader app developers are facing are ebook developers.

To be specific: a large proportion of the ebooks they get are so utter rubbish that to load ebook files without CSS overrides would have a dramatic negative effect on their business, support costs would skyrocket, returns spike, reputations would crater, etc..

Huh?

That’s humbling.

Especially since it’s fairly easy to confirm. Pretty much everybody I’ve approached who has been involved in ereader app development confirmed this: they get a lot of utter rubbish that’s only salvageable by overriding the CSS. The stuff in these ebooks makes the Geocities pages of yore look like professional web development outfits.

Just look at what happened when J.K. Rowling’s “The Casual Vacancy” turned out to be unreadable on the Kindle. It was initially pitched is as an Amazon issue when the fault was entirely Hachette’s.

So, ereader app developers can’t implement apps that don’t override the book’s CSS and so can’t, ever, implement full and unfettered support for CSS. Which is, y’know, exactly the thing that’s required for EPUB3 to become the portable document version of the web that Bill McCoy is trying to pitch it as.

Of course, if they all agreed on the specific overrides to implement and standardised how they were implemented, that would simplify life for well-behaving ebook developers. Maybe that’s something the IDPF can look at.

But that doesn’t give us full and unrestricted CSS back.

When you have a platform that nobody seems to be able to use, you have to question whether it was an appropriate choice. A significant portion of publishers churn out files that would be broken—unreadable—in ereaders that are fully standards-compliant. When the producers can’t produce and the platform makers dare not accept the goods, something is broken.

Ereader platform owners don’t feel they can give publishers access to the full, unfettered, capabilities of EPUB.

Publishers are split into two groups. One is incapable of producing unbroken files. The other is crippled by the limitations imposed on ereader platforms to compensate for the first group.

Ereader platforms offer none of the tools the web platform offers, but remain just as complicated.

I don’t think anybody could be blamed at this point for asking whether basing ebook formats on web tech was such a good idea in the first place.

The web is based on four foundation technologies: HTML, CSS, Javascript, and HTTP.

Shit happens when you remove HTTP because all of our tools and all of our standards assume that it’ll remain there, right in the foundation, all nice and steady to build on.

Every web developer’s work process involves a local server of some kind, at least if it’s a non-trivial project. Every web inspector tool assumes HTTP. Every tool in every step assumes that all assets will be loaded using HTTP. A lot of them become useless when you switch into a non-networked model of development such as with EPUB.

One of the problems Facebook and other major developers are facing with the mobile web is that the web doesn’t lend itself to offline. It just likes HTTP too much. The offline specs churned out by various HTML5 folks are all a mess, tricky to use, and buggy in practice. Even when they work perfectly they don’t really match the use cases of outfits like Facebook.

But the biggest issue is updates. The web was the first app platform to have an update mechanism built into its heart. Change something and the user will have the latest version on the next reload. It’s the fastest, most agile, update system in any app platform available.

This is vital because the rest of the web platform has traditionally sucked balls. Javascript lacks modules, packages, tooling, and most of the niceties you expect in a modern programming language. HTML is messy and prone to breaking. CSS is a non-deterministic hellhole that consumes your sanity faster than Pinhead from Hellraiser.

The key to making these flawed formats work is HTTP. It breaks. You fix it. Everybody gets the fix. The agility and speed of updates is what makes it workable.

EPUB removes that. If you discover a bug in a live book, you’re sunk. This isn’t so bad when you’re doing a fiction ebook that maybe uses italics a couple of times, but it is suicide if you’re doing anything interactive, anything with complex layouts, anything ambitious. If you make a complex ebook that breaks because of an ereader’s security update, you’re sunk. If you make a complex ebook that is rolled out but turns out to break in an edge case, you’re sunk. There is no standard way of pushing updates out to the client and the web stack requires that level of agility.

Every other ‘packaging’ of HTML includes an update mechanism. Extensions, addons, etc., all build in updates from the start because web tech needs it.

The web just isn’t viable as an app platform without it.

Which brings me to…

Does anybody seriously think that it’s a good idea for EPUB to become an app platform?

It’s always going to be a second-rate platform compared to the web (where web tech hasn’t been compromised) and to native app platforms (which continue to improve at a steady rate and not stop and wait for EPUB to catch up).

EPUB platforms are extremely unlikely to iterate at a pace as fast as web platforms and are never going to have all of the features that native platforms offer.

EPUB could do with some javascript for adding interactivity to some books, sure, but a fully fledged app platform? One that aspires to be a peer to browsers and native apps?

Whatever makes anybody think that EPUB can compete with those two on a level playing field?

Daniel Glazman, CSS WG co-chair and WYSIWYG tool software entrepreneur, has been working on an EPUB3 tool and has been documenting some of the issues he’s encountered:

They make for a fun read. Especially when you couple them with pieces like this one: The truth about structuring an HTML5 page.

Fun facts from that article:

The definition of elements added in HTML5 (such as header and footer) run completely counter to standard practice, e.g. they completely ignored what every web developer has been doing and invented their own shit.
The outlining algorithm isn’t supported by screen readers at all and if you follow the recommendations accessibility will suffer not improve because screen readers use an outlining algorithm based on header levels.
The new elements don’t replace or simplify anything, just massively increase the complexity of your markup and your CSS.

Is there a secret charter for standards organisations where they all promise to make my life a fucking pain in the arse? Because they’re being way too consistent for it to be a coincidence.

Is web-tech just too complicated for publishing?

Exhibit A: Overrides. We can’t use unfettered CSS. As described above, this particular CSS pool has been poisoned by an abundance of bad actors in the publishing industry.

Exhibit B: The design of EPUB and other web format leads to overly complex WYSIWYG tools (just see some of the issues Daniel Glazman was tackling above) and markup full of errors (just view source on almost any website, anywhere).

Exhibit C: The markup (HTML5) is badly specified, digresses wildly from existing best practice, is getting too complex, and remains largely unsupported by screen readers. The XML flavour used by EPUB combines the complexity of HTML5 with an unforgiving attitude towards errors which means things keep breaking, all the time.

Exhibit D: Web tech is turning into an app platform, which is great for the web. But EPUB is following it and trying to become an app platform as well. Publishing needs a publishing platform, not second rate knockoffs of existing web platforms. Publishers already have an app platform. They have several, even: the web and a host of native platforms. They don’t need a new one, especially not one that will never be able to compete with existing platforms. They need an agile and easy to use publishing platform. Current ebook platforms are neither.

I have no idea what alternative solutions are viable. Apple’s tactic with iBooks Author is to take measures to make sure that all .ibooks files come from iBooks Author, which means that a base level of correctness is maintained by the authoring tool. That could work.

Another standard tactic, used by web people since time immemorial, is to switch to a minimal markup language for authorship. This is pretty much what markdown was invented for by Gruber. It’d be interesting to see Amazon, for example, embrace multimarkdown as a primary authoring format and migrate away from web-based formats for authorship. I think that’d solve a lot of problems.

Or we could all just keep at the web-based ebook format thing, as it grows more and more complex, less and less manageable, and more and more error-prone.

Fun.

Previous entry

The time work takes

On the launch of Studio Tendra

24 September 2012
Next entry

iBooks 3.0

25 October 2012

Is it safe?

Is web-tech just too complicated for publishing?

Join the Newsletter