Just say no to ebook CSS and JS

You think I’m joking?

One of the biggest issue publishers face with ebook production is the somewhat adversarial attitude ereader and app vendors have taken towards publisher stylesheets.

Publisher styles are largely overridden by default at Kobo, B&N, and in Aldiko. Even iBooks requires you to slot in a set of proprietary meta tags before it respects your font and image decisions.

Problems with vendor stylesheet overrides:

Vendor overrides are one of the biggest time sinks in ebook development. Because they are only partial, we have to include our own styles, but the mix is often unpredictable. The number of basic things that break, seemingly randomly, in Kobo, iBooks, or whatever random RMSDK-based ereader you have this week is too high to be disregarded. So, we test, and fix, and test, and fix.

(The simpler your stylesheet is, the less you have to test, obviously. Which is a decent motivation to make your styles as minimal as you can. Unfortunately, too many publishers and authors are dead set on forcing ebooks to mimic the styles of their print edition, filling it with all sorts of stylistic crap that’s patently inappropriate to digital. I draw the line at drop caps, which just can’t be done properly in an accessible way in digital. They are a trite Victorian affectation that compromises readability.)

To make matters worse, a lot of the ebooks publishers are releasing are full of insane crap that even the worst hack web developer wouldn’t dream of trying to pull off. Like making every element of a book either a P or a SPAN.

Which is a practice that makes the noise people at the IDPF make about non-xml HTML5 being tag soup pretty ridiculous, XHTML is just as capable of non-semantic tag soup as HTML. Oh, and it also makes any claim of EPUB3’s superior accessibility rather silly as there’s no way for screen readers to tell which P is supposed to be a heading and which is actually a paragraph. Complex accessibility features are meaningless if all the publisher gives you is a indistinct blob of tags.

Then there’s the tendency of some systems to output ebooks where the only styles come in the form of style attributes on every single element, making any attempt to work with the styles of the ebook impossible.

The biggest problem with these ebooks is that everybody thinks they are okay because they look okay when opened. Headings are bold and large, because that’s a bit of CSS most vendors respect. Quotes are indented. Italics are italicised. The basic structure of the ebook looks preserved and the stupid crap in the ebook is ignored. Vendor overrides basically work for crap ebooks. From both the publisher’s and the vendor’s perspective this is a success. The vendor is happy because an atrocious ebook file is made readable and a large portion of their inventory remains sellable. The publisher is happy because they are short-sighted fucks who just got away with not giving a flying toss about ebooks and feel fine about making zero investment in the biggest growth area in publishing since the introduction of the paperback (yes, they are morons).

But, they are both wrong. That ebook is broken and needs to be fixed. It’s inaccessible to screen readers. It’s an opaque blob to text analysis like Amazon’s X-Ray. It’s an indecipherable mess to search engines (which are going to be damn important in the future). An ebook that doesn’t have structure is broken and unacceptable.

I propose a conditional surrender

The more I discover about existing publisher ebook production processes, the more I talk to people ‘on the inside’, the clearer it becomes that a substantial portion of existing ebook inventory is quite simply rubbish. No structure. Crap stylesheet. Broken markup.

So I propose that ereader vendors simply turn all publisher styles off and never even consider enabling javascript. Considering how much of a mess these clowns are making of basic markup and CSS, how likely do you think it is that they can do javascript safely?

Not bloody likely at all.

In exchange, what we need you to do is to improve your built-in stylesheets. We need you to support common markup practices like figures and captions, headings and subheadings, horizontal rules that don’t look like a 90s flashback and so on. Best if you support them both in markup patterns and as class-based microformats.

Wordpress’s classes for captions and images with .alignleft .alignright and the like are a good start. As are common microformats such as hAtom and hNews.

And if you can support basic HTML5 structures such as:

    <p>Tag line</p>


    <img blablabla />
    <figcaption>The image's caption</figcaption>

If you manage to render every bit of those patterns appropriately (e.g. subheadings, tag lines, captions, etc.), that would be nice as well.

Oh, and don’t forget some nice styles for tables. Standard syntax highlighting for CODE elements would be a bonus.

–You aren’t serious?

I absolutely am. The key here is a full-featured built-in stylesheet that correctly styles all major structural elements of the book. This would mean that the only thing you need to do to make sure an ebook is okay is to load it and see. If it looks like a heading it will be a heading, etc.. Everything will be what it looks like. Books with crap, inaccessible, structure will have crap inaccessible styles and so be exposed immediately. Books that are properly structured will look as great as the vendor and reader (with their chosen settings) intended.

It would do to ebooks what RSS and SEO did to websites.

(In case you weren’t around in the web industry over a decade or so ago: the structural quality of web development tools and CMSes didn’t begin to improve until client apps that required structural quality began to be important, namely RSS/Atom readers and search engine crawlers. Before that most tools generated markup that was an atrocious mess of tables and font tags. Any publisher who thinks search engines won’t be important to ebooks is very mistaken.)

Ebook production would be dramatically cheaper and simpler, largely consisting of making sure that the structure of the ebook is preserved throughout the editorial process. Little to no testing required and people can focus on bikeshedding the cover design instead.

—You can do this already by just not including any styles in your book.

That only solves my problem (production costs) and it only solves it if my book is only plain text with a few headings, italics, bold, and maybe some quotes. Existing built-in stylesheets are inadequate to the job.

It doesn’t solve the problem of how to motivate publishers to improve their ebooks without making them unreadable. By robbing the faux-headings and the like of their styles you surface the blobby soupy nature of the book without destroying it. And knowing how crap the ebook in general will be, it’d probably still look a lot nicer than if you enabled all styles and let the publisher’s incompetence shine through.

The built-in stylesheets provided by vendors cover too little of what an ebook needs. Add in figures, captions, table styles, code highlighting, some structural awareness—headers, footers, article, and the like—a few microformats, and some nice horizontal rules and we’d be mostly sorted.

Then, once you got your built-in stylesheet in order, just turn off all publisher CSS completely and tell everybody to go and fix their fucking ebooks.