EPUB javascript security

Every industry has its apocryphal tales, stories that are presented as true but too neatly demonstrate the truths of day-to-day work, too tailored to their message, to be entirely true.

Stories are a big part of every industry’s true education. They highlight dangers, reset expectations, and define boundaries. Most stories told about work over coffee or at lunch contains essential information related to the work and tasks at hand.

The moment when the younger employees are open to listening to the old fogeys’ war stories and legends, is a teachable moment.

The anti-malware industry has a few stories like this. A lot of them are banal, as in any other industry. “There was an emergency and we had to work late.” Some of them are brilliant comedy pieces about the expectations of executives and non-programmers. “Then he opened the case, to reveal a cheese sandwich in a birdcage.”[1]

One such story demonstrates a singular truth about anti-malware work. It’s probably fiction but contains enough grains of truth for it to be presented as plausible fact.

This is a story of one time when the anti-malware industry actually won.

Back in the nineties, the early days of the web, around the time when the dot-com bubble was being inflated, there wasn’t much money in writing malware. Most practitioners were hobbyists—people who liked to demonstrate how clever and superior they were by destroying other people’s work and wasting everybody’s time.

This was only a few years after the collapse of the Eastern Bloc and, as with other computer industries, the antivirus industry had a lot of programmers from Eastern Europe, people who had fled the chaos and problems of their home countries. Many of those guys are legends in the industry now.

Some of the antivirus labs noticed regular and increasingly frequent spikes in virus activity. Whoever was responsible kept churning out virus after virus. Virus labs couldn’t keep up.

A few people in the industry managed to trace these viruses back to a single individual from one of the Eastern European countries. They even discovered his name and who he was, but there was no way of getting the local authorities involved to stop him. These countries were having problems just keeping the peace, let alone hunting down a guy for something they hadn’t even made illegal yet.

(A lot of the virus writers back then were programmers stuck in the economic turmoil of the Eastern Bloc. A lot of present day malware developers are programmers stuck in the economic turmoil of the Eastern Bloc.)

This guy, for a while, was responsible for a substantial portion of most antivirus companies’ workload. Most antivirus companies responded to this in the same way they always do: try and keep up and keep trying new methods to increase detection rates.

But one of the AV programmers, one who was originally from the same country as our prolific malware developer, kept thinking about the problem, before announcing at a meeting that he had an idea of how to solve this problem, but that he needed a few days to work it out.

Days, then weeks passed with no new viruses from the virus writer. Months passed. Finally, the AV programmer’s colleagues had to find out how he had done it and asked.

The AV programmer just smiled and said that he had sent some money back home and gotten a friend of his to hire a couple of thugs to go to the virus writer’s house and break all of his fingers. And they threatened to come again and do worse if he wrote more viruses.

The virus writer never wrote another piece of malware again.

Now, if you tell this story to people outside of the anti-malware industry, most reactions are simply vague shock. But it serves another purpose in the industry.

This is a story that looks like a fairly traditional story about how the good guys won over the bad, possibly with a bit of excessive force. The anti-malware industry won. The moment when you begin to find this story funny, hilarious even, is the moment when you, as an employee in the anti-malware industry, have accepted the truth that the anti-malware industry never wins. The moment you laugh is the moment where you become one of us.

The creators of anti-malware software either don’t lose, or lose. They never win. Software security never wins. When they are successful, nobody notices them and when they fail, they get blamed for the sometimes catastrophic results of the actions of criminals. They never ever win.

And that makes a story about winning funny.

There are two kinds of malware:

The user’s device is the target.
The user’s device is a resource.

(There are many more kinds, but these two categories will do the job for now.)

Most people assume that malware is targeting them specifically, that the goal is to break into your computer and do something bad that impacts you. Malware that holds your files for ransom or hacks your bank account gets a lot of attention but doesn’t account for most malware in circulation.

The majority is malware that treats the user’s device as a resource to be used for perpetrating criminal acts elsewhere.

You have Russian cartels blackmailing gambling websites by threatening repeated Denial of Service attacks. Attacks perpetrated by botnets that link compromised computers together into a coordinated resource.

You have computer crimes, hacks, that are untraceable because they’re perpetrated while proxied through a computer that sits unattended in somebody’s bedroom.

Virus writers build botnets that are then sold on the open market as a commodity resource for perpetrating computer crime of various sorts. That’s where the money is.

In many parts of the world, a new computer science graduate will earn more by going into malware development than by going into the anti-malware industry. Anti-malware never wins.

If software exploits were the only distribution vector for malware, our lives would be much easier. Browsers today are scarred – battle-tested – pieces of software. The tactics required to exploit them are becoming less and less practical. The security model of the browser applications themselves are sound and hard to break. Provided they are up-to-date.

(The security model of the web isn’t sound. I’ll get into that later.)

Most drive-by-infections happen because users never update their OSes, browsers, or anti-malware software. Windows XP is unsafe at any speed. IE 6 and 7 are unsafe at any speed. Old versions of Firefox and Chrome are unsafe at any speed. A computer with out of date software is a hazard to the internet.

If drive-by infections were the only way of getting malware onto your computer, the computer world would be a pretty safe place by now. It wouldn’t be perfect, but it would be much safer.

It isn’t safe. A lot of malware is installed by the users themselves.

Some gets installed because the user is an idiot and installs a fake app, some even pretending to be anti-malware apps, that appears to them in a popup on a porn site.

Some because the user is uninformed and thinks that the download site they’re on is a legitimate one despite the prevalence of ads featuring topless women.

Some because the user is cheap and doesn’t know that pirated software is a common way of spreading malware.

Some because the user actually believes what they read in emails from random strangers.

People are dumb and software security never wins.

Before we can get into the mechanics of how malware ebooks would get implemented we need to consider the goals of a malware developer.

There are three scenarios for ebook javascript malware:

The goal of the malware ebook is to target the user.
The goal of the malware ebook is to target others (turn the user’s device into a resource).
And as a special circumstance: the malware book is trying to take over an ereader implemented using javascript.

The first scenario isn’t particularly likely. The ebook environment doesn’t have access to particularly sensitive data and vulnerabilities in an ereader’s social networking code are unlikely to persist for a long time. Holding ebooks for ransom or using an ebook to compromise a reader’s Facebook page isn’t a convincing malware scenario.

The latter two, however, are. The third scenario is especially interesting as compromised javascript-based ereader addon could be leveraged into compromising the entire browsing experience and become a fantastic resource for the malware developer.

They also have the advantage of being easily reverse-engineered (view source, web inspectors).

As far as I can tell, browsers do not have the infrastructure to let a javascript-based ereader safely run ebooks with unvetted javascript. HTML5 sandbox features are useful to make sure that whatever scripts you’ve missed don’t run or for limiting network access, but if you’re planning to support ebook javascript it’s difficult to come up with a scheme that is truly safe. Caja seems essential, even if it means incompatibility with some javascript code.

If you’re planning on cross-platform browser support, you’re sunk, because major browsers don’t yet support the iframe sandbox attribute. And, don’t forget, there are hordes of readers who never update any of their software. Many of them even dislike updates intensely, see it as an intrusion that risks ruining their setup.

Even with most safeguards, due to the many security flaws of HTML, javascript, and the web in general, you’re still going to find it difficult to thwart a dedicated malware writer. (Anti-malware never wins.)

It isn’t paranoia if you have a knife in your back

The web is unsafe at any speed. The browsers themselves are fine, for the most part. It’s the web that’s the problem.

The web is open to clickjacking, DNS rebinding, history leaks, man-in-the-middle attacks, XSS, and CSRF. To top that off most websites and web apps are badly made and allow session-hijacking and fixation, SQL injection, HTTP response splitting and smuggling, and various access control bugs.

Even a major site like Gmail that has been hardened by experts and offers two-factor authentication has been exploited due to silly bugs in the weakest component, mistakes on the part of the user, and corporate incompetence (not Google’s, mind you). (Anti-malware never wins.)

These attacks and more render the same origin policy largely meaningless except as a handy way to frustrate novice web developers. Even SSL can’t be trusted due to how frequently SSL/TLS resellers are compromised, implementation issues, and how readily users click through on untrusted certificates. (The big red background on the warning isn’t enough. Anti-malware never wins.)

Web apps are unsafe. The web is unsafe. Ebooks with javascript are unsafe.

And, as it turns out, ebook readers that fully support javascript are even less safe than your average browser.

The malware ebook distribution vector

The key problem facing a malware writer, any software writer, is distribution. Malware writers are capable of considerable creativity when it comes to coming up with new ways of spreading their wares, from finely crafted spam emails to well designed software websites distributing trojans.

If javascript gets widely supported in ereaders, ebooks represent a lucrative new option for malware writers because they combine a fundamentally insecure platform (the web) with a built-in distribution vector.

It’s easy to spread malware ebooks:

A popular title gets released. It is a smash success.
The malware writer acquires a copy, either by buying and breaking the DRM or by pirating, and infects the copy with malware javascript.
They post the malware ebook onto a download site popular with ebook pirates.
Wait as people infect their own computers.

The blockbuster nature of ebooks today, combined with high prices, results in a neat, built-in, distribution vector for malware ebooks. It’s easy for the malware writer to figure out what titles are popular enough to have reach. It’s also easy to find websites that will distribute pirated books. It’s easy to find people who will download pirated books and don’t really care if they are harming other people (whether it’s harming the writer though lost revenue or because their ebook copy is part of a Denial of Service network overloading somebody’s website).

It’s all very very easy.

The damage an unconstrained malware ebook can do

Ebooks have one unique component that the web platform generally doesn’t have: time.

Most web pages, especially the less secure ones, only have a few moments, minutes at most, to do their work. But ebooks take time to read. Individual sessions can easily stretch into hours. There is a host of attacks and problems that are unfeasible on a normal website, because the attacker simply doesn’t get enough time to execute, but are feasible in an ebook. An unconstrained malware ebook becomes a resource that the malware developer can tap into.

Unconstrained javascript is a bad idea and unconstrained network access – even without javascript – is a bad idea since you can do things like port scan without javascript.

Malware ebooks are useful for Denial of Service attacks, using port scans to discover vulnerable services, complex session attacks, and more. Given time, you can do much more complex things and solve much more complex problems than you can with malware on a regular website. Like using complex tactics to bypass a website’s security countermeasures, for example.

Providing a comprehensive overview of what an unconstrained malware ebook can do is impossible. It’s the malware web, double plus ungood.

A malware ebook that has network access is a very effective tool for attacking other websites. It’s much more effective than your regular hostile website.

If we accept that unconstrained javascript in an ebook is a bad idea that opens the reader up to malware infection, what would a secure js-enabled ereader look like?

Remarkably like iBooks, as it happens, which is coincidentally the only major js-enabled ereader available.

iBooks has two different security models for javascript: EPUB and Multi-Touch Ebooks (.ibooks files).

The EPUB security model is simple: no network, no persistence. The ebook doesn’t get any network access and doesn’t get to store anything for later except in very limited ways. This very effectively prevents every single attack I can think of. They all hinge on network access and most of them are much less effective without some sort of persistent storage.

This cripples iBooks as an app platform but since Apple already has two excellent app platforms (the web and native apps) you can hardly blame them for not caring

Their other security model is that of iBooks Author books where javascript is disabled in the main body of the book, neutralising the ebook’s time advantage, but enabled with relatively few limitations in widgets which require user interaction to activate.

Both of these mitigation tactics work very well. It’s next to impossible to implement a malware ebook that is effective in iBooks.

The risk is that as EPUB3 gets implemented, more ereader vendors will implement javascript support to enable some of the features unique to EPUB3, but fail to restrict the javascript like iBooks does. Their marketing departments might well see it as a selling point.

Unless they cripple their javascript support in some way, those ereader apps will become malware vectors.

What needs to be done?

Once you have decided to develop and maintain an app platform, the only workable strategy in the long run is a whitelist of allowed executables.

(It’s the only workable strategy for defaults. Letting experts override the defaults and install apps is pretty safe and they might even benefit from herd immunity.)

Blacklists lag. Malware development iterates much faster than a blacklist of banned apps and executables can be updated.

Heuristics result in false positives. You have the choice of either blocking a bunch of legitimate apps or letting malware through.

Whitelists work. A central authority maintains a list of allowed apps and apps not on the list won’t launch by default. Apple’s iOS app store is one such whitelist, although I prefer the Mac OS X Gatekeeper or Android model where expert users can easily run any old app at their discretion.

The problem then becomes one of maintaining a good whitelist. And to try not to give large portions of your market reasons to override the whitelist and expose themselves to malware. There is a debate to be made about whether Apple and Google are doing a good job of maintaining their whitelists, but that’s an argument for another day.

Whitelists don’t work for ebooks because the only way to implement a whitelist would be to create an ereader that only runs ebooks that have DRM from the authority that maintains the whitelist. Standardised DRM wouldn’t work as the very concept of standardisation means that the whitelist can be bypassed. Giving control over the world’s knowledge and literature to a few corporate entities is much more problematic than letting an app platform vendor control their platforms. And there are simply too many ebooks out there without DRM and too many free speech issues with mandating DRM on every ebook file for that ever to work.

The only alternative for ebooks is to limit the functionality of javascript, or to leave out javascript altogether.

If you decide to limit javascript then there are things that need to be standardised.

From an EPUB perspective, the javascript security limitations come in only a few varieties:

No network access allowed.
No persistent storage allowed.
No javascript in main text (enabled only in HTML files that aren’t included in the spine or have the ‘linear’ attribute set to ‘no’).

It’s easy for an ereader app to let the ebook know that no network access is available. There are well-supported offline APIs and events. As long as ereaders make sure that navigator.onLine returns false when network access isn’t available, ebook developers can test for network availability in their code.

Which they should be doing anyway, since ebooks are much more likely to be used offline than your average website, even if the ereader allows network access.

There is a risk, however, that the various DOM storage objects will exist in an ereader app, but fail silently, depending on how the limitation on the API is implemented. Ereader vendors should always make sure that javascript object detection works as expected in the ebook environment.

The problem lies in detecting whether the ereader app allows javascript in the main text or has relegated it to non-linear HTML files only. The iBooks EPUB and iBooks Author security models differ substantially and would require different designs—you can’t easily bridge the gap with graceful degradation or progressive enhancement.

Either EPUB reading system vendors need to agree just to use one security model, or they need to agree on a way to let the ebook developer discover which model is being used at each time. Standardising this would be nice.

My suggestion would be to only support javascript in non-linear files (not in the spine) to begin with, mainly because my guess is that it would be easier to implement. The ereader could just have the file launch in a popover webview and not worry about integrating javascript support into their main EPUB rendering code. The vendors themselves have a better idea of what tactic is the easiest and which is the most viable. I could be wrong.

Or they can just not implement javascript support and be free of the headache.

The argument for javascript is simple: we do not know what ebook interactivity looks like. Until we do know, we cannot easily implement declarative widgets and other interactivity features that don’t resort to scripting. We simply do not know what is needed so we need javascript in ebooks.

The counter-argument, beyond that of security, implementation, and business issues, is also simple: we already have apps and websites for interactivity. If a story or text benefits from interactivity, then it isn’t an ebook. It might complement and be complemented by an ebook. But it isn’t one itself so we don’t need javascript in ebooks.

And now that I’ve given away the punchline, I won’t ever be able to tell you the cheese sandwich in a birdcage story. Which is good; I’m crap at telling it, anyway. ↩