prefer-online? Not so much

The standards process can be really frustrating, and for developers especially so when a problem is ‘fixed’ in a way that’s unacceptable and means we end up continuing to use crazy hacks and workarounds, and invent new ones, rather than using something that was intended as a precise solution to that very problem. I fear this is what we’re about to see with the prefer-online flag that was just added to appcache.

In recent months much has been made of the limitations of the HTML5 app cache, particularly around certain common use cases that are particularly badly served by the current behaviour. Tobie Langel highlighted some of these scenarios in the fixing appcache community group. The one that caught my eye was:

A blog engine (e.g. WordPress) can have a very basic offline mode that caches its index page and the _n_ entries listed on it for offline use. This wouldn’t modify its behavior when online, so that visiting the index page would always display the last entries and not need a page refresh to do so.

You want the very latest available content to be displayed when a user visits your site, so if they’re online, that’s the latest content published, while if they’re offline it’s the most recent content that was available when they were last online. This does not conceptually seem very hard. The problem is this: In order to make a page available offline you must list it in your manifest, and by doing so, due to the current behaviour of App Cache, it will always be fetched from cache, regardless of whether the user is online or not.

Many possible solutions to this exist. The one that has made it into the spec is to add this to the cache manifest file format:

SETTINGS:
prefer-online

This seems a simple solution to the problem. If we’re offline, we get the cached resource, and if we’re online, we get the live one. Hurrah.

Oh wait. Doesn’t that mean that for every single request made while offline, we have to wait for a network connection timeout before we are able to get it from the cache?! Possibly. What about resources that we know are not going to change? The spec isn’t terribly clear on whether prefer-online makes the browser attempt a network connection for all types of cached resource (though from Ian Hickson’s definition of the feature it appears the answer is yes – see second bullet point under ‘New Proposal’)

The reality also isn’t necessarily quite as bad, since even if the browser does go to the network first in every case, that request might get served from the ordinary browser cache, if the resource was originally served with a positive cache TTL. But ordinary browser cache has no persistence guarantees, so we can’t rely on it having anything in it at all (it’s true that appCache also does not have a persistence guarantee, but in practice is far less volatile).

This problem was raised by Bug #14702 (Always up to date web applications). A number of solutions were proposed, including the prefer-online one Hickson added to the spec. But many contributors called for something completely different, and posted increasingly detailed use cases explaining why. I agree with them.

A better solution

Rather than suffering network latency on every request, we should be able to be much smarter about this. We are trying to be responsible developers, creating websites that retain the URL metaphor for addressing unique pieces of content. Therefore, there will be many (millions?) of unique URLs describing different pieces of content or distinct views of content on our website. We cannot feasibly put all of these in the app cache, so we use the FALLBACK section to define a single generic client-side router page that can generate the appropriate view by reference to data stored elsewhere (localStorage, IndexedDB etc). Since the home page of the website is no special case in this regard, it could easily be generated by the fallback router. Therefore we would ideally simply not include any content pages in the manifest at all.

Unfortunately, we’re forced to cache our content pages, because by adding the manifest attribute to an HTML tag, we mark a page as a master asset which will get cached regardless of whether it appears in the cache manifest.

So, why not add a flag that instructs the app cache not to cache master entries? Then we can avoid adding any additional complex and performance sapping logic to cache behaviour while offline – every page request would attempt one network request, and on failure, would then load the fallback, and all other resources would load from cache with no resort to the network.

The prefer-online solution added a new SETTINGS section to the cache manifest language, so instead of using it for the prefer-online flag, we should be considering something like no-master-cache instead.

Still, as I write, this is a fait accompli for prefer-online and it seems unlikely that there will be any move to change this. I think this is a shame.