Batching decoupled API requests

If you’re developing websites for mobile, by now you probably have a healthy hatred for HTTP requests. The growth in the number of people wanting to use the web on cellular connections has introduced developers to a new kind of network: one with high bandwidth, but very high latency and low reliability. The more HTTP requests your site requires to load, the longer your page will take to become interactive, regardless of how small they are.

We realised that in order to really cut down on our HTTP requests, we needed to batch very disparate requests together. Ideally we should be able to send one single request to fetch an image, log some analytics, and get the content of an article. Another important realisation is that we don’t need to send all requests immediately. Some, like analytics, can be done not just asynchronously, but can also be deferred until there’s an opportunity to send a request for some other reason.

The main problem with all of this is that it threatens to play havoc with both your server side and client side application architecture. We don’t want the need for this behaviour to affect the way we write our code, or make disparate parts of our application needlessly aware of one another.

Client side

Looking at the client side first, we solved this problem by creating an API object that can internally queue requests and intelligently marshal several different queues. The interface is accessed like this:

api.add('getStory', {'path': '/story1.html'}).done(callback1);
api.add('getStory', {'path': '/story2.html'}).done(callback2);
api.send().done(callback3);
api.add('healthcheck', params).done(callback4);
api.send().done(callback5);

First, we separate the queuing and sending of API requests into two calls. No great pain there, and it provides the application with a more flexible tool – it can attach a callback to the response from each individual getStory request, and also attach a roll-up callback to the entire queue, which is notified when all the queued requests have completed. Both add and send return deferred objects, so the callbacks are added by chaining the done method. We have our own deferred implementation, but if you want to learn about deferreds and promises, this HTML5 rocks tutorial, which uses the jQuery Deferred implementation, is a good place to start.

Now, given this simple piece of code, you’d assume that the API object is now going to make two HTTP requests, but it doesn’t. It makes one. Why? When you call send, we set a setTimeout with a zero delay, which delays the actual send process until the next break in the execution thread. This means any further calls to send that are made in the same thread (ie. before the JavaScript engine has a chance to return to the queued setTimeout) will simply re-queue the same send.

Why not simply call send once? The example above is a bit contrived – obviously you can easily see you’re calling send twice and refactor – but in many cases that’s not possible. We might call a function in another part of our code which needs to do an API request, and decides it needs to be sent immediately. When control returns to the parent function, if we also want to send an API request there, we don’t know that one has already just been invoked. The API object takes care of rolling these up for us without the various parts of our application needing to know about one another’s API antics.

With this aggressive grouping of requests, we need to add more intelligence to ensure we don’t run into problems. First, it’s possible that an application will add so many requests to the queue that the resulting batched HTTP request will be too large (never mind the size of the potential response!), or will take too long for the server to process. So when a send is invoked on a very large queue, the API object will create a separate, special send-queue, move the main queue into it, and start sending it in manageable batches. This means optimally sized HTTP requests and a nice controlled pace that doesn’t send too many requests at once.

Another problem that comes up when you have a lot of requests in the queue is prioritisation. A function that adds 100 requests to the queue is going to be pretty content to wait a while for them to complete, but if a function just adds one and says it should be sent immediately, having to wait for 100 others ahead of it is going to make for a potentially painful user experience. A practical example of this is an app downloading a long list of content in the background, and then suddenly needing to load a piece of content in response to a user action. That new request is much higher priority than the background stuff, but the background stuff is already in the send-queue and is being downloaded right now. In our implementation this is fine, because as soon as send is invoked, the transfer of the main queue into a separate send-queue frees up the main queue, and when send is invoked the next time, a second send-queue, independent of the first, is created and starts processing. The result is that the queue of one item will get processed almost immediately even though there’s a background task which is still churning through a giant queue of stuff. Some checks and balances are needed to ensure we don’t have too many send queues being processed at the same time, but the automatic batching tends to avoid that.

So, in summary:

The origin of API requests within the application and what actually gets batched together are completely disconnected
It’s possible for two utterly decoupled parts of an application to make separate immediately-executed API calls and have them actually sent in the same HTTP request
It’s equally possible for an app to make lots of API calls from the same procedure but actually have them sent in multiple batches
Spawning separate send-queues for each send allows parallelising of urgent API requests alongside low priority ones, and prevents low priority requests swamping all network resources.

One final point about the use of the API interface in the app JavaScript – there are some requests (notably analytics) which are such low priority that we really don’t care when they get sent. So for those, we can just call add, and let them sit in the queue until there’s some other part of the app that wants some response right now.

On the wire

When the API object actually comes to make a request, it sends it to a generic API URL endpoint (eg /api), wrapping the package in JSON and sending it as a single query param (or POST field). The topmost level is an array of API requests. Each request is a two-element array, the first element being the method name, and the second an object with the arguments to that method call:

[
	["getStory", {"path": "/story1.html'}],
	["getStory", {"path": "/story2.html"}],
	["healthcheck", {"lastupdate": 1234567890, "jsver": 1.76}]
]

When the response is returned, the responses are in the same order as their respective requests. The top level is an array of responses, each of which is an object with a status key indicating the success or otherwise of that part. If it was successful, the value of the status property is ‘ok’ and there is a data property containing the response data. If not, there may be a detail property containing error information.

[
	{
		"status": "ok",
		"data":{
			"id": 545366,
			"title": "Historic treaty agreement in sight",
			"pubdate": "2012-12-01T10:04:45Z",
			...
		}
	},
	{
		"status": "error",
		"detail":"Story not found"
	},
	{
		"status": "ok",
		"data": true
	}
]

The API client object can therefore easily unpack this response, process each subresponse, and either resolve or reject each deferred object as appropriate, notifying disparate parts of our application JavaScript that their various requests have been fulfilled.

Server side

That leaves the small matter of routing this on the server. Sending API requests in this way presents a number of challenges:

Without conventional URL-based routing you risk creating a controller class that ‘does everything’
Including multiple batched requests in a single HTTP call makes it less likely that the response will be cacheable (either in the browser or in edge caches such as Varnish), and less likely that a given request will be a cache hit due to increased fragmentation.

We deal with the first of these by creating a class for each API method, and a generic API controller that unpacks and routes each part of the request independently to each specific API method class. The content returned can then be aggregated by the generic API controller. Effectively we just add an packing/unpacking layer above a fairly conventional set of controller classes.

Caching is more difficult. The value you can get from network caches like Varnish is vastly reduced by this technique, so instead, its important to cache each individual API response. Then a batched request might be fulfilled mostly from cache with maybe a couple of parts having to be generated. Memcached is pretty much made for this (be sure to use a parallel multi-get if your language’s memcached bindings support it, eg PHP’s getMulti function).

With this approach, and with a primed cache) we can often load and display a page with just one API request for all non-cacheable items or update checks. The Economist HTML5 app, for example, starts up with just two: a request for the appcache manifest, and an API batch. And for now, that’s as efficient as it gets.