2013/10/22

The boss issue with distributing (computational) work

Here in France (your mileage may very then), the boss gives jobs to subordinates. That's what subordinates are for, and that's what a boss is here for.

Now, as some of you might have noticed, there are a few glitches to that scenario. The boss most of the time does not correctly estimate the time needed for a task if he does that only on his own. If he asks, it means hours of meetings. Not very efficient.

From the boss' view, there was no doubt that the task must take the estimated time; if it's taking more time, it's because the worker is lazy; if it's taking less time than estimated, it's good, time to drown the guy with work. End of story.

So far, this all looks like the usual ranting about a pyramidal work organization. But here comes the real stuff: when dealing with distributing computational workload between machines, the same approach is, amazingly, actually used. And that does not make sense. Let me explain where the trouble is.

It's very difficult and unlikely, when doing some computations on a set of computers (let's call that a computing farm for big data to follow the hype) that each and every machine will have the same and exact computational power; or memory or hard storage. In real life, it's highly unlikely to happen.
And if it does, there will be different hop times between machines, network congestions happening, etc.
In a few words: the plateform is bound to be heterogeneous to a noticeable degree.

Yet, most distributed systems fail to take that into account. There is always a grand master (or, a boss, to fall back to the teamwork metaphor) which will decide everything.
Not to mention most will assume an equal RAM, link speed, etc.
That model has at its root a hidden major flaw: the boss needs to know about each and every machine status, do estimates on each one as to where to push jobs. In other word, it has to do a lot of tracking, guessing, ordering, keeping track on dead machines, etc.

What about now reversing the model: the workers ask for work. What happens then ?
The only thing for the boss to handle is to have a job queue, and keep track of dead tasks. Workers keep on asking for jobs.
And that's it. The work is consumed at the full available computing power with little managerial tasks. The fine tuning boils down to finding the right slice size of work.


This model is resilient and simple. Works well with multithreading too.

So, pile up tasks, and have your workers ask for the next task when they're done.

A simple libevent http get server example

I recently tried to serve json queries in the "lightest" way, with a simple C program, with as little overhead as possible (no fcgi module, no scripting, standalone).

It happens that libevent has already all one need to do just that: answer http queries with as little overhead as possible, laid on an efficient networking library.

The trouble is, there is some documentation, but not that much, and a few pieces are missing to write a simple GET http server.

So, here is a skeleton for a simple GET http server with libevent.
It's handy for a very simple GET json server for instance.

Just in case that's of any help.