Deciding Without the Future: An Introduction to Online Algorithms
Deciding Without the Future: An Introduction to Online Algorithms
Almost every system you have ever built makes decisions before it is allowed to know what comes next. A cache has to evict something now, without knowing which pages you will ask for in the next hour. An autoscaler adds a server now, not knowing whether the traffic spike will last two minutes or two days. You decide whether to rent or buy, whether to accept or reject, whether to keep or discard, and the rest of the sequence only reveals itself afterward, one request at a time. This is the normal condition of real software, and it is exactly the condition that classical algorithm analysis quietly ignores.
Most of algorithms is offline: the whole input sits in front of you, and you are free to look at all of it before committing to a single move. Sorting, shortest paths, dynamic programming, they all assume the data is already there. Online algorithms drop that assumption. The input arrives as a stream, you must respond to each piece immediately and irrevocably, and you are judged on the choices you made without the benefit of hindsight. The natural question, and the one this whole series is about, is disarmingly simple: how much does it cost you to not know the future?
The competitive ratio: measuring the price of no crystal ball
To answer that, we need something to compare against, and the fair yardstick is the best you could have done with perfect foresight. Call it OPT, the offline optimum: an omniscient algorithm that sees the entire sequence in advance and plays it perfectly. You will never beat OPT, because it knows everything you do not. The competitive ratio measures how close you can stay to it anyway.
An online algorithm A is c-competitive if there is a constant α such that, for every possible input sequence σ,
If α = 0, we call A strictly c-competitive. Strip away the notation and the meaning is clean: no matter what sequence arrives, A never pays more than c times what the all-knowing optimum pays, plus some fixed slack that stops mattering as the input grows. A 2-competitive algorithm, the number that shows up again and again, is one that in the worst case pays at most double what OPT pays. Never more than double, no matter how the input is arranged.
That last clause is the heart of it. The competitive ratio is a worst-case guarantee, taken over every sequence, including the most hostile one imaginable. The clean way to picture this is as a game against an adversary who knows your strategy completely and gets to choose the input specifically to hurt you. If your algorithm is c-competitive, then even that adversary, doing its absolute worst, cannot make you pay more than c times OPT. There are no assumptions about the input being random, or typical, or well-behaved. The guarantee holds against everything. That is what makes it so strong, and, as we will see, sometimes so pessimistic.
The cleanest example: renting versus buying skis
The problem that makes all of this click is ski rental. You are going skiing, but you have no idea for how many days. Renting skis costs $1 per day. Buying them outright costs $B. Each morning you must decide, rent again or buy, and once you buy you are done paying. If you knew the total number of days in advance, the choice would be trivial: ski fewer than B days and you rent the whole time, ski more and you buy on day one. That is OPT, and it needs the future to make the call.
You do not have the future. So here is the online strategy: rent until you have spent $B on rentals, and then buy. That simple rule is 2-competitive. If the season turns out short, you only ever rented, exactly matching OPT. If it turns out long, the worst case is that you rent for B - 1 days and then buy, paying about $2B, while OPT would have bought on day one for $B. Twice the optimum, never worse, and no crystal ball required. Even better, you cannot do meaningfully better: one can prove that no deterministic online algorithm for ski rental beats a ratio of 2. The break-even rule is not just good, it is optimal.
Why a ratio of 2 is worth celebrating
Coming from offline algorithms, a factor of 2 might sound like a lot to give up. In the online world it is often a triumph. For a great many problems, a small constant competitive ratio is the best achievable, and there are matching lower bounds, adversary arguments that prove no online algorithm, however clever, can do better. Ski rental's lower bound of 2 is the first of many we will meet. And when the deterministic ceiling frustrates us, a surprising escape hatch appears: randomization. An algorithm that flips coins the adversary cannot predict can sometimes beat the deterministic bound, and randomized ski rental pushes the ratio down from 2 to about 1.58. The interplay between what an adversary can force and what a little randomness can buy back is one of the most beautiful threads in the whole subject.
What this series covers
Each installment takes one classic online problem and pulls the competitive ratio out of it, usually with a matching lower bound so you see both what is achievable and what is impossible:
- Ski rental and rent-or-buy, the archetype, deterministic and randomized.
- Paging and caching, where LRU, FIFO, and their friends turn out to be k-competitive, and randomization buys an exponential improvement.
- Load balancing, where Graham's dead-simple greedy scheduler is (2 − 1/m)-competitive.
- Self-organizing lists, where move-to-front is 2-competitive, proved with a lovely potential argument that connects straight back to caching.
- Online matching, the RANKING algorithm and its 1 − 1/e ratio, the theory underneath ad auctions.
- Lower bounds and randomization, the adversary's side of the table, and Yao's principle for proving even randomized algorithms cannot do too well.
An honest word on what it does and does not tell you
Competitive analysis is worst-case and adversarial, and that is both its strength and its blind spot. It gives you a guarantee with zero assumptions, which is priceless when you genuinely do not know the input distribution. But the adversary it imagines is often far nastier than reality. LRU is only k-competitive on paper, yet it is beloved in practice because real access patterns are nothing like the worst case. That gap has spawned a whole refinement toolkit, resource augmentation, distributional and beyond-worst-case models, that we will touch on when the pure ratio feels too gloomy. Treat the competitive ratio as the first question to ask about an online decision, not the last.
The thread running through everything ahead is the one worth holding onto: the competitive ratio is the price of uncertainty, the measurable cost of having to act before the future arrives. Learn to see it, and you start designing systems that stay graceful against whatever an adversary, or an unpredictable world, decides to throw at them next.