# On the German Tank / Taxicab Problem

I always get giddy when I can apply real statistics and math to problems in my life. Recently, I had an opportunity to apply the ‘Taxicab Problem’ to something that came up at work. Given that I work for a ridesharing platform and I was quite literally counting “taxis” (or at least cars meant to drive others around), this was doubly exquisite.

For the uninitiated, the Taxicab / Germany Tank problem is as follows:

*Viewing a city from the train, you see a taxi numbered x. Assuming taxicabs are consecutively numbered, how many taxicabs are in the city?*

This was also applied to counting German tanks in World War II to know when/if to attack. Statstical methods ended up being accurate within a few tanks (on a scale of 200-300) while “intelligence” (unintelligence) operations overestimated numbers about 6-7x. Read the full details on Wikipedia here (and donate while you’re over there).

Somebody suggested using $2 * \bar{x}$ where $\bar{x}$ is the mean of the current observations. This made me feel all sorts of weird inside but I couldn’t quite put my finger on what was immediately wrong with this (outside of the glaringly obvious, touched upon shortly). On one hand, it makes some sense intuitively. If the distribution of the numbers you see is random, you can be somewhat sure that the mean of your subset is approxiately the mean of the true distribution. Then you simply multiply by 2 to get the max since the distribution is uniform. However, this doesn’t take into account that $2 * \bar{x}$ might be lower than $m$ where $m$ is the max number you’ve seen so far. An easy fix to this method is to simply use $max(m, 2 * \bar{x})$.

Read on →