Analysing the COVID-19 “Binary Search” testing algorithm via Monte Carlo method (Part 1)

5 min readApr 4, 2020

A few days ago, a tweet popped up in my feed. Here it is:

As a Computer Science undergraduate, this piqued my interest. How much of an improvement would this algorithm be? A normal test, through N people would take time O(N), and n testing kits. Clearly, the worst case scenario for the above algorithm is the same, if all the patients are positive.

However, given that only a percentage of the population that is being tested is actually infected, say X%, what kind of improvement would this algorithm bring? The tweet claims 10x which is pretty interesting. Since we are in probability land now, we will have to calculate the expected running time, i.e E[T(N)] instead of just T(N), which by itself would be a random variable depending on the population being tested.

So how do we calculate this probability, and other interesting facts about the distribution of T(N)? As a lazy person who wants to avoid putting pen to paper, I propose a simple system for calculating expected complexity class of a probabilistic algorithm that utilises Monte Carlo simulations, and we’ll use it to analyse how the given algorithm performs given X.

Monte Carlo Simulation, explained with Minecraft mining strategies

Lately I’ve been playing a lot of Minecraft. I was desperately looking for better mining strategies, so I can get more diamond, and make progress in the game. The world in Minecraft is procedurally generated, which means it’s sort of kind of random. All blocks generated are random variables, that it to say.

So now I thought, well. I would like to know the expected number of diamond blocks I would find at a depth say, N, so I will only mine at that level and not waste time in low diamond areas. I am not a hacker, and have no access to the code generating the system, and I have anti-Google syndrome. What do I do? How do I find out the best depth for mining?

So here’s an idea. Make Minecraft generate many worlds. For each depth, say at level 18, we record the amount of diamond blocks generated at that depth. Now we take the average across all worlds, and by the central limit theorem, we have an estimate for the expected number of diamond blocks at that depth.

Now do that for each depth, and we have a function Diamond(D), with depth as a parameter!

Sidenote: Mean is not all that matters

Being a wise miner, I would want to mine at the depth where the expectation of diamond blocks is highest, right? But wait. There’s way more to this story than you would realise.

Would you mine a depth with mean 10 and variance 6 or a depth with mean 8 but variance 1? The first one is more profitable on average, but it involves more risk too. The second is safer, but less profitable on average. Any more, and we would enter risk management territory.

Monte Carlo is an expensive place

But there are issues here. How many worlds before the average depths actually converge to the expected depth? That’s a lot of compute we’re going to have to consume there, generating worlds. What if the world generator doesn’t even do the same thing across worlds? Questions, questions. We’ll see how this pans out later on.

Monte Carlo, in many ways of interpretation, is an expensive place. In fact, the method is named after a casino called Monte Carlo, in Monaco, Italy. Good trivia.

But anyway, what we pulled off here is called going to Monte Carlo. Let’s make it concrete. We have a function that lets us generate a sample from that distribution. We have no information how it works, but we can make samples at will. We generate samples, take sample statistics, analyse as required, and stay home.

Another example of Monte Carlo

A square with a closed loop (circle in this case) inside it. Find the area.

Let’s say I give you a unit area square with side length 1, and inside that square is a closed loop. I want you to find the area of that loop. I will only provide you with one function, nothing else. Call it bool sampler(). Sampler picks a random point from the square (uniformly random, mind it) and gives you true if that point is inside the closed loop, false if not.

How would you find the area enclosed by the closed loop? Left as an exercise to the reader 😉.

Designing a system for complexity class inference of probabilistic algorithms

So to analyse the average complexity of the COVID-19 “binary search” algorithm above, we would like a system that takes a function and it’s input space as inputs, and spits out a graph or complexity class of the T(N) function.

We apply the same idea. We generate random samples from the domain of input space given, and then run the algorithm on it. We record the number of costly operations utilised, and then visualise the distribution obtained.

Now we should probably pick an easy starting function so that we know that the system actually works before using it on the “binary search” algorithm. So we start with actual binary search 😄

A simple start

Notebook Link

I’ve made a starter notebook for this system, and it seems to work well on vanilla binary search, giving a very pretty logarithmic graph when run with huge number of samples.

Update! Part 2 is out!