Man: Let's first refresh what is median. The median is the middle element in a sorted list. Given a list of numbers, if we sort them, the median will be the middle element. If we have an even number of elements, we would simply average the middle two elements.
The median maintenance problem is also called the continuous median problem. You are given numbers one by one, and at each point, you have to find the new median of all the numbers you have seen so far. A nice solution would be to track the numbers internally as a sorted array and at each new addition, put the new number in its rightful place. Let's code this up.
We create a class that will maintain the sorted array as data. Then within our add method, we will insert each new item into its sorted position within this data array. The implementation of our add method is very similar to the implementation of the add methods in min and max maintenance algorithms. We simply loop through all the existing items and compare the item against the items at the given index.
Finally, once we have the rightful place, we would return the new median. Within the median method, we simply check if the number of items is even. If so, we simply return the average of the two middle elements.
If the number of items is odd, then we only need to return the middle element. Since the add method needs to potentially loop through all the existing elements, it takes o of n time. We can do better than that. The trick is similar to what we did for the minimum and maximum maintenance algorithms.
This time, we use two heaps instead of one, one to maintain the lower set of elements and one heap to maintain the larger set of elements. The peak method of the smaller max heap will give us the n by twoth smallest element, and peeking into the root of the big items min heap will give us the n by twoth largest element. The median is simply the value among these two roots.
Let's code it up. We start by bringing in our heap data structure that we have written before. You will create a class to maintain these two heaps. We initialize the two heaps, one low max heap for the smaller items and one high min heap for the larger items.
Now we create our add method which will maintain these two heaps. It takes a new item. If there are no items in the low heap or the item is smaller than its root, then it belongs in the low heap. Otherwise, we add the item to the high heap.
The number of items in the two heaps might be imbalanced. We simply determine which of the heaps is bigger if any. If the difference of size between the two heaps is greater than one, then we can simply balance them by extracting from the larger heap and putting the item into the smaller heap.
Finally, we calculate the new median. If the heaps are of equal size, it means we have an even number of elements and the median is simply the average of the roots of the two heaps. Otherwise, it is the root of the heap that has more elements. That's it.
Since the only complex operations in the add method are not direct calls to our heap's add and extract root methods, we can do this in overflow and time. Let's test our algorithm with a quick example. We start off by creating this new data structure, add a few items, and at each point, log out the returned median.
We expect the median to be four, then between four and six to be five, and then between three, four, and six to be four. As you can see, it works as expected.