Published 6 years ago

Updated 2 years ago

Heap is a data structure that can fundamentally change the performance of fairly common algorithms in Computer Science.

The heap data structure is called a heap because it satisfies the heap property. The heap property states, that if P is a parent node of C, then the value of node P is less than the value of node C.

In this lesson we discuss the key operations of the Heap data structure along with their performance. We then implement the data structure using TypeScript / JavaScript.

[00:00] Here we have a graph of nodes A, B, C, D, E. It is nice to have a mental model of a heap as a complete binary tree. The tree would satisfy the heap property if A is less than its children, B and C. Similarly, B is less than its children, D and E.

[00:22] Note that if the heap property is satisfied for direct children, it is automatically satisfied for any indirect children. For example, A is less than D, and A is less than E. If the heap property is satisfied by each parent, it implies that the smallest item in the tree has to be the root node. The heap data structure is called a heap because it has this heap property.

[00:48] Also worth mentioning is the fact that since it is a complete binary tree, the maximum number of levels will be of the order log N, where N is the number of items in the tree.

[01:01] Note that a given set of values can be arranged in multiple ways to satisfy the heap property. For example, given the numbers 4, 4, 5, 6, 9, we can have the following two trees. Both are perfectly valid, as each node is smaller than all of its children.

[01:20] The only position that is guaranteed to be the same among all possible heaps with a given set of values is the root node. The root will always be the item with the smallest value.

[01:33] Even though it's nice to have a mental model of a heap as a complete binary tree with pointers, they are normally implemented as an array. An array provides very low memory overhead, and the left and right child nodes can be calculated quite easily.

[01:50] Consider this simple array of indexes 0to 13. It can be conceptualized as a tree with indexes corresponding to the items of the array.

[02:00] When we are done with the left and right nodes at any given level, we start a new level, and consider these indexes as the left and right of the items of the previous level. We stop whenever we run out of items in the array.

[02:16] Given a node at index N, let's try and figure out a formula to find the index for its left child node. As always, it's a great idea to write down a few examples so we can use these to test our formula.

[02:30] The left node of the zero to index is one. The left node of the first index is three. The left node of the third index is seven.

[02:41] The key realization for the formula is that for an item at index N, we will essentially have to skip N spaces on its right, and one more to arrive at its left node. We arrive at the formula for left of N as simply 2n + 1. Our examples are all satisfied by this formula.

[03:09] Now the right child node is simply one plus the left node, so its formula would be 2n + 2. Not only can we traverse the children of such a binary tree by simple math, we can also traverse the parent of a given index simply by going back from our left and right formulas.

[03:31] We start off with different parent formulas for left and right children. From 2n + 1, we can see that the left node indexes must be an odd number, and 2n + 2 means that right node indexes must be an even number.

[03:50] At a given index, if it is even, we use the parent right formula. Otherwise, we use the parent left formula. With these left, right, and parent equations, we can easily traverse this array-backed binary tree without needing to have pointers.

[04:07] This usage of an array as the backing storage is one of the reasons why heaps are extremely popular. Furthermore, pointer traversal can be done with simple math, which can be done very efficiently with bit-shifting breaks for powers of two.

[04:23] We now have enough context to jump into the implementation of the heap's excellent O(log n) add and extract root methods. These are the raison d'être of the heap data structure.

[04:38] To allow us to compare two items of type T, we need a compare function. This compare function will follow the same semantics as offered by the built-in JavaScript arrays sort method.

[04:54] We start off by creating a generic class for a heap for items of type T. We create a backing data storage as an array for items of type T. We have a constructor that accepts a compare function for items of type T, and then we go ahead and write a private node through our cell functions.

[05:20] One for the left child node, one for the right child node, and finally, a traversal to the parent from a given node index which has different implementation depending on the index being even or odd, as we discussed before.

[05:38] Now let's go ahead and create our add method that adds a given element into the heap in O(log n). It takes a new item of type T, and we just go ahead and add the item to the end of the array.

[05:56] As soon as we do that, there may be a violation of the heap property between this new node and its parent, so we will go ahead and add a sift-up operation that makes sure that the smallest node rises to the top of the heap again.

[06:13] Within the sift-up operation, we simply load the parent, and in each attrition, we check if the item at the given index is smaller than its parent. We simply swap the item with its parent, restoring the heap property.

[06:36] In the next attrition, we check this new item against its parent. The loop terminates when the item is no longer violating the heap property with respect to its parent, or release the top, in which case the item becomes the new root.

[06:51] Since we only traverse the depth of the tree with the sift-up operation, its run time complexity will be log N, and therefore, the add operation will also have a run time complexity of log N. The other key operation of the heap is the extract root method, which retrieves and removes the root element of the heap in log N.

[07:16] The root element will be the smallest element in the heap, forum in heap. Of course, we can only remove an item if there are any items. Therefore, the method might return undefined.

[07:28] We check if there are any items in the array. If there are, we fetch the root. If we remove this root item, there will be a hole in the head of the internal array. The best way to really fill this hole is with the last item in the array. That would result in the minimum number of array index changes.

[07:49] Now if the array still contains any items, we move this last item to the head of the array, and then, just like add, we might have an inconsistency that the new root which we blindly fetched from the end of the array might be larger than one of its children.

[08:06] This would violate the heap property. We fix this inconsistency by moving it down using a sift-down routine. Finally, we would return the root.

[08:18] Now let's create the sift-down routine. The main objective is to move the biggest value of the triangle parent left, right, down to remove a heap violation. We will use a utility main index method, which simply returns the index that contains the smaller value among a given left, right.

[08:43] If the right child index is out of range, we check if the left child index is still in range. If not, we return minus one to signal the end. Otherwise, we return the left child index. Else, we simply compare the left and the right child index values and return whichever one is smaller.

[09:10] We kick off by comparing the left and the right nodes of the input index. At each point, we switch the current node with the smaller of its two children. This ensures that the new parent will be the smallest in the current parent left, right triangle.

[09:34] We simply continue until the heap violation is removed. That is, the parent is smaller than its children, or we reach the leaves, in which case there are no children, and therefore, no more heap violation.

[09:46] Again, the maximum amount of work will be equal to the depth of the tree, which is of the order log N. As a final exercise, we can add a size method like we do for many of our other data structures. In this case, we can simply return the length of the underlying array.

[10:05] Another useful method to have is peak, which, instead of removing the item, only looks at its value. We can do that quite easily by simply returning the item at the zero to index. Now let's look at an example of this data structure in action. We go ahead and create a heap of numbers, passing in an ascending order of comparison function.

[10:32] We add in a few numbers in some random order. At this point, the size of the heap should be three, and if you go ahead and run the demo, it works as expected.

[10:47] Now we can extract the root value one by one from the main heap. We simply go ahead and call the extract root method a few times, and the numbers come out in ascending order. If you run the demo, it works as expected.

[11:10] Worth mentioning is the fact that a heap can easily be changed into a max heap by simply changing the order of occupants in the comparison function. Now when we extract the root, it should come out in descending order.

[11:26] There are quite a few use cases where this O(log n) insertion and extract can greatly improve the efficiency of simple programming challenges. Basically, whenever you see an algorithm requiring repeated minimum or maximum computations, consider using a heap.