Tuesday, March 5, 2013

Heaps

A heap is a data structure that contains objects which have comparable keys. Some uses for a heap include: keeping track of employee records where the social security number is the key, network edges, events where there is a priority, etc.

There are two basic operations for a heap.
(1) Insert - add a new object to the heap
(2) Extract-min - remove an object in the heap with a minimum key value (with ties broken arbitrarily). You could always makes this an extract-max operation or negate every key and you will get the maximum when you extract-min.

The running time for both operations is O(logn) where n is the number of objects in the heap.

Another operation is heapify, which inserts a batch of objects into the heap. This seems like it should run in O(nlogn) but it can actually be done in O(n). Deletion can also be done in O(logn) time.

Applications
(1) One application is for heap sort. In selection sort, you compute the minimum multiple times, resulting in O(n2) running time. However, you can always heapify the set and then extract-min n times to get the elements in order. The running time for this algorithm is O(nlogn).

(2) A heap can also be called a priority queue. It can be used to schedule events and to check which event is going to occur next, using the timestamp as key. This makes it much faster than keeping an unordered list and finding the max each time.

(3) Thirdly we can use a heap for "median maintenance." You can given a sequence x1, ... xn and at each time step i, you need to give the median of {x1, ..., xi}. Use O(log i) time at each step i. We can solve this by maintaining two heaps, Hlow and Hhigh to keep track of the smallest half of the elements and the largest half of the elements respectively. Hlow supports extract-max while Hhigh supports extract-min. The key idea is to maintain the invariant that about i/2 smallest (largest) elements in Hlow (Hhigh).

When you add a new element, if it is smallest than max element in Hlow then add it in there. Otherwise, add to Hhigh. What if there is an imbalance? That is, if Hlow has two more elements than Hhigh? Then you simply extra the max of Hlow and add to the other heap.

So how do we compute the median? It is the max of the Hlow or the min of Hhigh. If i is even, then they are both medians. Otherwise, the median is just from the heap that has one more element than the other.

No comments:

Post a Comment