Wednesday, February 13, 2013

Quicksort

Quicksort is a sorting algorithm that runs on average O(nlogn) and does in-place sorting. This saves space because it does not require a temporarily array to store values like in MergeSort.

The key concept behind Quicksort is the partitioning around a pivot. Choose one of the elements in the array as a pivot. Then rearrange the array so that everything less than the pivot is to its left and everything greater is to its right. Note that this rearrangement does not mean the entire array has to be sorted after this one partition. However, the pivot does end up in the "right" spot. That is, the pivot will not be moved anymore because it is already in the spot where it will be in the sorted array.

Partitioning can be done in linear time without allocating any extra memory and reduces the problem size.

QuickSort(Array a, length n)

  • if n = 1 return
  • p = ChoosePivot(a, n)
  • Parition a around p
  • QuickSort(first part)
  • QuickSort(second part)
Partitioning
The easy to partition the array would be to allocate a temporary array using O(n) space. This will still partition around the pivot in O(n) time. Assuming the pivot is the first element in the array, you traverse the array and keep a partitioned part and an unpartitioned part. You use one pointer to traverse the array and another to point to the first element in the partitioned part larger than the pivot. When you find an element smaller than the pivot, you simply swap it with the pointer's element. When you are finished, swap the element one before the pointer with the pivot to finish partitioning.

Finding the Optimal Pivot
So far, we have used the first element in the array as the pivot. What if the array were already sorted, or reverse sorted? Then partitioning would leave the pivot either in the same spot or in the back. Then we'd recursively call QuickSort on the (n-1) length subarray. We do this over and over until the array is sorted. This takes O(n2) time.

What would be a better way? If we want to balance the two subarrays so that they are about equal in length, then we would want to find the median. Thus, if we are lucky enough to be able to get the median as the pivot every time, then the algorithm would be much like MergeSort without the extra memory allocation, running in O(nlogn).

What if finding the median is too hard? What other choices do we have. Well, it is established that if we are able to split the subarrays just 25%/75%, then we will still have O(nlogn) running time. If we had 100 elements, we just have to pick an element that is anywhere from 25th smallest to 75th smallest. This is half the total elements, so we have 50% chance of picking this pivot.

As a side note, this is an instance of adding randomness to our algorithm. Randomness is actually really useful in many applications because it makes things easier, more elegant, and efficient, but more on this later.

In conclusion, we have established that a randomized QuickSort has an average running time of O(nlogn), a blazingly fast algorithm without the extra space allocation.

No comments:

Post a Comment