Finding k largest elements - revisit
2014.08.22
Jingchao Hu
Caigen100 Corp.
The Problem
- Find the largest number from 10000 numbers?
- Find 2 largest numbers from 10000 numbers?
- Find 100 largest numbers from 10000 numbers?
- Find k largest numbers from N numbers?
What's the best time complexity for it?
O(Nlogk) ?
No, it's O(logN)
Method 1: sort
- Sort all the N elements -> O(NlogN)
- Pick the last k elemetns -> O(k)
- Total cost O(NlogN)
Method 2: Temporary array
- Store the first k elements in a temporary array temp[0..k-1].
-
For each element x in arr[k] to arr[n-1]
- Find the smallest element in temp[], let the smallest element be min -> O(k)
- If x is greater than the min then remove min from temp[] and insert x.
- Print final k elements of temp[]
- Total cost: O((n-k)*k)
Method 3: using heap
- Pick first k elements, build a minheap -> O(klogk)
- For x in k+1 to n elements
- heap push x -> O(logk)
- heap pop minimum -> O(logk)
- The final heap contains the k largest elements
- Total Cost:
- O(klogk + (n-k)logk)
- O(Nlogk)
Method 3 seems good enough!
The candidates usually need a lot of tips and advices before they can actually come to this conclusion
However, it's still not the best answer
Revisit Heap
- Structure: complete binary tree
- Insert: Olog(N)
- pop: Olog(N)
buildheap = insert N times = O(NlogN)- buildheap = O(N)
Why O(N) of buildheap
-
1 The first n/2 elements go on the bottom row of the heap. h=0, so heapify is not needed.
-
2 The next n/4 elements go on the row 1 up from the bottom. h=1, heapify filters 1 level down.
-
i The next n/2^i elements go in row i up from the bottom. h=i, heapify filters i levels down.
-
log(n) The last 1 element goes in row log(n) up from the bottom. h=log(n), heapify filters log(n) levels down.
-
O(n*(1/4+2/8+3/16+..i/2^(i+1)..+log(n)/n))
= O(n)
Method 3*: using heap*
- Build a maxheap of all -> O(N)
- Heappop for k times -> O(klogN)
- Total Cost: O(N+klogN)
Anything better than that?
Method 4: Quick Select
- Quick select - Find k-th elements in array in O(N)
- choose pivot, divide array into (left, right) partitions, where left<pivot<=right
- if length of right > k, do the partition again in right
- otherwise, right contains the max len(right) elements, then we find the (k-len(right))th element in the left
- We can see along the way we do quickselect, we not only sorted out the k-th element, but also the largest k elements.
- Time Complexity: O(N+N/2+N/4+...+1)
- => O(N)
Lessons learned
- Our memory can be wrong: build heap is cheaper than I can remember
- Our intuition can be wrong: it seems to us a linear solution is too good to be true, so we stopped at O(Nlogk)
- We are not as good as we think: keep learning, you always have room to improve
- Interviews are good, thinkings are good.
Finding k largest elements - revisit
By jingchaohu
Finding k largest elements - revisit
- 1,726