Finding k largest elements - revisit

2014.08.22

Jingchao Hu

Caigen100 Corp.

The Problem

  • Find the largest number from 10000 numbers?
  • Find 2 largest numbers from 10000 numbers?
  • Find 100 largest numbers from 10000 numbers?
  • Find k largest numbers from N numbers?

What's the best time complexity for it?

O(Nlogk) ?

No, it's O(logN)

Method 1: sort

  • Sort all the N elements -> O(NlogN)
  • Pick the last k elemetns -> O(k)
  • Total cost O(NlogN)

Method 2: Temporary array

  • Store the first k elements in a temporary array temp[0..k-1].
  • For each element x in arr[k] to arr[n-1]
    • ​Find the smallest element in temp[], let the smallest element be min    -> O(k)
    • If is greater than the min then remove min from temp[] and insert x.
  •  Print final k elements of temp[]
  • Total cost: O((n-k)*k)

Method 3: using heap

  • Pick first k elements, build a minheap -> O(klogk)
  • For x in k+1 to n elements
    • heap push x  -> O(logk)
    • heap pop minimum -> O(logk)
  • The final heap contains the k largest elements
  • Total Cost:
    • O(klogk + (n-k)logk)
    • O(Nlogk)

Method 3 seems good enough!

The candidates usually need a lot of tips and advices before they can actually come to this conclusion

 

However, it's still not the best answer

Revisit Heap

  • Structure: complete binary tree 
  • Insert: Olog(N)
  • pop: Olog(N)
  • buildheap = insert N times = O(NlogN)
  • buildheap = O(N)

Why O(N) of buildheap

  • 1 The first n/2 elements go on the bottom row of the heap. h=0, so heapify is not needed.

  • 2 The next n/4 elements go on the row 1 up from the bottom. h=1, heapify filters 1 level down.

  • i The next n/2^i elements go in row i up from the bottom. h=i, heapify filters i levels down.

  • log(n) The last 1 element goes in row log(n) up from the bottom. h=log(n), heapify filters log(n) levels down.

  • O(n*(1/4+2/8+3/16+..i/2^(i+1)..+log(n)/n))
    = O(n)

Method 3*: using heap*

  • Build a maxheap of all -> O(N)
  • Heappop for k times -> O(klogN)
  • Total Cost: O(N+klogN)

Anything better than that?

Method 4: Quick Select

  • Quick select - Find k-th elements in array in O(N) 
    • choose pivot, divide array into (left, right) partitions, where left<pivot<=right
    • if length of right > k, do the partition again in right
    • otherwise, right contains the max len(right) elements, then we find the (k-len(right))th element in the left
  • We can see along the way we do quickselect, we not only sorted out the k-th element, but also the largest k elements.
  • Time Complexity: O(N+N/2+N/4+...+1)
  • => O(N)

Lessons learned

  • Our memory can be wrong: build heap is cheaper than I can remember
  • Our intuition can be wrong: it seems to us a linear solution is too good to be true, so we stopped at O(Nlogk)
  • We are not as good as we think: keep learning, you always have room to improve
  • Interviews are good, thinkings are good.

Finding k largest elements - revisit

By jingchaohu

Finding k largest elements - revisit

  • 1,726