Sorting the strings in 20GB file with one string in each line.

Problem

Imagine you have a 20 GB file with one string per line. Explain how you would sort strings in the file.

This is quite similar to "K way merge problem" except for the fact that we first need to bring 20 GB file to the K partially sorted files to perform the K way merge.

Here is how we do it.

1. Split 20 GB files into K files 

Since we have limited memory, we first need to split 20 GB files into K sub-arrays, each of which contains M integers that can fit into the memory available.

2. Sort K files individually, $O(K \times M \log M)$

Now, we can sort each chunk with size of M either through quick sort or merge sort. In this case, in place Quick Sort will be more efficient. Running time of this step will be $O(K \times M \log M)$, were K is the number of sub-arrays and M is the number of integers in each sub-array.

3. Perform the K way merge, $O(N \log K)$

Now, we have K partially sorted array of integers, which can be merged through K way merge.
The details algorithm of K way merge can be found in "K way merge problem"
Running time of K way merge will be $O(N \log K)$, where N is the number of integers in 20 GB file and K is the number of partially sorted array.

Overall time complexity of this sorting algorithm will be $$O(K \times M \log M)$$.


Comments

Popular posts from this blog

Planting flowers with no adjacent flower plots

Stock price processing problem