Produce the non-repeating string from a given string [updated]

Problem  

Rearrange characters in a string so that no character repeats twice.

  Input: abcccddaaabc

  output: cacacbadcdab


Finding the pattern


First all, given string "abcccddaaabc", we need to understand that we first need to find the longest repeat string, which are "aaaa" or "cccc".

After that we need to put shorter repeated characters between longest string, so that we can produce the non-repeated string.

One more thing we need to keep in mind is that "longest repeated" string is the string is not used to produce the resultant string yet.

This means we need to find new longest repeated string each time. Look at the example in Figure 1.
Figure 1: Sample algorithm
In iteration 3, the longest repeated strings changes to 'b' from 'a'.
This means we need to keep track of the current longest repeated string in each iteration, which is quite expensive.

Here is the rough algorithm.

1. Sort the input string in ascending order (descending order is ok as well.) - $O(N \log N)$

2. Count the occurrence of each character and put { char, occurrence count} pair into Max Heap, which put the node with largest occurrence count on the time of the tree.

3. Take two nodes from heap and put two characters into the result string avoiding the repetition.

4. Decrease the occurrence counts of two nodes by 1.

4. put back the nodes if their occurrence count is still positive.

Step 2 - 3 will take $O(N \log N)$, $O(N)$ for iterating over the Max Heap, and $O(\log N)$ for extracting the max node and putting back the nodes.

Overall running time is $O(N \log N)$, where N is the length of the input string.

Here is the complete code.



Practice statistics

13:00 to think up the algorithm
14:12 to hand-wrote the solution.
14:12: to write the solution in the IDE.
13:05: to write up the max heap code. (got the bubbleUp() method wrong).

UPDATE(2019-07-20): solved again and first attempt used the hashtable to record the occurrence but
 tried to reconstruct the solution in round robin fashion regardless of each character's repetition count.

Problem of this approach is that it is possible that shorter chars can be left last instead of the longest char.

In the second attempt, I used the MaxHeap to keep the {char: repeat count} in descending order.
this approach ensures the longer chars are filled with shorter chars thereby we will end up with the longer chars left last.

Here is python solution:

First attempt:

51:50: failed to come up with a correct and working solution.

Second attempt:

Used a MaxHeap and took 30 mins to write the working code. 

Comments

Popular posts from this blog

Planting flowers with no adjacent flower plots

Stock price processing problem