Finding the minimum sub-string
Problem
Given the input string str1 and the seed string str 2, write the program to find the minimum sub-string of str1 that is inclusive of str 2.
Example)
Case 1) str1 = "abbcbcba" and str2 = "abc", answer = "cba"
Case 2) str1 = "caaababc" and str2 = "abc", answer = "abc"
Similiarity
This question is very similar to the "finding the minimum sub-sequence given three strings" except for the additional string that should not be included.Brute-force recursive approach
Inefficient but simplest way is to search from the beginning of the input string, str1 and try the all the possible permutation which contains str2. Algorithm will be like as below:
1. create the map containing the letters in str2 along with its occurrence count
2. for i to str1.len then ==> O(N)
2-1. try all substrings starting from str1[i] and find the sub string contains str2. ==>O(N)
2-2. check if found string is the smaller then the previously found string
3. return the smallest substring.
Running time of this approach will be O(N2)
Pervious algorithm does get the results but it is not efficient. It examines the same substring multiple times.
Better way is what I call, moving window approach. See the following illustration.
In this algorithm, the window start with the one character and increase each iteration. In doing so, the count of matching letters, matching count is also counted. If the matching count becomes 3 as in the last iteration in Figure 3. We take the sub string from s to i and store it.
After that, the algorithm reduces the window from the left as long as the matching count is 3. If matching count become 2 as in Figure 2, the window start growing to the right again following the same rules in Figure 1 until the next substring is found.
Each time new substring is found, only shortest sub string is stored or ignored otherwise.
In this algorithm, we reduce the previous letters after finding the candidate substring, which removes the chances of examining the same substrings as in the brute force approach.
Running time of this approach will be O(N).
Here is complete code of DFS approach
1. create the map containing the letters in str2 along with its occurrence count
2. for i to str1.len then ==> O(N)
2-1. try all substrings starting from str1[i] and find the sub string contains str2. ==>O(N)
2-2. check if found string is the smaller then the previously found string
3. return the smallest substring.
Running time of this approach will be O(N2)
Better approach with moving window
Pervious algorithm does get the results but it is not efficient. It examines the same substring multiple times.
Better way is what I call, moving window approach. See the following illustration.
![]() |
Figure 1. Algorithm illustration. |
After that, the algorithm reduces the window from the left as long as the matching count is 3. If matching count become 2 as in Figure 2, the window start growing to the right again following the same rules in Figure 1 until the next substring is found.
Each time new substring is found, only shortest sub string is stored or ignored otherwise.
In this algorithm, we reduce the previous letters after finding the candidate substring, which removes the chances of examining the same substrings as in the brute force approach.
Running time of this approach will be O(N).
Here is complete code of DFS approach
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<string> | |
#include<map> | |
#include<vector> | |
#include<iostream> | |
using namespace std; | |
void dfs(string& input, int pos, string cur, map<char, int>&hashtable, string &found) | |
{ | |
if (hashtable.size() == 0) | |
{ | |
found = cur; | |
return; | |
} | |
if (pos == input.length()) | |
{ | |
return; | |
} | |
int c = input[pos]; | |
if (hashtable.find(c) != hashtable.end()) | |
{ | |
hashtable[c]--; | |
if (hashtable[c] == 0) | |
hashtable.erase(c); | |
} | |
cur.push_back(c); | |
dfs(input, pos+1, cur, hashtable, found); | |
} | |
void search(string & input, int pos, map<char, int> hashtable, string& found) | |
{ | |
dfs(input, pos, "", hashtable, found); | |
} | |
string find_min_str(string input, string seed) | |
{ | |
string min; | |
map<char, int> hashtable; | |
for (int i = 0; i < seed.length(); i++) | |
{ | |
if (hashtable.find(seed[i])!= hashtable.end()) | |
hashtable[seed[i]]++; | |
else | |
hashtable[seed[i]]=1; | |
} | |
for (int i = 0; i < input.length(); i++) | |
{ | |
string found; | |
search(input, i, hashtable, found); | |
if(found.length() == 0) | |
continue; | |
if (min.length()==0) | |
min = found; | |
else if(min.length() > found.length()) | |
min = found; | |
} | |
return min; | |
} | |
int main() | |
{ | |
string input = "abbcbcba"; | |
string input2 = "caaababc"; | |
string s ="abc"; | |
cout << "found = " <<find_min_str(input, s)<<endl; | |
cout << "found = " <<find_min_str(input2, s)<<endl; | |
return 1; | |
} |
Here is the complete code of moving window approach
Python version of moving window scanning
Python version of brute-force search
Practice statistics
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<map> | |
#include<string> | |
#include<set> | |
#include<iostream> | |
using namespace std; | |
string find_min_str(string input, string s) | |
{ | |
set<char> seed; | |
map<char, int> charMap; | |
int i =0; | |
int start = 0, count = 0; | |
int min_len = INT_MAX; | |
string minstr; | |
for(i = 0; i < s.length(); i++) | |
{ | |
if (seed.find(s[i])== seed.end()) | |
seed.insert(s[i]); | |
} | |
for (i = 0; i < input.length(); i++) | |
{ | |
char cur = input[i]; | |
if (seed.find(cur) != seed.end()) | |
{ | |
if (charMap.find(cur) != charMap.end()) | |
{ | |
if (charMap[cur] == 0) | |
count++; | |
charMap[cur]++; | |
} else { | |
count++; | |
charMap[cur] = 1; | |
} | |
} | |
while(count == seed.size()) | |
{ | |
int c = input[start]; | |
if (charMap.find(c) != charMap.end()) | |
{ | |
if (charMap[c] == 1) | |
count--; | |
charMap[c]--; | |
} | |
if (i-start+1 < min_len) | |
{ | |
minstr = input.substr(start, i-start+1); | |
min_len = i - start+1; | |
} | |
start++; | |
} | |
} | |
return minstr; | |
} | |
int main() | |
{ | |
string input = "abbcbcba"; | |
string input2 = "caaababc"; | |
string s ="abc"; | |
cout << "found = " <<find_min_str(input, s)<<endl; | |
cout << "found = " <<find_min_str(input2, s)<<endl; | |
return 1; | |
} |
Python version of moving window scanning
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Given the input string str1 and the seed string str 2, write the program to find the minimum sub-string of str1 that is inclusive of str 2. | |
Example) | |
Case 1) str1 = "abbcbcba" and str2 = "abc", answer = "cba" | |
Case 2) str1 = "caaababc" and str2 = "abc", answer = "abc" | |
''' | |
class FindMinimumString: | |
def findMinimum(self, longer, shorter): | |
str_map = { c : 0 for c in shorter } | |
score = 0 | |
minstr = longer | |
start = end = 0 | |
len_longer = len(longer) | |
while end < len_longer: | |
#add next string | |
if longer[end] in str_map: | |
score = score + 1 if str_map[longer[end]] == 0 else score | |
str_map[longer[end]] += 1 | |
#check if we can trim left | |
while longer[start] not in str_map or str_map[longer[start]] > 1: | |
if longer[start] in str_map and str_map[longer[start]] > 1: | |
str_map[longer[start]] -= 1 | |
start += 1 | |
if score == 3 and (end - start +1) < len(minstr): | |
minstr = longer[start: end + 1] | |
end += 1 | |
return minstr | |
longer = "abbcbcba" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) | |
longer = "caaababc" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) | |
longer = "rcdabcdf" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) |
Python version of brute-force search
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Given the input string str1 and the seed string str 2, write the program to find the minimum sub-string of str1 that is inclusive of str 2. | |
Example) | |
Case 1) str1 = "abbcbcba" and str2 = "abc", answer = "cba" | |
Case 2) str1 = "caaababc" and str2 = "abc", answer = "abc" | |
''' | |
class FindMinimumString: | |
def findMinimum(self, longer, shorter): | |
str_map = { c : 0 for c in shorter } | |
score = 0 | |
minstr = longer | |
start = end = 0 | |
len_longer = len(longer) | |
while end < len_longer: | |
#add next string | |
if longer[end] in str_map: | |
score = score + 1 if str_map[longer[end]] == 0 else score | |
str_map[longer[end]] += 1 | |
#check if we can trim left | |
while longer[start] not in str_map or str_map[longer[start]] > 1: | |
if longer[start] in str_map and str_map[longer[start]] > 1: | |
str_map[longer[start]] -= 1 | |
start += 1 | |
if score == 3 and (end - start +1) < len(minstr): | |
minstr = longer[start: end + 1] | |
end += 1 | |
return minstr | |
longer = "abbcbcba" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) | |
longer = "caaababc" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) | |
longer = "rcdabcdf" | |
shorter = "abc" | |
search = FindMinimumString() | |
print ("longer: %s shorter: %s , answer = %s" % (longer, shorter, search.findMinimum(longer, shorter))) |
Practice statistics
33:00 to write up the DFS approach
20:00 to write up the moving window approach
Comments
Post a Comment