Guessing the word from a dictionary [reviewed]
Problem
Assuming you're playing one game that you need guess a word from a dictionary. You're given a machine you can try to guess the word, the machine will return how many characters has been matched by your guess. Design a system to crack the word.
Brute-force approach O(N×N!)
My first approach back in 2016 was to find the length and characters of the secret words by using the repeated words such as "aaaaaaaaaaaaaaaaaaaaaaaaaaa" with the target word.
Longest word in the dictionary is 45 characters-long. So, putting 100 of 'a' and comparing with the tar get word will tell us whether 'a' exists in the target word and how many 'a' is in the word if presents.
For example,
target word is 'beautiful', comparing with "aaaaaaaaaaaaaaaaaaaaaaaaaaa" will return 1.
By comparing from repeated 'a' to 'z', we can get the length of target word and list of alphabet characters.
Previous solution used these information to produce possible permutations and match the words.
This approach takes too long. Especially for 45 chars-longs words. (5.3829999e+57 times of comparison).
Here is the complete code running in O(N×N!)
Longest word in the dictionary is 45 characters-long. So, putting 100 of 'a' and comparing with the tar get word will tell us whether 'a' exists in the target word and how many 'a' is in the word if presents.
For example,
target word is 'beautiful', comparing with "aaaaaaaaaaaaaaaaaaaaaaaaaaa" will return 1.
By comparing from repeated 'a' to 'z', we can get the length of target word and list of alphabet characters.
Previous solution used these information to produce possible permutations and match the words.
This approach takes too long. Especially for 45 chars-longs words. (5.3829999e+57 times of comparison).
Here is the complete code running in O(N×N!)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<iostream> | |
#include<vector> | |
#include<map> | |
#include<string> | |
using namespace std; | |
class Tester { | |
public: | |
Tester(string target) :_guessWord(target) | |
{ | |
memset(cmap, 0, 512 * sizeof(int)); | |
create_map(_guessWord); | |
} | |
int CountMatched(string guess) | |
{ | |
int count = 0; | |
for (int i = 0; i < guess.length(); i++) | |
{ | |
count += (int)cmap[guess[i]]; | |
} | |
return count; | |
} | |
bool IsAnswer(string guess) | |
{ | |
return guess == _guessWord; | |
} | |
private: | |
void create_map(string target) | |
{ | |
for (int i = 0; i< target.length(); i++) | |
{ | |
cmap[target[i]]++; | |
} | |
} | |
string _guessWord; | |
int cmap[512]; | |
}; | |
string permutate_and_find(Tester& t, int len, vector<char>& set, char* freqenency, string cur) | |
{ | |
if (cur.length() == len) | |
{ | |
return (t.IsAnswer(cur)) ? cur : ""; | |
} | |
for (int i = 0; i < set.size(); i++) | |
{ | |
char c = set[i]; | |
if ((int)freqenency[c]>0) | |
{ | |
cur.push_back(c); | |
freqenency[c]--; | |
string found = permutate_and_find(t, len, set, freqenency, cur); | |
if (found.length()) | |
return found; | |
cur.pop_back(); | |
freqenency[c]++; | |
} | |
} | |
return ""; | |
} | |
string crack_by_permutation(Tester& t, int len, vector<char>& set, char* freqenency) | |
{ | |
string answer = permutate_and_find(t, len, set, freqenency, ""); | |
return answer; | |
} | |
string crackPassword(Tester & tester) | |
{ | |
int len = 0; | |
string answer; | |
vector<char> subset; | |
char freqenency[512]; | |
memset(freqenency, 0, 512 * sizeof(char)); | |
for (char i = 'a'; i <= 'z'; i++) | |
{ | |
string tmp; | |
tmp.push_back(i); | |
int count = tester.CountMatched(tmp); | |
if (count) | |
{ | |
subset.push_back(i); | |
len += count; | |
freqenency[i]=count; | |
} | |
} | |
//now we have set of chars and the length | |
//1. permutation K*K^l, where K is # of chars, l is the length of the word | |
answer = crack_by_permutation(tester, len, subset, freqenency); | |
/* 2. using the dictionary words. | |
- filter the words with length O(N) | |
- match the words one by one O(N) | |
*/ | |
return answer; | |
} | |
int main() | |
{ | |
string word = "beautiful"; | |
Tester t(word); | |
cout << " answer = " << crackPassword(t) << endl; | |
return 1; | |
} |
Practice statistics
40:00 to write up the code and debug the logical flaw
40:00 to write up the code and debug the logical flaw
More advanced approach O(N)
This approach still takes advantage of previous solution finding the length of the target word and the characters consisting of it.
Given that we found the length of the word, N and set of chars, CHARS.
With that information, it takes the following steps:
- create empty array, Result with length of N filled with None
- iterate for characters in CHARS, for each char, C and position, I
- create a guess word, G filled with C where Result[i] is None.
- compare G with the target word and store the matching char count, M
- increment I by 1 to get the next char C′.
- for each positions, J in N:
- replace G[J] with C′
- compare G again and store a new matching char count, M′.
- if M′ > M, C′ is in the right position J. Set Result[i] to C′
- if M′ < M, old value, C was in the correct position.
- Set Result[J] to C and G[J] to C (restore old value)
Time complexity
In the algorithm above, the outer for loop runs up to 26 times in the worst case and inner for loop runs N times, where N is the length of the target word.
Therefore, running time is close to O(N).
Here is the solution in python.
Practice statistics
60:07: did not come up with the complete algorithm
26:39: to write up the complete code
15:43: fixing the bugs. There was one logic bug. alphabet string had a duplicate characters. (- -)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class GuessWord: | |
def __init__(self): | |
self.alphabet = 'abcdefghijklmnopqrstuvwxyz' | |
def guess(self, target): | |
self.target = target | |
found = self.find_target_length() | |
length = found[0] | |
target_chars = found[1] | |
result = [None for i in range(length)] | |
pos = 0 | |
while pos < len(target_chars): | |
c = target_chars[pos] | |
guess = self.fill_guess_word(result, c) | |
matched = self.check_guess(guess) | |
pos += 1 | |
next_char = target_chars[pos] | |
for i in range(length): | |
old_char = guess[i] | |
next_guess = self.replace_char(guess, next_char, i) | |
new_matched = self.check_guess(next_guess) | |
if new_matched > matched: | |
#new char is in a correct position | |
result[i] = next_char | |
elif new_matched < matched: | |
#old char was in the correct postion | |
result[i] = old_char | |
next_guess = self.replace_char(guess, old_char, i) | |
pos += 1 | |
return "".join(result) | |
def fill_guess_word(self, partial_answer, c): | |
guess="" | |
for i in range(len(partial_answer)): | |
guess += c if partial_answer[i] == None else partial_answer[i] | |
return guess | |
def check_guess(self, guess): | |
count = 0 | |
for i in range(len(self.target)): | |
if guess[i] == self.target[i]: | |
count += 1 | |
return count | |
def is_same (self, guess): | |
return self.target == guess | |
def find_target_length(self): | |
length = 0 | |
target_chars = {} | |
MAX_LENGTH = 100 | |
for c in self.alphabet: | |
guess = c * MAX_LENGTH | |
count = self.check_guess(guess) | |
if count > 0: | |
target_chars[c] = count | |
length += count | |
return [length, target_chars.keys()] | |
def replace_char(self, string, c, pos): | |
return string[:pos] + c + string[pos+1:] | |
guesser = GuessWord() | |
input ="beautiful" | |
print("input = {} output = {}".format(input, guesser.guess(input))) | |
input ='pneumonoultramicroscopicsilicovolcanoconiosis' | |
print("input = {} output = {}".format(input, guesser.guess(input))) |
Practice statistics
60:07: did not come up with the complete algorithm
26:39: to write up the complete code
15:43: fixing the bugs. There was one logic bug. alphabet string had a duplicate characters. (- -)
Comments
Post a Comment