Write serialize and deserialize functions for an array of strings
Problem
Write 2 functions to serialize and deserialize an array of strings. strings can contain any Unicode character.
Do not worry about string overflow.
input = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']
output = serialize(input)
deserialize(output) = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']
Basically, you need to decide how you want to encode your serialize messages so that you can deserialize it later.
For simplicity, I decided to encode the messages as below:
Write 2 functions to serialize and deserialize an array of strings. strings can contain any Unicode character.
Do not worry about string overflow.
input = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']
output = serialize(input)
deserialize(output) = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']
Basically, you need to decide how you want to encode your serialize messages so that you can deserialize it later.
For simplicity, I decided to encode the messages as below:
[meta_data_length]>[meta_data with ',' delimiter][concatenated strings]
For simplicity, I use a special delimiter '-'. However, we can avoid using it if we fix the first length field to a fixed size such as 64 bits. Again, for simplicity, we will assume '>' will not be used in the data.
To ensure the data contains '>', we will use html.escape() and html.unescape() to encode and decode '>' in the data. (UPDATE: 2022-06-13) The original code had a bug of now being able to deserialize the data correctly when the data contains the same delimiter, '>' in the string.
Once the serialization format is defined, we can write two methods, according to the serialization format.
Here is the working python code.
Practice statistics:
15:00: to write up the code
8:00: to fix the logical error. Had to debug the code by executing it. end value for reading data was calculated incorrectly. It should be s+l instead of l itself.
Once the serialization format is defined, we can write two methods, according to the serialization format.
Here is the working python code.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def serialize(input): | |
len_list=[] | |
for i in input: | |
len_list.append(str(len(i))) | |
data = "".join(input) | |
meta_data = ",".join(len_list) | |
meta_len = len(meta_data) | |
return "{}-{}{}".format(str(meta_len), meta_data, data) | |
def deserialize(input): | |
tokens = input.split("-") | |
meta_len = int(tokens[0]) | |
meta_and_data = tokens[1] | |
meta_data = meta_and_data[:meta_len] | |
data = meta_and_data[meta_len:] | |
len_list = [int(i) for i in meta_data.split(',')] | |
#extract data | |
result = [] | |
s = 0 | |
for l in len_list: | |
result.append(data[s:s+l]) | |
s += l | |
return result | |
input = ['abdcd', '4agasddsfafdas', 'hi there I love you'] | |
serialized = serialize(input) | |
print ("input = {} after serialization = {}".format(input, serialized)) | |
print ("after deserialization = {}".format(deserialize(serialized))) |
Practice statistics:
15:00: to write up the code
8:00: to fix the logical error. Had to debug the code by executing it. end value for reading data was calculated incorrectly. It should be s+l instead of l itself.
UPDATE(2022-06-13): Solved the problem again. Had to spend time figuring out how to avoid the deserialization failure when the data string contains the delimiter for meta_data_length separation.
After trying several things, I decided to escape the delimiter with html.escape().
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'''Write 2 functions to serialize and deserialize an array of strings. strings can contain any unicode character. | |
Do not worry about string overflow. | |
input = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you'] | |
output = serialize(input) | |
deserialize(output) = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you']''' | |
import html | |
def serialize(input): | |
input_lens=[] | |
escaped_data = [] | |
for i in input: | |
escaped = html.escape(i) | |
input_lens.append(str(len(escaped))) | |
escaped_data.append(escaped) | |
meta_data = ",".join(input_lens) | |
meta_data_len = len(meta_data) | |
data = "".join(escaped_data) | |
return "{}>{}{}".format(meta_data_len, meta_data, data) | |
def deserialize(serialized): | |
meta_data_len = int(serialized.split('>')[0]) | |
rest = serialized.split('>')[1] | |
meta_data = rest[:meta_data_len] | |
data = rest[meta_data_len:] | |
data_len = meta_data.split(',') | |
output = [] | |
start = 0 | |
for length in data_len: | |
end = start + int(length) | |
token = data[start: end] | |
output.append(html.unescape(token)) | |
start = end | |
return output | |
# test | |
input = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you'] | |
output = serialize(input) | |
print ("serialized = {}".format(output)) | |
print("deserialized = {}".format(deserialize(output))) |
Comments
Post a Comment