Write serialize and deserialize functions for an array of strings

July 21, 2019

Problem

Write 2 functions to serialize and deserialize an array of strings. strings can contain any Unicode character.
Do not worry about string overflow.

input = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']

output = serialize(input)

deserialize(output) = ['abdcd', '4agasd-dsfafdas', 'hi there I love you']

Basically, you need to decide how you want to encode your serialize messages so that you can deserialize it later.

For simplicity, I decided to encode the messages as below:

[meta_data_length]>[meta_data with ',' delimiter][concatenated strings]

For simplicity, I use a special delimiter '-'. However, we can avoid using it if we fix the first length field to a fixed size such as 64 bits. Again, for simplicity, we will assume '>' will not be used in the data.

To ensure the data contains '>', we will use html.escape() and html.unescape() to encode and decode '>' in the data. (UPDATE: 2022-06-13) The original code had a bug of now being able to deserialize the data correctly when the data contains the same delimiter, '>' in the string.

Once the serialization format is defined, we can write two methods, according to the serialization format.

Here is the working python code.

	def serialize(input):
	len_list=[]

	for i in input:
	len_list.append(str(len(i)))

	data = "".join(input)
	meta_data = ",".join(len_list)
	meta_len = len(meta_data)
	return "{}-{}{}".format(str(meta_len), meta_data, data)

	def deserialize(input):
	tokens = input.split("-")
	meta_len = int(tokens[0])
	meta_and_data = tokens[1]
	meta_data = meta_and_data[:meta_len]
	data = meta_and_data[meta_len:]
	len_list = [int(i) for i in meta_data.split(',')]

	#extract data
	result = []
	s = 0
	for l in len_list:
	result.append(data[s:s+l])
	s += l
	return result

	input = ['abdcd', '4agasddsfafdas', 'hi there I love you']
	serialized = serialize(input)
	print ("input = {} after serialization = {}".format(input, serialized))
	print ("after deserialization = {}".format(deserialize(serialized)))

view raw serial_deserialization_of_array_of_strings.py hosted with ❤ by GitHub

Practice statistics:

15:00: to write up the code

8:00: to fix the logical error. Had to debug the code by executing it. end value for reading data was calculated incorrectly. It should be s+l instead of l itself.

UPDATE(2022-06-13): Solved the problem again. Had to spend time figuring out how to avoid the deserialization failure when the data string contains the delimiter for meta_data_length separation.

After trying several things, I decided to escape the delimiter with html.escape().

	'''Write 2 functions to serialize and deserialize an array of strings. strings can contain any unicode character.
	Do not worry about string overflow.

	input = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you']

	output = serialize(input)

	deserialize(output) = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you']'''

	import html

	def serialize(input):

	input_lens=[]
	escaped_data = []

	for i in input:
	escaped = html.escape(i)
	input_lens.append(str(len(escaped)))
	escaped_data.append(escaped)

	meta_data = ",".join(input_lens)
	meta_data_len = len(meta_data)
	data = "".join(escaped_data)

	return "{}>{}{}".format(meta_data_len, meta_data, data)

	def deserialize(serialized):
	meta_data_len = int(serialized.split('>')[0])
	rest = serialized.split('>')[1]
	meta_data = rest[:meta_data_len]
	data = rest[meta_data_len:]
	data_len = meta_data.split(',')

	output = []
	start = 0
	for length in data_len:

	end = start + int(length)
	token = data[start: end]
	output.append(html.unescape(token))
	start = end

	return output

	# test

	input = ['abdcd', '4agasd-dsf>afdas', 'hi there I love you']

	output = serialize(input)
	print ("serialized = {}".format(output))

	print("deserialized = {}".format(deserialize(output)))

view raw serialize_deserialize.py hosted with ❤ by GitHub

Search This Blog

Peter's CodeCrushing

Write serialize and deserialize functions for an array of strings

Comments

Post a Comment

Popular posts from this blog

Planting flowers with no adjacent flower plots

Find the shorted path from the vertex 0 for given list of vertices.