Design the service providing the end of day stock price

Problem

  Imagine you are building some sort of service that will be called by up to 1000 client applications to get simple end of day stock price information (open, close, high, low). You may assume that you already have the data, and you can store it in any format you wish. How would you design the client-facing service which provides the information to the client applications?

  You are responsible for the development, rollout, and ongoing monitoring and maintenance of the feed. Describe the different methods you considered and why you would recommend your approach. Your service and use any technologies you wish, and can distribute the information to the client applications in any mechanism you choose.


This is the first post on the system design. I often came across the system design questions during the interview but never prepare for these questions thoroughly before.

It might be a good idea to prepare for these questions in advance. 
In designing the the system, first things to understand is what are the requirements of the system. 
There can be two types of the requirements: Functional and non-functional (or Quality Attributes).
We will examine the questions from these two perspectives in designing the system.

Functional requirements

In this question, the system provide one simple service, returning the stock price information upon the request from the client. It would be worth asking "How does the client indicate the specific stock information? Is it by the name of the company or the symbols of the stock.

Simplest way is to request the information by stock symbol. However, it would be also useful to allow the client to use the company name to request stock information as well.

Let say we will support both a stock symbol and the company name. (e.g. Google and GOOG)

Now, let look at the stock information. Our system should provide four values: open, close, high, low.
It is quite small data. It would be a good idea to just map both the company name and stock symbol to the stock information. One format will be keeping the data in Jason format as below:

{
  "Open": 728.29,
  "High" :  728.29,
  "Low" : 728.29,
  "Close":728.29
}

This is possible because we don't provide querying by values of the stock information. We always returns these four values as a set. 

Quality Attributes (a.k.a Non-functional requirements)

I will use the term "Quality Attributes" to refer to Non-functional requirements. Quality attributes are the ones related to the quality of the system such as performance, availability, reliability etc.

It is our job to determine which quality attributes are relevant to the question.

Performance

As stated in the question, the service should serve up to 1000 clients. What does this mean? We should expect our service should be able to response to 1000 concurrent requests within the agreed response time. (Say, 1 second).

Can traditional SQL store be a good fit? As stated earlier, we don't need rich searching capabilities.
We may want to go with the No-SQL stores such as Redis or Memcached. From a personal experience, Redis can provide both low response time and persistency. 

Putting the web service in front of the redis server, the first design is now completed.

Figure 1. Initial design
Redis can easily handle over 80000 requests per second and any web server can handle around 3000 requests per second. It seems simple and easy to understand. Is it enough?

Not quite yet. We have two potential "Single point of failure" problems: Web server and Redis.

If any of two components fails, our stock service will stop.

Design in Figure 2 address the single point of failure problems.

Figure 2. Better design with redundancy
 In this new design, we put a reverse proxy/ load balancing server in front of the web servers. Nginx is one good candidate. We can also use the dedicated switch to do so. You can put a backup nginx server to cope with the failure of the Nginx server.

This way, the requests from 1000 clients can be distributed across multiple web servers. 

Secondly, Redis replica can improve the redundancy of the Redis store. It will make sure the data in redis is duplicate across multiple Redis servers.

Updated design using commercial cloud service [2019]

For faster implementation and low cost, we can use existing cloud service to make the scaling and deployment easier.


Assumption:

 - service provide the stock information for a given stock symbol or keywords
 - service does not support search by data range and symbol for simplicity


If so, a simple key-value store can serve the purpose to handle 1000 TPS. The size of data will be relatively small I can store as simple CVS format in the storage or json format for more sophisticated search.


 Possible storage options:


 - Memcache
 - Redis

 If memcache is used, we need a durable backend storage. However, memcache is more efficient than Redis in that it is multi-processed while Redis is a single processed.

If deployed to local environment, it does not matter.

Dynamodb can provide a durable and scalable storage option for our stock data.

Having a stock symbol as as primary key and company name as a global secondary key will make searching by either stock symbol or company name easier.

Caching

it is very good idea to cache a hot stock information in memory. Memcache can serve as caching layer.

Redis can also be an alternatives but Memcache can be easier and little faster given that we will have a separate storage layer.

Business logic layer

business logic will be simple, upon the request search

- the business logic check the caching layer first and search backend storage if not present.
- after retrieving the key, store the stock information in the cache with TTL.
- less recently used item will be evicted

Deployment


I would choose existing cloud service to hold my app.

For easier scaling and deployment, I will use load balancer in front of nginx fleet for easier scaling.

Elastic Load Balancer can be used as a load balancer and putting Nginx fleets in autoscaling group for easier scaling.

Code deploy can be used to deploy a new business logic to EC2 instances and cloud formation will make the structural change of my deployment.

Elasticache with memcache backend can be used. It supports easy scaling up/down without breaking hashing order.

Optionally, we can use Route53 to use a single URL for our service.

Overall architecture

Given the choice of cloud services, my final architecture will be as blow:

[TODO: add a better diagram]

[ELB] - [NginX in EC2] - [Elasticache] - [Dynamodb]

[Cloud formation] and [Code deploy] to deploy new logic or structural change

Comments

Popular posts from this blog

Stock price processing problem

Find the maximum number of bomb that can be detonated