Intro

This blog post is a continuation of the Frontend Interview Cheatsheet. If you haven’t seen that yet, check it out first πŸ‘‡

The list here will give you a fine grain introduction to System Design Interview concepts that you may need to master. Hope you will enjoy the reading and don’t forget to follow me on social media: medium and twitter

If this article collects 1000 claps πŸ‘, I will post the System Design Interview Template that will give you an easy time for interview preparation

Client-server model

  • DNS β€” Domain Name System redirects the domain name to the IP address. The client sends a query to DNS and returns IP address;
  • IP address β€” numeric address for each machine connected to the internet; a.b.c.d (0–255). 192.0.0.1 β€” localhost. 192.169..c.d
  • Port β€” listening to multiple processes without colliding. 0–65525. 2¹⁢. 0–1023 ports are taken by the system (22: Secure Shell, 53: DNS lookup; 80: HTTP; 443: HTTPS)

Network Protocols

  • IP β€” Internet protocol β€” rules to communicate between machines on the internet. Using IP Packets
  • TCP β€” Transmission Control Protocol β€” guarantee packets are delivered in order, will know if the failure happened. By additional TCP header (order information), build on top of IP
  • IP Packet β€” minimal data unit to communicate between machines. Consist of the header (source, recipient, size of the packet, version of protocol IPv4) and payload. Size from packet ²¹⁢ bytes. Can not guarantee all packets are received and ordered HTTP -> TCP -> IP

Storage

  • Database β€” servers that store data in disk or memory. Record and query;
  • Disk β€” stores and keeps data if the process dies. HDD β€” slower and cheaper. SSD β€” faster and more expensive, non-volatile;
  • Memory β€” RAM fast access, volatile, erase data when the process dies;
  • Persistent data β€” keeps data if the process dies (outage);

Latency and Throughput

  • Latency β€” time requires to complete the operation (ms):

1) Reading 1 Mb from RAM β€” 0.25 ms;

2) Reading 1 Mb from SSD β€” 1 ms;

3) Transfer 1 Mb through the network (1 Gb/s) β€” 10 ms;

4) Reading 1 Mb from HDD β€” 20 ms;

5) Transfer 1 Mb intercontinental round trip β€” 150 ms;

  • Throughput β€” the number of the operating systems can handle per sec (RPS, QPS)

Availability β€” reliability

  • Availability β€” Access of the service in point of time (per year); 1) 99% (87.8 hours); 2) 99.9% (8.8 hours); 3) 99.99% (52.6 min) 4) 99.999% (5.3 min) (5 nines);
  • Redundancy β€” replication of the system to increase availability (make sure you don’t have a single point of failure β€” use redundancy). Passive redundancy (2 machines if one dies other works), active redundancy (if one dies another takes control of the function)
  • SLA (service β€” level β€” agreement) guarantees of the service, consists of multiple SLO
  • SLO (service β€” level β€” objective) guarantees of the service (availability)

Cache

  • What can be cached β€” Client, Server, In Between (Server-Database ), API, Hardware (CPU). Operations that used frequently, heavy operation/computation;
  • Cache β€” Hardware or software that stores the data to provide faster access (the result of the heavy computation, API responses, API requests);
  • Write through cache β€” when the cache is updated in cache and memory synchronously, recover data if the process dies. Pros β€” simpler and reliable; cons β€” better works when not so many writes in cache;
  • Write back cache β€” when cache updated in cache and memory asynchronously;
  • If we care of staleness of cache (accuracy of data) -> we use local cache; if accuracy is important (views, likes) β€” use third party cache (Redis)
  • Cache Hitβ€” find requested data in the cache
  • Cache miss β€” did not find stored cache because of pure design
  • Cache eviction policy β€” rules when cached will be removed (LRU (recent), LFU (frequent), FIFO (queue))
  • Content Delivery Network (CDN) β€” third party service that caches your server (based on region), always better latency. PoP (Point of Presence). Cloudflare, Google Cloud CDN.

Proxy β€” security, reliability

  • Forward proxy β€” client-side (hide IP) VPN;
  • Reverse proxy β€” server interceptor (load balancing, leader election) (logging, caching, filter requests, load balancing, security shield);
  • Proxy tool β€” Nginx;

Load Balancer β€” security, reliability, performance

  • Distribute traffic across the servers. Multiple load balancers with different selection strategies;
  • Server selection strategy β€” (weight-if some servers more powerful) round-robin, random selection, performance-based, IP based, API path based;
  • Hot-spot β€” distribute too much traffic to one machine, reason suboptimal sharding key/hashing function
  • LB Library β€” Nginx;

Hashing

  • Hashing β€” transforming a piece of data to hash β€” usually integer;
  • Consistent hashing β€” minimize the number of keys to be remapped if the hash table is resized. (load balancer during adding or removing server) (Rendezvous β€” highest random hash);
  • SHA β€” secure hash algorithm, the collection of hash functions. Popular SHA-3.

Database

  • ACID β€” database (of database transaction): ATOMICITY β€” all operations in a transaction are all successful or failed, not an intermediate state; CONSISTENCY β€” each transaction guarantees that data is in the updated state (strong consistency), (eventual consistency) will be updated after some time; ISOLATION β€” execution of multiple transactions will be the same as they executed sequentially; DURABILITY β€” transactions are in the non-volatile state, they can not crash
  • Relational database β€” database organized in tables, rows, and columns, and the relation between them.
  • Nonrelational database β€” NOT tabular, specially organized for specific purposes.
  • SQL β€” structured query language;
  • SQL database β€” a relational database that supports SQL;
  • NoSQL β€” a database that does not support SQL
  • Database index β€” a data structure that improves the data retrieval, and slows the data write (because create an index for columns);
  • Strong consistency β€” relates to ACID. (every transaction guarantee updated data) β€” PostgreSQL;
  • Eventual consistency β€” (transaction will eventually (later) know the updated data);

Key-value store

  • NoSQL database used for caching and dynamic configuration (Etcd β€” strong consistency, used for leader election), (Redis β€” caching, rate limiter), Zookeeper (strong consistency, leader election, highly available);

Specialized storage paradigms

  • Blob Storage β€” key-value storage of Blob (binary large object) usually media: images, audio, video (Google Cloud Storage, Amazon S3);
  • Time Series Database β€” store and analyze time-indexed data, created continuously (logging, internet of things, stocks ): Prometheus, InfluxDB, Graphite;
  • Graph Database β€” when data has a deep relationship connection (social network) (Neo4j)
  • Cypher β€” graph query language created by Neo4j, simpler than SQL
  • Spatial database β€” location, map; uses quadtree for fast query (the region/location); (hotel, map)

Replication and sharding

  • Replication β€” duplication of data from one server to another; (reduce single point of failure, reduce latency), main database and replica have to be updated (sync, async);
  • Sharding β€” data partitioning (when the database is huge), splitting the database into shards to increase throughput. (based on client region, based on the type of data (payment), based on the hash of column). Can have Hot Spot if suboptimal sharding strategy

Leader Election β€” reliability

  • The process when servers elect a leader server that performs primary operations (system important operations). And other nodes know who is the leader also after reelection;
  • Consensus algorithms β€” choose the leader and synchronize all servers Paxos, Raft. Etcd and Zookeeper β€” key-value databases that implement leader election

Peer to Peer

  • Machines divide the workload, increase system throughput. (file distribution)
  • Gossip Protocol β€” the process of communication between machines to share the workload. Split the file into chunks of data (blobs), create a hash map to inform which peer owns what chunk. Distributed Hash Table. Kraken

Polling and Streaming

  • Use cases: Chat, stock, temperature
  • Polling β€” fetching the data with interval (API);
  • Stream β€” create a connection that continuously retrieves (listening) the data (API) Web Socket

Configuration

  • JSON or YAML
  • Can be static β€” constant, pros constant hard to break system, cons β€” require rebuilding
  • Dynamic (stored in KV β€” Redis) pros β€” flexible, fast to apply, cons β€” can break system easily if no test

Rate limiting β€” security, reliability

  • Reduce spamming system, avoid DoS and DDoS (Distributed) Denial of Service attack;
  • Implemented by the cache, hard to use server cache because Load Balancer will not know how to redirect;
  • Use external key-value databases (Redis) in reverse proxy (Load Balancer);
  • Rate limiting rules can be simple or more complex (base on tiers) depending on the system;

Logging β€” reliability, growing

  • Logging β€” collecting info from events. All the events will log in STDOUT, STDERR. Save them to Time-series database: InfluxDB, Prometheus
  • Monitor β€” visualizing with Grafana;
  • Alert β€” If the monitoring system has a spike, create the alert, for example, post in slack;
  • Logging is important during system growing

Pub/sub pattern β€” Stream

  • Pub/Sub model;
  • Use cases: chat messages, a news feed, stock prices, notification;
  • Guarantees: at-least-one-delivery, persistent storage, ordering, replayability;
  • Publisher Server (P) β€” Topic Channel (T) β€” Message β€” Subscriber (P) -> (T) β€” m1 β€” m2 -> (S)
  • Idempotent operation β€” have the same result regardless of how many times it called;
  • Apache Kafka β€” streaming system by LinkedIn, Google Cloud Sub/Pub guarantee at-least-one-delivery;

MapReduce β€” Scalability Big Data

  • File System β€” abstraction of organizing data (hierarchy, tree β€” Unix file system). Distributed File System β€” the same but split data between machines β€” Google File System, Hadoop Distributed File System;
  • MapReduce β€” distribute data: fast, efficient, and fail-tolerant; Dataset split between multiple machines -> (Map) for each chunk to key:val -> (Shuffle) reorganize -> (Reduce) transform to more meaningful data;
  • Prerequisite β€” we have distributed file system, we have dataset split into chunks, divided between multiple machines, have central control (know what is going on and communicate with map/reduce workers); servers send mapped data;
  • If a failure occurs, central control rerun map/reduce (idempotent operation)

Security

  • Man in the middle attack β€” intercept and mutate private IP packet from client to server. Encryption and HTTPS protection;
  • Symmetric Encryption β€” Use one key to encrypt and decrypt data. Faster than Asymmetric. Its algorithms are part of the Advanced Encryption Standard (AES);
  • Advanced Encryption Standard β€” standard symmetric encryption algorithms (AES β€” 128, AES β€” 192, AES β€” 256);
  • Asymmetric Encryption β€” public and private key to encrypt and decrypt. Data encrypted with a public key (can be shared), maybe only decrypted with a private key (need to be secure). Slower than symmetric;
  • HTTPS β€” secure connection. Requires server to have SSL Certificate and uses TLS to communicate (encrypt data) between server-clients;
  • TLS β€” transport layer security protocol built on top of TCP;
  • SSL Certificate β€” server was granted a digital certificate by Certificate Authority. Contains servers public key. To establish TLS handshake in HTTPS connection;
  • Certificate Authority β€” an entity that is verifying the certificate source of public key;
  • TLS Handshake β€” establishing a secure connection between client and server. The client sends a string of random bytes (client hello) -> server sends another string of random bytes (server hello) + SSL Certificate with public key -> client verifies the certificate with Certificate Authority and sends premaster secret encrypted with a public key string of random bytes -> client and server use client hello, server hello and premaster secret to generate session key (using symmetric encryption), and encrypt all the data during communication;

--

--

Writer for

πŸ”₯ SR Frontend Dev from Amazon πŸ›’ Interviews 🎁 Web 3.0 πŸ”¬ Micro Frontend πŸ…°οΈ Angular πŸ“Š d3 βš›οΈ React 🧐 Monorepo