Networking Basics for System Design: A Complete Beginner's Guide

Author Photo
Networking Basics for System Design: A Complete Beginner's Guide

Every internet application starts with a simple question: how do two programs talk to each other? This post covers five essential networking topics every system designer must know. You will start with Client & Server โ€” the foundation of every web and mobile application. Then you will learn how machines find and reach each other through IP Address & DNS โ€” the addressing and naming system of the internet. From there you will explore HTTP / HTTPS, the communication protocol that clients and servers use to exchange data, followed by TCP vs UDP, the two transport protocols that trade reliability against speed. Finally, you will understand Latency & Throughput โ€” the two performance metrics that determine how fast and how much your system can handle. Each topic is explained with real-world analogies, step-by-step examples, and clear diagrams โ€” starting from zero.

๐ŸŒ 1. Client & Server

The client-server model is where every system design journey begins. Before you can understand load balancers, databases, CDNs, or APIs, you need to understand this one idea: a client asks, a server answers. In this section, we will build that understanding from the ground up โ€” with analogies, step-by-step examples, and clear diagrams โ€” so that every concept that follows makes intuitive sense.

Client and Server โ€” how clients send requests and servers send responses

1.1 ๐ŸŽฏ Introduction

Imagine you type netflix.com into your browser. Within milliseconds, your browser has contacted Netflix's servers, authenticated your account, fetched a personalised list of movies, and started streaming a video โ€” all without you doing anything beyond pressing Enter. That entire sequence is the client-server model in action.

Your browser is the client โ€” the program that asks for something. Netflix's backend systems are the servers โ€” the programs that listen for requests and send back replies. Every time you open a website, send a WhatsApp message, order on Amazon, or call an Uber, this exact exchange is happening behind the scenes.

1.2 ๐Ÿ’ก Why It Matters

Every single system design problem starts with the same question: how does the user's device communicate with the backend? Whether you are designing Instagram (2 billion users), Uber (150 million users), or a simple URL shortener, the answer always begins with the client-server model. Everything else โ€” load balancers, databases, caches, CDNs โ€” exists to make this basic exchange faster, more reliable, and capable of handling millions of users at once.

  • Without understanding clients and servers, you cannot reason about how requests reach your system.
  • Load balancers only make sense when you understand that many clients send requests to multiple servers.
  • CDNs only make sense when you understand that static files are served from servers closer to the client.
  • Microservices only make sense when you understand that a server can itself be a client to another server.
Foundation first: In system design, strong answers always start with the simple request-response path and then evolve. Never jump straight to "Kafka, Redis, sharding, microservices" โ€” always start with "a client sends a request to a server."

1.3 ๐Ÿ  Real-world Analogy

Think of a restaurant. You walk in, sit at a table, and look at the menu. When you are ready, you call the waiter over and place your order. The waiter goes to the kitchen, the kitchen prepares your food, and the waiter brings it back to your table.

Restaurant WorldSoftware WorldRole
๐Ÿ‘ค Customer๐Ÿ’ป Client (browser / app)Asks for something
๐Ÿง‘โ€๐Ÿ’ผ Waiter๐ŸŒ API / Server interfaceReceives the request, coordinates work
๐Ÿ‘จโ€๐Ÿณ Kitchenโš™๏ธ Application serverRuns the actual business logic
๐Ÿ—„๏ธ Storage room๐Ÿ—ƒ๏ธ Database / storageKeeps data, files, and records
๐Ÿฝ๏ธ Meal served๐Ÿ“ฆ HTTP ResponseThe result sent back to the client

Notice a key point: you never go to the kitchen yourself. You send a request through the waiter, and the waiter brings back the result. This is exactly how a client and server communicate โ€” the client never directly touches the database or business logic; it only talks to the server.

1.4 ๐Ÿ“– Key Terms

TermSimple DefinitionQuick Example
ClientA program or device that initiates a requestYour browser, your mobile app, a CLI tool
ServerA program that listens for requests and sends responsesAmazon's backend, YouTube's API service
RequestThe message the client sends to ask for data or an action"Get me the homepage" / "Log me in"
ResponseThe server's reply โ€” either the requested data or an errorHTML page, JSON data, 404 Not Found
ProtocolAn agreed-upon set of rules for how two programs communicateHTTP, HTTPS, TCP, WebSocket
PortA number that identifies a specific service running on a serverPort 80 = HTTP, Port 443 = HTTPS, Port 5432 = PostgreSQL
IP AddressThe unique address of a device or server on a network142.250.80.14 (a Google server)
NetworkThe infrastructure connecting clients to serversThe internet, a company's private network
Remember: A server is a program, not a physical machine. Your laptop can run a server. One physical machine can run dozens of server programs at the same time on different ports.

1.5 ๐Ÿ”ข How It Works

Let us walk through exactly what happens when you type amazon.com in your browser and press Enter. This is the most important request path to understand in system design.

StepWhat Happens
โ‘  Type URLYour browser (the client) is ready to make a request. It needs to find where Amazon's server lives on the internet.
โ‘ก DNS LookupBrowser asks the DNS system: "What is the IP address of amazon.com?" DNS responds with something like 205.251.242.103. (DNS is covered fully in Section 3.)
โ‘ข ConnectBrowser opens a connection to that IP address on port 443 (HTTPS).
โ‘ฃ Send RequestBrowser sends an HTTP GET request: GET / HTTP/1.1 Host: amazon.com
โ‘ค Server ProcessesAmazon's server receives the request, runs business logic, and queries the database for product listings and your session data.
โ‘ฅ Send ResponseThe server builds an HTTP response containing the HTML for the Amazon homepage and sends it back through the internet.
โ‘ฆ Browser RendersYour browser receives the HTML and displays the Amazon page. Done โ€” typically in under 500 ms.
Key insight: The entire exchange above โ€” from Step 1 to Step 7 โ€” typically happens in under 500 milliseconds. For large-scale systems like Amazon, this same process happens for millions of users simultaneously, which is why concepts like load balancers, caches, and CDNs become necessary.

1.6 ๐Ÿ”€ Types & Variations

"Client" and "server" are roles, not fixed things. The same program can be a server to some callers and a client to others. Here are the most common types you will encounter in system design.

Types of Clients

๐ŸŒ

Web Browser

Chrome, Firefox, Safari โ€” renders HTML/CSS/JS from web servers. The most common client type.

๐Ÿ“ฑ

Mobile App

Instagram, WhatsApp, Uber โ€” calls APIs on backend servers over HTTPS to fetch and send data.

๐Ÿ–ฅ๏ธ

Desktop App

Slack, Spotify, VS Code โ€” connects to cloud servers in the background for data, sync, and updates.

๐Ÿค–

IoT Device

Smart thermostat, security camera โ€” sends sensor readings and receives commands from cloud servers.

โš™๏ธ

Server as Client

In microservices, every service calls other services. A Payment Service is a client when calling the Fraud Detection Service.

๐Ÿ’ป

CLI Tool

curl, wget โ€” makes HTTP requests directly from the command line. Used by developers and automation scripts.

Types of Servers

๐ŸŒ

Web Server

Nginx, Apache โ€” serves static files: HTML, CSS, JavaScript, images. Fast and simple.

โš™๏ธ

Application Server

Node.js, Django, Spring Boot โ€” runs business logic: login, payments, recommendations, order processing.

๐Ÿ—„๏ธ

Database Server

PostgreSQL, MySQL, MongoDB โ€” stores and retrieves structured application data persistently.

โšก

Cache Server

Redis, Memcached โ€” stores frequently accessed data in memory so the database doesn't need to be queried every time.

๐Ÿ“ฆ

File / Object Storage

Amazon S3, Google Cloud Storage โ€” stores large files: images, videos, backups, documents at massive scale.

โš–๏ธ

Load Balancer

AWS ELB, Nginx โ€” distributes incoming client requests across multiple servers to prevent any one from being overwhelmed.

The "server as client" pattern: In modern microservices architectures, almost every service acts as both a server (to the services that call it) and a client (to the services it calls). For example, Instagram's Feed Service is a server to the mobile app, but it's a client to the User Service, Media Service, and Recommendation Service.

1.7 ๐ŸŽจ Illustrated Diagram

The diagram below shows the core client-server request-response cycle โ€” a client sends a request, the server processes it and queries the database, and the response travels back. This is the fundamental pattern behind every internet application.

%%{init: {"theme": "base", "themeVariables": {"lineColor": "#64748b", "edgeLabelBackground": "#fff"}}}%% flowchart LR C["๐Ÿ’ป Client\n(Browser / App)"] S["โš™๏ธ Server\n(Business Logic)"] DB["๐Ÿ—„๏ธ Database\n(Data Storage)"] C -->|"โ‘  HTTP Request"| S S -->|"โ‘ก Query"| DB DB -->|"โ‘ข Data"| S S -->|"โ‘ฃ HTTP Response"| C style C fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style S fill:#d1fae5,stroke:#059669,color:#064e3b style DB fill:#fff3e0,stroke:#d97706,color:#92400e

Reading the diagram: The client sends an HTTP Request โ‘  to the server. The server queries โ‘ก the database for the data it needs, the database returns โ‘ข the result, and the server sends an HTTP Response โ‘ฃ back to the client. Every internet interaction follows this four-step cycle.

1.8 โœ… When to Use

The client-server model is the default choice for virtually every internet application. You should use it whenever you have a centralized resource to share, business logic to protect, or data that needs to be consistent across users.

Use client-server whenโ€ฆAvoid (consider P2P) whenโ€ฆ
You have shared data that many users need to accessYou need true decentralisation with no central authority (e.g. blockchain)
You want centralised access control and authenticationUsers need to share files directly with each other (BitTorrent-style)
You need to update business logic without touching clientsYou want to eliminate the server cost entirely
You need to scale the backend independently of clientsLow-latency real-time communication between two specific peers
You want to monitor, log, and secure all traffic centrallyYou require censorship resistance by design
Rule of thumb: If you are designing any application for users โ€” social media, e-commerce, banking, streaming, messaging โ€” use client-server. If you are designing a decentralised protocol or file-sharing network, consider Peer-to-Peer. In practice, 99% of system design problems use client-server.

1.9 ๐Ÿ—๏ธ Real-world Example โ€” Instagram

When you open Instagram on your phone and scroll through your feed, here is what happens behind the scenes:

StepActorWhat Happens
โ‘ ๐Ÿ“ฑ Your Phone (Client)Sends GET /feed?user_id=123&page=1 to Instagram
โ‘กโš–๏ธ Load BalancerReceives the request and routes it to one of many available API servers
โ‘ขโš™๏ธ API ServerChecks who you follow, runs the ranking algorithm, decides which posts to show
โ‘ฃโšก Cache ServerChecked first โ€” if your feed was recently built, it's returned instantly from memory
โ‘ค๐Ÿ—„๏ธ Database ServerReturns post metadata (captions, like counts, timestamps) โ€” images are stored separately
โ‘ฅโš™๏ธ API ServerBuilds a JSON response with post data and image URLs, sends it back to your phone
โ‘ฆ๐Ÿ“ฑ Your Phone (Client)Receives JSON, makes separate requests to CDN servers to download the actual images
โ‘ง๐ŸŒ CDN ServerDelivers image files from the edge location nearest to you โ€” fast, low latency
Notice: Your phone (one client) communicated with five different server types โ€” Load Balancer, API Server, Cache Server, Database Server, and CDN Server โ€” all within a single feed load. This is how real large-scale systems work: many specialised servers working together to serve one client request.
New terms above? Load Balancer, Cache Server, and CDN will each get their own dedicated post in Phase 2 of this series. For now, just notice that a single client request touches multiple server types โ€” that is the key insight from this example.

1.10 โš–๏ธ Trade-offs

โœ… AdvantagesโŒ Disadvantages
Centralised control โ€” update the server and all clients get the update instantly, no app store releases neededSingle point of failure โ€” if the server goes down, no client can work; requires redundancy and high-availability design
Security โ€” sensitive business logic, API keys, and data stay on the server; clients never see internalsServer cost โ€” running servers 24/7 at scale is expensive; requires infrastructure investment
Scalability โ€” add more servers to handle more clients without changing client codeNetwork dependency โ€” clients need a working internet connection; offline mode requires extra engineering
Consistency โ€” all clients read from the same data source, so everyone sees the same informationLatency โ€” every action requires a network round-trip to the server; cannot be fully instant
Maintainability โ€” bugs are fixed in one place (server), not in millions of client devicesBottleneck risk โ€” a poorly designed server becomes a bottleneck under high traffic

1.11 ๐Ÿšซ Common Mistakes

#โŒ Common Mistakeโœ… The Reality
1 Server = physical machine A server is a program, not a box. You can run a web server on your laptop right now. One physical machine can run dozens of server programs simultaneously on different ports.
2 A server can never be a client In microservices, services constantly switch roles. The Payment Service is a server to the frontend but a client to the Fraud Detection Service. Roles are relative, not fixed identities.
3 Web server = application server A web server (Nginx, Apache) serves static files. An application server (Node.js, Django) runs business logic. Most production systems have both doing different jobs.
4 One server handles all requests Large systems like Instagram run on thousands of servers across multiple data centres. Designing for a single server is the most common beginner mistake in system design.
5 The client sees the server's internals Clients only know the server's address and protocol. All internal logic โ€” databases, services, business rules โ€” is hidden. This is called encapsulation and is a security best practice.
6 Start with complex architecture Always start with the simple path: Client โ†’ Server โ†’ Database. Add load balancers, caches, and CDNs only when a specific problem justifies the complexity.

1.12 ๐Ÿ“ Summary

  • Client initiates, Server responds โ€” the client always makes the first move; the server waits and reacts.
  • A server is a program, not a physical machine โ€” it can run on any hardware, including your laptop.
  • A server can be a client โ€” in microservices, services call each other; roles are relative, not fixed.
  • Multiple server types work together โ€” a single user request typically touches several specialised servers: application server, database, and more.
  • Always start simple โ€” Client โ†’ Server โ†’ Database is the baseline; add complexity only when justified by scale or requirements.

1.13 ๐Ÿ‹๏ธ Design Challenge

๐Ÿ• Challenge: Design a food delivery app

You are designing a system like Uber Eats or DoorDash. Think through the following:
  • What are the different types of clients in your system? (Hint: there is more than one kind of user.)
  • What are the different types of servers you would need? List at least four.
  • Draw a simple diagram showing how a customer places an order โ€” trace the request from the customer's phone to the restaurant and back.
  • What happens if your main application server goes down while someone is placing an order?
๐Ÿ‘๏ธ Show Answer

Types of Clients (3 distinct roles):

  • ๐Ÿ“ฑ Customer app (iOS/Android) โ€” places orders, tracks delivery in real time
  • ๐Ÿ” Restaurant dashboard (tablet/web app) โ€” receives new orders, marks them as ready
  • ๐Ÿš— Driver app (mobile) โ€” receives delivery assignments, navigates to pickup and drop-off

Types of Servers needed:

  • โš™๏ธ API Server โ€” the main application server; handles all requests from all three client types
  • ๐Ÿ’ณ Payment Server โ€” processes card charges securely when an order is placed
  • ๐Ÿ”” Notification Server โ€” sends real-time alerts to the restaurant and driver apps
  • ๐Ÿ—„๏ธ Database Server โ€” stores users, restaurants, menus, orders, and delivery status

Request flow when a customer places an order:

  1. Customer app (client) โ†’ sends POST /orders request to the API Server
  2. API Server validates the order and writes it to the Database Server
  3. API Server calls the Payment Server to charge the customer's card
  4. API Server tells the Notification Server to alert the restaurant
  5. Notification Server pushes the order to the Restaurant app (client)
  6. Restaurant accepts โ†’ API Server updates order status in the Database
  7. API Server responds to the customer app: order confirmed โœ…

If the application server goes down:
Orders cannot be placed โ€” customers see an error. The fix is to run multiple application servers so if one fails, others continue handling requests. We will cover exactly how this works when we study Load Balancers in Phase 2.

1.14 โ˜๏ธ Cloud Service Mapping

In the cloud, a "server" is any service that receives and processes requests. The three main ways to run server code on any cloud platform are:

How to Run a ServerAWS (Primary)GCPAzure
Virtual machine โ€” full control over the server environmentAmazon EC2Compute EngineAzure VMs
Managed app hosting โ€” deploy your code, cloud manages the serverElastic Beanstalk / App RunnerApp Engine / Cloud RunAzure App Service
Serverless โ€” a function that acts as a server, runs only when calledAWS LambdaCloud FunctionsAzure Functions
Simplest AWS picture: A browser (client) sends a request โ†’ EC2 instance or Lambda function (server) receives and processes it โ†’ sends a response back. That is the client-server model running in the cloud.

๐ŸŒ 2. IP Address & DNS

Every device on the internet has a unique numeric address โ€” an IP address โ€” just like every house has a street address. But humans don't think in numbers. We use friendly names like youtube.com. DNS is the system that bridges this gap, translating the names we type into the addresses machines actually use. In this section you will learn what IP addresses are, how public and private addresses differ, how DNS resolves names step by step, and why both concepts are foundational to every system design decision you will make.

IP Address and DNS โ€” how domain names are translated to IP addresses

2.1 ๐ŸŽฏ Introduction

Imagine you type youtube.com into your browser. You know the name โ€” but your computer does not know where YouTube's servers are physically located on the internet. It needs a numeric address. An IP address is that numeric address: a unique identifier assigned to every device connected to a network, from your laptop to YouTube's servers.

But here is the challenge: IP addresses look like 142.250.80.14. No human is going to memorise that. So the internet uses a naming system called DNS โ€” Domain Name System โ€” that automatically translates youtube.com into 142.250.80.14 every time you press Enter. Without IP addresses, devices cannot communicate. Without DNS, humans cannot use the internet practically.

2.2 ๐Ÿ’ก Why It Matters

IP addresses and DNS are not optional infrastructure โ€” they are the foundation on which every internet system runs. Cloudflare's public DNS resolver (1.1.1.1) alone handles over 1 trillion DNS queries per month. Google's DNS (8.8.8.8) processes billions of queries daily. Every website visit, API call, and app request begins with a DNS lookup.

  • In system design, DNS is how traffic is routed to the right servers โ€” load balancers, CDN edge nodes, and multi-region endpoints all use DNS.
  • When you add a new server or replace a failed one, you update a DNS record โ€” not every client application.
  • Private vs public IP addressing determines what parts of your system are reachable from the internet โ€” a critical security decision.
  • DNS TTL directly controls how quickly your system can recover from failures and how smoothly you can migrate servers.
Key insight: DNS is where system design meets the internet. Every load balancer, CDN, and API gateway in this series is ultimately reached through a DNS record. Understanding DNS now means every future topic will make more sense.

2.3 ๐Ÿ  Real-world Analogy

Think of a city's postal system. Every building has a street address (the IP address) โ€” a precise numeric location that delivery services use to physically find it. But people don't walk around saying "I'm going to 221B Baker Street" โ€” they say "I'm going to Sherlock Holmes' house." The phonebook or directory is what translates that name into the actual address.

Real WorldInternet / SoftwareRole
๐Ÿ  Street address (221B Baker St)IP address (142.250.80.14)The actual numeric location machines use to connect
๐Ÿท๏ธ Person or place name (Sherlock's house)Domain name (youtube.com)The human-friendly name people remember and type
๐Ÿ“– Phonebook / directoryDNS (Domain Name System)Translates names into addresses automatically
๐Ÿ“ฌ Speed-dial / recent calls listDNS cache (browser/OS/resolver)Stores recently looked-up addresses for quick re-use

Just as you would look up a name in a phonebook to find the phone number, your browser looks up a domain name in DNS to find the IP address โ€” every single time, unless the answer is already cached.

2.4 ๐Ÿ“– Key Terms

TermSimple DefinitionQuick Example
IP AddressA unique numeric address identifying any device on a network142.250.80.14 (a YouTube server)
IPv44-part dotted format, supports ~4.3 billion addresses8.8.8.8 (Google DNS), 192.168.1.1 (home router)
IPv6128-bit hex format, virtually unlimited addresses2001:db8::7334
Public IPReachable from the internet โ€” your server's external addressLoad balancer, CDN, API gateway endpoint
Private IPInternal-only, not routable on the internet10.0.0.5 (database inside a VPC)
Domain NameHuman-readable name for a server or serviceyoutube.com, api.stripe.com
DNSDomain Name System โ€” the internet's distributed phonebookTranslates youtube.com โ†’ 142.250.80.14
DNS ResolverThe component that performs the full DNS lookup on a client's behalf8.8.8.8 (Google), 1.1.1.1 (Cloudflare)
DNS RecordA specific entry in the DNS system mapping a name to a valueA record, CNAME record, MX record
TTLTime To Live โ€” how long a DNS answer can be cached before it must be re-fetchedTTL = 300 means cache for 5 minutes
Authoritative DNSThe final DNS server that has the definitive answer for a domainYouTube's own nameservers have youtube.com records

2.5 ๐Ÿ”ข How It Works

Here is the exact sequence of events when you type youtube.com in your browser and press Enter. This process completes in milliseconds, but involves up to 9 steps behind the scenes.

StepWhat Happens
โ‘  Browser cacheBrowser checks if it already has a cached answer for youtube.com. If yes, use it immediately โ€” no DNS query needed.
โ‘ก OS cacheIf not in browser cache, the operating system checks its own DNS cache. If found, return it.
โ‘ข Ask DNS ResolverIf no cached answer, the OS asks the configured DNS Resolver (e.g. 8.8.8.8 or your ISP's resolver).
โ‘ฃ Resolver โ†’ Root DNSResolver asks a Root DNS server: "Who manages .com domains?" Root returns the address of the .com TLD servers.
โ‘ค Resolver โ†’ TLD DNSResolver asks the .com TLD server: "Who manages youtube.com?" TLD returns the address of YouTube's authoritative nameservers.
โ‘ฅ Resolver โ†’ Authoritative DNSResolver asks YouTube's own authoritative DNS: "What is the IP address of youtube.com?" Authoritative returns: 142.250.80.14 (TTL: 300s).
โ‘ฆ Resolver caches + respondsResolver caches the answer for 300 seconds, then returns the IP address to your browser.
โ‘ง Browser connectsBrowser now knows the IP address and opens a TCP connection to 142.250.80.14 on port 443 (HTTPS).
โ‘จ YouTube respondsYouTube's server receives the request and sends back the homepage HTML. You see YouTube.
Fast path: Steps โ‘ฃโ€“โ‘ฅ are skipped whenever a cached answer exists โ€” which is most of the time for popular domains. Caching is what makes DNS fast enough to be invisible to users.

2.6 ๐Ÿ”€ Types & Variations

Types of IP Addresses

4๏ธโƒฃ

IPv4

4 numbers separated by dots, each 0โ€“255. Example: 8.8.8.8. Supports ~4.3 billion addresses โ€” largely exhausted. Still the most widely used format today.

6๏ธโƒฃ

IPv6

128-bit hex format. Example: 2001:db8::7334. Supports 340 undecillion addresses โ€” effectively unlimited. Growing adoption for new infrastructure.

๐ŸŒ

Public IP

Assigned by your internet provider, visible on the internet. Every internet-facing entry point (load balancer, CDN, API gateway) needs one. Example: 203.0.113.5.

๐Ÿ”’

Private IP

Not routable on the internet. Used for internal services โ€” databases, caches, backend APIs. Common ranges: 10.x.x.x, 192.168.x.x, 172.16.x.x.

DNS Record Types

RecordWhat It DoesExample
AMaps a domain name to an IPv4 addressyoutube.com โ†’ 142.250.80.14
AAAAMaps a domain name to an IPv6 addressyoutube.com โ†’ IPv6 address
CNAMEMaps a domain name to another domain name (alias)www.example.com โ†’ example.com
MXSpecifies the mail server for a domain@example.com โ†’ mail.example.com
TXTStores text for verification or security policiesSPF, DKIM, domain ownership proof
NSSpecifies the authoritative nameservers for a domainDelegates DNS management to a provider

2.7 ๐ŸŽจ Illustrated Diagram

The diagram below shows the full DNS resolution journey โ€” from your browser typing a domain name to connecting to the actual server.

%%{init: {"theme": "base", "themeVariables": {"lineColor": "#64748b", "edgeLabelBackground": "#fff"}}}%% flowchart TD C["๐Ÿ’ป Browser\n(types youtube.com)"] Res["๐Ÿ” DNS Resolver\n(e.g. 8.8.8.8)"] Root["๐ŸŒ Root DNS\n(knows .com, .org, .net)"] TLD["๐Ÿ“‹ .com TLD DNS\n(knows youtube.com nameservers)"] Auth["๐Ÿ“Œ Authoritative DNS\n(YouTube's own nameservers)"] S["๐Ÿ–ฅ๏ธ YouTube Server\n(142.250.80.14)"] C -->|"โ‘  Query: youtube.com?"| Res Res -->|"โ‘ก Where is .com?"| Root Root -->|"โ‘ข Ask .com TLD servers"| Res Res -->|"โ‘ฃ Where is youtube.com?"| TLD TLD -->|"โ‘ค Ask YouTube nameservers"| Res Res -->|"โ‘ฅ IP of youtube.com?"| Auth Auth -->|"โ‘ฆ 142.250.80.14 (TTL 300s)"| Res Res -->|"โ‘ง Here's the IP"| C C -->|"โ‘จ Connect!"| S style C fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style Res fill:#fff3e0,stroke:#d97706,color:#92400e style Root fill:#f3e5f5,stroke:#8e24aa,color:#4a148c style TLD fill:#f3e5f5,stroke:#8e24aa,color:#4a148c style Auth fill:#f3e5f5,stroke:#8e24aa,color:#4a148c style S fill:#d1fae5,stroke:#059669,color:#064e3b

Reading the diagram: Your browser โ‘  asks the DNS Resolver for youtube.com. The Resolver doesn't know the answer, so it asks โ‘ก Root DNS, which points it to the .com TLD servers โ‘ขโ‘ฃ. The TLD points it to YouTube's own nameservers โ‘ค, which return โ‘ฅ the final IP address with a TTL of 300 seconds โ‘ฆ. The Resolver caches the answer and returns it โ‘ง. Your browser then connects directly to YouTube's server โ‘จ.

2.8 โœ… When to Use

ScenarioUse ThisWhy
Internet-facing entry points (load balancer, CDN, API gateway)Public IPExternal clients need to reach this endpoint over the internet
Internal services (database, cache, backend API)Private IPThese services should never be directly reachable from the internet โ€” security best practice
Stable services that rarely changeHigh TTL (3600s+)Reduces DNS query volume and improves response speed for users
Before a planned server migration or failover setupLow TTL (60โ€“300s)Changes propagate quickly โ€” users switch to the new IP within minutes instead of hours
New infrastructure (greenfield projects)IPv6 (with IPv4 fallback)Future-proof; IPv4 addresses are exhausted and increasingly expensive
Golden rule: In production systems, only your load balancers, CDNs, and API gateways have public IPs. Everything behind them โ€” databases, caches, internal services โ€” uses private IPs and is never exposed to the internet.

2.9 ๐Ÿ—๏ธ Real-world Example โ€” How Instagram Routes Global Traffic

When you open the Instagram app from Tokyo, here is exactly how DNS and IP addressing route your request to the nearest server:

StepActorWhat Happens
โ‘ ๐Ÿ“ฑ Instagram App (Client)Sends a DNS query: "What is the IP address of api.instagram.com?"
โ‘ก๐Ÿ” DNS ResolverAsks Instagram's authoritative DNS; sends the user's geographic location as a hint
โ‘ข๐Ÿ“Œ Instagram Authoritative DNSReturns the IP of Instagram's nearest CDN/edge server โ€” a Tokyo edge location, not a US server
โ‘ฃ๐Ÿ“ฑ Instagram AppConnects to the Tokyo edge IP (public IP). This edge server is internet-facing.
โ‘ค๐ŸŒ Tokyo Edge ServerForwards the request to Instagram's backend using internal private IPs (10.x.x.x) โ€” the backend is never exposed publicly
โ‘ฅโš™๏ธ Instagram BackendFetches feed data from databases (private IPs), builds a JSON response, returns it through the edge server back to your phone
New term above? Step โ‘ข uses GeoDNS โ€” DNS that returns different IPs based on where the user is located, routing them to the nearest data center. This will be covered in full when we reach Data Centers & Multi-Region in Phase 2.

2.10 โš–๏ธ Trade-offs

โœ… AdvantagesโŒ Disadvantages
IPv4: universally supported, simple 4-part notation, compatible with all existing toolsIPv4: address space exhausted โ€” ~4.3 billion total, prices rising, NAT workarounds add complexity
IPv6: virtually unlimited addresses, built-in security features, future-proofIPv6: slower ecosystem adoption, some older systems and tools don't fully support it
Public IP: directly reachable from anywhere โ€” easy for clients to connectPublic IP: exposed to the internet โ€” requires firewalls, DDoS protection, and regular security hardening
Private IP: hidden from internet โ€” secure by default, no direct exposurePrivate IP: not directly reachable externally โ€” requires NAT, VPN, or a gateway for external access
High TTL: fewer DNS queries, faster responses for users, lower DNS server loadHigh TTL: DNS changes propagate slowly โ€” a problem during migrations, incidents, or failovers
Low TTL: DNS changes take effect quickly โ€” good for dynamic systems and fast failoverLow TTL: more DNS queries per minute โ€” increases load on DNS infrastructure

2.11 ๐Ÿšซ Common Mistakes

#โŒ Common Mistakeโœ… The Reality
1DNS sends the website contentDNS only resolves names to IP addresses. It does not send any data, HTML, or API responses โ€” that is the server's job, after DNS has finished.
2Changing a DNS record is instantDNS changes can take minutes to hours to propagate globally depending on TTL. Old answers remain cached until their TTL expires.
3One domain = one IP addressProduction systems often have one domain pointing to dozens or hundreds of IPs โ€” CDN edge nodes, load balancer cluster IPs, regional endpoints.
4Private IP = secure IPPrivate IPs are just not internet-routable โ€” they still need firewall rules, encryption, and access controls. "Private" does not mean "automatically secure."
5192.168.x.x is a server's real IPThis is a private IP range used for internal networks. Internet-facing servers have public IPs. When you see 192.168.x.x it means you're looking at an internal address.

2.12 ๐Ÿ“ Summary

  • IP address is the unique numeric identifier of any device on a network โ€” machines use it to reach each other.
  • IPv4 (4.3B addresses, largely exhausted) vs IPv6 (virtually unlimited) โ€” new infrastructure should prefer IPv6.
  • Public IPs face the internet; private IPs are for internal communication and should never be exposed directly.
  • DNS translates human-readable domain names into IP addresses through a 4-level hierarchy: Resolver โ†’ Root โ†’ TLD โ†’ Authoritative.
  • TTL controls how long DNS answers are cached โ€” low TTL for fast changes, high TTL for fewer queries.
  • DNS records (A, CNAME, MX, TXT, NS) each serve a specific purpose โ€” A records map domains to IPs, CNAME creates aliases, MX handles email.

2.13 ๐Ÿ‹๏ธ Design Challenge

๐ŸŒ Challenge: Design a global web application

Your company is launching a web application with servers in 3 regions: US East, Europe (Frankfurt), and Asia Pacific (Tokyo). Answer the following:
  • European users should connect to Frankfurt servers, Asian users to Tokyo servers. How do you configure DNS to achieve this?
  • Your TTL is set to 86400 seconds (24 hours). Your primary server fails. How long before users fail over to the backup? What should you have done differently?
  • Your backend databases must never be reachable from the internet. How do you configure IP addressing to enforce this?
๐Ÿ‘๏ธ Show Answer

1. Route users to nearest region:
Use DNS-based geographic routing. Configure your DNS provider to return different IPs based on the user's location โ€” Frankfurt's load balancer IP for European users, Tokyo's IP for Asian users. AWS Route 53 offers latency-based and geolocation routing policies for exactly this.

2. The TTL problem:
With TTL = 86400 seconds, clients cache the old IP for up to 24 hours after your DNS record changes. During a server failure, those clients can't reach the new server until their cache expires โ€” meaning up to 24 hours of downtime for some users.

Fix: Always lower TTL to 60โ€“300 seconds before a planned migration. For emergency failover, use DNS health checks (e.g. Route 53 Health Checks) that automatically update DNS records when a server fails โ€” but these only propagate quickly if TTL is low.

3. Protect your databases:
Give all backend databases private IPs only (e.g. 10.0.0.5). Place them in a private subnet inside a VPC with no internet gateway attached. Only your application servers โ€” which have both a public IP and a private IP โ€” can communicate with the databases on their private IP addresses. The databases are invisible to the internet.

2.14 โ˜๏ธ Cloud Service Mapping

DNS management and IP routing are provided as managed services on every major cloud platform:

ConceptAWS (Primary)GCPAzure
DNS hosting & record managementAmazon Route 53Cloud DNSAzure DNS
GeoDNS & latency-based routingRoute 53 routing policies (latency, geolocation, failover)Cloud DNS + Traffic DirectorAzure Traffic Manager
Health checks & DNS failoverRoute 53 Health ChecksCloud Monitoring + uptime checksAzure Traffic Manager health probes
AWS-first picture: youtube.com is managed in Route 53. Route 53 returns different IPs based on the user's region (latency-based routing). Each region's load balancer has a public IP; backend servers use private IPs inside a VPC.

๐ŸŒ 3. HTTP / HTTPS

You now know that DNS translates youtube.com into an IP address โ€” but what happens next? Once your browser has the server's address, it needs a common language to ask for data and receive responses. That language is HTTP. When that communication is encrypted, it becomes HTTPS. In this section you will learn how HTTP requests and responses are structured, the five HTTP methods every engineer must know, what status codes mean, why HTTP is stateless, and why HTTPS is non-negotiable in production systems.

HTTP and HTTPS โ€” the protocol for client-server communication

3.1 ๐ŸŽฏ Introduction

Imagine you search for "laptop" on Amazon. Your browser sends a precisely structured message: GET /search?q=laptop HTTP/1.1. That is an HTTP request. Amazon's server processes it and sends back an HTTP response with product data. Every web page you visit, every API call your app makes, every file you download โ€” all of it travels as HTTP or HTTPS.

HTTP (HyperText Transfer Protocol) defines how clients and servers communicate โ€” what a request looks like, what a response contains, and what each side can expect. HTTPS is HTTP with TLS encryption so no one can intercept or read the data in transit.

3.2 ๐Ÿ’ก Why It Matters

When system designers draw an arrow between a client and a server โ€” that arrow IS HTTP/HTTPS. Every REST API, web application, mobile app, and most microservice-to-microservice calls use HTTP as the communication protocol.

  • HTTP methods (GET, POST, PUT, PATCH, DELETE) are how you design clean, predictable APIs that developers can understand instantly.
  • Status codes (200, 404, 500) are how clients know whether a request succeeded or failed โ€” without them, every error looks the same.
  • HTTP is stateless โ€” every request must carry its own authentication. This single property shapes how you design sessions and scalability in every distributed system.
  • HTTPS is non-negotiable in production: passwords, payment details, tokens, and personal data must always be encrypted in transit.
In system design: Always say "clients communicate over HTTPS" โ€” never draw an arrow without knowing that arrow means an HTTP/HTTPS call. This shows you understand both the protocol and the security requirement.

3.3 ๐Ÿ  Real-world Analogy

Think of HTTP like placing a phone order at a restaurant. There is a structured format both sides agree on: you say what you want (request), the restaurant confirms and gives you the result (response). Both sides follow the same script โ€” that script is the protocol.

Phone Order WorldHTTP WorldRole
๐Ÿ“ž Calling the restaurantOpening an HTTP connectionInitiating the conversation
๐Ÿ—ฃ๏ธ "I'd like a pizza, deliver to 5 Main St"HTTP Request (POST /orders)The client's structured ask
๐Ÿ“‹ "Confirmed, #ORD123, 30 minutes"HTTP Response (201 Created + JSON)The server's structured reply
๐Ÿ“ฆ The pizza itselfResponse body (JSON data)The actual content returned
๐Ÿ” Calling on an encrypted private lineHTTPS (HTTP over TLS)Securing the conversation from eavesdroppers

3.4 ๐Ÿ“– Key Terms

TermSimple DefinitionQuick Example
HTTPProtocol defining how clients and servers communicateAll web requests use HTTP or HTTPS
HTTPSHTTP over TLS โ€” encrypted, secure HTTPhttps://amazon.com โ€” the padlock in your browser
RequestMessage from client โ†’ server asking for data or an actionGET /products โ€” give me the product list
ResponseServer's reply โ€” contains status, headers, and body200 OK + JSON product data
HTTP MethodThe type of action the client wants to performGET (read), POST (create), DELETE (remove)
Status CodeA 3-digit number indicating success or failure200 = OK, 404 = Not Found, 500 = Server Error
HeaderExtra metadata attached to a request or responseAuthorization: Bearer token, Content-Type: application/json
Body / PayloadThe actual data content of a request or responseJSON object with login credentials or product list
StatelessServer does not remember previous requests โ€” every request is independentEvery API call must include an auth token
TLSTransport Layer Security โ€” the encryption layer that makes HTTPS secureThe padlock icon; encrypts all data in transit
REST APIAPI design style using HTTP methods and URLs to represent resourcesGET /users/123 โ€” fetch user 123
Port 80 / 443Default ports: HTTP uses 80, HTTPS uses 443Servers listen on these ports for incoming requests

3.5 ๐Ÿ”ข How It Works

An HTTP exchange has two halves: a request (client โ†’ server) and a response (server โ†’ client). Each has a defined, structured format that every client and server in the world understands.

HTTP Request Structure

Every HTTP request has three parts: a request line (method + URL + HTTP version), headers (metadata), and an optional body (data for POST/PUT/PATCH). Here is a real search request to Amazon:

GET /search?q=laptop HTTP/1.1 Host: amazon.com Authorization: Bearer eyJhbGci... Accept: application/json User-Agent: Mozilla/5.0 (Chrome/120)

In plain English: "Hey Amazon (Host), please give me (GET) the search results for 'laptop' (/search?q=laptop). Here is my login token (Authorization). I want the response as JSON (Accept)."

HTTP Response Structure

The server replies with a status line (version + status code + text), headers, and a body containing the actual data returned.

HTTP/1.1 200 OK Content-Type: application/json Cache-Control: max-age=60 { "products": [ {"name": "Laptop Pro", "price": 999}, {"name": "Laptop Air", "price": 799} ] }

In plain English: "Request successful (200 OK). Here is the data as JSON (Content-Type). You can cache this for 60 seconds (Cache-Control)."

Key insight: The request line tells the server WHAT to do. The headers add context (who you are, what format you accept). The body carries data (only in POST/PUT/PATCH). The response status code tells you the outcome before you even read the body.

3.6 ๐Ÿ”€ Types & Variations

HTTP has several key building blocks: methods (action to perform), status codes (what happened), headers (metadata), body/payload (data), the critical stateless property, and the HTTPS/TLS security layer. Each is explained below.

A. HTTP Methods โ€” The Five Actions

MethodMeaningHas Body?Changes Server Data?
๐Ÿ“– GETRead / fetch dataNoNo โ€” safe to repeat
โž• POSTCreate new dataYesYes โ€” creates something new
๐Ÿ”„ PUTReplace entire resourceYesYes โ€” replaces completely
โœ๏ธ PATCHUpdate part of a resourceYesYes โ€” partial update only
๐Ÿ—‘๏ธ DELETERemove a resourceNoYes โ€” deletes permanently

GET โ€” Read data. Fetches data without changing anything on the server. Safe to repeat โ€” refreshing a page just sends the same GET request again.

GET /products/123 HTTP/1.1 Host: amazon.com
ActionGET Request
View YouTube video detailsGET /videos/abc123
Load Instagram profileGET /users/james
Search productsGET /search?q=laptop
Read post commentsGET /posts/10/comments

POST โ€” Create new data. Sends data in the body to create something new. Repeating a POST order request creates two separate orders โ€” not idempotent like GET.

POST /orders HTTP/1.1 Content-Type: application/json { "items": ["laptop_123"], "address": "5 Main St, Tokyo" }
ActionPOST Request
Create accountPOST /users
LoginPOST /login
Place orderPOST /orders
Post commentPOST /posts/10/comments

PUT โ€” Replace entire resource. Replaces the full resource with a new version. You must send ALL fields โ€” any field not included is removed.

PUT /users/123 HTTP/1.1 Content-Type: application/json { "name": "james Fernando", "email": "james.new@example.com", "city": "Osaka" }

PATCH โ€” Update part of a resource. Updates only the fields you send. More efficient than PUT when you only need to change one or two fields.

PATCH /users/123 HTTP/1.1 Content-Type: application/json { "city": "Osaka" }
PUT vs PATCH: PUT = replace the whole object (must send everything). PATCH = change only what you specify (send only changed fields). In practice, PATCH is used far more often because it is safer and more efficient.

DELETE โ€” Remove a resource. Permanently removes the identified resource.

DELETE /comments/987 HTTP/1.1 Authorization: Bearer eyJhbGci...
ActionDELETE Request
Delete commentDELETE /comments/987
Cancel orderDELETE /orders/ORD123
Remove saved addressDELETE /addresses/5

B. HTTP Status Codes โ€” What Happened?

Status codes are three-digit numbers in every HTTP response. They tell the client immediately โ€” before reading the body โ€” whether the request succeeded or failed. Memorise these eight codes: they cover 90% of what you will encounter in real systems.

CodeMeaningTypical CauseExample
200 OKRequest succeededSuccessful GET, PUT, PATCHGET /products/123 โ†’ product found
201 CreatedNew resource createdSuccessful POSTPOST /orders โ†’ order placed
400 Bad RequestClient sent invalid dataMissing field, wrong formatEmail format wrong, required field empty
401 UnauthorizedNot authenticatedNo token, expired tokenGET /my-orders without login โ†’ 401
403 ForbiddenAuthenticated but not allowedValid login, wrong permissionNormal user tries DELETE /admin/users/55
404 Not FoundResource does not existWrong ID, deleted resourceGET /products/999999 โ†’ not found
429 Too Many RequestsRate limit exceededToo many calls in short timeRepeated login attempts blocked
500 Internal Server ErrorServer crashedUnhandled exception, bugGET /orders โ†’ server database crashed
401 vs 403: 401 = "I don't know who you are โ€” login first." 403 = "I know who you are, but you're not allowed to do this." A request with no token โ†’ 401. A normal user trying an admin action โ†’ 403.

C. HTTP Headers โ€” Metadata on Every Request

Headers are key-value pairs that carry metadata. Think of them like labels on a package โ€” the package contains the main item (the body), but the labels tell the receiver what type of item it is, who sent it, and how it should be handled.

HeaderMeaningExample
HostThe domain the client is requestingamazon.com
AuthorizationLogin token, Bearer token, or API keyBearer eyJhbGci...
Content-TypeFormat of the request body being sentapplication/json
AcceptFormat the client wants in the responseapplication/json
Cache-ControlCaching instructionsmax-age=60 (cache 60s)
User-AgentBrowser or client app infoMozilla/5.0 (Chrome/120)
CookieSession or tracking info sent by browsersession_id=abc123

Here is what a real POST request with authentication headers looks like:

POST /orders HTTP/1.1 Authorization: Bearer eyJhbGci... Content-Type: application/json Accept: application/json { "items": ["laptop_123"], "address": "5 Main St, Tokyo" }

D. HTTP Body / Payload

The body is the actual data content. GET and DELETE requests usually have no body โ€” the URL carries all the information. POST, PUT, and PATCH carry data in the body โ€” this is how you send new or updated data to the server.

In modern APIs, the body is almost always JSON because it is readable by both humans and machines. Example login request body:

{ "email": "james@example.com", "password": "mypassword" }

And the server's response body (after placing an order):

{ "order_id": "ORD-20260530-123", "status": "confirmed", "estimated_delivery": "30 minutes", "total": 4500 }

E. HTTP Is Stateless โ€” Critical for Scalability

This single property shapes every scalability decision you will make: HTTP is stateless. The server does not automatically remember anything about a previous request. Every request is treated as completely independent.

Real-world analogy: Imagine calling a customer support center. Every time you call, a different agent answers. That agent has no memory of your previous calls โ€” you must re-identify yourself every time: "Hi, my name is james, customer ID 12345, calling about order ORD-123." HTTP works exactly the same โ€” every request must carry enough information for the server to understand who you are and what you are allowed to do.

Because the server remembers nothing, the client includes an authentication token, cookie, or session ID in every request header:

GET /my-orders HTTP/1.1 Authorization: Bearer eyJhbGci... โ† identity proof on EVERY request

Why statelessness is great for scalability:

  • Any server in a cluster can handle any request โ€” the request contains all the information the server needs
  • Load balancers can route requests to any available server โ€” no "sticky sessions" needed
  • If a server crashes, another server picks up the next request with no data loss
  • Auto-scaling works cleanly โ€” new servers are immediately ready to handle requests

F. HTTP vs HTTPS โ€” Why Encryption Matters

FeatureHTTPHTTPS
SecurityโŒ Plaintext โ€” anyone can intercept and readโœ… TLS encrypted โ€” unreadable in transit
Default port80443
URL prefixhttp://https://
Safe for passwords, payments, tokensโŒ Neverโœ… Yes
Browser padlock shownNo (warning shown instead)Yes
Production useOnly internal services in private networksAlways for external-facing APIs and websites
Without HTTPS: Anyone between the client and server โ€” on the same Wi-Fi, at the ISP, or a malicious middle actor โ€” can read everything: passwords, tokens, credit card numbers, personal messages. This is called a man-in-the-middle attack. HTTPS makes all of this data completely unreadable to anyone who intercepts it.

G. TLS / SSL โ€” How HTTPS Encrypts

TLS (Transport Layer Security) is the security layer under HTTPS. You may hear "SSL" โ€” that is the older name; modern systems use TLS. TLS provides three guarantees for every HTTPS connection:

TLS GuaranteeWhat It MeansAnalogy
๐Ÿ” EncryptionData is scrambled โ€” only client and server can read itSending a locked box โ€” only the receiver has the key
โœ… AuthenticationBrowser verifies the server is who it claims to be (via TLS certificate)Checking the ID of the person before handing over the package
๐Ÿ›ก๏ธ IntegrityData cannot be silently modified in transitTamper-evident seal โ€” any modification is detected

The TLS handshake (happens automatically in milliseconds before the first HTTP request):

StepWhat Happens
โ‘ Browser connects to server on port 443 and says "I want a secure connection"
โ‘กServer sends its TLS certificate (issued by a trusted Certificate Authority like Let's Encrypt or DigiCert)
โ‘ขBrowser verifies the certificate โ€” checks it is valid, not expired, and issued by a trusted authority
โ‘ฃBrowser and server agree on shared encryption keys using public-key cryptography (no key is ever sent over the network)
โ‘คSecure encrypted channel established โ€” all HTTP data from here is encrypted
โ‘ฅNormal HTTP request-response begins, now running inside the encrypted tunnel
In production: TLS is usually terminated at the load balancer or CDN layer โ€” not at the backend server. The load balancer handles TLS encryption/decryption, and backend servers receive unencrypted HTTP on the internal private network (protected by private IPs and firewall rules). This is called TLS termination.

3.7 ๐ŸŽจ Illustrated Diagram

The diagram below shows the difference between HTTP and HTTPS, and the structure of the request-response cycle.

%%{init: {"theme": "base", "themeVariables": {"lineColor": "#64748b", "edgeLabelBackground": "#fff"}}}%% flowchart TD subgraph HTTP["โŒ HTTP โ€” Port 80 (Unencrypted)"] direction LR C1["๐Ÿ’ป Client"] -->|"โš ๏ธ Plaintext โ€” anyone can intercept"| S1["๐Ÿ–ฅ๏ธ Server"] S1 -->|"โš ๏ธ Plaintext response โ€” data exposed"| C1 end subgraph HTTPS["โœ… HTTPS โ€” Port 443 (TLS Encrypted)"] direction LR C2["๐Ÿ’ป Client"] -->|"๐Ÿ” Encrypted request โ€” only server reads it"| S2["๐Ÿ–ฅ๏ธ Server"] S2 -->|"๐Ÿ” Encrypted response โ€” only client reads it"| C2 end style HTTP fill:#fef2f2,stroke:#ef4444,color:#991b1b style HTTPS fill:#f0fdf4,stroke:#22c55e,color:#14532d style C1 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style S1 fill:#fee2e2,stroke:#ef4444,color:#991b1b style C2 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style S2 fill:#d1fae5,stroke:#059669,color:#064e3b

Reading the diagram: HTTP sends data as plaintext โ€” anyone who intercepts the traffic between client and server can read passwords, tokens, and personal data. HTTPS wraps the same HTTP communication in TLS encryption โ€” the data is unreadable to anyone except the intended client and server.

3.8 โœ… When to Use

ScenarioUse ThisWhy
Any production application (login, payments, personal data, APIs)HTTPS alwaysSensitive data must never travel unencrypted over the internet
Fetching data โ€” no state change on the serverGETRead-only, safe to retry, can be cached
Creating a new resource (order, account, post)POSTSends data in the body; creates something new on the server
Updating a small part of a resource (change city, update photo)PATCHMore efficient than PUT โ€” only sends changed fields
Replacing a full resource with a completely new versionPUTSends the entire object; replaces everything
Removing a resource permanentlyDELETERemoves the identified resource from the server
Golden rule: Use HTTP (not HTTPS) only for local development or internal service-to-service calls inside a private VPC. Every external-facing endpoint โ€” login, API, CDN, admin panel โ€” must use HTTPS.

3.9 ๐Ÿ—๏ธ Real-world Example โ€” Placing an Order on Uber Eats

When you place a food order on Uber Eats, here are the HTTP calls happening behind the scenes:

StepHTTP CallWhat Happens
โ‘ GET /restaurants?city=tokyoApp fetches nearby restaurants โ€” server returns list as JSON. Response: 200 OK
โ‘กGET /restaurants/123/menuUser taps a restaurant โ€” app fetches its menu. Response: 200 OK
โ‘ขPOST /orders + body: {items, address, payment}User confirms order โ€” app creates a new order. Response: 201 Created
โ‘ฃGET /orders/ORD123/statusApp polls order status โ€” returns "accepted", "preparing", "on the way". Response: 200 OK
โ‘คPATCH /orders/ORD123/addressUser changes delivery address before driver picks up. Response: 200 OK
โ‘ฅDELETE /orders/ORD123User cancels order. Response: 200 OK or 204 No Content
Notice: All five HTTP methods appear in a single user session. Each call has the right method for the action โ€” GET for reading, POST for creating, PATCH for partial update, DELETE for removal. This is clean REST API design.

3.10 โš–๏ธ Trade-offs

โœ… AdvantagesโŒ Disadvantages
Stateless design โ€” any server can handle any request; scales horizontally with load balancersStateless overhead โ€” every request must carry auth tokens/cookies, adding bytes to every call
HTTPS security โ€” data is encrypted; users and browsers trust HTTPS sitesTLS handshake latency โ€” adds one round trip on first connection (mitigated by TLS 1.3 and keep-alive)
Widely supported โ€” HTTP/HTTPS works across every platform, language, and deviceNot ideal for real-time โ€” HTTP is request-response; not suited for live bidirectional streams (WebSockets are better)
Simple caching โ€” GET responses can be cached by CDNs, browsers, and proxiesText-based overhead โ€” HTTP headers add significant bytes per request (HTTP/2 headers compression helps)

3.11 ๐Ÿšซ Common Mistakes

#โŒ Common Mistakeโœ… The Reality
1Using POST for everythingUse the right method: GET to read, POST to create, PUT/PATCH to update, DELETE to remove. Wrong methods make your API unpredictable and break client expectations.
2Returning 200 for all responses including errorsReturn the correct status code โ€” 400 for bad input, 401 for unauthenticated, 404 for not found, 500 for server error. Returning 200 for everything forces clients to parse every response body to detect errors.
3HTTP and HTTPS are completely different protocolsHTTPS is HTTP over TLS โ€” it is the same protocol with an encryption layer added. The request/response structure, methods, and status codes are identical.
4Forgetting HTTP is statelessThe server does not remember you between requests. Always include authentication (Bearer token, cookie, session ID) in every request that requires it.
5Using HTTP in productionAlways use HTTPS for any public-facing endpoint. HTTP exposes passwords, tokens, and personal data to anyone on the network โ€” unacceptable in production.

3.12 ๐Ÿ“ Summary

  • HTTP is the protocol defining how clients and servers communicate โ€” every web request is an HTTP request-response pair.
  • HTTPS = HTTP + TLS encryption โ€” always use HTTPS in production for any data that matters.
  • 5 methods: GET (read) ยท POST (create) ยท PUT (replace) ยท PATCH (partial update) ยท DELETE (remove). Use the right one for each action.
  • Status codes: 2xx success ยท 3xx redirect ยท 4xx client error ยท 5xx server error. Return meaningful codes โ€” never 200 for everything.
  • HTTP is stateless โ€” every request is independent. Authentication tokens or cookies must be included with every request that needs them.
  • REST APIs are built on HTTP โ€” resources are URLs, actions are methods, results are status codes.

3.13 ๐Ÿ‹๏ธ Design Challenge

๐Ÿ• Challenge: Design a food delivery app REST API

For each of the following actions, choose the correct HTTP method, design the endpoint URL, and state the expected success status code:
  • Browse available restaurants near the user
  • Place a new food order
  • Change the delivery address on an existing order
  • Cancel an order before it is picked up
  • A user tries to cancel an order that doesn't exist โ€” what status code should the server return?
๐Ÿ‘๏ธ Show Answer
ActionMethodEndpointSuccess Code
Browse restaurantsGET/restaurants?city=tokyo200 OK
Place new orderPOST/orders201 Created
Change delivery addressPATCH/orders/{id}/address200 OK
Cancel orderDELETE/orders/{id}200 OK or 204 No Content
Cancel non-existent orderDELETE/orders/{id}404 Not Found

3.14 โ˜๏ธ Cloud Service Mapping

In cloud production systems, HTTP/HTTPS traffic flows through these managed services:

ConceptAWS (Primary)GCPAzure
TLS certificatesAWS Certificate Manager (ACM) โ€” free, auto-renewsCertificate ManagerAzure Key Vault / App Service Certificates
HTTP/HTTPS traffic routingApplication Load Balancer (ALB)Cloud Load Balancing (HTTP(S))Azure Application Gateway
CDN with HTTPSAmazon CloudFrontCloud CDNAzure Front Door / Azure CDN
HTTPS API entry pointAmazon API GatewayAPI Gateway / ApigeeAzure API Management
AWS flow: Client โ†’ Route 53 (DNS) โ†’ CloudFront (CDN + HTTPS) โ†’ Application Load Balancer โ†’ EC2/Lambda (backend). ACM automatically provides and renews the TLS certificate for CloudFront and ALB โ€” no manual certificate management needed.

โšก 4. TCP vs UDP

You now know that HTTP/HTTPS is the language clients and servers use to communicate. But how does that data actually travel across the internet? That is the job of the transport layer, and there are two main protocols to choose from: TCP (reliable, ordered, slower) and UDP (fast, lightweight, no guarantees). Every system design decision involving real-time communication โ€” video calls, online gaming, live location tracking โ€” ultimately comes down to choosing between these two.

TCP vs UDP โ€” reliable vs fast transport protocols

4.1 ๐ŸŽฏ Introduction

Imagine you are on a Zoom call. At the same moment, your browser downloads your bank statement. Both use the internet, but they behave very differently: the Zoom video stream keeps going even if a few frames are lost โ€” your call stays smooth. But your bank statement absolutely cannot have a single byte missing or corrupted โ€” every number must be exact.

This difference comes down to TCP vs UDP. TCP (Transmission Control Protocol) is the careful, reliable choice โ€” it guarantees every byte arrives in order. UDP (User Datagram Protocol) is the fast, lightweight choice โ€” it sends data as quickly as possible without waiting for confirmations.

Understanding where TCP and UDP sit in the network stack is essential:

HTTP/HTTPS โ†’ What message format is used? TCP / UDP โ†’ How that message is transported? IP โ†’ Where the packet should go? Network โ†’ The physical cables and wireless signals

4.2 ๐Ÿ’ก Why It Matters

Every system you design has components that communicate over a network. The choice of TCP vs UDP directly affects reliability, latency, and user experience. Getting this wrong can mean lost payments, broken file downloads, or laggy video calls.

  • HTTP/HTTPS (every web page and REST API) runs on TCP โ€” reliable delivery is non-negotiable for web content.
  • DNS lookups commonly use UDP โ€” queries are tiny and speed matters more than retrying.
  • Zoom, Google Meet, and Discord voice use UDP-based protocols โ€” a lost video frame is better ignored than waited for.
  • WhatsApp text messages use TCP โ€” but WhatsApp voice/video calls switch to UDP-based transport.
  • Modern HTTP/3 uses QUIC over UDP โ€” an attempt to get TCP-like reliability with UDP-like speed.
Core decision: Use TCP when correctness matters more than speed. Use UDP when speed matters more than perfect delivery.

4.3 ๐Ÿ  Real-world Analogy

TCP is like sending an important contract via registered mail with tracking and signature confirmation. The courier confirms delivery, tracks every step, resends if something goes missing, and ensures pages arrive in the right order. Slower โ€” but nothing is lost.

UDP is like a sports commentator shouting live updates. They keep talking regardless of whether every word reaches every listener โ€” some words may be lost to background noise, but the commentary stays current and keeps moving forward.

AnalogyTCPUDP
๐Ÿ“ฌ Registered mail with trackingโœ… TCP โ€” confirmed deliveryโ€”
๐Ÿ“ฃ Sports commentary shouted liveโ€”โœ… UDP โ€” keeps moving, no confirmation
Queue at a counter (ordered)โœ… TCP โ€” serves in strict orderโ€”
Leaflets dropped from a planeโ€”โœ… UDP โ€” fast, no confirmation who received

4.4 ๐Ÿ“– Key Terms

TermSimple DefinitionQuick Example
TCPReliable, ordered transport โ€” guarantees every byte arrives correctlyHTTP, file downloads, payments
UDPFast, lightweight transport โ€” sends quickly, no delivery guaranteeVideo calls, DNS queries, online gaming
PacketA small chunk of data sent across the networkA single 1500-byte unit of your download
3-Way HandshakeTCP's connection setup process โ€” SYN โ†’ SYN-ACK โ†’ ACKLike "Hello โ†’ Hello back โ†’ OK, let's talk"
SYN / ACKSYN = "I want to connect". ACK = "I received your message"TCP's connection handshake signals
RetransmissionTCP resending a packet that was lost in transitLost packet 3 โ†’ TCP requests and resends it
Ordered DeliveryData arrives in the same sequence it was sentPackets 1, 2, 3 arrive as 1, 2, 3 (not 3, 1, 2)
Head-of-Line BlockingOne lost packet blocks all later packets from being deliveredPacket 2 lost โ†’ packets 3, 4, 5 wait on hold
Connection-orientedA connection is established before data is sent (TCP)TCP 3-way handshake before HTTP request
ConnectionlessData is sent without establishing a connection first (UDP)DNS query sent immediately, no handshake
QUICModern protocol over UDP that adds reliability features โ€” used by HTTP/3HTTP/3 โ†’ QUIC โ†’ UDP โ†’ IP

4.5 ๐Ÿ”ข How It Works

TCP โ€” Reliable, Step by Step

Step 1: The 3-Way Handshake โ€” Before any data is sent, TCP establishes a connection:

Client โ†’ Server : SYN "I want to connect. Are you ready?" Server โ†’ Client : SYN-ACK "Yes, I'm ready. Are you ready?" Client โ†’ Server : ACK "Yes. Let's communicate."

Only after all three steps does data transfer begin. This adds one round-trip of latency before any content is sent.

Step 2: Ordered Delivery โ€” TCP numbers every packet. Even if they arrive out of order, TCP reorders them before handing data to the application:

Network delivers: Packet 1, Packet 3, Packet 2 TCP reassembles: Packet 1, Packet 2, Packet 3 โ† always correct order

Step 3: Retransmission โ€” If a packet is lost, TCP detects it and requests a resend. The application waits until the complete data arrives:

Packet 1 โ†’ received โœ“ Packet 2 โ†’ LOST โœ— Packet 3 โ†’ received but WAITS (head-of-line blocking) Packet 2 โ†’ resent โœ“ Packet 3 โ†’ now delivered (in order)
Head-of-Line Blocking: Because TCP delivers data IN ORDER, one missing packet blocks all later packets from being delivered โ€” even if they've already arrived. Like a queue where one person drops something and nobody behind them can move forward until it's picked up.

UDP โ€” Fast, Step by Step

No handshake โ€” UDP just sends packets immediately. No connection setup, no waiting:

Client โ†’ Server : Packet 1 (sent immediately) Client โ†’ Server : Packet 2 (sent immediately) Client โ†’ Server : Packet 3 (sent immediately) โ† no acknowledgement, no confirmation

No ordering, no retransmission โ€” if a packet is lost, UDP ignores it and keeps going. The application receives whatever arrives, in whatever order:

Sent: Frame 1, Frame 2, Frame 3, Frame 4, Frame 5 Received: Frame 1, Frame 3, Frame 4, Frame 5 โ† Frame 2 lost, ignored App sees: Shows Frame 1, 3, 4, 5 โ€” tiny glitch, call continues

4.6 ๐Ÿ”€ Types & Variations

FeatureTCPUDP
Connection setupโœ… 3-way handshake requiredโŒ No handshake โ€” just send
Delivery guaranteeโœ… Every packet confirmedโŒ No guarantee โ€” may drop
Orderingโœ… Always in sequenceโŒ May arrive out of order
Retransmissionโœ… Lost packets are resentโŒ Lost packets are ignored
Speed๐Ÿข Slower (overhead of guarantees)๐Ÿš€ Faster (minimal overhead)
OverheadHigher โ€” header + acks + flow controlLower โ€” minimal 8-byte header
Best forPayments, APIs, file downloads, loginVideo calls, gaming, DNS, live streaming

Where each protocol sits in real stacks:

HTTP/1.1 and HTTP/2 โ†’ TCP (most web traffic) HTTPS โ†’ TLS + TCP DNS queries โ†’ UDP (fast small lookups) Video calls (Zoom) โ†’ RTP/SRTP over UDP HTTP/3 โ†’ QUIC over UDP (modern, reliability built in)
HTTP/3 & QUIC: HTTP/3 runs on QUIC which is built on top of UDP. QUIC adds reliability features similar to TCP (ordering, retransmission) but solves TCP's head-of-line blocking problem. For most system design discussions, remember: classic HTTP = TCP, HTTP/3 = QUIC/UDP.

Can This Use Case Tolerate Packet Loss?

One of the most useful ways to decide between TCP and UDP is to ask: "If a packet is lost, can the application continue correctly โ€” or does it break?"

Use CaseCan Tolerate Packet Loss?Protocol ChoiceWhy
๐Ÿ’ณ Payment transactionโŒ NoTCPA missing byte could mean the wrong amount is charged
๐Ÿ”‘ Login requestโŒ NoTCPDropped credentials = authentication failure or security hole
๐Ÿ“ File downloadโŒ NoTCPA missing packet = corrupted file that cannot be opened
๐Ÿ“ง Send emailโŒ NoTCPEmail must arrive complete and in order
๐ŸŽฌ Live video frameโœ… Sometimes yesUDPOne dropped frame = tiny glitch; call continues normally
๐ŸŽฎ Game position updateโœ… Often yesUDPOld position is stale anyway โ€” next update arrives in milliseconds
๐ŸŽ™๏ธ Voice call audio sampleโœ… Sometimes yesUDPA tiny gap in audio is less disruptive than a delayed call
๐ŸŒ DNS queryโœ… YesUDPIf lost, the resolver simply retries โ€” the query is tiny
Two questions to decide:
โ‘  Do I need reliable, ordered delivery? โ†’ Yes โ†’ TCP
โ‘ก Can I tolerate some packet loss in exchange for lower latency? โ†’ Yes โ†’ UDP

4.7 ๐ŸŽจ Illustrated Diagram

The diagram below compares the TCP and UDP flows side by side โ€” showing the handshake, ordered delivery, and retransmission of TCP versus the fire-and-forget simplicity of UDP.

%%{init: {"theme": "base", "themeVariables": {"lineColor": "#64748b", "edgeLabelBackground": "#fff"}}}%% flowchart TD subgraph TCP["โœ… TCP โ€” Reliable & Ordered"] direction LR T1["๐Ÿ’ป Client"] -->|"โ‘  SYN"| T2["๐Ÿ–ฅ๏ธ Server"] T2 -->|"โ‘ก SYN-ACK"| T1 T1 -->|"โ‘ข ACK + Data"| T2 T2 -->|"โ‘ฃ ACK (confirmed)"| T1 T1 -->|"โ‘ค Resend if lost"| T2 end subgraph UDP["โšก UDP โ€” Fast & Lightweight"] direction LR U1["๐Ÿ’ป Client"] -->|"Packet 1 (no confirm)"| U2["๐Ÿ–ฅ๏ธ Server"] U1 -->|"Packet 2 (no confirm)"| U2 U1 -->|"Packet 3 โ†’ LOST"| U2 U1 -->|"Packet 4 (continues anyway)"| U2 end style TCP fill:#eff6ff,stroke:#2563eb,color:#1e40af style UDP fill:#fff7ed,stroke:#d97706,color:#92400e style T1 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style T2 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style U1 fill:#fed7aa,stroke:#fb923c,color:#7c2d12 style U2 fill:#fed7aa,stroke:#fb923c,color:#7c2d12

Reading the diagram: TCP (blue) requires a 3-step handshake before any data, confirms every packet, and resends losses. UDP (orange) just fires packets one after another with no confirmation โ€” faster, but Packet 3 being lost is simply ignored.

4.8 โœ… When to Use

Ask two questions: Do I need every byte to arrive correctly? and Can I tolerate losing some data if it means lower latency?

Use CaseProtocolReason
Login, user authenticationTCPCredentials must arrive correctly โ€” no silent loss
Payment, order placementTCPCorrectness and order are critical โ€” a missing byte = wrong amount
File upload / downloadTCPFile must arrive complete and uncorrupted
REST APIs, web pagesTCPHTTP/HTTPS runs on TCP by design
Database queriesTCPEvery SQL query and response must be exact
DNS lookupsUDPSmall, fast queries โ€” retrying is trivial if needed
Live video / voice callsUDPOld frames are useless โ€” keep sending new ones
Online gaming (position updates)UDPOld positions are stale โ€” latest update is what matters
Live sports score updatesUDPA missed score update is fine โ€” next one arrives in ms
Rule of thumb: If missing data would cause a bug, security issue, or incorrect result โ†’ TCP. If missing data just causes a tiny visual glitch or the data is immediately superseded anyway โ†’ UDP.

4.9 ๐Ÿ—๏ธ Real-world Examples

The same application often uses BOTH TCP and UDP for different features. Here are four concrete examples showing exactly which protocol is chosen and why.

Example 1: Online Payment

For online payment, correctness is more important than speed. A payment request contains critical data:

POST /payments HTTP/1.1 { "amount": 10000, "currency": "JPY", "merchant": "Merchant ABC", "card": "****1234" }

You do not want this data to be lost, duplicated, corrupted, or delivered out of order. A missing packet could mean the wrong amount is charged or the transaction is never recorded. A tiny delay is perfectly acceptable. An incorrect payment is not.

Payment always uses TCP (HTTPS): correctness > speed. Every byte must arrive in order.

Example 2: Video Streaming vs Live Video Call

This is the most important distinction to understand โ€” and one that trips up beginners. Not all video is the same.

Normal Video Streaming (YouTube/Netflix)Live Video Call (Zoom/Google Meet)
ProtocolTCP (HTTP-based streaming)UDP-based (RTP/SRTP)
ReasonVideo is buffered โ€” if a chunk is delayed slightly, the player waits briefly and the video plays correctlyIf an old audio/video packet arrives late, it is useless โ€” the conversation has already moved on
PriorityCorrectness โ€” every chunk must arrive for the video to playLow latency โ€” keep the call flowing even if a frame is lost
Loss toleranceNo โ€” buffer handles delays, TCP resends lossesYes โ€” one lost frame = tiny glitch, call continues
Key takeaway: YouTube uses TCP because buffering tolerates short delays. Zoom uses UDP because a 200ms delay waiting to retransmit an old video frame makes the conversation choppy and unusable.

Example 3: Online Gaming

In an online game, your character's position changes many times per second. The server must know where every player is at every moment:

Position update at 10:00:01.001 โ†’ x:100, y:200 Position update at 10:00:01.020 โ†’ x:102, y:201 Position update at 10:00:01.040 โ†’ x:104, y:203

If the position update from 10:00:01.020 is lost, it is pointless to wait for it โ€” by the time it is retransmitted, the position at 10:00:01.040 is already more accurate. Waiting for the old packet (TCP behaviour) would cause lag and make the game feel sluggish. Instead, games use UDP: if a position update is lost, just use the next one that arrives.

Online gaming uses UDP because old state (old position) is immediately superseded. Low latency and smooth experience matter far more than perfect delivery of every packet.

Example 4: Chat App (WhatsApp)

A single chat app uses different protocols for different features โ€” a perfect illustration of how real systems mix TCP and UDP:

WhatsApp FeatureProtocolWhy
๐Ÿ”‘ Login & registrationTCP (HTTPS)Credentials must arrive correctly โ€” authentication cannot fail silently
๐Ÿ’ฌ Send text messageTCP (WebSocket / HTTPS)Messages must NOT be silently dropped โ€” user thinks it was sent when it wasn't
๐Ÿ–ผ๏ธ Upload photo / video fileTCP (HTTPS)File must arrive complete and uncorrupted โ€” a missing packet = corrupted image
๐ŸŽ™๏ธ Voice call audioUDP-based (RTP)Old audio packets are useless โ€” keep the call flowing without waiting for retransmission
๐Ÿ“น Video call streamUDP-based (RTP/SRTP)Lost frames = tiny glitch; retransmitting 200ms-old video = choppy call
๐Ÿ”” Push notificationPlatform-specific (APNs/FCM over TCP)Notifications must be reliably delivered โ€” no silent drops
Key insight: WhatsApp uses TCP for everything that must not be lost (text, files, login) and UDP-based protocols for everything where latency matters more than perfection (voice, video). One app โ€” both protocols โ€” different features.

4.10 โš–๏ธ Trade-offs

โœ… TCP AdvantagesโŒ TCP Disadvantages
Guaranteed delivery โ€” nothing is silently lostHandshake adds latency before first byte
Ordered delivery โ€” application always gets data in sequenceHead-of-line blocking โ€” one lost packet stalls everything
Error detection and retransmission built inHigher overhead โ€” more bytes per packet (headers, ACKs)
Flow and congestion control โ€” won't flood the networkNot suitable when oldest packet is worthless (live video)
โœ… UDP AdvantagesโŒ UDP Disadvantages
Very low latency โ€” no handshake, no waitingNo delivery guarantee โ€” packets can be lost silently
Minimal overhead โ€” tiny 8-byte headerNo ordering โ€” application must handle reordering itself
Works well for broadcast/multicastNo retransmission โ€” application must implement reliability if needed
Connectionless โ€” scales easily for many small requestsHarder to build reliable features on top without significant effort

4.11 ๐Ÿšซ Common Mistakes

#โŒ Common Mistakeโœ… The Reality
1UDP is always better because it is fasterUDP is only better when losing data is acceptable. For payments, file transfers, or login โ€” UDP would break the application silently.
2TCP is always better because it is reliableTCP's reliability creates overhead and latency. For a live video call, waiting to retransmit a 100ms-old video frame makes the call choppy โ€” UDP is the right choice.
3Forgetting that HTTP/HTTPS uses TCPEvery REST API call, web page load, and HTTPS request runs on TCP. When you draw a client-server arrow for an API, that arrow implies TCP.
4Thinking UDP means the application is unreliableApplications CAN build reliability on top of UDP. QUIC does exactly this โ€” reliable features implemented in the application layer, running over UDP to avoid TCP's head-of-line blocking.
5Forgetting that DNS uses UDPDNS queries are typically sent over UDP because they are small and fast. If a query is lost, the resolver just asks again. This is a common system design fact to know.

4.12 ๐Ÿ“ Summary

  • TCP = reliable, ordered, connection-oriented. Use for payments, APIs, file transfers, login โ€” anything where missing data = broken functionality.
  • UDP = fast, lightweight, connectionless. Use for live video/audio, online gaming, DNS โ€” anything where speed matters and old data is worthless.
  • TCP 3-way handshake (SYN โ†’ SYN-ACK โ†’ ACK) establishes a connection before data is sent, adding one round-trip of latency.
  • Head-of-line blocking is TCP's key limitation โ€” one lost packet stalls all later packets until it is retransmitted.
  • HTTP/HTTPS runs on TCP. DNS uses UDP. HTTP/3 uses QUIC over UDP โ€” reliability with less head-of-line blocking.
  • One system can use both TCP and UDP for different features โ€” WhatsApp uses TCP for text, UDP-based for voice/video.

4.13 ๐Ÿ‹๏ธ Design Challenge

๐Ÿš— Challenge: Design the Uber App โ€” Choose TCP or UDP for each feature

For each feature below, choose TCP or UDP and explain why:
  • User login and signup
  • Booking a ride (request + confirmation)
  • Processing payment at the end of a ride
  • Live driver location updates (shown on the map every second)
  • In-app chat between rider and driver
๐Ÿ‘๏ธ Show Answer
FeatureProtocolWhy
User login & signupTCP (HTTPS)Credentials and tokens must arrive correctly and securely
Booking a rideTCP (HTTPS)Booking data must not be lost โ€” a dropped packet could mean no driver is dispatched
Payment processingTCP (HTTPS)A single missing byte in a payment request could mean wrong amount charged
Live driver location (every second)UDP or WebSocket/TCPA missed location update from 1 second ago is useless โ€” next update arrives in 1s. UDP gives lower latency. (Some systems use WebSocket over TCP for simplicity, accepting slight latency)
In-app chatTCP (WebSocket)Text messages must not be silently dropped โ€” user would think message was sent when it wasn't

4.14 โ˜๏ธ Cloud Service Mapping

TCP and UDP are protocols, not cloud services โ€” but cloud load balancers and gateways handle them differently. Here are the cloud services relevant to TCP vs UDP routing:

Traffic TypeAWS (Primary)GCPAzure
HTTP/HTTPS (TCP)Application Load Balancer (ALB)Cloud Load Balancing (HTTP(S))Azure Application Gateway
High-performance TCP / UDPNetwork Load Balancer (NLB)Network Load BalancingAzure Load Balancer
DNS (UDP)Amazon Route 53Cloud DNSAzure DNS
WebSocket (TCP-based)ALB + API Gateway WebSocketCloud Load BalancingAzure API Management
AWS mental model: Web/API traffic (HTTP/HTTPS over TCP) โ†’ Application Load Balancer. Real-time or raw TCP/UDP (gaming, VoIP, custom protocols) โ†’ Network Load Balancer. DNS queries (UDP) โ†’ Route 53.

๐Ÿš€ 5. Latency & Throughput

You have now learned how data is found (DNS), how it is communicated (HTTP/HTTPS), and how it is transported (TCP/UDP). The final question in this networking foundation is: how fast does the system respond, and how much work can it handle at once? These are the two most important performance metrics in system design โ€” latency (speed for one user) and throughput (capacity for many users). Every performance decision you make as an engineer comes down to these two concepts.

Latency and Throughput โ€” speed for one user vs capacity for many users

5.1 ๐ŸŽฏ Introduction

Imagine you are designing a system like Amazon. A user types "laptop" in the search box and presses Enter. Two critically important questions arise immediately:

QuestionConceptWhat You Measure
How quickly do the search results appear for this user?LatencyMilliseconds per request
How many users can search at the same time?ThroughputRequests per second (RPS)

Latency is the time it takes for a single request to travel from the client to the server and come back with a response โ€” the user's waiting time. If this takes 200 ms, that is the latency. Throughput is how much work the system can handle per unit of time โ€” how many requests per second it can process. A system can have good latency for individual users but still fail during peak traffic if throughput is too low.

5.2 ๐Ÿ’ก Why It Matters

Latency and throughput are not just academic concepts โ€” they directly affect users and business outcomes. Studies show that a 100ms increase in latency reduces Amazon sales by 1% and a 1-second delay causes a 7% drop in conversions. Google found that 53% of mobile users abandon a page that takes longer than 3 seconds to load.

  • Latency determines whether your app feels responsive or sluggish โ€” it is what the user directly experiences.
  • Throughput determines whether your system survives peak traffic โ€” a sale event, a viral post, or a breaking news moment.
  • p99 latency matters more than average โ€” if 1% of requests to Amazon are slow and Amazon serves 10 million requests/day, that is 100,000 slow experiences daily.
  • Every system design decision โ€” caching, CDN, database indexing, load balancing โ€” ultimately improves latency, throughput, or both.
Remember: Latency = speed for one user. Throughput = capacity for many users. A system can be fast for individual requests but still collapse under heavy load โ€” or handle massive traffic but feel sluggish for each user.

5.3 ๐Ÿ  Real-world Analogy

Think of a highway between Tokyo and Osaka:

Highway WorldSystem Design WorldMeaning
๐Ÿš— Time for ONE car to drive Tokyo โ†’ OsakaLatencyHow long one request takes to complete
๐Ÿš—๐Ÿš—๐Ÿš— How many cars can pass per hourThroughput (RPS)How many requests the system handles per second
๐Ÿ›ฃ๏ธ Adding more lanes to the highwayHorizontal scalingMore servers = more throughput
๐Ÿšฆ Traffic jam (all cars slow down)Server overloadToo many requests โ†’ latency spikes for everyone
๐ŸŽ๏ธ Faster speed limit (same lanes)Code optimizationSame number of servers but each is faster

A highway may let thousands of cars through per hour (high throughput) โ€” but if there is a traffic jam, each car still takes longer to reach its destination (high latency). Similarly, your system can handle many requests per second while some individual requests are slow. These two dimensions are independent but related.

5.4 ๐Ÿ“– Key Terms

TermSimple DefinitionQuick Example
LatencyTime for one request to complete โ€” the user's waiting timeGoogle search returns in 200ms โ†’ latency = 200ms
ThroughputAmount of work the system handles per unit of timeServer handles 10,000 requests/second
RPSRequests Per Second โ€” throughput for web/API systems"Our API handles 5,000 RPS"
QPSQueries Per Second โ€” throughput for database systems"MySQL handles 10,000 QPS"
TPSTransactions Per Second โ€” throughput for payment/DB transactions"Payment system processes 500 TPS"
p50 latency50% of requests complete faster than this valuep50 = 100ms โ†’ half of users wait less than 100ms
p95 latency95% of requests complete faster than this valuep95 = 500ms โ†’ 95% of users wait less than 500ms
p99 latency99% of requests complete faster than this valuep99 = 2s โ†’ worst 1% of users wait up to 2 seconds
BottleneckThe slowest or most limited component that constrains system performanceSlow database query โ†’ entire request is slow
Cache hitData was found in cache โ€” fast response, no DB query neededProduct page served from Redis in 5ms
Cache missData not in cache โ€” must query database โ€” slowerFirst request for a product goes to DB (100ms)
Async processingWork done outside the user's request path โ€” user doesn't waitSend confirmation email after order, not during

5.5 ๐Ÿ”ข How It Works

Measuring Latency

Latency is measured from the moment the client sends a request to the moment it receives a complete response:

User clicks "Search" for "laptop" โ†“ [~10ms] Network travel client โ†’ server โ†“ [~150ms] Server processes: validate, query DB, build response โ†“ [~10ms] Network travel server โ†’ client โ†“ [~30ms] Browser renders results Total latency โ‰ˆ 200ms โ† what the user actually waits

Each component adds to the total latency. A slow database query, a distant server, or a large response all increase the time the user waits.

Average vs Percentile Latency โ€” Why Average is Misleading

This is one of the most important concepts in production systems โ€” and one that beginners consistently get wrong.

Suppose your system has an average latency of 100ms. That sounds good. But what if some users experience 5 seconds? Average latency hides these slow users. This is why production systems use percentile latency:

p50 latency = 100ms โ†’ 50% of users get a response in under 100ms p95 latency = 500ms โ†’ 95% of users get a response in under 500ms p99 latency = 2000ms โ†’ 99% of users get a response in under 2 seconds (1% of users โ€” the "tail" โ€” may wait 2+ seconds)
Why p99 matters: If your system serves 10 million requests per day and p99 = 2 seconds, that means 100,000 users per day are experiencing 2-second wait times. Average latency of 100ms looks great on the dashboard โ€” but 100,000 users are having a bad experience. Always monitor p95 and p99, not just average.

Measuring Throughput

Throughput is measured as the number of operations completed per unit of time:

UnitMeaningTypical Context
RPS (Requests/sec)How many API requests per secondWeb servers, load balancers
QPS (Queries/sec)How many database queries per secondMySQL, PostgreSQL, Redis
TPS (Transactions/sec)How many transactions per secondPayment systems, banking
Messages/secHow many messages processed per secondKafka, SQS, message queues
MB/s or GB/sHow much data transferred per secondVideo streaming, file transfer

5.6 ๐Ÿ”€ Types & Variations

A. Common Causes of High Latency

Understanding why latency is high is the first step to fixing it. These are the six most common causes:

1. Network Distance โ€” If a user in Japan makes a request to a server in the US, the data travels thousands of kilometres. Each kilometre adds latency. Speed of light in fibre optics is about 200,000 km/s โ€” a round trip Japanโ†’USโ†’Japan adds ~150ms just for travel.

Japan user โ†’ US server โ†’ ~150ms round-trip travel time alone Japan user โ†’ Tokyo server โ†’ ~5ms round-trip travel time

Fix: Deploy regional servers, use CDN edge nodes, use DNS-based geographic routing.

2. Slow Database Queries โ€” A backend can respond in milliseconds, but if the database takes 2 seconds to run a query, that 2 seconds is the bottleneck.

-- Bad: scanning 10 million rows with no index SELECT * FROM products WHERE name = 'laptop' โ†’ 3 seconds -- Good: with an index on 'name' SELECT * FROM products WHERE name = 'laptop' โ†’ 2 ms

Fix: Add indexes, optimize queries, use caching, use read replicas, use search engines (Elasticsearch) for complex searches.

3. Too Many Service Calls (Microservices) โ€” In microservices, one user request may trigger a chain of calls to many internal services. Each call adds latency.

API Server โ†’ User Service (20ms) โ†’ Product Service (30ms) โ†’ Inventory (25ms) โ†’ Pricing Service (20ms) โ†’ Recommendation (40ms) Sequential total โ†’ 135ms extra latency from service calls alone

Fix: Reduce unnecessary calls, run independent calls in parallel, cache frequently needed data, avoid chatty communication patterns.

4. Server Overload โ€” If a server receives more requests than it can handle, requests queue up. Users at the back of the queue wait longer.

Server capacity: 1,000 RPS Incoming traffic: 5,000 RPS โ†’ Requests queue โ†’ latency spikes from 100ms to 5+ seconds

Fix: Add more servers (horizontal scaling), load balancing, auto-scaling, queue-based processing for heavy tasks.

5. Large Response Size โ€” Returning too much data takes longer to send over the network.

Bad: Return all 10,000 products in one response โ†’ huge JSON โ†’ slow Better: Return 20 products per page + pagination โ†’ small response โ†’ fast

Fix: Pagination, compression (gzip/Brotli), CDN for static content, return only required fields, efficient data formats.

6. Cold Starts (Serverless) โ€” In serverless systems (AWS Lambda), if a function hasn't run recently, the cloud provider must spin up a new instance. This startup delay โ€” a "cold start" โ€” can add hundreds of milliseconds.

Warm Lambda: request โ†’ function runs โ†’ 20ms response Cold Lambda: request โ†’ spin up container (400ms) โ†’ function runs โ†’ 420ms response

Fix: Keep critical functions warm, use provisioned concurrency, use always-running services for latency-sensitive paths.

B. How to Reduce Latency

TechniqueHow It Reduces LatencyExample
โšก CachingServe frequently accessed data from memory instead of re-querying the databaseProduct details from Redis in 1ms vs 100ms from DB
๐ŸŒ CDNServe static content from edge servers near the userJapan user gets images from Tokyo CDN, not US origin
๐Ÿ—บ๏ธ Regional deploymentPlace servers in the same region as usersTokyo users hit Tokyo servers, not Virginia
๐Ÿ—„๏ธ Database optimizationIndexes, read replicas, query optimizationIndex on product name: 3 seconds โ†’ 2ms
โš™๏ธ Async processingMove non-critical work out of the request pathSend email in background; user doesn't wait for it
๐Ÿ”€ Parallel service callsCall independent services simultaneously instead of sequentiallyCall User + Product + Inventory in parallel: 40ms vs 135ms
๐Ÿ“ฆ CompressionReduce response size so it transfers faster10KB JSON compressed to 2KB with gzip โ†’ 5ร— faster transfer

C. How to Improve Throughput

TechniqueHow It Improves ThroughputExample
๐Ÿ“ˆ Horizontal scalingAdd more servers โ€” each handles its share of traffic1 server = 1,000 RPS โ†’ 10 servers = 10,000 RPS
โš–๏ธ Load balancingDistribute requests across all servers so none is overloadedALB spreads 50,000 RPS across 50 servers
โšก CachingServe from cache = backend handles fewer requests = more capacityHomepage cached โ†’ DB receives 10% of original queries
๐Ÿ—„๏ธ Database scalingRead replicas, sharding, NoSQL for high-scale patterns5 read replicas โ†’ 5ร— read throughput
๐Ÿ“จ Queue-based architectureBuffer traffic spikes โ€” workers consume at their own paceBlack Friday orders โ†’ SQS queue โ†’ workers process steadily
๐Ÿ”ง Reduce per-request workPrecompute, cache results, move heavy tasks to backgroundPre-generate recommendations โ†’ serve from cache instantly

5.7 ๐ŸŽจ Illustrated Diagram

The diagram below shows latency (time for one request end-to-end) and throughput (multiple requests handled per second) as distinct but related dimensions.

%%{init: {"theme": "base", "themeVariables": {"lineColor": "#64748b", "edgeLabelBackground": "#fff"}}}%% flowchart TD subgraph LAT["โฑ๏ธ Latency โ€” Time for ONE Request"] direction LR C1["๐Ÿ‘ค User clicks"] -->|"10ms network"| S1["โš™๏ธ Server"] S1 -->|"150ms DB query + processing"| DB1["๐Ÿ—„๏ธ Database"] DB1 -->|"result"| S1 S1 -->|"10ms network"| R1["โœ… User sees result Total = ~170ms"] end subgraph THR["๐Ÿ“Š Throughput โ€” Many Requests Per Second"] direction LR U1["๐Ÿ‘ค User 1"] --> LB["โš–๏ธ Load Balancer (10,000 RPS)"] U2["๐Ÿ‘ค User 2"] --> LB U3["๐Ÿ‘ค User 3"] --> LB U4["๐Ÿ‘ค ...1000s more..."] --> LB LB --> SV1["โš™๏ธ Server 1"] LB --> SV2["โš™๏ธ Server 2"] LB --> SV3["โš™๏ธ Server 3"] end style LAT fill:#eff6ff,stroke:#2563eb,color:#1e40af style THR fill:#f0fdf4,stroke:#059669,color:#064e3b style C1 fill:#dbeafe,stroke:#2563eb,color:#1e3a8a style S1 fill:#d1fae5,stroke:#059669,color:#064e3b style DB1 fill:#fff3e0,stroke:#d97706,color:#92400e style R1 fill:#d1fae5,stroke:#059669,color:#064e3b style LB fill:#fff3e0,stroke:#d97706,color:#92400e style SV1 fill:#d1fae5,stroke:#059669,color:#064e3b style SV2 fill:#d1fae5,stroke:#059669,color:#064e3b style SV3 fill:#d1fae5,stroke:#059669,color:#064e3b

Reading the diagram: Latency (blue) is the journey of ONE request through the network, server, and database โ€” every hop adds time. Throughput (green) is many users hitting a load balancer that distributes work across multiple servers โ€” adding more servers increases capacity.

5.8 โœ… When to Use

Different system features have different performance priorities. Always ask: does this feature need a fast response for one user, or does it need to handle many users simultaneously, or both?

Feature / ScenarioPriorityWhy
Payment confirmationLatency + correctnessUser expects quick confirmation; correctness matters more than raw speed
Video start (YouTube)LatencyBuffering time directly affects user satisfaction โ€” every second of delay hurts
Amazon search resultsBothUser wants fast results; and millions search simultaneously during sales events
Send confirmation emailThroughput (async)Can be done in the background โ€” user doesn't wait for it; but system must handle millions per day
Live driver location (Uber)BothLow latency for smooth map updates; high throughput for millions of location events/second
Video encoding (YouTube upload)ThroughputEncoding can take minutes โ€” no user waiting; but system must encode thousands of videos/hour
WhatsApp message deliveryLatencyUser expects near-instant delivery โ€” a 5-second delay feels broken
Key rule: Real-time user-facing features need low latency. Background and batch operations need high throughput. Features serving many users simultaneously need both.

5.9 ๐Ÿ—๏ธ Real-world Examples

YouTube

FeaturePerformance PriorityHow YouTube Addresses It
Video starts playing quicklyLatencyCDN delivers video chunks from edge nodes near the user
Search returns results fastLatencySearch index cached; results served from pre-built indexes
Millions of concurrent streamsThroughputDistributed CDN edge servers worldwide; adaptive bitrate streaming
Video encoding after uploadThroughput (async)Encoding queue processes thousands of uploads per hour in background
Recommendations load quicklyLatencyPre-computed recommendations cached per user

Amazon

FeaturePerformance PriorityHow Amazon Addresses It
Search "laptop" returns resultsBothElasticSearch index + caching; millions searching simultaneously during sales
Product page loadsLatencyProduct data cached in ElastiCache; images served from CloudFront CDN
Checkout during Prime DayThroughputAuto-scaling, queue-based order processing, multiple database replicas
Payment processingLatency + correctnessUser expects quick confirmation; TCP/HTTPS, reliable services, retries

Uber

FeaturePerformance PriorityHow Uber Addresses It
Show nearby drivers on mapLatencyDriver locations cached in-memory; geospatial indexes for fast radius queries
Driver location updates (every second)BothStream processing pipeline; millions of location events per second
Match rider with driverLatencyReal-time matching algorithm with cached driver availability
Surge pricing calculationThroughputAggregates supply/demand from thousands of events per second in real time
Notice the pattern: Every large system has latency-critical features (things users wait for) and throughput-critical features (things that happen at massive scale). Knowing which is which is how you make the right architectural decisions.

5.10 โš–๏ธ Trade-offs

Improving latency and throughput can sometimes conflict. Understanding these trade-offs is essential for making the right system design decision.

TechniqueEffect on LatencyEffect on ThroughputWhen to Accept the Trade-off
Batching โ€” wait to collect 1,000 messages then process togetherโŒ Increases โ€” first message waits for the batch to fillโœ… Improves โ€” processing in bulk is more efficientBackground jobs, analytics, email digests โ€” not for real-time user requests
Compression โ€” gzip/Brotli encoding of responsesโœ… Often reduces โ€” less data to transferโš ๏ธ Mixed โ€” saves network bandwidth but adds CPU overhead; throughput may drop if CPU is overloadedLarge API responses, static assets; skip for tiny responses or CPU-bound systems
Strong consistency โ€” every write confirmed by multiple regions before respondingโŒ Increases โ€” must wait for all confirmations across regionsโŒ Reduces โ€” system spends more time per transactionFinancial transactions, critical data โ€” accept higher latency for correctness guarantees
Cachingโœ… Reduces โ€” serve from memory, avoid DB round-tripโœ… Improves โ€” DB gets fewer requests, can handle more trafficRead-heavy workloads with mostly stable data; avoid for data that changes very frequently
Golden rule: Identify the bottleneck first. Adding more app servers when the database is the bottleneck does not help. Profile before optimizing โ€” measure which component contributes most to latency, then fix that specific component.

5.11 ๐Ÿšซ Common Mistakes

#โŒ Common Mistakeโœ… The Reality
1Confusing latency with throughput โ€” "System is slow because it handles many requests"They are independent. A system can handle 100,000 RPS but still have slow individual responses. Always distinguish: "Is one request slow?" (latency) vs "Is the system overwhelmed?" (throughput)
2Only monitoring average latency โ€” "Average is 100ms, we are fine"Average hides tail latency. If p99 = 5 seconds and you serve 1M requests/day, that is 10,000 users getting 5-second responses daily. Always monitor p95 and p99.
3Adding more app servers without checking the bottleneckIf the database is the bottleneck, more app servers do nothing โ€” they all still wait for the same slow DB. Identify the bottleneck first, then fix it.
4Ignoring geography โ€” deploying everything in one regionA user in Japan connecting to a US server adds ~150ms of latency from network distance alone. Use CDN, regional deployment, and latency-based DNS routing.
5Making every task synchronous โ€” user waits for email, analytics, invoice generationNon-critical tasks should be async. User places order โ†’ confirm immediately โ†’ send email, update analytics, generate invoice in the background. Sync everything = slow user experience.

5.12 ๐Ÿ“ Summary

  • Latency = time for one request to complete. Throughput = how many requests the system handles per second. They are different dimensions.
  • Always measure percentile latency (p50, p95, p99), not just average. p99 can reveal thousands of users having a bad experience that averages hide.
  • Main causes of high latency: network distance, slow DB queries, too many service calls, server overload, large responses, cold starts.
  • Reduce latency with: caching, CDN, regional servers, DB indexing, async processing, parallel service calls, compression.
  • Improve throughput with: horizontal scaling, load balancing, caching, DB scaling (replicas/sharding), message queues, reducing per-request work.
  • Caching improves both latency and throughput โ€” it is one of the most powerful tools in system design.

5.13 ๐Ÿ‹๏ธ Design Challenge

๐Ÿ• Challenge: Food Delivery App Performance

For each feature below, decide whether Latency, Throughput, or Both are the primary concern. Then suggest one technique to improve that dimension:
  • User searches nearby restaurants
  • User places an order
  • System sends order confirmation email
  • Driver location updates every second
  • Payment processing
๐Ÿ‘๏ธ Show Answer
FeaturePriorityImprovement Technique
Search nearby restaurantsBothCache restaurant lists by area; use geospatial indexes for fast radius queries; auto-scale for dinner rush
Place an orderBothConfirm order quickly (respond in <500ms); use message queue to process order async; auto-scale during peak hours
Confirmation emailThroughput (async)Move to background queue โ€” user doesn't wait; process millions of emails per hour asynchronously via SQS + Lambda
Driver location updatesBothStream processing (Kafka/Kinesis); cache latest driver position in Redis; update map every 1-2 seconds for smooth UX
Payment processingLatency + correctnessUse dedicated payment service with SLA; correctness > speed โ€” user waits 1-2s for confirmed payment over risking data errors

5.14 โ˜๏ธ Cloud Service Mapping

Every major cloud platform has services specifically designed to reduce latency and increase throughput:

NeedAWS (Primary)GCPAzure
Reduce global latency (CDN)Amazon CloudFrontCloud CDNAzure Front Door / Azure CDN
Route users to nearest regionRoute 53 latency-based routingCloud DNS + Traffic DirectorAzure Traffic Manager
Cache application dataAmazon ElastiCache (Redis/Memcached)MemorystoreAzure Cache for Redis
Scale web/API serversEC2 Auto Scaling / ECS / App RunnerCloud Run / GKE autoscalingAzure Container Apps / AKS
Distribute traffic (throughput)Application Load BalancerCloud Load BalancingAzure Application Gateway
Scale database readsRDS / Aurora Read ReplicasCloud SQL Read Replicas / AlloyDBAzure SQL Geo-Replication
Absorb traffic spikes (queue)Amazon SQS / KinesisPub/SubAzure Service Bus / Event Hubs
Monitor latency & throughputCloudWatch + X-Ray (distributed tracing)Cloud Monitoring / Cloud TraceAzure Monitor / Application Insights
AWS mental model for performance: Reduce latency โ†’ CloudFront (global) + ElastiCache (data) + Route 53 latency routing (regional). Increase throughput โ†’ Auto Scaling + Load Balancer + SQS (absorb spikes) + Read Replicas (database). Monitor both โ†’ CloudWatch metrics + X-Ray traces.

๐Ÿ“š References

  • Computer Networking: A Top-Down Approach โ€” Kurose & Ross โ€” The standard university textbook covering client-server, DNS, HTTP, TCP/UDP, and all networking fundamentals in this post.
  • System Design Interview (Vol. 1) โ€” Alex Xu โ€” Practical system design explanations with networking as the foundation for every concept.
  • Cloudflare Learning Center โ€” Beginner-friendly explanations of DNS, IP addressing, HTTP, TCP/UDP, and CDN โ€” highly recommended for visual learners.
  • MDN Web Docs โ€” HTTP โ€” Comprehensive reference for HTTP/HTTPS methods, status codes, headers, and the request-response cycle.
  • High Scalability โ€” Real-world architecture breakdowns showing how companies use DNS, CDNs, and networking at scale.