Docker & Containers

Containers are the unit that modern infrastructure is built on — they're what Kubernetes orchestrates, what CI pipelines build, and how microservices ship. Docker made them mainstream by turning "package an app with everything it needs to run" into a one-command workflow. The crucial thing to understand is that a container is not a lightweight virtual machine: it's an isolated process on a shared kernel, and that distinction explains both its speed and its limits.

⚡ Quick Takeaways

A container packages an app + all its dependencies into a portable image that runs identically on any host — killing "works on my machine."
It's an isolated process, not a VM — Linux namespaces give it its own view of the system; cgroups cap its CPU/memory. It shares the host kernel.
vs VMs: no guest OS, so containers are MBs not GBs, start in milliseconds, and pack densely — at the cost of weaker isolation.
Images are layered & read-only — each Dockerfile instruction is a cached layer; layers are shared across images to save space and build time.
The Dockerfile is the reproducible build recipe; registries (Docker Hub, ECR) store and distribute images.
Containers are ephemeral & immutable — state goes in volumes/external stores; you replace, not patch, a running container.

tldr

A container bundles an application with its dependencies into an immutable image that runs the same everywhere. Under the hood it's just a host process isolated by Linux namespaces (its own filesystem, network, PID space) and constrained by cgroups (CPU/memory limits), sharing the host kernel — which is why it's far lighter than a VM. Images are built from a Dockerfile as cached, shareable layers and distributed via registries. Keep containers stateless and immutable; persist data outside.

The Problem: "Works on My Machine"

Before containers, deploying software meant reproducing its environment — the right language runtime, library versions, system packages, config — on every machine. Tiny differences between a developer's laptop, CI, and production caused the infamous "but it works on my machine." Containers solve this by packaging the app together with its entire userland environment into one artifact, so the thing you test is byte-for-byte the thing you run in production.

What a Container Actually Is

A container is a normal Linux process that's been given an isolated view of the system using two kernel features:

Namespaces provide isolation — a container gets its own filesystem mount, network stack, process IDs (PID namespace, so its "PID 1" is its main process), hostname, and users. It can't see the host's other processes or files.
cgroups (control groups) provide resource limits — capping how much CPU, memory, and I/O the container can use, so one container can't starve the others.

Crucially, all containers on a host share that host's kernel. There's no guest operating system inside a container — just your app and its userland files running as isolated, constrained host processes. That single fact is the source of every container-vs-VM trade-off.

Containers vs Virtual Machines

Aspect	Container	Virtual Machine
Isolation unit	Process (namespaces + cgroups)	Full guest OS on a hypervisor
Kernel	Shares the host kernel	Own kernel per VM
Size	MBs	GBs
Startup	Milliseconds	Seconds to minutes
Density	Hundreds per host	A handful per host
Isolation strength	Weaker (shared kernel)	Stronger (hardware-level)

The takeaway: containers win on weight, speed, and density (perfect for many small services), while VMs win on isolation (stronger security boundary). In practice they're layered — containers often run inside VMs in the cloud to get both.

Images and Layers

A container image is the immutable, packaged filesystem + metadata an instance runs from. Its defining feature is that it's built in layers: each is a read-only filesystem diff, stacked via a union filesystem into the final root filesystem. Layers are content-addressed and shared — if ten images are built on the same base layer, that layer is stored once and reused. When a container runs, a thin writable layer is added on top (copy-on-write), so the underlying image stays immutable.

layered image + writable container layer

┌─────────────────────────────┐  ← writable layer (per container, ephemeral)
├─────────────────────────────┤
│ COPY app/        (layer 4)  │  ┐
│ RUN npm install  (layer 3)  │  │ read-only image layers,
│ COPY package.json(layer 2)  │  │ cached + shared across images
│ FROM node:20     (layer 1)  │  ┘
└─────────────────────────────┘

  change app code → only layer 4 rebuilds; layers 1–3 reused from cache

This layering is why builds are fast (unchanged layers are cached) and why ordering Dockerfile instructions matters — put rarely-changing steps (installing dependencies) before frequently-changing ones (copying source) so the cache survives most edits.

The Dockerfile

An image is built from a Dockerfile — a declarative recipe where each instruction produces a layer:

Dockerfile

FROM node:20-alpine          # small base image
WORKDIR /app
COPY package*.json ./        # copy deps first → cache-friendly
RUN npm ci --omit=dev
COPY . .                      # then source (changes often)
EXPOSE 3000
CMD ["node", "server.js"]    # the process the container runs

A multi-stage build (a build stage that compiles, then a slim runtime stage that copies only the artifacts) keeps the final image small by leaving build tools behind — a key practice for lean, secure images.

Registries

Images are distributed through a registry — a repository you push built images to and pull them from (Docker Hub, AWS ECR, GitHub Container Registry, etc.). Images are tagged (e.g. myapp:1.4.2), and because layers are content-addressed, a pull only downloads layers the host doesn't already have. The registry is the handoff point between your CI/CD pipeline (which builds and pushes) and your orchestrator (which pulls and runs).

State, Networking, and the Runtime

Two operational facts matter. First, a container's writable layer is ephemeral — it vanishes when the container is removed, so persistent data goes in volumes (mounts backed by host or networked storage) or external services (a database, object storage), never inside the container. Second, each container gets its own network namespace; Docker wires them together with virtual networks and maps ports to the host. The Docker engine (or another runtime like containerd) is what turns an image into a running, isolated process.

Why Containers Won

Portability — the same image runs on a laptop, CI, and prod, identically.
Density & speed — millisecond startup and hundreds per host make them ideal for microservices and autoscaling.
Immutable deploys — you ship a versioned image and roll back by switching tags, not by mutating servers.
Foundation for orchestration — uniform, self-contained units are exactly what Kubernetes needs to schedule, scale, and heal.

Pitfalls

Image bloat — fat base images and leftover build tools make images huge and slow to pull; use slim bases and multi-stage builds.
Treating containers as VMs — don't SSH in and mutate them or run an init system with many processes; ideally one main process per container.
State inside the container — anything written to the container fs is lost on restart; use volumes.
Security — shared kernel means weaker isolation; avoid running as root, scan images for CVEs, and don't bake secrets into layers (they persist in the image).

takeaway

A container is an isolated, resource-capped process sharing the host kernel — not a VM — which is why it's tiny, instant, and dense. Images are immutable, layered, and shareable, built reproducibly from a Dockerfile and distributed via registries. Keep them stateless and immutable, and you get portable, roll-backable deploys that orchestrators like Kubernetes can manage at scale.

🎯 interview hot-takes

Container vs VM? A container is an isolated process sharing the host kernel (namespaces + cgroups); a VM runs a full guest OS on a hypervisor. Containers are MBs and start in ms; VMs are GBs and isolate more strongly.
What actually isolates a container? Linux namespaces (filesystem, network, PID view) plus cgroups (CPU/memory limits) — no guest kernel.
Why are image layers useful? Each Dockerfile step is a cached, content-addressed layer shared across images — fast builds and small storage; order steps least-changing first.
Where does container state go? Not in the ephemeral writable layer — in volumes or external stores, since containers are immutable and replaceable.
Why did containers win for microservices? Portability, millisecond startup, high density, immutable versioned deploys, and being the perfect unit for orchestration.