Container interviews have a specific shape. The first ten minutes are trivia, testing whether you know the vocabulary. The next thirty minutes are scenarios, testing whether you know what to do when something breaks. Most candidates prepare for the first part and get caught off-guard by the second.
This post focuses on Docker and Kubernetes specifically. Not the full DevOps breadth, just containers and orchestration, because that’s often its own interview block, especially at companies running significant workloads on AWS EKS, GKE, or self-managed clusters.
Docker: the questions and what good answers look like
The fundamentals come up in every interview. Know them cold, but know them at depth, not just definitionally.
Image vs. container: An image is an immutable, layered filesystem. A container is a running instance of that image with its own writable layer on top. The layered model is why containers start fast and why layer caching in builds matters. A shallow answer here is “an image is like a template.” A full answer explains the union filesystem.
Multi-stage builds: You use a build stage with a full SDK image (say, golang:1.22) to compile your binary, then copy only the binary into a minimal final image (alpine or scratch). The build tools never end up in production. Final image size drops from several hundred MB to tens of MB. This isn’t optional anymore at serious shops, it’s table stakes.
CMD vs. ENTRYPOINT: ENTRYPOINT sets the executable; CMD sets default arguments. When you run docker run myimage --verbose, the --verbose overrides CMD but not ENTRYPOINT. Most candidates can say this. Fewer can explain why you’d use the exec form (["command", "arg"]) instead of shell form (command arg) and what signal handling implications follow.
Non-root users: Running as root inside a container is a real security problem if the container runtime is misconfigured or if there’s a container escape. You add a USER directive in the Dockerfile and drop privileges. Many interviewers consider this a baseline now.
Vulnerability scanning: Trivy, Snyk, Docker Scout. Know at least one well enough to describe integrating it into a CI pipeline. The scenario question is usually “your security team says no unscanned images go to production, how do you enforce that in CI?”
The Dockerfile optimization question
This comes up in some form at almost every interview. The setup is usually: “here’s a Dockerfile, what would you change?” Common problems they embed in it:
- COPY of everything before dependencies install (breaks layer cache on every code change)
- No multi-stage build (dev dependencies in production image)
- Running as root
- Using
latesttag for base image - Missing
.dockerignorecausing node_modules or .git to get copied
The Stack Overflow Developer Survey 2024 found Docker is used by 59% of professional developers. You’re going to see it in interviews for a long time.
Kubernetes: trivia vs. operational questions
Trivia questions have right answers. Operational questions have answers that depend on context. Interviewers use both, but the operational questions are what actually matter for the hire decision.
Trivia examples:
- What’s the difference between a Deployment and a StatefulSet?
- What does a Service of type ClusterIP do?
- What’s the difference between a liveness probe and a readiness probe?
Operational examples:
- “A pod is in
CrashLoopBackOff. Walk me through your process.” - “You need to deploy a new version with zero downtime and the ability to roll back in under two minutes. How do you design that?”
- “Your HPA isn’t scaling up even though CPU is high. What do you check?”
For the CrashLoopBackOff question: start with kubectl describe pod to see events, then kubectl logs --previous to see what the last container printed before it died. Check resource limits (OOMKilled is common and shows up in the events). Verify the image tag exists in the registry. Check init containers if any. That sequence, stated clearly and in order, is what the interviewer wants to hear.
RBAC: the question that separates mid-level from senior
Kubernetes RBAC confuses people who haven’t had to debug it. The model:
- A Role grants permissions within a single namespace. A ClusterRole grants permissions cluster-wide (or to non-namespaced resources like nodes).
- A RoleBinding attaches a Role or ClusterRole to a subject within a namespace. A ClusterRoleBinding attaches a ClusterRole cluster-wide.
- You can bind a ClusterRole with a RoleBinding to limit its scope to a specific namespace. This is useful and often overlooked.
The scenario question is usually about a service account that can’t access something it should. You debug this with kubectl auth can-i and by inspecting the RoleBindings in the relevant namespace.
Networking in Kubernetes
This is the topic I’d bet most candidates are weakest on, myself included when I first started interviewing for these roles. Kubernetes networking has a lot of moving parts and several of them are CNI-plugin-specific.
The basics you need: every pod gets its own IP. Pods can talk to each other directly (no NAT) by default. Services are stable virtual IPs backed by kube-proxy. DNS resolution in-cluster follows the pattern service.namespace.svc.cluster.local.
NetworkPolicy is the access control layer. Without any policies, all pod-to-pod traffic is allowed. Adding a NetworkPolicy that selects a pod restricts ingress to only the pods/namespaces specified in the policy. The BLS notes that security roles in infrastructure are growing faster than most other tech categories, which means more interviews are including security-in-depth questions like NetworkPolicy design.
Deployment strategies: the scenario most candidates fumble
Rolling update, blue-green, canary. Know the mechanics of each and the failure mode of each.
Rolling update: Kubernetes does this by default. It replaces pods incrementally. The gotcha is that during the rollout, both old and new versions are serving traffic. If your migration introduces a breaking API change, rolling update is the wrong strategy.
Blue-green: you run two full environments and flip traffic. Zero overlap period, so breaking changes are safer. Cost is doubled infrastructure for the duration.
Canary: you shift a small percentage of traffic to the new version first. Requires an ingress controller or service mesh that supports traffic splitting (NGINX with annotations, Istio, Flagger). More complex to set up, but the right choice for high-risk changes at scale.
If you’ve never implemented a canary deployment end-to-end, that’s the one to actually build in a lab environment before interviewing. Talking through theory is noticeably different from explaining it as someone who debugged a Flagger rollout at 2am.