DevOps Interview Help: Top 50 Questions with AI-Powered Prep Strategy
The 50 DevOps interview questions that actually get asked — covering CI/CD, Kubernetes, Docker, Terraform, and cloud platforms — plus how AI tools help you prepare.
What DevOps Interviews Actually Test
DevOps interviews in 2026 go far beyond "do you know Docker?" Companies want engineers who understand the entire delivery pipeline — from code commit to production monitoring. They test your ability to design CI/CD workflows, troubleshoot infrastructure issues, manage containerized environments, and automate everything in between.
The breadth is the challenge. You might get asked about Kubernetes networking, Terraform state management, GitHub Actions workflows, AWS IAM policies, and incident response — all in the same interview. No one memorizes all of this. That is where AI interview assistants like Craqly become valuable — they provide real-time reminders of the specific details you need.
CI/CD Pipeline Questions (1-10)
1. Describe your ideal CI/CD pipeline from commit to production.
What they want: A complete picture — source control hooks, build steps, testing stages, artifact management, deployment strategies, and rollback mechanisms. Mention specific tools you have used.
2. How do you implement blue-green deployments?
Key points: Two identical environments, router/load balancer switching, database migration considerations, instant rollback capability by switching back.
3. What is the difference between canary and rolling deployments?
Key points: Canary routes a small percentage of traffic first; rolling updates instances gradually. Canary is better for catching issues early; rolling is simpler to implement.
4. How do you handle database migrations in a CI/CD pipeline?
Key points: Backward-compatible migrations, separate migration step before deployment, rollback scripts, feature flags for schema changes.
5. What is GitOps and how does it differ from traditional CI/CD?
Key points: Git as single source of truth for infrastructure and deployment, pull-based reconciliation (ArgoCD, Flux), declarative over imperative.
6. How do you manage secrets in your CI/CD pipeline?
Key points: Never in code, vault solutions (HashiCorp Vault, AWS Secrets Manager), environment variable injection at runtime, secret rotation.
7. Explain how you would set up a multi-stage Docker build.
Key points: Build stage with dependencies, production stage with only runtime requirements, reducing image size, security benefits of minimal images.
8. How do you handle failed deployments?
Key points: Automated rollback triggers, health check failures, circuit breakers, communication protocols for incidents.
9. What metrics do you track for your CI/CD pipeline?
Key points: Build time, deployment frequency, lead time for changes, change failure rate, MTTR (the four DORA metrics).
10. How do you implement infrastructure testing?
Key points: Terratest for Terraform, InSpec for compliance, container scanning, integration tests against ephemeral environments.
Kubernetes and Container Orchestration (11-20)
11. Explain Kubernetes networking — how does a request reach a pod?
Key points: Service discovery, kube-proxy, iptables/IPVS rules, ClusterIP vs NodePort vs LoadBalancer, Ingress controllers.
12. How do you handle persistent storage in Kubernetes?
Key points: PersistentVolumes, PersistentVolumeClaims, StorageClasses, dynamic provisioning, StatefulSets for stateful workloads.
13. What is the difference between a Deployment and a StatefulSet?
Key points: Deployments for stateless apps (interchangeable pods), StatefulSets for stateful (stable network identity, ordered deployment, persistent storage).
14. How do you implement horizontal pod autoscaling?
Key points: HPA resource, metrics server, CPU/memory targets, custom metrics with Prometheus adapter, cooldown periods.
15. Explain Kubernetes RBAC.
Key points: Roles and ClusterRoles define permissions, RoleBindings and ClusterRoleBindings assign them to users/service accounts, principle of least privilege.
16. How do you troubleshoot a pod stuck in CrashLoopBackOff?
Key points: kubectl describe pod, kubectl logs (including --previous), check resource limits, readiness/liveness probe configuration, init containers.
17. What are Kubernetes Operators and when would you use one?
Key points: Custom controllers that extend Kubernetes API, encode operational knowledge, useful for complex stateful applications (databases, message queues).
18. How do you manage Kubernetes configuration across environments?
Key points: Kustomize overlays, Helm values files, environment-specific ConfigMaps/Secrets, GitOps with directory structures per environment.
19. Explain the Kubernetes scheduler — how does it decide where to place a pod?
Key points: Filtering (node selectors, taints/tolerations, resource requests), scoring (affinity/anti-affinity, resource utilization), custom schedulers.
20. How do you implement zero-downtime deployments in Kubernetes?
Key points: Rolling update strategy, readiness probes, PodDisruptionBudgets, preStop hooks for graceful shutdown, connection draining.
Infrastructure as Code (21-30)
21. Terraform state — what is it and how do you manage it?
Key points: State tracks resource mappings, remote backends (S3+DynamoDB for locking), state locking, workspace separation, import for existing resources.
22. How do you handle Terraform state drift?
Key points: terraform plan to detect, terraform refresh, automated drift detection in CI, policies to prevent manual changes.
23. Terraform modules — best practices?
Key points: DRY infrastructure code, versioned modules, input variables with validation, outputs for cross-module references, module registry.
24. Compare Terraform and Pulumi.
Key points: Terraform uses HCL (declarative), Pulumi uses general-purpose languages (imperative/declarative). Pulumi offers more flexibility; Terraform has larger ecosystem.
25. How do you test infrastructure code?
Key points: terraform validate, tflint, Terratest for integration tests, plan output review in PR, policy-as-code (OPA/Sentinel).
Cloud Platform Questions (26-35)
26. Design a highly available architecture on AWS.
Key points: Multi-AZ deployment, auto scaling groups, RDS Multi-AZ, CloudFront CDN, Route53 health checks, cross-region replication for DR.
27. Explain AWS IAM best practices.
Key points: Least privilege, role-based access, no long-lived credentials, MFA enforcement, SCPs for organizational boundaries, regular access reviews.
28. How do you optimize cloud costs?
Key points: Right-sizing instances, Reserved Instances/Savings Plans, spot instances for fault-tolerant workloads, S3 lifecycle policies, unused resource cleanup.
29. Explain the shared responsibility model.
Key points: Cloud provider manages infrastructure security; customer manages data, identity, application, and OS-level security. Varies by service type (IaaS vs PaaS vs SaaS).
30. How do you implement multi-cloud strategy?
Key points: Terraform for cloud-agnostic IaC, Kubernetes for portable workloads, abstraction layers for cloud-specific services, cost/complexity trade-offs.
Monitoring, Logging, and Incident Response (31-40)
31. Describe your monitoring stack.
Key points: Metrics (Prometheus/Datadog), logs (ELK/Loki), traces (Jaeger/OpenTelemetry), alerting (PagerDuty/Opsgenie), dashboards (Grafana).
32. What is observability and how is it different from monitoring?
Key points: Monitoring checks known failure modes; observability lets you ask arbitrary questions about system behavior. Three pillars: metrics, logs, traces.
33. How do you set up meaningful alerts?
Key points: Alert on symptoms not causes, SLO-based alerting, avoid alert fatigue, runbooks for every alert, escalation policies.
34. Walk me through an incident response process.
Key points: Detection, triage, communication, mitigation, root cause analysis, postmortem, action items. Roles: incident commander, communications lead.
35. How do you implement SLOs/SLAs?
Key points: Define SLIs (latency, error rate, throughput), set SLOs as targets, error budgets for balancing reliability with velocity.
Security and Networking (36-45)
36-40. Common topics:
Network policies in Kubernetes, service mesh (Istio/Linkerd), mTLS, container image scanning, supply chain security (Sigstore/Cosign).
Behavioral and Culture (41-50)
41-45. Common behavioral questions:
"Tell me about a production incident you handled." "How do you balance speed and reliability?" "Describe a time you automated a painful manual process."
46-50. Culture fit questions:
"How do you handle on-call?" "What is your approach to documentation?" "How do you mentor junior engineers?" "How do you push back on unrealistic timelines?"
Using AI to Prepare for DevOps Interviews
DevOps interviews cover an enormous surface area. Nobody memorizes every Kubernetes flag, every Terraform resource type, and every AWS service. In real work, you look things up. In interviews, you are expected to recall everything from memory.
Craqly's AI interview assistant bridges this gap by providing real-time suggestions during your interview. When the interviewer asks about Kubernetes networking, the AI reminds you of the key components (kube-proxy, iptables, CNI plugins). When they ask about Terraform state management, it suggests the right patterns (remote backend, state locking, workspaces).
Combined with genuine preparation and hands-on experience, AI assistance lets you demonstrate your real knowledge without the anxiety of forgetting specific details under pressure.
Comments
Leave a comment
No comments yet. Be the first to share your thoughts!
Related Articles
SRE Interview Help: Top Questions on Reliability Engineering
Real SRE interview questions covering SLOs, error budgets, incident management, capacity planning, and toil reduction — with answer guidance from engineers who have lived through production outages.
Read moreFull Stack Developer Interview Help: Frontend, Backend, and Everything Between
The full stack interview covers everything from React hooks to database indexing. Here are the questions that actually come up, with practical answer guidance for each.
Read moreQA Engineer Interview Help: Testing and Automation Questions
The most common QA engineer interview questions on manual testing, automation frameworks, API testing, and CI/CD — with practical answer guidance for each.
Read more