Orchestration

Kubernetes

A practical overview of the core Kubernetes objects that run applications on a cluster.

Kubernetes is a control system for running containerized applications at scale. You describe the desired state , which workloads should run, how many replicas, where traffic should go , and Kubernetes continuously reconciles the actual state of the cluster to match it. This model was created to solve the operational problems that arise when many teams run many services across many machines: manual restarts, fragile deployments, inconsistent configuration, and unpredictable recovery from failure.

Learning objectives

What you should be able to do after reading.
  • Explain the role of the main workload and networking objects in a Kubernetes cluster.
  • Describe how Pods, Deployments, and Services work together to keep an application running.
  • Recognize where configuration and routing usually live in the cluster.

At a glance

Fast mental model before you dive in.
Core workload objects
  • Pods
  • Deployments
  • Namespaces
Access and traffic
  • Services
  • Ingress
  • Network paths
Configuration
  • ConfigMaps
  • Secrets
  • Declarative updates

Core idea

Kubernetes is built around a reconciliation loop. Every component in the system watches for a gap between the desired state (what the API server has recorded) and the actual state (what is running on the nodes). When a gap appears, a Pod crashes, a new Deployment is created, a node goes offline, the relevant controller creates, deletes, or adjusts workloads until the actual state matches the desired state again.

The control plane stores all desired state in etcd, a distributed key-value store. The API server is the single entry point for all reads and writes to that state. Controllers run in a loop reading from the API server and issuing instructions to worker nodes through the kubelet. This separation of concerns, declaring intent in the API, executing intent on nodes, is what makes Kubernetes both powerful and complex to reason about.

A Deployment is a good example of how this works in practice. You declare 'I want three replicas of this container image running at all times.' The Deployment controller creates a ReplicaSet, which in turn creates three Pods. If one Pod crashes, the ReplicaSet notices there are only two actual replicas and creates a new one. If you update the image version, the Deployment controller performs a rolling replacement, creating new Pods with the new image and deleting old ones in a controlled sequence.

This model makes environments more reproducible and recovery more automatic, but it also means that the cluster depends on clear, disciplined object definitions. Workloads that drift from their declared state, that use manually patched configurations, or that depend on cluster state outside version control are harder to operate and harder to debug when something goes wrong.

Operational model

  • Pods are the smallest runnable unit, they group containers that must run together and share a network namespace and storage volumes.
  • Deployments manage the desired count and rollout behavior of Pod replicas, handling both scaling and updates.
  • Services provide a stable network identity for a set of Pods, abstracting away the fact that individual Pods are created and destroyed continuously.
  • ConfigMaps and Secrets separate environment-specific and sensitive configuration from the workload image itself.

Baseline

  • Use namespaces to organize workloads by team, environment, or function and to scope access control and resource quotas.
  • Keep all workload definitions in version control and deploy through a controlled process rather than applying kubectl commands directly in production.
  • Use liveness and readiness probes on every workload so the cluster can detect and replace unhealthy containers automatically.
  • Treat the Kubernetes API as a privileged surface. Restrict who and what can talk to it, and review changes to RBAC carefully.

Signals to watch for

Patterns worth investigating further.
  • Workloads that depend on manually created resources outside version control.
  • Services or Ingress rules that expose more traffic than the application needs.
  • Pods that change behavior across environments because configuration is inconsistent.

DEEP DIVE

Pods

A Pod is the atomic unit of scheduling in Kubernetes. It can hold one or more containers, but those containers are co-located on the same node, share the same network namespace (they communicate on localhost and share a single IP address), and can share storage volumes. The most common case is one container per Pod, multiple containers in a Pod are used when the containers are tightly coupled and must always run together, such as a main application and a sidecar that handles logging or traffic proxying.

The kubelet on each node is responsible for starting and monitoring the containers in Pods scheduled to that node. It communicates with the container runtime (Docker, containerd, CRI-O) to create containers and monitors their health via the probes defined in the Pod spec. If a container fails its liveness probe repeatedly, the kubelet restarts it. If it fails its readiness probe, the endpoint controller removes the Pod's IP from the Service endpoints, stopping new traffic from being sent to it.

Pods are designed to be ephemeral and replaceable, not long-lived and hand-managed. A Pod that is killed by the scheduler (due to node resource pressure), evicted, or simply deleted is gone permanently. Its writable container filesystem is discarded. Any persistent state must be stored in a volume backed by external storage. This design requires applications to be stateless or to externalize state explicitly, which is a major architectural consideration when containerizing legacy applications.

A common mistake is to try to SSH into a Pod and make changes directly to fix a problem. This creates drift. The running state no longer matches the declared state, and the fix will disappear when the Pod is replaced. The right approach is to update the workload definition, redeploy, and let the reconciliation loop apply the fix to all replicas consistently.

Deployments

A Deployment is the standard way to run stateless application workloads in Kubernetes. It declares the desired number of Pod replicas, the Pod template (which image, environment variables, resource limits, probes), and the update strategy. The Deployment controller continuously watches actual replica counts and creates or deletes Pods to match the desired count, making the application self-healing without operator intervention.

Kubernetes supports two rollout strategies in Deployments. RollingUpdate (the default) replaces Pods incrementally. It creates a new Pod with the new image, waits for it to become ready, then terminates an old Pod, and repeats until all replicas are updated. This maintains availability throughout the update. Recreate terminates all existing Pods before creating new ones. It causes a brief outage but is simpler for applications that cannot run two versions simultaneously.

Readiness probes are critical to rolling updates. The rollout controller waits for a new Pod to pass its readiness probe before proceeding to terminate an old Pod. If the probe never passes (because the new image has a startup bug, for example), the rollout stalls rather than replacing healthy Pods with unhealthy ones. This automatic stall behavior is a safety net, but only if the probe is configured and tuned correctly.

Deployments maintain a history of previous ReplicaSets, which enables rollback. Running 'kubectl rollout undo deployment/myapp' tells the Deployment controller to restore the previous ReplicaSet's Pod template. The rollback is itself a rolling update. It follows the same controlled replacement process as a normal update. This makes rollback predictable and low-risk, as long as the previous image and configuration are still valid.

Services

A Service provides a stable network identity for a dynamic set of Pods. Pods are created and destroyed continuously. Their IP addresses change every time they are replaced. A Service solves the discovery problem. It maintains a consistent cluster-internal IP and DNS name that clients can connect to, while the Endpoints object underneath is updated automatically as Pods come and go.

Kubernetes supports several Service types. ClusterIP (the default) makes the Service reachable only from within the cluster. Useful for internal APIs and databases. NodePort exposes the Service on a high port on every node in the cluster, which works for simple external access but is rarely used in production. LoadBalancer provisions an external load balancer from the cloud provider, giving the Service a public IP address. ExternalName maps the Service to an external DNS name without any proxying.

Traffic routing in Services is implemented by kube-proxy, which runs on every node and maintains a set of iptables or IPVS rules. When a request is made to the Service IP, kube-proxy routes it to one of the currently healthy Pods behind the Service using simple load balancing. This is not a full-featured load balancer, it has no understanding of HTTP, no session affinity by default, and no health-based routing, but it is sufficient for the majority of internal service communication.

A key concept that confuses newcomers. You do not need to know the IP addresses of Pods to connect services together. The Service's DNS name (servicename.namespace.svc.cluster.local, or just servicename within the same namespace) resolves to the Service's stable ClusterIP. The application connects to the name, and Kubernetes handles the rest. This is why hardcoding Pod IPs in configuration files is always wrong.

Ingress

An Ingress is a set of routing rules that directs external HTTP and HTTPS traffic into Services inside the cluster. While a Service of type LoadBalancer gives one Service its own external IP, Ingress allows many Services to share a single external entry point by routing based on hostname, URL path, or other HTTP attributes. This is the standard way to expose web applications publicly in Kubernetes.

An Ingress object by itself does nothing. It requires an Ingress controller to interpret and implement the routing rules. Common Ingress controllers include nginx-ingress, Traefik, HAProxy Ingress, and cloud-native options like the AWS Load Balancer Controller or Google Cloud's GKE Ingress controller. The choice of controller affects available features, performance characteristics, and how advanced routing rules are expressed.

TLS termination is a common Ingress use case. The Ingress controller accepts HTTPS connections, terminates TLS using a certificate stored in a Kubernetes Secret, and forwards plain HTTP to the backend Service. Certificate management is typically handled by cert-manager, which automates the issuance and renewal of TLS certificates from Let's Encrypt or internal CAs.

A common confusion is that Ingress operates at Layer 7 (HTTP), while Services operate at Layer 4 (TCP/UDP). Ingress can route by hostname and URL path. Services can only route by destination port. For non-HTTP protocols that need external access, a LoadBalancer Service is usually the right choice, not an Ingress.

Namespaces

Namespaces divide a Kubernetes cluster into logical segments. Most Kubernetes objects, Pods, Services, Deployments, ConfigMaps, Secrets, are namespace-scoped, meaning that names only need to be unique within a namespace and that RBAC, ResourceQuotas, and LimitRanges can be applied per namespace. This makes namespaces the primary organizational unit for multi-team and multi-environment clusters.

Common namespace patterns include one namespace per team (all of team A's workloads in namespace team-a), one namespace per environment (dev, staging, production in separate namespaces), or a combination. The right pattern depends on the cluster's purpose, a shared development cluster benefits from team-scoped namespaces, a dedicated production cluster might use application-scoped namespaces.

Namespaces are an organizational and access control boundary, but they are not a security isolation boundary by themselves. Without NetworkPolicies, Pods in different namespaces can still communicate with each other freely. Without RBAC enforcement, a user with cluster-level permissions can read Secrets from any namespace. Namespaces reduce accidental collision but do not provide the same isolation as separate clusters for workloads with strong security separation requirements.

A common mistake is treating the default namespace as a valid operational choice. The default namespace has no special protections and typically has the most relaxed defaults. Production workloads should always run in named, explicitly configured namespaces with appropriate resource quotas, network policies, and RBAC.

ConfigMaps

A ConfigMap stores non-sensitive configuration data as key-value pairs, decoupling environment-specific settings from the container image. This allows the same image to run in development, staging, and production with different database hostnames, feature flag values, or logging levels. Without rebuilding the image for each environment.

ConfigMaps can be consumed by Pods in three ways, as environment variables (the value of a key is injected as an env var in the container), as command-line arguments (referenced in the container spec), or as mounted files (the ConfigMap data is presented as a directory of files inside the container). Mounted files are the most flexible and are preferred when configuration is complex enough to warrant a config file rather than individual env vars.

An important operational detail environment variable injection from a ConfigMap happens at Pod creation time. If the ConfigMap is updated after the Pod starts, the environment variables in the running container do not change. To pick up the new values, the Pod must be restarted. Mounted volumes, on the other hand, are eventually updated in place. Kubernetes syncs the volume contents within about a minute of a ConfigMap change, without restarting the Pod.

ConfigMaps should not be used for sensitive values. They are stored in plaintext in etcd and are readable by any subject with 'get' or 'list' permission on ConfigMaps in the namespace. Sensitive values, passwords, API keys, certificates, connection strings, belong in Secrets, which have a separate access control path and, when properly configured, are encrypted at rest in etcd.

Secrets

Kubernetes Secrets store sensitive values such as passwords, tokens, TLS certificates, and API keys. The object type is separate from ConfigMaps specifically to allow different RBAC policies. You can grant access to read ConfigMaps without granting access to read Secrets. However, the type alone does not make secrets secure, several additional controls are required to use them safely.

The most important misconception about Secrets is that base64 encoding is encryption. It is not. By default, Secret values are stored in etcd in base64-encoded plaintext. Anyone with read access to etcd has access to all Secrets in the cluster. Encryption at rest requires explicitly enabling etcd encryption in the API server configuration, which many cluster operators do not configure by default.

For workloads that need sensitive values, the choices range from native Kubernetes Secrets (convenient but limited security) to external secrets operators (External Secrets Operator, Sealed Secrets, Vault Agent Injector) that fetch secrets from an external secret store like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault at Pod startup. External secrets operators provide audit trails, access control, and automatic rotation that native Secrets lack.

A common security oversight is allowing ServiceAccounts to auto-mount their token into Pods that don't need API access. Every Pod in Kubernetes gets a ServiceAccount token mounted at /var/run/secrets/kubernetes.io/serviceaccount by default. This token can be used to authenticate to the Kubernetes API. Workloads that don't need to talk to the Kubernetes API should disable this auto-mounting (automountServiceAccountToken: false in the Pod spec or ServiceAccount spec) to limit what an attacker can do with a compromised container.

How workloads run

When a Deployment is created or updated, the following sequence occurs. The API server stores the Deployment object in etcd, the Deployment controller notices the new or changed object and creates or updates a ReplicaSet, the ReplicaSet controller creates Pod objects, the Scheduler assigns each unscheduled Pod to a node based on resource availability, affinity rules, and taints/tolerations, the kubelet on the chosen node reads the Pod spec and instructs the container runtime (containerd, CRI-O) to start the containers.

Health checks control what happens to containers after they start. A liveness probe detects when a container is stuck in a broken state that it cannot recover from, when it fails repeatedly, the kubelet restarts the container. A readiness probe detects when a container is not yet ready to accept traffic, when it fails, the Pod's IP is removed from Service endpoints and no new requests are routed to it. A startup probe gives slow-starting containers extra time to initialize before liveness checks begin.

When a container fails its liveness probe and is restarted, Kubernetes applies an exponential backoff between restart attempts, starting at 10 seconds, then 20 seconds, 40 seconds, and so on up to a maximum of 5 minutes. A container that keeps crashing enters CrashLoopBackOff state, which is visible in 'kubectl get pods'. CrashLoopBackOff is not a Kubernetes bug. It is a signal that the container itself is failing, and the root cause is in the application logs or the container configuration.

Resource requests and limits are part of how the scheduler places workloads and how the system manages overcommit. A request (resources.requests.cpu/memory) is what the scheduler uses to decide which nodes have capacity. A limit (resources.limits.cpu/memory) is what the node enforces at runtime. A Pod that exceeds its memory limit is OOMKilled immediately. A Pod that exceeds its CPU limit is throttled. Setting requests and limits correctly is important for both cluster stability and application performance.