Kubernetes for Developers: What You Actually Need to Know

Contributor
Jun 3, 2025
5 min read

The previous posts in this path covered automation that does not become technical debt, monitoring automation, and scaling automation across teams. This post covers the orchestration platform that many of those automations run on: Kubernetes — specifically, the concepts that matter to developers whose applications run on Kubernetes, even if they never configure the cluster themselves.

Kubernetes (K8s) has a reputation for being complex. That reputation is earned — the full Kubernetes ecosystem includes hundreds of concepts, resources, and tools. But as a developer, you do not need to understand all of them. You need to understand the concepts that affect how your application runs: pods, deployments, services, health checks, resource limits, and configuration. These concepts determine whether your application starts correctly, handles traffic reliably, recovers from failures, and scales under load.

The Mental Model

Kubernetes manages containers at scale. You tell Kubernetes what you want (3 copies of my application running, accessible on port 8080, with these environment variables), and Kubernetes makes it happen — scheduling containers on available nodes, restarting them when they crash, scaling them when demand increases, and routing traffic to healthy instances.

Your application does not know it is running on Kubernetes. It is a container — the same container you built with Docker. Kubernetes provides the infrastructure around that container: networking, storage, configuration, and lifecycle management.

Pods are the smallest deployable unit. A pod runs one or more containers that share networking and storage. In most cases, a pod runs exactly one container — your application. The pod provides the container with an IP address, environment variables, and mounted volumes.

Deployments manage pods. A deployment says "I want 3 replicas of this pod running at all times." If a pod crashes, the deployment creates a replacement. If you update the container image, the deployment rolls out the new version gradually (rolling update), replacing old pods with new pods one at a time.

Services provide stable networking. Pods come and go — they get new IP addresses each time they restart. A service provides a stable DNS name and IP address that routes traffic to the current set of healthy pods. Your frontend connects to order-service:8080, not to a specific pod's ephemeral IP address.

Health Checks: Telling Kubernetes Your App Is Alive

Kubernetes needs to know whether your application is healthy. Without health checks, Kubernetes assumes a running container is a healthy container — even if the application inside has crashed to a state where it accepts connections but cannot process requests.

Liveness probes tell Kubernetes whether the application is alive. If the liveness probe fails (the health endpoint returns a 500, or the application does not respond within the timeout), Kubernetes restarts the pod. Use liveness probes to detect states where the application is hung, deadlocked, or otherwise non-functional.

Readiness probes tell Kubernetes whether the application is ready to receive traffic. If the readiness probe fails, Kubernetes removes the pod from the service's load balancer — no new traffic is routed to it, but the pod is not restarted. Use readiness probes during startup (the application is connecting to the database and is not ready yet) and during temporary overload (the application is healthy but too busy to accept more requests).

The implementation: add a health endpoint to your application. /health/live returns 200 when the application process is functioning. /health/ready returns 200 when the application is ready to handle requests (database connected, dependencies reachable). Configure Kubernetes to probe these endpoints at appropriate intervals.

The anti-pattern: a liveness probe that checks external dependencies. If the database is down and the liveness probe fails, Kubernetes restarts your pod — which does not fix the database. The pod restarts, the database is still down, the liveness probe fails again, and the pod enters a restart loop. Liveness probes should check the application's internal health. Readiness probes can check external dependencies.

Resource Limits: Playing Nice with Neighbors

In Kubernetes, multiple applications share the same cluster nodes. Resource limits prevent one application from consuming all CPU or memory and starving its neighbors.

Resource requests tell Kubernetes how much CPU and memory the pod needs under normal conditions. Kubernetes uses requests to schedule pods on nodes that have sufficient available resources.

Resource limits set the maximum CPU and memory the pod can consume. If the pod exceeds its memory limit, Kubernetes kills it (OOMKilled). If the pod exceeds its CPU limit, it is throttled — it runs slower but is not killed.

Setting these correctly requires understanding your application's resource profile. Too low and the application is throttled or killed under normal load. Too high and you waste cluster resources. The approach: start with generous limits, observe actual resource usage in production, and tighten to match reality with headroom for spikes.

Configuration and Secrets

Kubernetes provides two mechanisms for injecting configuration into pods.

ConfigMaps store non-sensitive configuration — feature flags, database hostnames, logging levels. They are injected as environment variables or mounted as files. Your application reads them the same way it reads any environment variable or configuration file.

Secrets store sensitive data — database passwords, API keys, TLS certificates. They are similar to ConfigMaps but with additional access controls. Secrets are injected as environment variables or mounted as files, and Kubernetes restricts which pods can access which secrets.

The developer's responsibility: design your application to read configuration from environment variables or files rather than hardcoding values. This makes the application configurable through Kubernetes without code changes — the same container image runs in development, staging, and production with different ConfigMaps and Secrets.

Scaling: Handling More Traffic

Kubernetes scales your application in two ways.

Horizontal Pod Autoscaler (HPA) adds or removes pod replicas based on metrics — CPU utilization, memory usage, or custom metrics (requests per second, queue depth). When traffic increases, HPA creates more pods to handle the load. When traffic decreases, it removes the extras.

Your application must be designed for horizontal scaling. It should be stateless — each request can be handled by any pod. If the application stores session state in memory, horizontal scaling will break because the user's next request might go to a different pod. Use external session storage (Redis, database) or stateless authentication (JWT tokens).

Vertical scaling gives each pod more resources (more CPU, more memory). This is less common and less flexible than horizontal scaling — you are limited by the size of the underlying node.

What Developers Should Actually Do

As a developer whose application runs on Kubernetes, your practical responsibilities are:

Write a good Dockerfile (small, secure, with proper signal handling). Implement health check endpoints (liveness and readiness). Read configuration from environment variables. Design for statelessness (no in-memory session state). Set reasonable resource requests and limits (collaborate with your ops team). Understand the deployment process (how your code gets from a merged PR to a running pod).

You do not need to manage the cluster, configure networking, or set up monitoring infrastructure. Those are platform and operations responsibilities. But understanding the concepts — enough to debug why your pod is crashing, why requests are timing out, or why your application is not scaling — makes you a significantly more effective developer on a Kubernetes platform.

The Takeaway

Kubernetes manages the lifecycle of your containerized application — starting it, scaling it, healing it, and routing traffic to it. The concepts that matter to developers are pods (where your container runs), deployments (how replicas are managed), services (how traffic reaches your pods), health checks (how Kubernetes knows your app is healthy), resource limits (how your app shares cluster resources), and configuration (how settings and secrets reach your app).

You do not need to be a Kubernetes expert to develop applications that run on it. You need to understand how your application interacts with the platform — and design that interaction deliberately rather than discovering it through production incidents.

Next in the "Automation That Scales" learning path: This concludes the automation scaling path. Continue your learning in the "Production-Ready Infrastructure" path for advanced DevOps content, or the "Platform Engineering" path for expert-level platform work.

ShiftQuality