Container + Kubernetes Security Pattern
How to ship containers + run them on Kubernetes without making the runtime an attack surface.
Container + Kubernetes Security Pattern
How to ship containers + run them on Kubernetes without making the runtime an attack surface.
TL;DR (human)
Five layers: image hygiene (small, signed, scanned), runtime restrictions (non-root, read-only filesystem, no privilege escalation), network policies (default deny), secret injection (vault integration, not env), admission control (gate every deploy). Each layer compounds; missing any is exploitable.
For agents
Image hygiene
Smaller, signed, scanned:
- Base image: distroless or minimal (Alpine, slim variants); fewer packages = smaller attack surface.
- Multi-stage builds: build deps in stage 1; final image only ships runtime.
- No shells / dev tools in production images (no
bash,curl, package manager). - Pin base image by digest (
FROM image@sha256:...), not by tag. - Sign + verify (cosign + sigstore — per
vulnerability-mgmt-pattern.md). - Scan on build (Trivy / Grype / Snyk) and on registry-pull.
Image size: aim < 100MB for typical Node / Go / Java apps. Cuts attack surface + speeds deploys.
Runtime restrictions (PodSecurityContext)
Every pod sets:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
seccompProfile:
type: RuntimeDefaultEach line blocks a class of escape / lateral-movement.
readOnlyRootFilesystem: true is the most impactful — many exploits write to filesystem; if filesystem is read-only, exploit blocked. Apps that need temp space mount an emptyDir for /tmp only.
Pod Security Standards (PSS)
K8s native (replaces deprecated PodSecurityPolicy):
- Privileged: no restrictions (avoid in production).
- Baseline: minimum hardening.
- Restricted: hard locked down; default for production.
Apply per namespace:
metadata:
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedRestricted mode rejects pods that don't meet hardening criteria. Forces compliance by enforcement, not by hope.
Network policy
Default deny; explicit allow:
# default deny all egress + ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes: [Ingress, Egress]Then per-namespace allow only what's needed:
- App pods → DB pods (specific ports).
- App pods → DNS (UDP 53).
- App pods → known external endpoints.
- Ingress controller → app pods.
Default-allow in k8s is the most common misconfiguration. Default-deny first; add what breaks; iterate.
Secret injection
Anti-pattern: secrets as env vars at deploy time (visible in pod spec).
Preferred:
- External Secrets Operator + cloud secret manager (AWS Secrets Manager, GCP Secret Manager, Vault).
- Vault Agent Injector for HashiCorp Vault: sidecar fetches secrets; templated into volume.
- Sealed Secrets (Bitnami): for GitOps where secrets must live in git.
K8s Secrets (the native object) are base64, not encrypted by default. Enable encryption at rest at the etcd layer (KMS provider).
See secrets-mgmt-deep-pattern.md.
Workload identity
Per pod: IAM identity (cloud-specific):
- EKS: IRSA (IAM Roles for Service Accounts).
- GKE: Workload Identity.
- AKS: AAD Pod Identity (deprecated) → Workload Identity Federation.
Pod identity → IAM permissions → cloud resources. No long-lived cloud credentials needed.
Admission control
Before a pod is created, an admission controller can:
- Validate the spec (reject if non-compliant).
- Mutate the spec (inject sidecars, defaults).
Tools:
- Open Policy Agent (OPA) Gatekeeper: policy-as-code; rejects non-compliant.
- Kyverno: k8s-native; YAML policies.
- Pod Security Admission: built-in PSS enforcement.
- Validation webhooks: custom logic.
Example policies:
- Require
runAsNonRoot: true. - Block
:latestimage tags (force digest). - Require image from approved registries.
- Require Network Policy on every pod.
- Require resource limits.
- Block
hostPathmounts (escape vector).
Resource limits
Every pod sets CPU + memory limits:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 512MiWhy critical security-wise:
- No limits = one pod OOMs the node = cascading.
- DDoS-by-resource-exhaustion within cluster.
- Cost predictability.
Enforce via admission controller; reject pods without limits.
RBAC for k8s itself
Kubernetes API access:
- Use Role / RoleBinding (namespace-scoped); ClusterRole / ClusterRoleBinding (cluster-wide).
- Service Accounts for pods (not user-level for workloads).
- Audit logging on the API server.
- No
cluster-adminfor regular operators; break-glass only.
A kubectl exec to a production pod is a privileged action — audited; rare; alternative tooling for routine ops.
Image registry
- Private registry (ECR, GCR, GHCR, Harbor).
- Pull-secrets per namespace.
- Vulnerability scanning on push.
- Retention policy (keep last N versions; expire older).
- Image-signing required (cosign verification at pull).
Cluster hardening (operator concerns)
- Etcd encryption at rest (KMS).
- API server audit logging.
- Control plane access restricted (private endpoints).
- Worker nodes hardened (CIS Benchmark).
- Regular cluster upgrades (K8s patches; deprecated APIs).
- Pod-level monitoring (Falco, Tetragon — runtime threat detection).
Container vulnerability lifecycle
- At build: scan; fail high-CVE.
- At registry: scan periodically; re-scan as new CVEs land.
- At deploy: gate on scan results.
- In production: image catalog tracks which clusters run which versions; CVE alerts route to teams.
Per vulnerability-mgmt-pattern.md: patch SLA per severity.
Common failure modes
- Containers run as root. Default in many images; exploit easier. →
runAsNonRoot: true. - No Network Policy. Lateral movement trivial within cluster. → Default deny.
- Secrets as plain env. Visible in pod spec; in logs. → External Secrets / Vault injection.
:latestimage tag. Mutability; rollback ambiguous. → Pin digest.- No resource limits. One pod kills a node. → Required; admission-enforced.
hostPathmount. Container can write host filesystem. → Block via admission.- Privileged sidecar. Compromise privileged sidecar = node compromise. → Avoid privileged; specific capabilities instead.
- Image scan but no enforcement. Vulnerable images deploy. → Gate at deploy.
Tooling stack (typical)
| Concern | Tool |
|---|---|
| Image scan | Trivy, Grype, Snyk, Anchore |
| Sign + verify | cosign, sigstore |
| Admission control | OPA Gatekeeper, Kyverno |
| Runtime threat detection | Falco, Tetragon |
| Secrets | External Secrets Operator + cloud SM, Vault |
| Network policy | Calico, Cilium, native |
| Cluster hardening | kube-bench (CIS), kube-hunter |
| RBAC visualisation | kubectl-who-can, rbac-tool |
See also
vulnerability-mgmt-pattern.md— SBOM + CVE pipeline.secrets-mgmt-deep-pattern.md— secret injection patterns.vault-pattern.md— secret storage.../architecture/service-mesh-pattern.md— mTLS in mesh.compliance-framework-pattern.md— container security is a SOC 2 / ISO control.../architecture/iac-pattern.md— cluster + namespaces defined as code.