Kubernetes workloads are often exposed on the internet, which is why they are interesting for attackers to gain initial access to a Kubernetes infrastructure. Unfortunately, no web framework is free of bugs or vulnerabilities, and the same goes for Docker images used to package applications.
Therefore, it is essential to harden the security of workloads to prevent the compromise of one application from leading to a full compromise of the underlying Kubernetes infrastructure.
This article aims to provide recommendations to mitigate these risks. These recommendations are mainly taken from the NSA's "Kubernetes Hardening Guide" and from the Kubernetes security documentation. It does not cover Role-Based Access Control (RBAC) in Kubernetes.
Securing Pods reduces the overall attack surface in the cluster and prevents many post-exploitation activities after a compromise.
Most container services run applications as root users by default. Yet, most of the time, applications do not require such high permissions.
Running applications as non-root users mitigates the impact of a container compromise, as it limits the rights these applications have in the containers. In addition, container engines (the services that manage containers on a Kubernetes node) are also prone to security flaws that can break the isolation between containers and the host system.
In this scenario, an attacker that managed to compromise a container (exploiting an application vulnerability for instance) could escape from it and end up with the same privileges on the host system as the one he has in the container.
There are two ways to run containers with a non-root user:
Specifying a non-root user to use in the Dockerfile
# Create a new user (user1) and new group (group1); then switch into that user’s context
RUN useradd user1 && groupadd group1
USER user1:group1
Using Pod security contexts to specify a non-root user at runtime
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
securityContext:
runAsUser: 1001
runAsNonRoot: true
Specifying the user in the Dockerfile is to be preferred, as it ensures the container will always be run without root rights by container engines.
An attacker that compromised an application and gained execution rights can create, download files or modify applications. Kubernetes can lock down the container's file system to prevent many post-exploitation activities.
⚠️ Note that these restrictions also affect legitimate applications running in containers and can result in crashes or abnormal behaviors.
To enable these limitations, the property securityContext.readOnlyRootFilesystem
need to be set to true in the container specification.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
securityContext:
readOnlyRootFilesystem: true
Building secure container images in the first place helps reduce the security flaws attackers can leverage to gain access to containers. In addition, using minimal images with only the necessary services allows for reducing the attack surface and the tools attackers can leverage from within a container.
Two mechanisms can be leveraged to build secure images:
scratch
(when possible) or alpine
-based imagesTo make sure the Docker images run by Pods are secure, it is possible to only accept signed images in the cluster (using GKE built-in Binary Authorization or a dedicated admission controller for instance) that come from trusted repositories and that passed vulnerability checks.
Linux divides the privileges traditionally associated with superuser into distinct units, called capabilities, which can be independently enabled and disabled. These capabilities are checked in kernel system calls to test whether a program can or cannot do privileged operations. Kubernetes container runtime (containerd
by default) is a privileged process that spawns containers with a set of predefined capabilities.
The privileged operations an attacker can perform inside a container are limited by the capabilities the container process has. Thus, many post-exploitation activities can be prevented by restricting them.
Container capabilities can be limited by the parameter securityContext.capabilities
. It is good practice to disable all capabilities using the keyword all
and then only add the ones that are strictly necessary.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
securityContext:
capabilities:
drop: ["all"]
add:
- MKNOD
- NET_RAW
⚠️ Note that legitimate programs need capabilities to perform operations, and so restricting them can impact applications.
Seccomp is a Linux kernel feature that allows restricting the system calls a program can do. It can be used to sandbox the privileges of a process.
The privileged operations an attacker can perform inside a container are limited by the system calls he is allowed to do. Many post-exploitation activities can be prevented by limiting them.
The Seccomp profiles to apply on the Pod containers can be configured with the property securityContext.seccompProfile
. Kubernetes allows specifying two kinds of profiles:
RuntimeDefault
which allows using the default seccomp profile provided by the container engine. Nodes can be configured to apply it on all containers by default. containerd
can apply the following default profile. Cloud providers usually activate it automatically on nodes.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
securityContext:
seccompProfile:
type: RuntimeDefault
Localhost
which allows using a profile loaded on the node host. The profile can be loaded beforehand using a DaemonSet
for instance.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
securityContext:
seccompProfile:
type: Localhost
localhostProfile: "profiles/my-custom-profile.json"
AppArmor is a Linux kernel security module that can restrict the capabilities of running processes and limit their access to files. With this module, each process can have its own security profile.
If you want more information on AppArmor, this article explains how to build AppArmor profiles specifically for Docker containers.
AppArmor allows restricting the activities an attacker can perform in a container by limiting the Linux capabilities (AppArmor is redundant with Kubernetes’ capabilities feature) of the container process and file access in the container.
The module first needs to be activated on the nodes’ OS (cloud providers usually provide optimized OS for Kubernetes that can ship with AppArmor). In addition, to use custom profiles, profiles need to be loaded on the node using a DaemonSet
for instance.
An AppArmor profile can be applied to a container, adding the annotation container.apparmor.security.beta.kubernetes.io/<container_name>
to the Pod's metadata. Kubernetes allows applying two types of profiles:
runtime/default
to apply the runtime default profile (see containerd
default profile). containerd
automatically apply this profile by default when AppArmor is enabled on the node.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
annotations:
container.apparmor.security.beta.kubernetes.io/my_container: runtime/default
spec:
...
localhost/<profile_name>
to apply a profile that was loaded on the node beforehand
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
annotations:
container.apparmor.security.beta.kubernetes.io/my_container: localhost/my-profile
spec:
...
SELinux can also be used to secure Pods through the securityContext
property. SELinux provides similar functionalities as AppArmor. However, it is considered harder to learn but more secure than AppArmor.
Note that AppArmor and SELinux cannot be used at the same time on a system.
Service accounts are critical elements of Kubernetes infrastructure, as they are used by Pods to interact with the Kubernetes API. Thus, it is important to only grant them the necessary rights and to never rely on default service accounts that are auto-mounted in all Pods by default.
Never use default service accounts unless you have no other choice
Respect the principle of least privilege, and only grant the necessary permissions to custom service accounts
Disable service account auto-mount in pods that do not need to interact with the Kubernetes API. The auto-mount can be disabled by using the property automountServiceAccountToken
.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
...
automountServiceAccountToken: false
Enforcing Pods’ security consists in preventing Pods not respecting the baseline security policy defined beforehand from running in the cluster. It allows making sure workloads run in Pods that respect part of the above security criteria.
Pod security enforcement ensures that no Pods with potential vulnerabilities can be deployed in the cluster and result in the
Kubernetes has a built-in Pod Security Admission Controller (since Kubernetes 1.23) that checks the compliance of Pod specifications with pre-defined Pod Security Standards (privileged
, baseline
, or restricted
) that define different isolation levels for Pods. It only requires annotating namespaces to define a Pod security standard level to perform compliance checks. The controller can either enforce
the policy and reject all Pods that violate the policy, or audit
the Pod's compliance with the policy and trigger the addition of an audit log if a Pod’s specification violates it. For instance, to enforce the restricted
policy:
apiVersion: v1
kind: Namespace
metadata:
name: my-secure-namespace
annotations:
# pod-security.kubernetes.io/<MODE>: <LEVEL>
pod-security.kubernetes.io/enforce: restricted
Instead of this controller, Policy controllers like Kyverno can be used to perform similar checks. These controllers also allow mutating Pods to directly add security constraints when during resource admission.
Isolating workloads is essential to limit lateral movement within clusters, and to prevent a compromised workload from impacting other workloads.
Kubernetes Namespaces allow a logical partition of cluster resources. Namespaces do not automatically isolate workloads and applications, but numerous resources apply to the scope of Namespaces.
Especially, the resources mentioned in the next part apply to Namespaces to isolate them from each other and to isolate workloads within those spaces. Roles
and RoleBindings
used for RBAC also apply to the Namespace scope.
NetworkPolicies
NetworkPolicies
?Traffic between Pods, Namespaces, and external IP addresses can be controlled with NetworkPolicies
. By default, there is no restriction for ingress and egress traffic in the cluster. Thus, without NetworkPolicies
, an attacker that compromises a container is able to request all other Pods and Services to potentially move laterally within the cluster.
NetworkPolicies
?NetworkPolicies require a Kubernetes network plugin that supports them (for instance Calico). To secure as much as possible your cluster, the best practice is to respect the principle of least privilege by only authorizing legitimate network flows. Network policies do not conflict since they are additive.
The good practice is to deny all ingress traffic in the cluster.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
spec:
podSelector: {}
policyType:
- Ingress
And then to allow only the necessary traffic in the cluster with other network policies. It is possible to go even further by restricting the egress traffic by denying all egress traffic from pods and only allowing necessary communications.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-egress
spec:
podSelector: {}
policyType:
- Egress
Example of Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: example-access-nginx
namespace: prod
spec:
podSelector:
matchLabels:
app: nginx
ingress:
-from:
-podSelector:
matchLabels:
access: "true"
ResourceQuotas
for requests and limitsResourceQuotas
?ResourceQuotas
allow restricting the sum of resources Pods in a Namespace can request. If they are used to limit compute resources, all Pods must define requests and limits, otherwise, the quota system may reject Pod creation. ResourceQuotas
help avoid resource exhaustion, especially to prevent user applications to monopolize resources and prevent Kubernetes system Pods from running correctly.
ResourceQuotas
?It is good practice to use ResourceQuotas
to:
apiVersion: v1
kind: ResourceQuota
metadata:
name: my-secure-namespace
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
LimitRange resources can also be used to set some default requests and limits on Pods if not specified and limit the ranges of these requests and limits at the Pod scope.
All these recommendations take quite some time to implement. You will probably not follow all of them, depending on your needs. However, some of them are quick wins that can highly improve the security of your cluster and your workloads.
Especially, building secure Docker images, taking extra care with Service Accounts as well as using default Seccomp and AppArmor profiles is a good first step.
However, taking extra precautions to secure workloads does not guarantee that an attacker will never find a vulnerability in your applications or cluster. Therefore, monitoring your cluster for anomalies that may result from a compromise is also critical to securing your infrastructure in depth.
For example, Falco is an intrusion detection tool that integrates well with Kubernetes to provide such functionality.