9 May 2023
Kubernetes is a critical component of infrastructures, making good security practices mandatory. But the Kubernetes control plane does not offer the possibility to define strict security policies. Kyverno is for us the best tool to impose security rules.
Overview
In the whole article, we will use the vocabulary associated with Kyverno resources: Policy, Rule, ...
Kyverno is a policy engine for Kubernetes. It allows to :
- Define policies as Kubernetes resources;
- Validate, modify, or generate resources on the fly via these policies;
- Block non-compliant resources with an admission controller;
- Log policy violations in reports.
Benefits
- Define security policies to prohibit the creation of insecure resources;
- Simplify the life of Ops via on-the-fly resource mutations;
- Possibility to configure policies in audit mode (without blocking) or enforce;
- Simple policy writing (compared to GateKeeper in particular)
Disadvantages
- Difficult to create policies with very specific and/or complex logic;
- Kyverno is a Single Point Of Failure. Some people know the dark side of admission controllers: if Kyverno pods are no longer available, no more Kubernetes resources can be deployed on the cluster. I'll give you some tips to avoid this problem in the following article.
Kubernetes Webhook
Kyverno runs as a dynamic admission controller in the Kubernetes cluster.
The Kyverno webhook receives requests from the API server during the "validating admission" and "mutating admission" steps:
Policy & Rule
A Kyverno Policy is composed of the following fields (for more info: kubectl explain policy.spec
) :
rules
: one or more rules define the policy- background : if true, the policy applies to all existing Kubernetes resources in the cluster,
otherwise
it applies only to new resourcesvalidationFailureAction
: the action mode of the policy: audit or enforce
A Rule contains the following fields (for more info: kubectl explain policy.spec.rules
):
match
: to select the resourcesexclude
(optional): to exclude resources from the selectionmutate
,validate
,generate
, orverifyImages
: depending on the type of policy allows to mutate, validate, generate a resource, or verify the signature of an image (in beta)
Audit vs Enforce
Kyverno has 2 modes of operation (validationFailureAction
):
- audit: does not block any deployment, but generates a report indicating when the specified policies are not respected and why
- enforce: completely blocks the creation of resources that do not respect the policies
Policy Report
Policy Reports are Kubernetes resources that can be listed simply:
kubectl get policyreport -A
For a given namespace, we can list policy violations with the command :
kubectl describe polr polr-ns-default | grep "Result: \\+fail" -B10
Installation
Kyverno can be installed on clusters via a simple Helm chart. Nothing could be simpler, that's the power of Kubernetes:
kelm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno --namespace kyverno --create-namespace --values values.yaml
Here are the important points to consider in the chart values.yaml
:
---
# 3 replicas for High Availability
replicaCount: 3
# Necessary in EKS with custom Network CNI plugin
# https://cert-manager.io/docs/installation/compatibility/#aws-eks
hostNetwork: true
config:
webhooks:
# Exclude namespaces from scope
- namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- kube-system
- kyverno
- calico-system
# Exclude objects from scope
- objectSelector:
matchExpressions:
- key: webhooks.kyverno.io/exclude
operator: DoesNotExist
Some remarks about the installation :
- Access to the host network is required if you use EKS
- Kyverno must be configured with at least 3 replicas to ensure high availability
- The namespaces
kube-system
andkyverno
are whitelisted in order not to block the deployment of critical Kubernetes resources (kube-proxy, weave, ...).
Example of policy
A list of simple examples is provided in the Kyverno documentation.
I'd like to present a slightly more advanced use case: dynamic RBAC rights management. Here is the use case we encountered. We set up on-the-fly development environments in Kubernetes at a customer's site.
We allowed developers, via a Gitlab CI job, to test their applications in environments created on the fly. These environments are in dedicated namespaces also created on the fly.
How do you provide the associated Gitlab runner with RBAC rights to namespaces that don't yet exist? Unfortunately, Kubernetes does not allow this via RBAC, but with Kyverno, it is very simple.
All you need to do is:
- Give the runner the RBAC rights to create namespaces
- To give RBAC rights on this namespace via a Kyverno Policy: a Mutation Policy can simply create a RoleBinding in reaction to the namespace creation
Here are the implementation details:
- The k8s service account gitlab-runner-ephemeral-env is only allowed to create namespaces
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-runner-ephemeral-env
labels:
app: gitlab-runner-ephemeral-env
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gitlab-runner-ephemeral-env
labels:
app: gitlab-runner-ephemeral-env
rules:
- apiGroups: ["*"]
resources: ["namespaces"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-runner-ephemeral-env
labels:
app: gitlab-runner-ephemeral-env
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gitlab-runner-ephemeral-env
subjects:
- kind: ServiceAccount
name: gitlab-runner-ephemeral-env
namespace: gitlab
- When a namespace is created, a rolebinding is created between it and the ClusterRole
cluster-admin
via a ClusterPolicy Kyverno
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-rbac-rules-env-volee
annotations:
policies.kyverno.io/title: Add RBAC permissions for ephemeral environments.
policies.kyverno.io/category: Multi-Tenancy
policies.kyverno.io/subject: RBAC
policies.kyverno.io/description: >-
Add RBAC rules when a namespace is created by a specific gitlab runner (gitlab-runner-env-volee), useful for ephemeral
environments.
spec:
background: false
rules:
- name: create-rbac
match:
resources:
kinds:
- Namespace
subjects:
- kind: ServiceAccount
name: gitlab-runner-ephemeral-env
namespace: gitlab
generate:
kind: RoleBinding
name: ephemeral-namespace-admin
namespace: ""
synchronize: true
data:
subjects:
- kind: ServiceAccount
name: gitlab-runner-ephemeral-env
namespace: gitlab
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Limitations of Kyverno
I will detail in this part several problems encountered when implementing Kyverno. Besides the fact that Kyverno is a SPOF on all the namespaces it monitors, the policies are quite complicated to write and debug. Not to mention that Kyverno can have side effects with other tools like ArgoCD.
Policies are complex to write
Overall, Kyverno policies can be quite difficult to write. The documentation has many examples, but the whole mechanism of filtering and mutating resources can be a bit confusing at first.
Let's take a live example. We want to disallow the privileged: true
parameter except for two types of pods (as shown in the following diagram):
- Pods in the
debug
namespace - Pods in the
gitlab
namespace whose name starts withrunner
Following the documentation, we are tempted to write the following policy:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged-containers
annotations:
policies.kyverno.io/category: Pod Security Standards (Baseline)
policies.kyverno.io/severity: medium
policies.kyverno.io/subject: Pod
policies.kyverno.io/description: >-
Privileged mode disables most security mechanisms and must not be allowed. This policy
ensures Pods do not call for privileged mode.
spec:
validationFailureAction: audit
background: true
rules:
- name: priviledged-containers
match:
resources:
kinds:
- Pod
exclude:
any:
- resources:
namespaces:
- "debug"
# Whitelisting
- resources:
namespaces:
- "gitlab"
names:
- "runner-*"
validate:
message: >-
Privileged mode is disallowed. The fields spec.containers[*].securityContext.privileged
and spec.initContainers[*].securityContext.privileged must not be set to true.
pattern:
spec:
=(initContainers):
- =(securityContext):
=(privileged): "false"
containers:
- =(securityContext):
=(privileged): "false"
This policy does not work, the filtering mechanism is not effective. After some research, here is the fix to apply:
18,20c18,21
< resources:
< kinds:
< - Pod
---
> all:
> - resources:
> kinds:
> - Pod
There is no indication in the documentation of a change in behavior between these two ways of filtering resources. Not easy to debug a policy that doesn't work... fortunately, the community is active, and someone quickly proposed the solution on Slack..
Beware of Mutation Webhooks
From experience, one should always be careful with Webhook Mutation, which can be confusing for DevOps teams. Kubernetes Webhook Mutations inherently induce a difference between the specified resources and the resources actually deployed on the cluster.
If an Ops is not aware of the existence of these mutations, they can waste a lot of time understanding why a particular resource appears or has certain attributes.
Similarly, if a cluster has too many MutationPolicies, there may be incompatibilities between policies, or edge effects that are difficult to identify.
I recommend using Webhook Mutations sparingly and documenting them very clearly. This can be extremely useful (e.g. adding the address of an HTTP proxy as an environment variable for all pods in a namespace), but it is best to avoid abusing it if possible.
Side effects with ArgoCD
We have also encountered some difficulties with Kubernetes clusters whose CD is managed via ArgoCD.
When a Kyverno policy is created that relates to a resource that deploys containers, such as pods, Kyverno intelligently modifies the rules
so that the policies take into account all types of Kubernetes resources that deploy containers.
For example, if we create this policy:
apiVersion : kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: enforce
rules:
- name: validate-registries
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Images may only come from our internal enterprise registry."
pattern:
spec:
containers:
- image: "registry.domain.com/*"
Kyverno will modify the policy on the fly via a Webhook Mutation like this:
aspec:
background: true
failurePolicy: Fail
rules:
- match:
any:
- resources:
kinds:
- Pod
name: validate-registries
validate:
message: Images may only come from our internal enterprise registry.
pattern:
spec:
containers:
- image: registry.domain.com/*
- match:
any:
- resources:
kinds:
- DaemonSet
- Deployment
- Job
- StatefulSet
name: autogen-validate-registries
validate:
message: Images may only come from our internal enterprise registry.
pattern:
spec:
template:
spec:
containers:
- image: registry.domain.com/*
- match:
any:
- resources:
kinds:
- CronJob
name: autogen-cronjob-validate-registries
validate:
message: Images may only come from our internal enterprise registry.
pattern:
spec:
jobTemplate:
spec:
template:
spec:
containers:
- image: registry.domain.com/*
validationFailureAction: enforce
What happens if the Kyverno policy was created via Argo? Argo will detect a change between the Yaml file of the declared policy and the resource actually deployed in the cluster. There is then a constant back and forth between Argo and Kyverno, which modify the Kyverno policy in turn.
To indicate to Argo that these changes are not to be taken into account, it is sufficient to use the ignoreDifferences
keyword in the Argo application:
ignoreDifferences:
# Kyverno auto-generates rules to make policies smarter. We want ArgoCD to
# ignore the auto-generated rules.
# For more information: https://kyverno.io/docs/writing-policies/autogen/
- group: kyverno.io
kind: ClusterPolicy
jqPathExpressions:
- .spec.rules[] | select( .name | startswith("autogen-") )
Conclusion
Now you know what Kyverno is, how to install it, and how to use it to secure your Kubernetes cluster! Once again, use Webhook Mutation sparingly, test your policies well in audit mode beforehand, and don't hesitate to contact the community in case of problems.