Kyverno policies for security on Kubernetes

9 May 2023

Kubernetes is a critical component of infrastructures, making good security practices mandatory. But the Kubernetes control plane does not offer the possibility to define strict security policies. Kyverno is for us the best tool to impose security rules.

Overview

In the whole article, we will use the vocabulary associated with Kyverno resources: Policy, Rule, ...

Kyverno is a policy engine for Kubernetes. It allows to :

Define policies as Kubernetes resources;
Validate, modify, or generate resources on the fly via these policies;
Block non-compliant resources with an admission controller;
Log policy violations in reports.

Benefits

Define security policies to prohibit the creation of insecure resources;
Simplify the life of Ops via on-the-fly resource mutations;
Possibility to configure policies in audit mode (without blocking) or enforce;
Simple policy writing (compared to GateKeeper in particular)

Disadvantages

Difficult to create policies with very specific and/or complex logic;
Kyverno is a Single Point Of Failure. Some people know the dark side of admission controllers: if Kyverno pods are no longer available, no more Kubernetes resources can be deployed on the cluster. I'll give you some tips to avoid this problem in the following article.

Kubernetes Webhook

Kyverno runs as a dynamic admission controller in the Kubernetes cluster.

The Kyverno webhook receives requests from the API server during the "validating admission" and "mutating admission" steps:

Policy & Rule

A Kyverno Policy is composed of the following fields (for more info: kubectl explain policy.spec) :

rules: one or more rules define the policy
background : if true, the policy applies to all existing Kubernetes resources in the cluster,
otherwise it applies only to new resources
validationFailureAction: the action mode of the policy: audit or enforce

A Rule contains the following fields (for more info: kubectl explain policy.spec.rules):

match: to select the resources
exclude (optional): to exclude resources from the selection
mutate, validate, generate, or verifyImages: depending on the type of policy allows to mutate, validate, generate a resource, or verify the signature of an image (in beta)

policy_rule

Audit vs Enforce

Kyverno has 2 modes of operation (validationFailureAction):

audit: does not block any deployment, but generates a report indicating when the specified policies are not respected and why
enforce: completely blocks the creation of resources that do not respect the policies

Policy Report

Policy Reports are Kubernetes resources that can be listed simply:

kubectl get policyreport -A

For a given namespace, we can list policy violations with the command :

kubectl describe polr polr-ns-default | grep "Result: \\+fail" -B10

Installation

Kyverno can be installed on clusters via a simple Helm chart. Nothing could be simpler, that's the power of Kubernetes:

kelm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno  --namespace kyverno --create-namespace --values values.yaml

Here are the important points to consider in the chart values.yaml:

---
# 3 replicas for High Availability
replicaCount: 3

# Necessary in EKS with custom Network CNI plugin
# https://cert-manager.io/docs/installation/compatibility/#aws-eks
hostNetwork: true

config:

  webhooks:
    # Exclude namespaces from scope
    - namespaceSelector:
        matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values:
            - kube-system
            - kyverno
            - calico-system
    # Exclude objects from scope
    - objectSelector:
        matchExpressions:
        - key: webhooks.kyverno.io/exclude
          operator: DoesNotExist

Some remarks about the installation :

Access to the host network is required if you use EKS
Kyverno must be configured with at least 3 replicas to ensure high availability
The namespaces kube-system and kyverno are whitelisted in order not to block the deployment of critical Kubernetes resources (kube-proxy, weave, ...).

Example of policy

A list of simple examples is provided in the Kyverno documentation.

I'd like to present a slightly more advanced use case: dynamic RBAC rights management. Here is the use case we encountered. We set up on-the-fly development environments in Kubernetes at a customer's site.

We allowed developers, via a Gitlab CI job, to test their applications in environments created on the fly. These environments are in dedicated namespaces also created on the fly.

How do you provide the associated Gitlab runner with RBAC rights to namespaces that don't yet exist? Unfortunately, Kubernetes does not allow this via RBAC, but with Kyverno, it is very simple.

All you need to do is:

Give the runner the RBAC rights to create namespaces
To give RBAC rights on this namespace via a Kyverno Policy: a Mutation Policy can simply create a RoleBinding in reaction to the namespace creation

Here are the implementation details:

The k8s service account gitlab-runner-ephemeral-env is only allowed to create namespaces

apiVersion: v1
kind: ServiceAccount
metadata:
  name: gitlab-runner-ephemeral-env
  labels:
    app: gitlab-runner-ephemeral-env
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gitlab-runner-ephemeral-env
  labels:
    app: gitlab-runner-ephemeral-env
rules:
- apiGroups: ["*"]
  resources: ["namespaces"]
  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitlab-runner-ephemeral-env
  labels:
    app: gitlab-runner-ephemeral-env
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gitlab-runner-ephemeral-env
subjects:
- kind: ServiceAccount
  name: gitlab-runner-ephemeral-env
  namespace: gitlab

When a namespace is created, a rolebinding is created between it and the ClusterRole cluster-admin via a ClusterPolicy Kyverno

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-rbac-rules-env-volee
  annotations:
    policies.kyverno.io/title: Add RBAC permissions for ephemeral environments.
    policies.kyverno.io/category: Multi-Tenancy
    policies.kyverno.io/subject: RBAC
    policies.kyverno.io/description: >-
      Add RBAC rules when a namespace is created by a specific gitlab runner (gitlab-runner-env-volee), useful for ephemeral
      environments.
spec:
  background: false
  rules:
  - name: create-rbac
    match:
      resources:
        kinds:
        - Namespace
      subjects:
      - kind: ServiceAccount
        name: gitlab-runner-ephemeral-env
        namespace: gitlab
    generate:
      kind: RoleBinding
      name: ephemeral-namespace-admin
      namespace: ""
      synchronize: true
      data:
        subjects:
        - kind: ServiceAccount
          name: gitlab-runner-ephemeral-env
          namespace: gitlab
        roleRef:
          kind: ClusterRole
          name: cluster-admin
          apiGroup: rbac.authorization.k8s.io

Limitations of Kyverno

I will detail in this part several problems encountered when implementing Kyverno. Besides the fact that Kyverno is a SPOF on all the namespaces it monitors, the policies are quite complicated to write and debug. Not to mention that Kyverno can have side effects with other tools like ArgoCD.

Policies are complex to write

Overall, Kyverno policies can be quite difficult to write. The documentation has many examples, but the whole mechanism of filtering and mutating resources can be a bit confusing at first.

Let's take a live example. We want to disallow the privileged: true parameter except for two types of pods (as shown in the following diagram):

Pods in the debug namespace
Pods in the gitlab namespace whose name starts with runner

Following the documentation, we are tempted to write the following policy:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged-containers
  annotations:
    policies.kyverno.io/category: Pod Security Standards (Baseline)
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      Privileged mode disables most security mechanisms and must not be allowed. This policy
      ensures Pods do not call for privileged mode.
spec:
  validationFailureAction: audit
  background: true
  rules:
    - name: priviledged-containers
      match:
        resources:
          kinds:
            - Pod
      exclude:
        any:
        - resources:
            namespaces:
            - "debug"
        # Whitelisting
        - resources:
            namespaces:
              - "gitlab"
            names:
            - "runner-*"
      validate:
        message: >-
          Privileged mode is disallowed. The fields spec.containers[*].securityContext.privileged
          and spec.initContainers[*].securityContext.privileged must not be set to true.
        pattern:
          spec:
            =(initContainers):
              - =(securityContext):
                  =(privileged): "false"
            containers:
              - =(securityContext):
                  =(privileged): "false"

This policy does not work, the filtering mechanism is not effective. After some research, here is the fix to apply:

18,20c18,21
<         resources:
<           kinds:
<             - Pod
---
>         all:
>         - resources:
>             kinds:
>               - Pod

There is no indication in the documentation of a change in behavior between these two ways of filtering resources. Not easy to debug a policy that doesn't work... fortunately, the community is active, and someone quickly proposed the solution on Slack..

Beware of Mutation Webhooks

From experience, one should always be careful with Webhook Mutation, which can be confusing for DevOps teams. Kubernetes Webhook Mutations inherently induce a difference between the specified resources and the resources actually deployed on the cluster.

If an Ops is not aware of the existence of these mutations, they can waste a lot of time understanding why a particular resource appears or has certain attributes.

Similarly, if a cluster has too many MutationPolicies, there may be incompatibilities between policies, or edge effects that are difficult to identify.

I recommend using Webhook Mutations sparingly and documenting them very clearly. This can be extremely useful (e.g. adding the address of an HTTP proxy as an environment variable for all pods in a namespace), but it is best to avoid abusing it if possible.

Side effects with ArgoCD

We have also encountered some difficulties with Kubernetes clusters whose CD is managed via ArgoCD.

When a Kyverno policy is created that relates to a resource that deploys containers, such as pods, Kyverno intelligently modifies the rules so that the policies take into account all types of Kubernetes resources that deploy containers.

For example, if we create this policy:

apiVersion : kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Images may only come from our internal enterprise registry."
      pattern:
        spec:
          containers:
          - image: "registry.domain.com/*"

Kyverno will modify the policy on the fly via a Webhook Mutation like this:

aspec:
  background: true
  failurePolicy: Fail
  rules:
  - match:
      any:
      - resources:
          kinds:
          - Pod
    name: validate-registries
    validate:
      message: Images may only come from our internal enterprise registry.
      pattern:
        spec:
          containers:
          - image: registry.domain.com/*
  - match:
      any:
      - resources:
          kinds:
          - DaemonSet
          - Deployment
          - Job
          - StatefulSet
    name: autogen-validate-registries
    validate:
      message: Images may only come from our internal enterprise registry.
      pattern:
        spec:
          template:
            spec:
              containers:
              - image: registry.domain.com/*
  - match:
      any:
      - resources:
          kinds:
          - CronJob
    name: autogen-cronjob-validate-registries
    validate:
      message: Images may only come from our internal enterprise registry.
      pattern:
        spec:
          jobTemplate:
            spec:
              template:
                spec:
                  containers:
                  - image: registry.domain.com/*
  validationFailureAction: enforce

What happens if the Kyverno policy was created via Argo? Argo will detect a change between the Yaml file of the declared policy and the resource actually deployed in the cluster. There is then a constant back and forth between Argo and Kyverno, which modify the Kyverno policy in turn.

To indicate to Argo that these changes are not to be taken into account, it is sufficient to use the ignoreDifferences keyword in the Argo application:

ignoreDifferences:
    # Kyverno auto-generates rules to make policies smarter. We want ArgoCD to
    # ignore the auto-generated rules.
    # For more information: https://kyverno.io/docs/writing-policies/autogen/
    - group: kyverno.io
      kind: ClusterPolicy
      jqPathExpressions:
        - .spec.rules[] | select( .name | startswith("autogen-") )

Conclusion

Now you know what Kyverno is, how to install it, and how to use it to secure your Kubernetes cluster! Once again, use Webhook Mutation sparingly, test your policies well in audit mode beforehand, and don't hesitate to contact the community in case of problems.