Kubernetes Operators often allow users to configure low level aspects of their operands and secondary resources. Typically, such settings are made available on the custom resource and reconciled into the operand.

An example of this is the Grafana custom resource of the Grafana Operator. It exposes many configuration options that are reflected into the Grafana configuration file, but also allows you to configure properties of the Kubernetes resources of your Grafana installation. For example, you can add additional ports to your service, mount Secrets into the Grafana pod or expose additional environment variables.

These are all individual fields in the custom resource that are then reconciled and applied to the respective Kubernetes resource. In this blog post I would like to describe some problems with this approach as well as offer an alternative way and insights as to then this alternative is preferable.

The problem with exposing hand-picked configuration options

The issue with hand-picking certain properties of Kubernetes resources, exposing them on a custom resource and then reconciling them into said resources is that it’s hard to foresee what a user might want to modify. For the Grafana Operator we often get requests to make additional fields of underlying resources configurable through the custom resource. Additionally, Kubernetes resources are changing with time. Granted, that happens rather slowly, but it does happen, for example when Ingress was promoted to v1.

Another issue with this approach is discoverability. Someone who is already familiar with configuring, for example a route, now needs to learn how to do that through your custom resource. And often only a subset of the available options is exposed.

A better approach

How could this be improved for both, the user and the developer? In the upcoming version of the Grafana Operator we have started exposing raw Kubernetes resources in the custom resource. To configure a deployment, you will have access to a Deployment object complete with the official spec and metadata. Same goes for all other resources that are managed by the Operator, the ServiceAccount, the Route or Ingress and the Service.

Users don’t have to learn how to configure a resource through the Grafana Operator. Instead, they can focus on the Kubernetes resources they want to configure.

Difficulties

Sounds easy? Not quite. There are a few obstacles to overcome.

  1. Kubernetes resource definitions (e.g. Deployment) are huge and will bloat your CRDs.
  2. Not suitable for partial specification when we want to allow users exactly that.
  3. We need a merge strategy to produce the result from the operator defaults and the user overrides.

We can tackle those issues. To address CRD bloat we’re going to strip the descriptions. When using kubebuilder to generate the CRDs from the code, we can pass the following parameter:

crd:maxDescLen=0

This cuts down the size of our CRDs by two thirds.

Let’s address partial specification. What problem are we trying to solve here? Resources like deployments come with optional and mandatory fields. That is also true for the deployment spec in our CRD. As soon as a user adds a non-empty deployment spec, they are required to also spec out all the mandatory fields. This is not ideal. A user might only be interested in overriding the replicas, without providing a full pod template.

The solution to this is not as simple as adding another parameter. We’ve come across it while looking at Banzaicloud' s operator-tools project. The idea is to provide your own definition of the spec that has all the same fields but no mandatory fields.

For example, the original deployment spec defines the pod template like this:

type DeploymentSpec struct {
    ...
    Template v1.PodTemplateSpec `json:"template"`
    ...
}

In our own definition we define it as a pointer and add the omitempty tag to prevent the serializer from adding an empty key:

type CustomDeploymentSpec struct {
    ...
    Template *v14.PodTemplateSpec `json:"template,omitempty"`
    ...
}

We also define our own Deployment type:

type CustomDeployment struct {
	ObjectMeta ObjectMeta           `json:"metadata,omitempty"`
	Spec       CustomDeploymentSpec `json:"spec,omitempty"`
}

This gives us a resources that has the same structure as a Deploymnet, but all the top level fields are optional. All that’s left now is a merge strategy to merge our overridden, custom deployment with the existing one.

What we want is a strategy that prefers existing fields in the original resource unless they are the default and ignores empty fields in the overridden resource. Thankfully, Kubernetes’ own apimachinery library contains everything we need in its strategicpatch package. apimachinery deals with schemas and conversion. strategicpatch compares two json representations of objects and produces a patch that can be applied.

Again, operator-tools contains an implementation of a Merge function using strategicpatch and we use a slightly modified version of it in the Grafana Operator.

The Result

Let’s see this in action. The default Grafana deployment created by the Operator looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  name: grafana-a-deployment
  namespace: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-a
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    ...

We want to override the strategy to Recreate, so we provide the following deployment in the Grafana CR spec:

  spec:
    deployment:
      spec:
        strategy:
          type: Recreate

We don’t need to provide any of the fields that are mandatory for deployments, only the one we are interested in overriding: spec.strategy.type. This produces an updated deployment with a strategy set to Recreate.

Conclusion

Exposing raw Kubernetes resources is a powerful way for users to configure Operands. Existing knowledge and documentation can be applied, everything that can be configured is included out of the box. There are a number of disadvantages though:

  • The size of the CRD increases considerably. If you need to configure a large number of resources, this might not work for you.
  • This comes with a risk of misconfiguration. Users can override settings that your Operator- or Operand depend on.

Overall, the advantages of such a flexible configuration system are more significant than the disadvantages like larger CRD size.