Kubernetes Operators often allow users to configure low level aspects of their operands and secondary resources. Typically, such settings are made available on the custom resource and reconciled into the operand.
An example of this is the Grafana
custom resource of
the Grafana Operator.
It exposes many configuration options that are reflected into the Grafana configuration file, but also allows you to
configure properties of the Kubernetes resources of your Grafana installation.
For example, you can add additional ports to your service, mount Secrets into the Grafana pod or expose additional
environment variables.
These are all individual fields in the custom resource that are then reconciled and applied to the respective Kubernetes resource. In this blog post I would like to describe some problems with this approach as well as offer an alternative way and insights as to then this alternative is preferable.
The problem with exposing hand-picked configuration options
The issue with hand-picking certain properties of Kubernetes resources, exposing them on a custom resource and then
reconciling them into said resources is that it’s hard to foresee what a user might want to modify.
For the Grafana Operator we often get requests to make additional fields of underlying resources configurable through
the custom resource.
Additionally, Kubernetes resources are changing with time. Granted, that happens rather slowly, but it does happen, for
example when Ingress
was promoted to v1
.
Another issue with this approach is discoverability. Someone who is already familiar with configuring, for example a route, now needs to learn how to do that through your custom resource. And often only a subset of the available options is exposed.
A better approach
How could this be improved for both, the user and the developer?
In the upcoming version of the Grafana Operator we have started exposing raw Kubernetes resources in the custom
resource.
To configure a deployment, you will have access to a Deployment
object complete with the official spec
and metadata
.
Same goes for all other resources that are managed by the Operator, the ServiceAccount
, the Route
or Ingress
and
the Service
.
Users don’t have to learn how to configure a resource through the Grafana Operator. Instead, they can focus on the Kubernetes resources they want to configure.
Difficulties
Sounds easy? Not quite. There are a few obstacles to overcome.
- Kubernetes resource definitions (e.g.
Deployment
) are huge and will bloat your CRDs. - Not suitable for partial specification when we want to allow users exactly that.
- We need a merge strategy to produce the result from the operator defaults and the user overrides.
We can tackle those issues. To address CRD bloat we’re going to strip the descriptions. When using kubebuilder
to
generate the CRDs from the code, we can pass the following parameter:
crd:maxDescLen=0
This cuts down the size of our CRDs by two thirds.
Let’s address partial specification. What problem are we trying to solve here? Resources like deployments come with optional and mandatory fields. That is also true for the deployment spec in our CRD. As soon as a user adds a non-empty deployment spec, they are required to also spec out all the mandatory fields. This is not ideal. A user might only be interested in overriding the replicas, without providing a full pod template.
The solution to this is not as simple as adding another parameter. We’ve come across it while looking at Banzaicloud' s operator-tools project. The idea is to provide your own definition of the spec that has all the same fields but no mandatory fields.
For example, the original deployment spec defines the pod template like this:
type DeploymentSpec struct {
...
Template v1.PodTemplateSpec `json:"template"`
...
}
In our own definition we define it as a pointer and add the omitempty
tag to prevent the serializer from adding an
empty key:
type CustomDeploymentSpec struct {
...
Template *v14.PodTemplateSpec `json:"template,omitempty"`
...
}
We also define our own Deployment
type:
type CustomDeployment struct {
ObjectMeta ObjectMeta `json:"metadata,omitempty"`
Spec CustomDeploymentSpec `json:"spec,omitempty"`
}
This gives us a resources that has the same structure as a Deploymnet
, but all the top level fields are optional.
All that’s left now is a merge strategy to merge our overridden, custom deployment with the existing one.
What we want is a strategy that prefers existing fields in the original resource unless they are the default and ignores empty fields in the overridden resource.
Thankfully, Kubernetes’ own apimachinery
library contains everything we need in its strategicpatch
package. apimachinery
deals with schemas and conversion.
strategicpatch
compares two json representations of objects and produces a patch that can be applied.
Again, operator-tools contains an implementation of a Merge
function using strategicpatch
and we use a slightly modified version of it in the Grafana Operator.
The Result
Let’s see this in action. The default Grafana deployment created by the Operator looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
name: grafana-a-deployment
namespace: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana-a
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
...
We want to override the strategy to Recreate
, so we provide the following deployment in the Grafana CR spec:
spec:
deployment:
spec:
strategy:
type: Recreate
We don’t need to provide any of the fields that are mandatory for deployments, only the one we are interested in overriding: spec.strategy.type
.
This produces an updated deployment with a strategy set to Recreate
.
Conclusion
Exposing raw Kubernetes resources is a powerful way for users to configure Operands. Existing knowledge and documentation can be applied, everything that can be configured is included out of the box. There are a number of disadvantages though:
- The size of the CRD increases considerably. If you need to configure a large number of resources, this might not work for you.
- This comes with a risk of misconfiguration. Users can override settings that your Operator- or Operand depend on.
Overall, the advantages of such a flexible configuration system are more significant than the disadvantages like larger CRD size.