1 - Managing clusters

How to join or remove a cluster from a fleet, and how to view the status of and label a member cluster

This how-to guide discusses how to manage clusters in a fleet, specifically:

  • how to join a cluster into a fleet; and
  • how to set a cluster to leave a fleet; and
  • how to add labels to a member cluster

Joining a cluster into a fleet

A cluster can join in a fleet if:

  • it runs a supported Kubernetes version; it is recommended that you use Kubernetes 1.24 or later versions, and
  • it has network connectivity to the hub cluster of the fleet.

For your convenience, Fleet provides a script that can automate the process of joining a cluster into a fleet. To use the script, run the commands below:

Note

To run this script, make sure that you have already installed the following tools in your system:

  • kubectl, the Kubernetes CLI
  • helm, a Kubernetes package manager
  • curl
  • jq
  • base64
# Replace the value of HUB_CLUSTER_CONTEXT with the name of the kubeconfig context you use for
# accessing your hub cluster.
export HUB_CLUSTER_CONTEXT=YOUR-HUB-CLUSTER-CONTEXT
# Replace the value of HUB_CLUSTER_ADDRESS with the address of your hub cluster API server.
export HUB_CLUSTER_ADDRESS=YOUR-HUB-CLUSTER-ADDRESS
# Replace the value of MEMBER_CLUSTER with the name you would like to assign to the new member
# cluster.
#
# Note that Fleet will recognize your cluster with this name once it joins.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
# Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use
# for accessing your member cluster.
export MEMBER_CLUSTER_CONTEXT=YOUR-MEMBER-CLUSTER-CONTEXT

# Clone the Fleet GitHub repository.
git clone https://github.com/Azure/fleet.git

# Run the script.
chmod +x fleet/hack/membership/join.sh
./fleet/hack/membership/join.sh

It may take a few minutes for the script to finish running. Once it is completed, verify that the cluster has joined successfully with the command below:

kubectl config use-context $HUB_CLUSTER_CONTEXT
kubectl get membercluster $MEMBER_CLUSTER

If you see that the cluster is still in an unknown state, it might be that the member cluster is still connecting to the hub cluster. Should this state persist for a prolonged period, refer to the Troubleshooting Guide for more information.

Alternatively, if you would like to find out the exact steps the script performs, or if you feel like fine-tuning some of the steps, you may join a cluster manually to your fleet with the instructions below:

Joining a member cluster manually
  1. Make sure that you have installed kubectl, helm, curl, jq, and base64 in your system.

  2. Create a Kubernetes service account in your hub cluster:

    # Replace the value of HUB_CLUSTER_CONTEXT with the name of the kubeconfig
    # context you use for accessing your hub cluster.
    export HUB_CLUSTER_CONTEXT="YOUR-HUB-CLUSTER-CONTEXT"
    # Replace the value of MEMBER_CLUSTER with a name you would like to assign to the new
    # member cluster.
    #
    # Note that the value of MEMBER_CLUSTER will be used as the name the member cluster registers
    # with the hub cluster.
    export MEMBER_CLUSTER="YOUR-MEMBER-CLUSTER"
    
    export SERVICE_ACCOUNT="$MEMBER_CLUSTER-hub-cluster-access"
    
    kubectl config use-context $HUB_CLUSTER_CONTEXT
    # The service account can, in theory, be created in any namespace; for simplicity reasons,
    # here you will use the namespace reserved by Fleet installation, `fleet-system`.
    #
    # Note that if you choose a different value, commands in some steps below need to be
    # modified accordingly.
    kubectl create serviceaccount $SERVICE_ACCOUNT -n fleet-system
    
  3. Create a Kubernetes secret of the service account token type, which the member cluster will use to access the hub cluster.

    export SERVICE_ACCOUNT_SECRET="$MEMBER_CLUSTER-hub-cluster-access-token"
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Secret
    metadata:
        name: $SERVICE_ACCOUNT_SECRET
        namespace: fleet-system
        annotations:
            kubernetes.io/service-account.name: $SERVICE_ACCOUNT
    type: kubernetes.io/service-account-token
    EOF
    

    After the secret is created successfully, extract the token from the secret:

    export TOKEN=$(kubectl get secret $SERVICE_ACCOUNT_SECRET -n fleet-system -o jsonpath='{.data.token}' | base64 -d)
    

    Note

    Keep the token in a secure place; anyone with access to this token can access the hub cluster in the same way as the Fleet member cluster does.

    You may have noticed that at this moment, no access control has been set on the service account; Fleet will set things up when the member cluster joins. The service account will be given the minimally viable set of permissions for the Fleet member cluster to connect to the hub cluster; its access will be restricted to one namespace, specifically reserved for the member cluster, as per security best practices.

  4. Register the member cluster with the hub cluster; Fleet manages cluster membership using the MemberCluster API:

    cat <<EOF | kubectl apply -f -
    apiVersion: cluster.kubernetes-fleet.io/v1beta1
    kind: MemberCluster
    metadata:
        name: $MEMBER_CLUSTER
    spec:
        identity:
            name: $SERVICE_ACCOUNT
            kind: ServiceAccount
            namespace: fleet-system
            apiGroup: ""
        heartbeatPeriodSeconds: 60
    EOF
    
  5. Set up the member agent, the Fleet component that works on the member cluster end, to enable Fleet connection:

    # Clone the Fleet repository from GitHub.
    git clone https://github.com/Azure/fleet.git
    
    # Install the member agent helm chart on the member cluster.
    
    # Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use
    # for member cluster access.
    export MEMBER_CLUSTER_CONTEXT="YOUR-MEMBER-CLUSTER-CONTEXT"
    
    # Replace the value of HUB_CLUSTER_ADDRESS with the address of the hub cluster API server.
    export HUB_CLUSTER_ADDRESS="YOUR-HUB-CLUSTER-ADDRESS"
    
    # The variables below uses the Fleet images kept in the Microsoft Container Registry (MCR),
    # and will retrieve the latest version from the Fleet GitHub repository.
    #
    # You can, however, build the Fleet images of your own; see the repository README for
    # more information.
    export REGISTRY="mcr.microsoft.com/aks/fleet"
    export FLEET_VERSION=$(curl "https://api.github.com/repos/Azure/fleet/tags" | jq -r '.[0].name')
    export MEMBER_AGENT_IMAGE="member-agent"
    export REFRESH_TOKEN_IMAGE="refresh-token"
    
    kubectl config use-context $MEMBER_CLUSTER_CONTEXT
    # Create the secret with the token extracted previously for member agent to use.
    kubectl create secret generic hub-kubeconfig-secret --from-literal=token=$TOKEN
    helm install member-agent fleet/charts/member-agent/ \
        --set config.hubURL=$HUB_CLUSTER_ADDRESS \
        --set image.repository=$REGISTRY/$MEMBER_AGENT_IMAGE \
        --set image.tag=$FLEET_VERSION \
        --set refreshtoken.repository=$REGISTRY/$REFRESH_TOKEN_IMAGE \
        --set refreshtoken.tag=$FLEET_VERSION \
        --set image.pullPolicy=Always \
        --set refreshtoken.pullPolicy=Always \
        --set config.memberClusterName="$MEMBER_CLUSTER" \
        --set logVerbosity=5 \
        --set namespace=fleet-system \
        --set enableV1Alpha1APIs=false \
        --set enableV1Beta1APIs=true
    
  6. Verify that the installation of the member agent is successful:

    kubectl get pods -n fleet-system
    

    You should see that all the returned pods are up and running. Note that it may take a few minutes for the member agent to get ready.

  7. Verify that the member cluster has joined the fleet successfully:

    kubectl config use-context $HUB_CLUSTER_CONTEXT
    kubectl get membercluster $MEMBER_CLUSTER
    

Setting a cluster to leave a fleet

Fleet uses the MemberCluster API to manage cluster memberships. To remove a member cluster from a fleet, simply delete its corresponding MemberCluster object from your hub cluster:

# Replace the value of MEMBER-CLUSTER with the name of the member cluster you would like to
# remove from a fleet.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
kubectl delete membercluster $MEMBER_CLUSTER

It may take a while before the member cluster leaves the fleet successfully. Fleet will perform some cleanup; all the resources placed onto the cluster will be removed.

After the member cluster leaves, you can remove the member agent installation from it using Helm:

# Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use
# for member cluster access.
export MEMBER_CLUSTER_CONTEXT=YOUR-MEMBER-CLUSTER-CONTEXT
kubectl config use-context $MEMBER_CLUSTER_CONTEXT
helm uninstall member-agent

It may take a few moments before the uninstallation completes.

Viewing the status of a member cluster

Similarly, you can use the MemberCluster API in the hub cluster to view the status of a member cluster:

# Replace the value of MEMBER-CLUSTER with the name of the member cluster of which you would like
# to view the status.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
kubectl get membercluster $MEMBER_CLUSTER -o jsonpath="{.status}"

The status consists of:

  • an array of conditions, including:

    • the ReadyToJoin condition, which signals whether the hub cluster is ready to accept the member cluster; and
    • the Joined condition, which signals whether the cluster has joined the fleet; and
    • the Healthy condition, which signals whether the cluster is in a healthy state.

    Typically, a member cluster should have all three conditions set to true. Refer to the Troubleshooting Guide for help if a cluster fails to join into a fleet.

  • the resource usage of the cluster; at this moment Fleet reports the capacity and the allocatable amount of each resource in the cluster, summed up from all nodes in the cluster.

  • an array of agent status, which reports the status of specific Fleet agents installed in the cluster; each entry features:

    • an array of conditions, in which Joined signals whether the specific agent has been successfully installed in the cluster, and Healthy signals whether the agent is in a healthy state; and
    • the timestamp of the last received heartbeat from the agent.

Adding labels to a member cluster

You can add labels to a MemberCluster object in the same as with any other Kubernetes object. These labels can then be used for targeting specific clusters in resource placement. To add a label, run the command below:

# Replace the values of MEMBER_CLUSTER, LABEL_KEY, and LABEL_VALUE with those of your own.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
export LABEL_KEY=YOUR-LABEL-KEY
export LABEL_VALUE=YOUR-LABEL-VALUE
kubectl label membercluster $MEMBER_CLUSTER $LABEL_KEY=$LABEL_VALUE

2 - Using the ClusterResourcePlacement API

How to use the ClusterResourcePlacement API

This guide provides an overview of how to use the Fleet ClusterResourcePlacement (CRP) API to orchestrate workload distribution across your fleet.

Overview

The CRP API is a core Fleet API that facilitates the distribution of specific resources from the hub cluster to member clusters within a fleet. This API offers scheduling capabilities that allow you to target the most suitable group of clusters for a set of resources using a complex rule set. For example, you can distribute resources to clusters in specific regions (North America, East Asia, Europe, etc.) and/or release stages (production, canary, etc.). You can even distribute resources according to certain topology spread constraints.

API Components

The CRP API generally consists of the following components:

  • Resource Selectors: These specify the set of resources selected for placement.
  • Scheduling Policy: This determines the set of clusters where the resources will be placed.
  • Rollout Strategy: This controls the behavior of resource placement when the resources themselves and/or the scheduling policy are updated, minimizing interruptions caused by refreshes.

The following sections discuss these components in depth.

Resource selectors

A ClusterResourcePlacement object may feature one or more resource selectors, specifying which resources to select for placement. To add a resource selector, edit the resourceSelectors field in the ClusterResourcePlacement spec:

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - group: "rbac.authorization.k8s.io"
      kind: ClusterRole
      version: v1          
      name: secretReader

The example above will pick a ClusterRole named secretReader for resource placement.

It is important to note that, as its name implies, ClusterResourcePlacement selects only cluster-scoped resources. However, if you select a namespace, all the resources under the namespace will also be placed.

Different types of resource selectors

You can specify a resource selector in many different ways:

  • To select one specific resource, such as a namespace, specify its API GVK (group, version, and kind), and its name, in the resource selector:

    # As mentioned earlier, all the resources under the namespace will also be selected.
    resourceSelectors:
      - group: ""
        kind: Namespace
        version: v1          
        name: work
    
  • Alternately, you may also select a set of resources of the same API GVK using a label selector; it also requires that you specify the API GVK and the filtering label(s):

    # As mentioned earlier, all the resources under the namespaces will also be selected.
    resourceSelectors:
      - group: ""
        kind: Namespace
        version: v1          
        labelSelector:
          matchLabels:
            system: critical
    

    In the example above, all the namespaces in the hub cluster with the label system=critical will be selected (along with the resources under them).

    Fleet uses standard Kubernetes label selectors; for its specification and usage, see the Kubernetes API reference.

  • Very occasionally, you may need to select all the resources under a specific GVK; to achieve this, use a resource selector with only the API GVK added:

    resourceSelectors:
      - group: "rbac.authorization.k8s.io"
        kind: ClusterRole
        version: v1          
    

    In the example above, all the cluster roles in the hub cluster will be picked.

Multiple resource selectors

You may specify up to 100 different resource selectors; Fleet will pick a resource if it matches any of the resource selectors specified (i.e., all selectors are OR’d).

# As mentioned earlier, all the resources under the namespace will also be selected.
resourceSelectors:
  - group: ""
    kind: Namespace
    version: v1          
    name: work
  - group: "rbac.authorization.k8s.io"
    kind: ClusterRole
    version: v1
    name: secretReader      

In the example above, Fleet will pick the namespace work (along with all the resources under it) and the cluster role secretReader.

Note

You can find the GVKs of built-in Kubernetes API objects in the Kubernetes API reference.

Scheduling policy

Each scheduling policy is associated with a placement type, which determines how Fleet will pick clusters. The ClusterResourcePlacement API supports the following placement types:

Placement typeDescription
PickFixedPick a specific set of clusters by their names.
PickAllPick all the clusters in the fleet, per some standard.
PickNPick a count of N clusters in the fleet, per some standard.

Note

Scheduling policy itself is optional. If you do not specify a scheduling policy, Fleet will assume that you would like to use a scheduling of the PickAll placement type; it effectively sets Fleet to pick all the clusters in the fleet.

Fleet does not support switching between different placement types; if you need to do so, re-create a new ClusterResourcePlacement object.

PickFixed placement type

PickFixed is the most straightforward placement type, through which you directly tell Fleet which clusters to place resources at. To use this placement type, specify the target cluster names in the clusterNames field, such as

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickFixed
    clusterNames: 
      - bravelion
      - smartfish 

The example above will place resources to two clusters, bravelion and smartfish.

PickAll placement type

PickAll placement type allows you to pick all clusters in the fleet per some standard. With this placement type, you may use affinity terms to fine-tune which clusters you would like for Fleet to pick:

  • An affinity term specifies a requirement that a cluster needs to meet, usually the presence of a label.

    There are two types of affinity terms:

    • requiredDuringSchedulingIgnoredDuringExecution terms are requirements that a cluster must meet before it can be picked; and
    • preferredDuringSchedulingIgnoredDuringExecution terms are requirements that, if a cluster meets, will set Fleet to prioritize it in scheduling.

    In the scheduling policy of the PickAll placement type, you may only use the requiredDuringSchedulingIgnoredDuringExecution terms.

Note

You can learn more about affinities in Using Affinities to Pick Clusters How-To Guide.

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                    - labelSelector:
                        matchLabels:
                            system: critical

The ClusterResourcePlacement object above will pick all the clusters with the label system:critical on them; clusters without the label will be ignored.

Fleet is forward-looking with the PickAll placement type: any cluster that satisfies the affinity terms of a ClusterResourcePlacement object, even if it joins after the ClusterResourcePlacement object is created, will be picked.

Note

You may specify a scheduling policy of the PickAll placement with no affinity; this will set Fleet to select all clusters currently present in the fleet.

PickN placement type

PickN placement type allows you to pick a specific number of clusters in the fleet for resource placement; with this placement type, you may use affinity terms and topology spread constraints to fine-tune which clusters you would like Fleet to pick.

  • An affinity term specifies a requirement that a cluster needs to meet, usually the presence of a label.

    There are two types of affinity terms:

    • requiredDuringSchedulingIgnoredDuringExecution terms are requirements that a cluster must meet before it can be picked; and
    • preferredDuringSchedulingIgnoredDuringExecution terms are requirements that, if a cluster meets, will set Fleet to prioritize it in scheduling.
  • A topology spread constraint can help you spread resources evenly across different groups of clusters. For example, you may want to have a database replica deployed in each region to enable high-availability.

Note

You can learn more about affinities in Using Affinities to Pick Clusters How-To Guide, and more about topology spread constraints in Using Topology Spread Constraints to Pick Clusters How-To Guide.

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 3
    affinity:
        clusterAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 20
                  preference:
                    - labelSelector:
                        matchLabels:
                            critical-level: 1

The ClusterResourcePlacement object above will pick first clusters with the critical-level=1 on it; if only there are not enough (less than 3) such clusters, will Fleet pick clusters with no such label.

To be more precise, with this placement type, Fleet scores clusters on how well it satisfies the affinity terms and the topology spread constraints; Fleet will assign:

  • an affinity score, for how well the cluster satisfies the affinity terms; and
  • a topology spread score, for how well the cluster satisfies the topology spread constraints.

Note

For more information on the scoring specifics, see Using Affinities to Pick Clusters How-To Guide (for affinity score) and Using Topology Spread Constraints to Pick Clusters How-To Guide (for topology spread score).

After scoring, Fleet ranks the clusters using the rule below and picks the top N clusters:

  • the cluster with the highest topology spread score ranks the highest;

  • if there are multiple clusters with the same topology spread score, the one with the highest affinity score ranks the highest;

  • if there are multiple clusters with same topology spread score and affinity score, sort their names by alphanumeric order; the one with the most significant name ranks the highest.

    This helps establish deterministic scheduling behavior.

Both affinity terms and topology spread constraints are optional. If you do not specify affinity terms or topology spread constraints, all clusters will be assigned 0 in affinity score or topology spread score respectively. When neither is added in the scheduling policy, Fleet will simply rank clusters by their names, and pick N out of them, with most significant names in alphanumeric order.

When there are not enough clusters to pick

It may happen that Fleet cannot find enough clusters to pick. In this situation, Fleet will keep looking until all N clusters are found.

Note that Fleet will stop looking once all N clusters are found, even if there appears a cluster that scores higher.

Up-scaling and downscaling

You can edit the numberOfClusters field in the scheduling policy to pick more or less clusters. When up-scaling, Fleet will score all the clusters that have not been picked earlier, and find the most appropriate ones; for downscaling, Fleet will unpick the clusters that ranks lower first.

Note

For downscaling, the ranking Fleet uses for unpicking clusters is composed when the scheduling is performed, i.e., it may not reflect the latest setup in the Fleet.

A few more points about scheduling policies

Responding to changes in the fleet

Generally speaking, once a cluster is picked by Fleet for a ClusterResourcePlacement object, it will not be unpicked even if you modify the cluster in a way that renders it unfit for the scheduling policy, e.g., you have removed a label for the cluster, which is required for some affinity term. Fleet will also not remove resources from the cluster even if the cluster becomes unhealthy, e.g., it gets disconnected from the hub cluster. This helps reduce service interruption.

However, Fleet will unpick a cluster if it leaves the fleet. If you are using a scheduling policy of the PickN placement type, Fleet will attempt to find a new cluster as replacement.

Finding the scheduling decisions Fleet makes

You can find out why Fleet picks a cluster in the status of a ClusterResourcePlacement object. For more information, see the Understanding the Status of a ClusterResourcePlacement How-To Guide.

Available fields for each placement type

The table below summarizes the available scheduling policy fields for each placement type:

PickFixedPickAllPickN
placementType
numberOfClusters
clusterNames
affinity
topologySpreadConstraints

Rollout strategy

After a ClusterResourcePlacement is created, you may want to

  • Add, update, or remove the resources that have been selected by the ClusterResourcePlacement in the hub cluster
  • Update the resource selectors in the ClusterResourcePlacement
  • Update the scheduling policy in the ClusterResourcePlacement

These changes may trigger the following outcomes:

  • New resources may need to be placed on all picked clusters
  • Resources already placed on a picked cluster may get updated or deleted
  • Some clusters picked previously are now unpicked, and resources must be removed from such clusters
  • Some clusters are newly picked, and resources must be added to them

Most outcomes can lead to service interruptions. Apps running on member clusters may temporarily become unavailable as Fleet dispatches updated resources. Clusters that are no longer selected will lose all placed resources, resulting in lost traffic. If too many new clusters are selected and Fleet places resources on them simultaneously, your backend may become overloaded. The exact interruption pattern may vary depending on the resources you place using Fleet.

To minimize interruption, Fleet allows users to configure the rollout strategy, similar to native Kubernetes deployment, to transition between changes as smoothly as possible. Currently, Fleet supports only one rollout strategy: rolling update. This strategy ensures changes, including the addition or removal of selected clusters and resource refreshes, are applied incrementally in a phased manner at a pace suitable for you. This is the default option and applies to all changes you initiate.

This rollout strategy can be configured with the following parameters:

  • maxUnavailable determines how many clusters may become unavailable during a change for the selected set of resources. It can be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.

    • Setting this parameter to a lower value will result in less interruption during a change but will lead to slower rollouts.

    • Fleet considers a cluster as unavailable if resources have not been successfully applied to the cluster.

    • How Fleet interprets this valueFleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable` number of clusters available, where N is:
      • for scheduling policies of the PickN placement type, the numberOfClusters value given;
      • for scheduling policies of the PickFixed placement type, the number of cluster names given;
      • for scheduling policies of the PickAll placement type, the number of clusters Fleet picks.

      If you use a percentage for the maxUnavailable parameter, it is calculated against N as well.

  • maxSurge determines the number of additional clusters, beyond the required number, that will receive resource placements. It can also be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.

    • Setting this parameter to a lower value will result in fewer resource placements on additional clusters by Fleet, which may slow down the rollout process.

    • How Fleet interprets this valueFleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge` number of clusters available, where N is:
      • for scheduling policies of the PickN placement type, the numberOfClusters value given;
      • for scheduling policies of the PickFixed placement type, the number of cluster names given;
      • for scheduling policies of the PickAll placement type, the number of clusters Fleet picks.

      If you use a percentage for the maxUnavailable parameter, it is calculated against N as well.

  • unavailablePeriodSeconds allows users to inform the fleet when the resources are deemed “ready”. The default value is 60 seconds.

    • Fleet only considers newly applied resources on a cluster as “ready” once unavailablePeriodSeconds seconds have passed after the resources have been successfully applied to that cluster.
    • Setting a lower value for this parameter will result in faster rollouts. However, we strongly recommend that users set it to a value that all the initialization/preparation tasks can be completed within that time frame. This ensures that the resources are typically ready after the unavailablePeriodSeconds have passed.
    • We are currently designing a generic “ready gate” for resources being applied to clusters. Please feel free to raise issues or provide feedback if you have any thoughts on this.

Note

Fleet will round numbers up if you use a percentage for maxUnavailable and/or maxSurge.

For example, if you have a ClusterResourcePlacement with a scheduling policy of the PickN placement type and a target number of clusters of 10, with the default rollout strategy, as shown in the example below,

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    ...
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
      unavailablePeriodSeconds: 60

Every time you initiate a change on selected resources, Fleet will:

  • Find 10 * 25% = 2.5, rounded up to 3 clusters, which will receive the resource refresh;
  • Wait for 60 seconds (unavailablePeriodSeconds), and repeat the process;
  • Stop when all the clusters have received the latest version of resources.

The exact period of time it takes for Fleet to complete a rollout depends not only on the unavailablePeriodSeconds, but also the actual condition of a resource placement; that is, if it takes longer for a cluster to get the resources applied successfully, Fleet will wait longer to complete the rollout, in accordance with the rolling update strategy you specified.

Note

In very extreme circumstances, rollout may get stuck, if Fleet just cannot apply resources to some clusters. You can identify this behavior if CRP status; for more information, see Understanding the Status of a ClusterResourcePlacement How-To Guide.

Snapshots and revisions

Internally, Fleet keeps a history of all the scheduling policies you have used with a ClusterResourcePlacement, and all the resource versions (snapshots) the ClusterResourcePlacement has selected. These are kept as ClusterSchedulingPolicySnapshot and ClusterResourceSnapshot objects respectively.

You can list and view such objects for reference, but you should not modify their contents (in a typical setup, such requests will be rejected automatically). To control the length of the history (i.e., how many snapshot objects Fleet will keep for a ClusterResourcePlacement), configure the revisionHistoryLimit field:

apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    ...
  strategy:
    ...
  revisionHistoryLimit: 10

The default value is 10.

Note

In this early stage, the history is kept for reference purposes only; in the future, Fleet may add features to allow rolling back to a specific scheduling policy and/or resource version.

3 - Using Affinity to Pick Clusters

How to use affinity settings in the ClusterResourcePlacement API to fine-tune Fleet scheduling decisions

This how-to guide discusses how to use affinity settings to fine-tune how Fleet picks clusters for resource placement.

Affinities terms are featured in the ClusterResourcePlacement API, specifically the scheduling policy section. Each affinity term is a particular requirement that Fleet will check against clusters; and the fulfillment of this requirement (or the lack of) would have certain effect on whether Fleet would pick a cluster for resource placement.

Fleet currently supports two types of affinity terms:

  • requiredDuringSchedulingIgnoredDuringExecution affinity terms; and
  • perferredDuringSchedulingIgnoredDuringExecution affinity terms

Most affinity terms deal with cluster labels. To manage member clusters, specifically adding/removing labels from a member cluster, see Managing Member Clusters How-To Guide.

requiredDuringSchedulingIgnoredDuringExecution affinity terms

The requiredDuringSchedulingIgnoredDuringExecution type of affinity terms serves as a hard constraint that a cluster must satisfy before it can be picked. Each term may feature:

  • a label selector, which specifies a set of labels that a cluster must have or not have before it can be picked;
  • a property selector, which specifies a cluster property requirement that a cluster must satisfy before it can be picked;
  • a combination of both.

For the specifics about property selectors, see the How-To Guide: Using Property-Based Scheduling.

matchLabels

The most straightforward way is to specify matchLabels in the label selector, as showcased below:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - labelSelector:
                    matchLabels:
                        system: critical

The example above includes a requiredDuringSchedulingIgnoredDuringExecution term which requires that the label system=critical must be present on a cluster before Fleet can pick it for the ClusterResourcePlacement.

You can add multiple labels to matchLabels; any cluster that satisfy this affinity term would have all the labels present.

matchExpressions

For more complex logic, consider using matchExpressions, which allow you to use operators to set rules for validating labels on a member cluster. Each matchExpressions requirement includes:

  • a key, which is the key of the label; and

  • a list of values, which are the possible values for the label key; and

  • an operator, which represents the relationship between the key and the list of values.

    Supported operators include:

    • In: the cluster must have a label key with one of the listed values.
    • NotIn: the cluster must have a label key that is not associated with any of the listed values.
    • Exists: the cluster must have the label key present; any value is acceptable.
    • NotExists: the cluster must not have the label key.

    If you plan to use Exists and/or NotExists, you must leave the list of values empty.

Below is an example of matchExpressions affinity term using the In operator:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - labelSelector:
                    matchExpressions:
                    - key: system
                      operator: In
                      values:
                      - critical
                      - standard

Any cluster with the label system=critical or system=standard will be picked by Fleet.

Similarly, you can also specify multiple matchExpressions requirements; any cluster that satisfy this affinity term would meet all the requirements.

Using both matchLabels and matchExpressions in one affinity term

You can specify both matchLabels and matchExpressions in one requiredDuringSchedulingIgnoredDuringExecution affinity term, as showcased below:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - labelSelector:
                    matchLabels:
                      region: east
                    matchExpressions:
                    - key: system
                      operator: Exists

With this affinity term, any cluster picked must:

  • have the label region=east present;
  • have the label system present, any value would do.

Using multiple affinity terms

You can also specify multiple requiredDuringSchedulingIgnoredDuringExecution affinity terms, as showcased below; a cluster will be picked if it can satisfy any affinity term.

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - labelSelector:
                    matchLabels:
                      region: west
                - labelSelector:
                    matchExpressions:
                    - key: system
                      operator: DoesNotExist

With these two affinity terms, any cluster picked must:

  • have the label region=west present; or
  • does not have the label system

preferredDuringSchedulingIgnoredDuringExecution affinity terms

The preferredDuringSchedulingIgnoredDuringExecution type of affinity terms serves as a soft constraint for clusters; any cluster that satisfy such terms would receive an affinity score, which Fleet uses to rank clusters when processing ClusterResourcePlacement with scheduling policy of the PickN placement type.

Each term features:

  • a weight, between -100 and 100, which is the affinity score that Fleet would assign to a cluster if it satisfies this term; and
  • a label selector, or a property sorter.

Both are required for this type of affinity terms to function.

The label selector is of the same struct as the one used in requiredDuringSchedulingIgnoredDuringExecution type of affinity terms; see the documentation above for usage.

For the specifics about property sorters, see the How-To Guide: Using Property-Based Scheduling.

Below is an example with a preferredDuringSchedulingIgnoredDuringExecution affinity term:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 10
    affinity:
        clusterAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 20
              preference:
                labelSelector:
                  matchLabels:
                  region: west

Any cluster with the region=west label would receive an affinity score of 20.

Using multiple affinity terms

Similarly, you can use multiple preferredDuringSchedulingIgnoredDuringExection affinity terms, as showcased below:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 10
    affinity:
        clusterAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 20
              preference:
                labelSelector:
                  matchLabels:
                   region: west
            - weight: -20
              preference:
                labelSelector:
                  matchLabels:
                    environment: prod

Cluster will be validated against each affinity term individually; the affinity scores it receives will be summed up. For example:

  • if a cluster has only the region=west label, it would receive an affinity score of 20; however
  • if a cluster has both the region=west and environment=prod labels, it would receive an affinity score of 20 + (-20) = 0.

Use both types of affinity terms

You can, if necessary, add both requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExection types of affinity terms. Fleet will first run all clusters against all the requiredDuringSchedulingIgnoredDuringExecution type of affinity terms, filter out any that does not meet the requirements, and then assign the rest with affinity scores per preferredDuringSchedulingIgnoredDuringExection type of affinity terms.

Below is an example with both types of affinity terms:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 10
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              clusterSelectorTerms:
                - labelSelector:
                    matchExpressions:
                    - key: system
                      operator: Exists
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 20
              preference:
                labelSelector:
                  matchLabels:
                   region: west

With these affinity terms, only clusters with the label system (any value would do) can be picked; and among them, those with the region=west will be prioritized for resource placement as they receive an affinity score of 20.

4 - Using Topology Spread Constraints to Spread Resources

How to use topology spread constraints in the ClusterResourcePlacement API to fine-tune Fleet scheduling decisions

This how-to guide discusses how to use topology spread constraints to fine-tune how Fleet picks clusters for resource placement.

Topology spread constraints are features in the ClusterResourcePlacement API, specifically the scheduling policy section. Generally speaking, these constraints can help you spread resources evenly across different groups of clusters in your fleet; or in other words, it assures that Fleet will not pick too many clusters from one group, and too little from another. You can use topology spread constraints to, for example:

  • achieve high-availability for your database backend by making sure that there is at least one database replica in each region; or
  • verify if your application can support clusters of different configurations; or
  • eliminate resource utilization hotspots in your infrastructure through spreading jobs evenly across sections.

Specifying a topology spread constraint

A topology spread constraint consists of three fields:

  • topologyKey is a label key which Fleet uses to split your clusters from a fleet into different groups.

    Specifically, clusters are grouped by the label values they have. For example, if you have three clusters in a fleet:

    • cluster bravelion with the label system=critical and region=east; and
    • cluster smartfish with the label system=critical and region=west; and
    • cluster jumpingcat with the label system=normal and region=east,

    and you use system as the topology key, the clusters will be split into 2 groups:

    • group 1 with cluster bravelion and smartfish, as they both have the value critical for label system; and
    • group 2 with cluster jumpingcat, as it has the value normal for label system.

    Note that the splitting concerns only one label system; other labels, such as region, do not count.

    If a cluster does not have the given topology key, it does not belong to any group. Fleet may still pick this cluster, as placing resources on it does not violate the associated topology spread constraint.

    This is a required field.

  • maxSkew specifies how unevenly resource placements are spread in your fleet.

    The skew of a set of resource placements are defined as the difference in count of resource placements between the group with the most and the group with the least, as split by the topology key.

    For example, in the fleet described above (3 clusters, 2 groups):

    • if Fleet picks two clusters from group A, but none from group B, the skew would be 2 - 0 = 2; however,
    • if Fleet picks one cluster from group A and one from group B, the skew would be 1 - 1 = 0.

    The minimum value of maxSkew is 1. The less you set this value with, the more evenly resource placements are spread in your fleet.

    This is a required field.

    Note

    Naturally, maxSkew only makes sense when there are no less than two groups. If you set a topology key that will not split the Fleet at all (i.e., all clusters with the given topology key has exactly the same value), the associated topology spread constraint will take no effect.

  • whenUnsatisfiable specifies what Fleet would do when it exhausts all options to satisfy the topology spread constraint; that is, picking any cluster in the fleet would lead to a violation.

    Two options are available:

    • DoNotSchedule: with this option, Fleet would guarantee that the topology spread constraint will be enforced all time; scheduling may fail if there is simply no possible way to satisfy the topology spread constraint.

    • ScheduleAnyway: with this option, Fleet would enforce the topology spread constraint in a best-effort manner; Fleet may, however, pick clusters that would violate the topology spread constraint if there is no better option.

    This is an optional field; if you do not specify a value, Fleet will use DoNotSchedule by default.

Below is an example of topology spread constraint, which tells Fleet to pick clusters evenly from different groups, split based on the value of the label system:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 3
    topologySpreadConstraints:
      - maxSkew: 2
        topologyKey: system
        whenUnsatisfiable: DoNotSchedule

How Fleet enforces topology spread constraints: topology spread scores

When you specify some topology spread constraints in the scheduling policy of a ClusterResourcePlacement object, Fleet will start picking clusters one at a time. More specifically, Fleet will:

  • for each cluster in the fleet, evaluate how skew would change if resources were placed on it.

    Depending on the current spread of resource placements, there are three possible outcomes:

    • placing resources on the cluster reduces the skew by 1; or
    • placing resources on the cluster has no effect on the skew; or
    • placing resources on the cluster increases the skew by 1.

    Fleet would then assign a topology spread score to the cluster:

    • if the provisional placement reduces the skew by 1, the cluster receives a topology spread score of 1; or

    • if the provisional placement has no effect on the skew, the cluster receives a topology spread score of 0; or

    • if the provisional placement increases the skew by 1, but does not yet exceed the max skew specified in the constraint, the cluster receives a topology spread score of -1; or

    • if the provisional placement increases the skew by 1, and has exceeded the max skew specified in the constraint,

      • for topology spread constraints with the ScheduleAnyway effect, the cluster receives a topology spread score of -1000; and
      • for those with the DoNotSchedule effect, the cluster will be removed from resource placement consideration.
  • rank the clusters based on the topology spread score and other factors (e.g., affinity), pick the one that is most appropriate.

  • repeat the process, until all the needed count of clusters are found.

Below is an example that illustrates the process:

Suppose you have a fleet of 4 clusters:

  • cluster bravelion, with label region=east and system=critical; and
  • cluster smartfish, with label region=east; and
  • cluster jumpingcat, with label region=west, and system=critical; and
  • cluster flyingpenguin, with label region=west,

And you have created a ClusterResourcePlacement as follows:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 2
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: region
        whenUnsatisfiable: DoNotSchedule

Fleet will first scan all the 4 clusters in the fleet; they all have the region label, with two different values east and west (2 cluster in each of them). This divides the clusters into two groups, the east and the west

At this stage, no cluster has been picked yet, so there is no resource placement at all. The current skew is thus 0, and placing resources on any of them would increase the skew by 1. This is still below the maxSkew threshold given, so all clusters would receive a topology spread score of -1.

Fleet could not find the most appropriate cluster based on the topology spread score so far, so it would resort to other measures for ranking clusters. This would lead Fleet to pick cluster smartfish.

Note

See Using ClusterResourcePlacement to Place Resources How-To Guide for more information on how Fleet picks clusters.

Now, one cluster has been picked, and one more is needed by the ClusterResourcePlacement object (as the numberOfClusters field is set to 2). Fleet scans the left 3 clusters again, and this time, since smartfish from group east has been picked, any more resource placement on clusters from group east would increase the skew by 1 more, and would lead to violation of the topology spread constraint; Fleet will then assign the topology spread score of -1000 to cluster bravelion, which is in group east. On the contrary, picking a cluster from any cluster in group west would reduce the skew by 1, so Fleet assigns the topology spread score of 1 to cluster jumpingcat and flyingpenguin.

With the higher topology spread score, jumpingcat and flyingpenguin become the leading candidate in ranking. They have the same topology spread score, and based on the rules Fleet has for picking clusters, jumpingcat would be picked finally.

Using multiple topology spread constraints

You can, if necessary, use multiple topology spread constraints. Fleet will evaluate each of them separately, and add up topology spread scores for each cluster for the final ranking. A cluster would be removed from resource placement consideration if placing resources on it would violate any one of the DoNotSchedule topology spread constraints.

Below is an example where two topology spread constraints are used:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 2
    topologySpreadConstraints:
      - maxSkew: 2
        topologyKey: region
        whenUnsatisfiable: DoNotSchedule
      - maxSkew: 3
        topologyKey: environment
        whenUnsatisfiable: ScheduleAnyway

Note

It might be very difficult to find candidate clusters when multiple topology spread constraints are added. Considering using the ScheduleAnyway effect to add some leeway to the scheduling, if applicable.

5 - Using Property-Based Scheduling

How to use property-based scheduling to produce scheduling decisions

This how-to guide discusses how to use property-based scheduling to produce scheduling decisions based on cluster properties.

Note

The availability of properties depend on which (and if) you have a property provider set up in your Fleet deployment. For more information, see the Concept: Property Provider and Cluster Properties documentation.

It is also recommended that you read the How-To Guide: Using Affinity to Pick Clusters first before following instructions in this document.

Fleet allows users to pick clusters based on exposed cluster properties via the affinity terms in the ClusterResourcePlacement API:

  • for the requiredDuringSchedulingIgnoredDuringExecution affinity terms, you may specify property selectors to filter clusters based on their properties;
  • for the preferredDuringSchedulingIgnoredDuringExecution affinity terms, you may specify property sorters to prefer clusters with a property that ranks higher or lower.

Property selectors in requiredDuringSchedulingIgnoredDuringExecution affinity terms

A property selector is an array of expression matchers against cluster properties. In each matcher you will specify:

  • A name, which is the name of the property.

    If the property is a non-resource one, you may refer to it directly here; however, if the property is a resource one, the name here should be of the following format:

    resources.kubernetes-fleet.io/[CAPACITY-TYPE]-[RESOURCE-NAME]
    

    where [CAPACITY-TYPE] is one of total, allocatable, or available, depending on which capacity (usage information) you would like to check against, and [RESOURCE-NAME] is the name of the resource.

    For example, if you would like to select clusters based on the available CPU capacity of a cluster, the name used in the property selector should be

    resources.kubernetes-fleet.io/available-cpu
    

    and for the allocatable memory capacity, use

    resources.kubernetes-fleet.io/allocatable-memory
    
  • A list of values, which are possible values of the property.

  • An operator, which describes the relationship between a cluster’s observed value of the given property and the list of values in the matcher.

    Currently, available operators are

    • Gt (Greater than): a cluster’s observed value of the given property must be greater than the value in the matcher before it can be picked for resource placement.
    • Ge (Greater than or equal to): a cluster’s observed value of the given property must be greater than or equal to the value in the matcher before it can be picked for resource placement.
    • Lt (Less than): a cluster’s observed value of the given property must be less than the value in the matcher before it can be picked for resource placement.
    • Le (Less than or equal to): a cluster’s observed value of the given property must be less than or equal to the value in the matcher before it can be picked for resource placement.
    • Eq (Equal to): a cluster’s observed value of the given property must be equal to the value in the matcher before it can be picked for resource placement.
    • Ne (Not equal to): a cluster’s observed value of the given property must be not equal to the value in the matcher before it can be picked for resource placement.

    Note that if you use the operator Gt, Ge, Lt, Le, Eq, or Ne, the list of values in the matcher should have exactly one value.

Fleet will evaluate each cluster, specifically their exposed properties, against the matchers; failure to satisfy any matcher in the selector will exclude the cluster from resource placement.

Note that if a cluster does not have the specified property for a matcher, it will automatically fail the matcher.

Below is an example that uses a property selector to select only clusters with a node count of at least 5 for resource placement:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - propertySelector:
                    matchExpressions:
                    - name: "kubernetes-fleet.io/node-count"
                      operator: Ge
                      values:
                      - "5"

You may use both label selector and property selector in a requiredDuringSchedulingIgnoredDuringExecution affinity term. Both selectors must be satisfied before a cluster can be picked for resource placement:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickAll
    affinity:
        clusterAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                clusterSelectorTerms:
                - labelSelector:
                    matchLabels:
                      region: east
                  propertySelector:
                    matchExpressions:
                    - name: "kubernetes-fleet.io/node-count"
                      operator: Ge
                      values:
                      - "5"

In the example above, Fleet will only consider a cluster for resource placement if it has the region=east label and a node count no less than 5.

Property sorters in preferredDuringSchedulingIgnoredDuringExecution affinity terms

A property sorter ranks all the clusters in the Fleet based on their values of a specified property in ascending or descending order, then yields weights for the clusters in proportion to their ranks. The proportional weights are calculated based on the weight value given in the preferredDuringSchedulingIgnoredDuringExecution term.

A property sorter consists of:

  • A name, which is the name of the property; see the format in the previous section for more information.

  • A sort order, which is one of Ascending and Descending, for ranking in ascending and descending order respectively.

    As a rule of thumb, when the Ascending order is used, Fleet will prefer clusters with lower observed values, and when the Descending order is used, clusters with higher observed values will be preferred.

When using the sort order Descending, the proportional weight is calculated using the formula:

((Observed Value - Minimum observed value) / (Maximum observed value - Minimum observed value)) * Weight

For example, suppose that you would like to rank clusters based on the property of available CPU capacity in descending order and currently, you have a fleet of 3 clusters with the available CPU capacities as follows:

ClusterAvailable CPU capacity
bravelion100
smartfish20
jumpingcat10

The sorter would yield the weights below:

ClusterAvailable CPU capacityWeight
bravelion100(100 - 10) / (100 - 10) = 100% of the weight
smartfish20(20 - 10) / (100 - 10) = 11.11% of the weight
jumpingcat10(10 - 10) / (100 - 10) = 0% of the weight

And when using the sort order Ascending, the proportional weight is calculated using the formula:

(1 - ((Observed Value - Minimum observed value) / (Maximum observed value - Minimum observed value))) * Weight

For example, suppose that you would like to rank clusters based on their per CPU core cost in ascending order and currently across the fleet, you have a fleet of 3 clusters with the per CPU core costs as follows:

ClusterPer CPU core cost
bravelion1
smartfish0.2
jumpingcat0.1

The sorter would yield the weights below:

ClusterPer CPU core costWeight
bravelion11 - ((1 - 0.1) / (1 - 0.1)) = 0% of the weight
smartfish0.21 - ((0.2 - 0.1) / (1 - 0.1)) = 88.89% of the weight
jumpingcat0.11 - (0.1 - 0.1) / (1 - 0.1) = 100% of the weight

The example below showcases a property sorter using the Descending order:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 10
    affinity:
        clusterAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 20
              preference:
                metricSorter:
                  name: kubernetes-fleet.io/node-count
                  sortOrder: Descending

In this example, Fleet will prefer clusters with higher node counts. The cluster with the highest node count would receive a weight of 20, and the cluster with the lowest would receive 0. Other clusters receive proportional weights calculated using the formulas above.

You may use both label selector and property sorter in a preferredDuringSchedulingIgnoredDuringExecution affinity term. A cluster that fails the label selector would receive no weight, and clusters that pass the label selector receive proportional weights under the property sorter.

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp
spec:
  resourceSelectors:
    - ...
  policy:
    placementType: PickN
    numberOfClusters: 10
    affinity:
        clusterAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 20
              preference:
                labelSelector:
                  matchLabels:
                    env: prod
                metricSorter:
                  name: resources.kubernetes-fleet.io/total-cpu
                  sortOrder: Descending

In the example above, a cluster would only receive additional weight if it has the label env=prod, and the more total CPU capacity it has, the more weight it will receive, up to the limit of 20.

6 - Using Taints and Tolerations

How to use taints and tolerations to fine-tune scheduling decisions

This how-to guide discusses how to add/remove taints on MemberCluster and how to add tolerations on ClusterResourcePlacement.

Adding taint to MemberCluster

In this example, we will add a taint to a MemberCluster. Then try to propagate resources to the MemberCluster using a ClusterResourcePlacement with PickAll placement policy. The resources should not be propagated to the MemberCluster because of the taint.

We will first create a namespace that we will propagate to the member cluster,

kubectl create ns test-ns

Then apply the MemberCluster with a taint,

Example MemberCluster with taint:

apiVersion: cluster.kubernetes-fleet.io/v1beta1
kind: MemberCluster
metadata:
  name: kind-cluster-1
spec:
  identity:
    name: fleet-member-agent-cluster-1
    kind: ServiceAccount
    namespace: fleet-system
    apiGroup: ""
  taints:
    - key: test-key1
      value: test-value1
      effect: NoSchedule

After applying the above MemberCluster, we will apply a ClusterResourcePlacement with the following spec:

  resourceSelectors:
    - group: ""
      kind: Namespace
      version: v1          
      name: test-ns
  policy:
    placementType: PickAll

The ClusterResourcePlacement CR should not propagate the test-ns namespace to the member cluster because of the taint, looking at the status of the CR should show the following:

status:
  conditions:
  - lastTransitionTime: "2024-04-16T19:03:17Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2024-04-16T19:03:17Z"
    message: All 0 cluster(s) are synchronized to the latest resources on the hub
      cluster
    observedGeneration: 2
    reason: SynchronizeSucceeded
    status: "True"
    type: ClusterResourcePlacementSynchronized
  - lastTransitionTime: "2024-04-16T19:03:17Z"
    message: There are no clusters selected to place the resources
    observedGeneration: 2
    reason: ApplySucceeded
    status: "True"
    type: ClusterResourcePlacementApplied
  observedResourceIndex: "0"
  selectedResources:
  - kind: Namespace
    name: test-ns
    version: v1

Looking at the ClusterResourcePlacementSynchronized, ClusterResourcePlacementApplied conditions and reading the message fields we can see that no clusters were selected to place the resources.

Removing taint from MemberCluster

In this example, we will remove the taint from the MemberCluster from the last section. This should automatically trigger the Fleet scheduler to propagate the resources to the MemberCluster.

After removing the taint from the MemberCluster. Let’s take a look at the status of the ClusterResourcePlacement:

status:
  conditions:
  - lastTransitionTime: "2024-04-16T20:00:03Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2024-04-16T20:02:57Z"
    message: All 1 cluster(s) are synchronized to the latest resources on the hub
      cluster
    observedGeneration: 2
    reason: SynchronizeSucceeded
    status: "True"
    type: ClusterResourcePlacementSynchronized
  - lastTransitionTime: "2024-04-16T20:02:57Z"
    message: Successfully applied resources to 1 member clusters
    observedGeneration: 2
    reason: ApplySucceeded
    status: "True"
    type: ClusterResourcePlacementApplied
  observedResourceIndex: "0"
  placementStatuses:
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2024-04-16T20:02:52Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 2
      reason: ScheduleSucceeded
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-04-16T20:02:57Z"
      message: Successfully Synchronized work(s) for placement
      observedGeneration: 2
      reason: WorkSynchronizeSucceeded
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2024-04-16T20:02:57Z"
      message: Successfully applied resources
      observedGeneration: 2
      reason: ApplySucceeded
      status: "True"
      type: Applied
  selectedResources:
  - kind: Namespace
    name: test-ns
    version: v1

From the status we can clearly see that the resources were propagated to the member cluster after removing the taint.

Adding toleration to ClusterResourcePlacement

Adding a toleration to a ClusterResourcePlacement CR allows the Fleet scheduler to tolerate specific taints on the MemberClusters.

For this section we will start from scratch, we will first create a namespace that we will propagate to the MemberCluster

kubectl create ns test-ns

Then apply the MemberCluster with a taint,

Example MemberCluster with taint:

spec:
  heartbeatPeriodSeconds: 60
  identity:
    apiGroup: ""
    kind: ServiceAccount
    name: fleet-member-agent-cluster-1
    namespace: fleet-system
  taints:
    - effect: NoSchedule
      key: test-key1
      value: test-value1

The ClusterResourcePlacement CR will not propagate the test-ns namespace to the member cluster because of the taint.

Now we will add a toleration to a ClusterResourcePlacement CR as part of the placement policy, which will use the Exists operator to tolerate the taint.

Example ClusterResourcePlacement spec with tolerations after adding new toleration:

spec:
  policy:
    placementType: PickAll
    tolerations:
      - key: test-key1
        operator: Exists
  resourceSelectors:
    - group: ""
      kind: Namespace
      name: test-ns
      version: v1
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate

Let’s take a look at the status of the ClusterResourcePlacement CR after adding the toleration:

status:
  conditions:
    - lastTransitionTime: "2024-04-16T20:16:10Z"
      message: found all the clusters needed as specified by the scheduling policy
      observedGeneration: 3
      reason: SchedulingPolicyFulfilled
      status: "True"
      type: ClusterResourcePlacementScheduled
    - lastTransitionTime: "2024-04-16T20:16:15Z"
      message: All 1 cluster(s) are synchronized to the latest resources on the hub
        cluster
      observedGeneration: 3
      reason: SynchronizeSucceeded
      status: "True"
      type: ClusterResourcePlacementSynchronized
    - lastTransitionTime: "2024-04-16T20:16:15Z"
      message: Successfully applied resources to 1 member clusters
      observedGeneration: 3
      reason: ApplySucceeded
      status: "True"
      type: ClusterResourcePlacementApplied
  observedResourceIndex: "0"
  placementStatuses:
    - clusterName: kind-cluster-1
      conditions:
        - lastTransitionTime: "2024-04-16T20:16:10Z"
          message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
          observedGeneration: 3
          reason: ScheduleSucceeded
          status: "True"
          type: Scheduled
        - lastTransitionTime: "2024-04-16T20:16:15Z"
          message: Successfully Synchronized work(s) for placement
          observedGeneration: 3
          reason: WorkSynchronizeSucceeded
          status: "True"
          type: WorkSynchronized
        - lastTransitionTime: "2024-04-16T20:16:15Z"
          message: Successfully applied resources
          observedGeneration: 3
          reason: ApplySucceeded
          status: "True"
          type: Applied
  selectedResources:
    - kind: Namespace
      name: test-ns
      version: v1

From the status we can see that the resources were propagated to the MemberCluster after adding the toleration.

Now let’s try adding a new taint to the member cluster CR and see if the resources are still propagated to the MemberCluster,

Example MemberCluster CR with new taint:

  heartbeatPeriodSeconds: 60
  identity:
    apiGroup: ""
    kind: ServiceAccount
    name: fleet-member-agent-cluster-1
    namespace: fleet-system
  taints:
  - effect: NoSchedule
    key: test-key1
    value: test-value1
  - effect: NoSchedule
    key: test-key2
    value: test-value2

Let’s take a look at the ClusterResourcePlacement CR status after adding the new taint:

status:
  conditions:
  - lastTransitionTime: "2024-04-16T20:27:44Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2024-04-16T20:27:49Z"
    message: All 1 cluster(s) are synchronized to the latest resources on the hub
      cluster
    observedGeneration: 2
    reason: SynchronizeSucceeded
    status: "True"
    type: ClusterResourcePlacementSynchronized
  - lastTransitionTime: "2024-04-16T20:27:49Z"
    message: Successfully applied resources to 1 member clusters
    observedGeneration: 2
    reason: ApplySucceeded
    status: "True"
    type: ClusterResourcePlacementApplied
  observedResourceIndex: "0"
  placementStatuses:
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2024-04-16T20:27:44Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
        score: 0, topology spread score: 0): picked by scheduling policy'
      observedGeneration: 2
      reason: ScheduleSucceeded
      status: "True"
      type: Scheduled
    - lastTransitionTime: "2024-04-16T20:27:49Z"
      message: Successfully Synchronized work(s) for placement
      observedGeneration: 2
      reason: WorkSynchronizeSucceeded
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2024-04-16T20:27:49Z"
      message: Successfully applied resources
      observedGeneration: 2
      reason: ApplySucceeded
      status: "True"
      type: Applied
  selectedResources:
  - kind: Namespace
    name: test-ns
    version: v1

Nothing changes in the status because even if the new taint is not tolerated, the exising resources on the MemberCluster will continue to run because the taint effect is NoSchedule and the cluster was already selected for resource propagation in a previous scheduling cycle.

7 - Using the ClusterResourceOverride API

How to use the ClusterResourceOverride API to override cluster-scoped resources

This guide provides an overview of how to use the Fleet ClusterResourceOverride API to override cluster resources.

Overview

ClusterResourceOverride is a feature within Fleet that allows for the modification or override of specific attributes across cluster-wide resources. With ClusterResourceOverride, you can define rules based on cluster labels or other criteria, specifying changes to be applied to various cluster-wide resources such as namespaces, roles, role bindings, or custom resource definitions. These modifications may include updates to permissions, configurations, or other parameters, ensuring consistent management and enforcement of configurations across your Fleet-managed Kubernetes clusters.

API Components

The ClusterResourceOverride API consists of the following components:

  • Placement: This specifies which placement the override is applied to.
  • Cluster Resource Selectors: These specify the set of cluster resources selected for overriding.
  • Policy: This specifies the policy to be applied to the selected resources.

The following sections discuss these components in depth.

Placement

To configure which placement the override is applied to, you can use the name of ClusterResourcePlacement.

Cluster Resource Selectors

A ClusterResourceOverride object may feature one or more cluster resource selectors, specifying which resources to select to be overridden.

The ClusterResourceSelector object supports the following fields:

  • group: The API group of the resource
  • version: The API version of the resource
  • kind: The kind of the resource
  • name: The name of the resource

Note: The resource can only be selected by name.

To add a resource selector, edit the clusterResourceSelectors field in the ClusterResourceOverride spec:

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
  name: example-cro
spec:
  placement:
    name: crp-example
  clusterResourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader

The example in the tutorial will pick the ClusterRole named secret-reader, as shown below, to be overridden.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: secret-reader
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]

Policy

The Policy is made up of a set of rules (OverrideRules) that specify the changes to be applied to the selected resources on selected clusters.

Each OverrideRule supports the following fields:

  • Cluster Selector: This specifies the set of clusters to which the override applies.
  • Override Type: This specifies the type of override to be applied. The default type is JSONPatch.
    • JSONPatch: applies the JSON patch to the selected resources using RFC 6902.
    • Delete: deletes the selected resources on the target cluster.
  • JSON Patch Override: This specifies the changes to be applied to the selected resources when the override type is JSONPatch.

Cluster Selector

To specify the clusters to which the override applies, you can use the clusterSelector field in the OverrideRule spec. The clusterSelector field supports the following fields:

  • clusterSelectorTerms: A list of terms that are used to select clusters.
    • Each term in the list is used to select clusters based on the label selector.

IMPORTANT: Only labelSelector is supported in the clusterSelectorTerms field.

Override Type

To specify the type of override to be applied, you can use the overrideType field in the OverrideRule spec. The default value is JSONPatch.

  • JSONPatch: applies the JSON patch to the selected resources using RFC 6902.
  • Delete: deletes the selected resources on the target cluster.

JSON Patch Override

To specify the changes to be applied to the selected resources, you can use the jsonPatchOverrides field in the OverrideRule spec. The jsonPatchOverrides field supports the following fields:

JSONPatchOverride applies a JSON patch on the selected resources following RFC 6902. All the fields defined follow this RFC.

  • op: The operation to be performed. The supported operations are add, remove, and replace.

    • add: Adds a new value to the specified path.
    • remove: Removes the value at the specified path.
    • replace: Replaces the value at the specified path.
  • path: The path to the field to be modified.

    • Some guidelines for the path are as follows:
      • Must start with a / character.
      • Cannot be empty.
      • Cannot contain an empty string ("///").
      • Cannot be a TypeMeta Field ("/kind", “/apiVersion”).
      • Cannot be a Metadata Field ("/metadata/name", “/metadata/namespace”), except the fields “/metadata/annotations” and “metadata/labels”.
      • Cannot be any field in the status of the resource.
    • Some examples of valid paths are:
      • /metadata/labels/new-label
      • /metadata/annotations/new-annotation
      • /spec/template/spec/containers/0/resources/limits/cpu
      • /spec/template/spec/containers/0/resources/requests/memory
  • value: The value to be set.

    • If the op is remove, the value cannot be set.
    • There is a list of reserved variables that will be replaced by the actual values:
      • ${MEMBER-CLUSTER-NAME}: this will be replaced by the name of the memberCluster that represents this cluster.
Example: Override Labels

To overwrite the existing labels on the ClusterRole named secret-reader on clusters with the label env: prod, you can use the following configuration:

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
  name: example-cro
spec:
  placement:
    name: crp-example
  clusterResourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: add
            path: /metadata/labels
            value:
              {"cluster-name":"${MEMBER-CLUSTER-NAME}"}

Note: To add a new label to the existing labels, please use the below configuration:

 - op: add
   path: /metadata/labels/new-label
   value: "new-value"

The ClusterResourceOverride object above will add a label cluster-name with the value of the memberCluster name to the ClusterRole named secret-reader on clusters with the label env: prod.

Example: Remove Verbs

To remove the verb “list” in the ClusterRole named secret-reader on clusters with the label env: prod,

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
  name: example-cro
spec:
  placement:
    name: crp-example
  clusterResourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: remove
            path: /rules/0/verbs/2

The ClusterResourceOverride object above will remove the verb “list” in the ClusterRole named secret-reader on clusters with the label env: prod selected by the clusterResourcePlacement crp-example.

The ClusterResourceOverride mentioned above utilizes the cluster role displayed below:

Name:         secret-reader
Labels:       <none>
Annotations:  <none>
PolicyRule:
Resources  Non-Resource URLs  Resource Names  Verbs
---------  -----------------  --------------  -----
secrets    []                 []              [get watch list]

Delete

The Delete override type can be used to delete the selected resources on the target cluster.

Example: Delete Selected Resource

To delete the secret-reader on the clusters with the label env: test selected by the clusterResourcePlacement crp-example, you can use the Delete override type.

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
  name: example-cro
spec:
  placement:
    name: crp-example
  clusterResourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: test
        overrideType: Delete

Multiple Override Patches

You may add multiple JSONPatchOverride to an OverrideRule to apply multiple changes to the selected cluster resources.

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
  name: example-cro
spec:
  placement:
    name: crp-example
  clusterResourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: remove
            path: /rules/0/verbs/2
          - op: remove
            path: /rules/0/verbs/1

The ClusterResourceOverride object above will remove the verbs “list” and “watch” in the ClusterRole named secret-reader on clusters with the label env: prod.

Breaking down the paths:

  • First JSONPatchOverride:
    • /rules/0: This denotes the first rule in the rules array of the ClusterRole. In the provided ClusterRole definition, there’s only one rule defined (“secrets”), so this corresponds to the first (and only) rule.
    • /verbs/2: Within this rule, the third element of the verbs array is targeted (“list”).
  • Second JSONPatchOverride:
    • /rules/0: This denotes the first rule in the rules array of the ClusterRole. In the provided ClusterRole definition, there’s only one rule defined (“secrets”), so this corresponds to the first (and only) rule.
    • /verbs/1: Within this rule, the second element of the verbs array is targeted (“watch”).

The ClusterResourceOverride mentioned above utilizes the cluster role displayed below:

Name:         secret-reader
Labels:       <none>
Annotations:  <none>
PolicyRule:
Resources  Non-Resource URLs  Resource Names  Verbs
---------  -----------------  --------------  -----
secrets    []                 []              [get watch list]

Applying the ClusterResourceOverride

Create a ClusterResourcePlacement resource to specify the placement rules for distributing the cluster resource overrides across the cluster infrastructure. Ensure that you select the appropriate resource.

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp-example
spec:
  resourceSelectors:
    - group: rbac.authorization.k8s.io
      kind: ClusterRole
      version: v1
      name: secret-reader
  policy:
    placementType: PickAll
    affinity:
      clusterAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
            - labelSelector:
                matchLabels:
                  env: test

The ClusterResourcePlacement configuration outlined above will disperse resources across all clusters labeled with env: prod. As the changes are implemented, the corresponding ClusterResourceOverride configurations will be applied to the designated clusters, triggered by the selection of matching cluster role resource secret-reader.

Verifying the Cluster Resource is Overridden

To ensure that the ClusterResourceOverride object is applied to the selected clusters, verify the ClusterResourcePlacement status by running kubectl describe crp crp-example command:

Status:
  Conditions:
    ...
    Message:                The selected resources are successfully overridden in the 10 clusters
    Observed Generation:    1
    Reason:                 OverriddenSucceeded
    Status:                 True
    Type:                   ClusterResourcePlacementOverridden
    ...
  Observed Resource Index:  0
  Placement Statuses:
    Applicable Cluster Resource Overrides:
      example-cro-0
    Cluster Name:  member-50
    Conditions:
      ...
      Message:               Successfully applied the override rules on the resources
      Observed Generation:   1
      Reason:                OverriddenSucceeded
      Status:                True
      Type:                  Overridden
     ...

Each cluster maintains its own Applicable Cluster Resource Overrides which contain the cluster resource override snapshot if relevant. Additionally, individual status messages for each cluster indicate whether the override rules have been effectively applied.

The ClusterResourcePlacementOverridden condition indicates whether the resource override has been successfully applied to the selected resources in the selected clusters.

To verify that the ClusterResourceOverride object has been successfully applied to the selected resources, check resources in the selected clusters:

  1. Get cluster credentials: az aks get-credentials --resource-group <resource-group> --name <cluster-name>
  2. Get the ClusterRole object in the selected cluster: kubectl --context=<member-cluster-context> get clusterrole secret-reader -o yaml

Upon inspecting the described ClusterRole object, it becomes apparent that the verbs “watch” and “list” have been removed from the permissions list within the ClusterRole named “secret-reader” on the prod clusters.

 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  ...
 rules:
 - apiGroups:
   - ""
   resources:
   - secrets
   verbs:
   - get

Similarly, you can verify that this cluster role does not exist in the test clusters.

8 - Using the ResourceOverride API

How to use the ResourceOverride API to override namespace-scoped resources

This guide provides an overview of how to use the Fleet ResourceOverride API to override resources.

Overview

ResourceOverride is a Fleet API that allows you to modify or override specific attributes of existing resources within your cluster. With ResourceOverride, you can define rules based on cluster labels or other criteria, specifying changes to be applied to resources such as Deployments, StatefulSets, ConfigMaps, or Secrets. These changes can include updates to container images, environment variables, resource limits, or any other configurable parameters.

API Components

The ResourceOverride API consists of the following components:

  • Placement: This specifies which placement the override is applied to.
  • Resource Selectors: These specify the set of resources selected for overriding.
  • Policy: This specifies the policy to be applied to the selected resources.

The following sections discuss these components in depth.

Placement

To configure which placement the override is applied to, you can use the name of ClusterResourcePlacement.

Resource Selectors

A ResourceOverride object may feature one or more resource selectors, specifying which resources to select to be overridden.

The ResourceSelector object supports the following fields:

  • group: The API group of the resource
  • version: The API version of the resource
  • kind: The kind of the resource
  • name: The name of the resource

Note: The resource can only be selected by name.

To add a resource selector, edit the resourceSelectors field in the ResourceOverride spec:

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
  name: example-ro
  namespace: test-namespace
spec:
  placement:
    name: crp-example
  resourceSelectors:
    -  group: apps
       kind: Deployment
       version: v1
       name: my-deployment

Note: The ResourceOverride needs to be in the same namespace as the resources it is overriding.

The examples in the tutorial will pick a Deployment named my-deployment from the namespace test-namespace, as shown below, to be overridden.

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: my-deployment
  namespace: test-namespace
  ...
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: test-nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: test-nginx
    spec:
      containers:
      - image: nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  ...

Policy

The Policy is made up of a set of rules (OverrideRules) that specify the changes to be applied to the selected resources on selected clusters.

Each OverrideRule supports the following fields:

  • Cluster Selector: This specifies the set of clusters to which the override applies.
  • Override Type: This specifies the type of override to be applied. The default type is JSONPatch.
    • JSONPatch: applies the JSON patch to the selected resources using RFC 6902.
    • Delete: deletes the selected resources on the target cluster.
  • JSON Patch Override: This specifies the changes to be applied to the selected resources when the override type is JSONPatch.

Cluster Selector

To specify the clusters to which the override applies, you can use the clusterSelector field in the OverrideRule spec. The clusterSelector field supports the following fields:

  • clusterSelectorTerms: A list of terms that are used to select clusters.
    • Each term in the list is used to select clusters based on the label selector.

IMPORTANT: Only labelSelector is supported in the clusterSelectorTerms field.

Override Type

To specify the type of override to be applied, you can use the overrideType field in the OverrideRule spec. The default value is JSONPatch.

  • JSONPatch: applies the JSON patch to the selected resources using RFC 6902.
  • Delete: deletes the selected resources on the target cluster.

JSON Patch Override

To specify the changes to be applied to the selected resources, you can use the jsonPatchOverrides field in the OverrideRule spec. The jsonPatchOverrides field supports the following fields:

JSONPatchOverride applies a JSON patch on the selected resources following RFC 6902. All the fields defined follow this RFC.

The jsonPatchOverrides field supports the following fields:

  • op: The operation to be performed. The supported operations are add, remove, and replace.

    • add: Adds a new value to the specified path.
    • remove: Removes the value at the specified path.
    • replace: Replaces the value at the specified path.
  • path: The path to the field to be modified.

    • Some guidelines for the path are as follows:
      • Must start with a / character.
      • Cannot be empty.
      • Cannot contain an empty string ("///").
      • Cannot be a TypeMeta Field ("/kind", “/apiVersion”).
      • Cannot be a Metadata Field ("/metadata/name", “/metadata/namespace”), except the fields “/metadata/annotations” and “metadata/labels”.
      • Cannot be any field in the status of the resource.
    • Some examples of valid paths are:
      • /metadata/labels/new-label
      • /metadata/annotations/new-annotation
      • /spec/template/spec/containers/0/resources/limits/cpu
      • /spec/template/spec/containers/0/resources/requests/memory
  • value: The value to be set.

    • If the op is remove, the value cannot be set.
    • There is a list of reserved variables that will be replaced by the actual values:
      • ${MEMBER-CLUSTER-NAME}: this will be replaced by the name of the memberCluster that represents this cluster.
Example: Override Labels

To overwrite the existing labels on the Deployment named my-deployment on clusters with the label env: prod, you can use the following configuration:

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
  name: example-ro
  namespace: test-namespace
spec:
  placement:
    name: crp-example
  resourceSelectors:
    -  group: apps
       kind: Deployment
       version: v1
       name: my-deployment
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: add
            path: /metadata/labels
            value:
              {"cluster-name":"${MEMBER-CLUSTER-NAME}"}

Note: To add a new label to the existing labels, please use the below configuration:

 - op: add
   path: /metadata/labels/new-label
   value: "new-value"

The ResourceOverride object above will add a label cluster-name with the value of the memberCluster name to the Deployment named example-ro on clusters with the label env: prod.

Example: Override Image

To override the image of the container in the Deployment named my-deployment on all clusters with the label env: prod:

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
  name: example-ro
  namespace: test-namespace
spec:
  placement:
    name: crp-example
  resourceSelectors:
    -  group: apps
       kind: Deployment
       version: v1
       name: my-deployment
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: replace
            path: /spec/template/spec/containers/0/image
            value: "nginx:1.20.0"

The ResourceOverride object above will replace the image of the container in the Deployment named my-deployment with the image nginx:1.20.0 on all clusters with the label env: prod selected by the clusterResourcePlacement crp-example.

The ResourceOverride mentioned above utilizes the deployment displayed below:

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: my-deployment
  namespace: test-namespace
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - image: nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
       ...
      ...
  ...

Delete

The Delete override type can be used to delete the selected resources on the target cluster.

Example: Delete Selected Resource

To delete the my-deployment on the clusters with the label env: test selected by the clusterResourcePlacement crp-example, you can use the Delete override type.

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
  name: example-ro
  namespace: test-namespace
spec:
  placement:
    name: crp-example
  resourceSelectors:
    -  group: apps
       kind: Deployment
       version: v1
       name: my-deployment
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: test
        overrideType: Delete

Multiple Override Rules

You may add multiple OverrideRules to a Policy to apply multiple changes to the selected resources.

apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
  name: example-ro
  namespace: test-namespace
spec:
  placement:
    name: crp-example
  resourceSelectors:
    -  group: apps
       kind: Deployment
       version: v1
       name: my-deployment
  policy:
    overrideRules:
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
        jsonPatchOverrides:
          - op: replace
            path: /spec/template/spec/containers/0/image
            value: "nginx:1.20.0"
      - clusterSelector:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: test
        jsonPatchOverrides:
          - op: replace
            path: /spec/template/spec/containers/0/image
            value: "nginx:latest"

The ResourceOverride object above will replace the image of the container in the Deployment named my-deployment with the image nginx:1.20.0 on all clusters with the label env: prod and the image nginx:latest on all clusters with the label env: test.

The ResourceOverride mentioned above utilizes the deployment displayed below:

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: my-deployment
  namespace: test-namespace
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - image: nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
       ...
      ...
  ...

Applying the ResourceOverride

Create a ClusterResourcePlacement resource to specify the placement rules for distributing the resource overrides across the cluster infrastructure. Ensure that you select the appropriate namespaces containing the matching resources.

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp-example
spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      name: test-namespace
      version: v1
  policy:
    placementType: PickAll
    affinity:
      clusterAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  env: prod
            - labelSelector:
                matchLabels:
                  env: test

The ClusterResourcePlacement configuration outlined above will disperse resources within test-namespace across all clusters labeled with env: prod and env: test. As the changes are implemented, the corresponding ResourceOverride configurations will be applied to the designated clusters, triggered by the selection of matching deployment resource my-deployment.

Verifying the Cluster Resource is Overridden

To ensure that the ResourceOverride object is applied to the selected resources, verify the ClusterResourcePlacement status by running kubectl describe crp crp-example command:

Status:
  Conditions:
    ...
    Message:                The selected resources are successfully overridden in the 10 clusters
    Observed Generation:    1
    Reason:                 OverriddenSucceeded
    Status:                 True
    Type:                   ClusterResourcePlacementOverridden
    ...
  Observed Resource Index:  0
  Placement Statuses:
    Applicable Resource Overrides:
      Name:        example-ro-0
      Namespace:   test-namespace
    Cluster Name:  member-50
    Conditions:
      ...
      Message:               Successfully applied the override rules on the resources
      Observed Generation:   1
      Reason:                OverriddenSucceeded
      Status:                True
      Type:                  Overridden
     ...

Each cluster maintains its own Applicable Resource Overrides which contain the resource override snapshot and the resource override namespace if relevant. Additionally, individual status messages for each cluster indicates whether the override rules have been effectively applied.

The ClusterResourcePlacementOverridden condition indicates whether the resource override has been successfully applied to the selected resources in the selected clusters.

To verify that the ResourceOverride object has been successfully applied to the selected resources, check resources in the selected clusters:

  1. Get cluster credentials: az aks get-credentials --resource-group <resource-group> --name <cluster-name>
  2. Get the Deployment object in the selected cluster: kubectl --context=<member-cluster-context> get deployment my-deployment -n test-namespace -o yaml

Upon inspecting the member cluster, it was found that the selected cluster had the label env: prod. Consequently, the image on deployment my-deployment was modified to be nginx:1.20.0 on selected cluster.

apiVersion: apps/v1
 kind: Deployment
 metadata:
   ...
   name: my-deployment
   namespace: test-namespace
   ...
 spec:
   ...
   template:
     ...
     spec:
       containers:
       - image: nginx:1.20.0
         imagePullPolicy: IfNotPresent
         name: nginx
         ports:
        ...
       ...
 status:
     ...

9 - Using Envelope Objects to Place Resources

How to use envelope objects with the ClusterResourcePlacement API

Propagating Resources with Envelope Objects

This guide provides instructions on propagating a set of resources from the hub cluster to joined member clusters within an envelope object.

Why Use Envelope Objects?

When propagating resources to member clusters using Fleet, it’s important to understand that the hub cluster itself is also a Kubernetes cluster. Without envelope objects, any resource you want to propagate would first be applied directly to the hub cluster, which can lead to some potential side effects:

  1. Unintended Side Effects: Resources like ValidatingWebhookConfigurations, MutatingWebhookConfigurations, or Admission Controllers would become active on the hub cluster, potentially intercepting and affecting hub cluster operations.

  2. Security Risks: RBAC resources (Roles, ClusterRoles, RoleBindings, ClusterRoleBindings) intended for member clusters could grant unintended permissions on the hub cluster.

  3. Resource Limitations: ResourceQuotas, FlowSchema or LimitRanges defined for member clusters would take effect on the hub cluster. While this is generally not a critical issue, there may be cases where you want to avoid these constraints on the hub.

Envelope objects solve these problems by allowing you to define resources that should be propagated without actually deploying their contents on the hub cluster. The envelope object itself is applied to the hub, but the resources it contains are only extracted and applied when they reach the member clusters.

Envelope Objects with CRDs

Fleet now supports two types of envelope Custom Resource Definitions (CRDs) for propagating resources:

  1. ClusterResourceEnvelope: Used to wrap cluster-scoped resources for placement.
  2. ResourceEnvelope: Used to wrap namespace-scoped resources for placement.

These CRDs provide a more structured and Kubernetes-native way to package resources for propagation to member clusters without causing unintended side effects on the hub cluster.

ClusterResourceEnvelope Example

The ClusterResourceEnvelope is a cluster-scoped resource that can only wrap other cluster-scoped resources. For example:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourceEnvelope
metadata:
  name: example
data:
  "webhook.yaml":
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingWebhookConfiguration
    metadata:
      name: guard
    webhooks:
    - name: guard.example.com
      rules:
      - operations: ["CREATE"]
        apiGroups: ["*"]
        apiVersions: ["*"]
        resources: ["*"]
      clientConfig:
        service:
          name: guard
          namespace: ops
      admissionReviewVersions: ["v1"]
      sideEffects: None
      timeoutSeconds: 10
  "clusterrole.yaml":
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: pod-reader
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "list", "watch"]

ResourceEnvelope Example

The ResourceEnvelope is a namespace-scoped resource that can only wrap namespace-scoped resources. For example:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ResourceEnvelope
metadata:
  name: example
  namespace: app
data:
  "cm.yaml":
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: config
      namespace: app
    data:
      foo: bar
  "deploy.yaml":
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ingress
      namespace: app
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: web
            image: nginx

Propagating envelope objects from hub cluster to member cluster

We apply our envelope objects on the hub cluster and then use a ClusterResourcePlacement object to propagate these resources from the hub to member clusters.

Example CRP spec for propagating a ResourceEnvelope:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp-with-envelope
spec:
  policy:
    clusterNames:
    - kind-cluster-1
    placementType: PickFixed
  resourceSelectors:
  - group: ""
    kind: Namespace
    name: app
    version: v1
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate

Example CRP spec for propagating a ClusterResourceEnvelope:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: crp-with-cluster-envelop
spec:
  policy:
    clusterNames:
    - kind-cluster-1
    placementType: PickFixed
  resourceSelectors:
  - group: placement.kubernetes-fleet.io
    kind: ClusterResourceEnvelope
    name: example
    version: v1beta1
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate

CRP status for ResourceEnvelope:

status:
  conditions:
  - lastTransitionTime: "2023-11-30T19:54:13Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2023-11-30T19:54:18Z"
    message: All 1 cluster(s) are synchronized to the latest resources on the hub
      cluster
    observedGeneration: 2
    reason: SynchronizeSucceeded
    status: "True"
    type: ClusterResourcePlacementSynchronized
  - lastTransitionTime: "2023-11-30T19:54:18Z"
    message: Successfully applied resources to 1 member clusters
    observedGeneration: 2
    reason: ApplySucceeded
    status: "True"
    type: ClusterResourcePlacementApplied
  placementStatuses:
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2023-11-30T19:54:13Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1:
        picked by scheduling policy'
      observedGeneration: 2
      reason: ScheduleSucceeded
      status: "True"
      type: ResourceScheduled
    - lastTransitionTime: "2023-11-30T19:54:18Z"
      message: Successfully Synchronized work(s) for placement
      observedGeneration: 2
      reason: WorkSynchronizeSucceeded
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2023-11-30T19:54:18Z"
      message: Successfully applied resources
      observedGeneration: 2
      reason: ApplySucceeded
      status: "True"
      type: ResourceApplied
  selectedResources:
  - kind: Namespace
    name: app
    version: v1
  - group: placement.kubernetes-fleet.io
    kind: ResourceEnvelope
    name: example
    namespace: app
    version: v1beta1

Note: In the selectedResources section, we specifically display the propagated envelope object. We do not individually list all the resources contained within the envelope object in the status.

Upon inspection of the selectedResources, it indicates that the namespace app and the ResourceEnvelope example have been successfully propagated. Users can further verify the successful propagation of resources contained within the envelope object by ensuring that the failedPlacements section in the placementStatus for the target cluster does not appear in the status.

Example CRP status where resources within an envelope object failed to apply

CRP status with failed ResourceEnvelope resource:

In the example below, within the placementStatus section for kind-cluster-1, the failedPlacements section provides details on a resource that failed to apply along with information about the envelope object which contained the resource.

status:
  conditions:
  - lastTransitionTime: "2023-12-06T00:09:53Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2023-12-06T00:09:58Z"
    message: All 1 cluster(s) are synchronized to the latest resources on the hub
      cluster
    observedGeneration: 2
    reason: SynchronizeSucceeded
    status: "True"
    type: ClusterResourcePlacementSynchronized
  - lastTransitionTime: "2023-12-06T00:09:58Z"
    message: Failed to apply manifests to 1 clusters, please check the `failedPlacements`
      status
    observedGeneration: 2
    reason: ApplyFailed
    status: "False"
    type: ClusterResourcePlacementApplied
  placementStatuses:
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2023-12-06T00:09:53Z"
      message: 'Successfully scheduled resources for placement in kind-cluster-1:
        picked by scheduling policy'
      observedGeneration: 2
      reason: ScheduleSucceeded
      status: "True"
      type: ResourceScheduled
    - lastTransitionTime: "2023-12-06T00:09:58Z"
      message: Successfully Synchronized work(s) for placement
      observedGeneration: 2
      reason: WorkSynchronizeSucceeded
      status: "True"
      type: WorkSynchronized
    - lastTransitionTime: "2023-12-06T00:09:58Z"
      message: Failed to apply manifests, please check the `failedPlacements` status
      observedGeneration: 2
      reason: ApplyFailed
      status: "False"
      type: ResourceApplied
    failedPlacements:
    - condition:
        lastTransitionTime: "2023-12-06T00:09:53Z"
        message: 'Failed to apply manifest: namespaces "app" not found'
        reason: AppliedManifestFailedReason
        status: "False"
        type: Applied
      envelope:
        name: example
        namespace: app
        type: ResourceEnvelope
      kind: Deployment
      name: ingress
      namespace: app
      version: apps/v1
  selectedResources:
  - kind: Namespace
    name: app
    version: v1
  - group: placement.kubernetes-fleet.io
    kind: ResourceEnvelope
    name: example
    namespace: app
    version: v1beta1

CRP status with failed ClusterResourceEnvelope resource:

Similar to namespace-scoped resources, cluster-scoped resources within a ClusterResourceEnvelope can also fail to apply:

status:
  conditions:
  - lastTransitionTime: "2023-12-06T00:09:53Z"
    message: found all the clusters needed as specified by the scheduling policy
    observedGeneration: 2
    reason: SchedulingPolicyFulfilled
    status: "True"
    type: ClusterResourcePlacementScheduled
  - lastTransitionTime: "2023-12-06T00:09:58Z"
    message: Failed to apply manifests to 1 clusters, please check the `failedPlacements`
      status
    observedGeneration: 2
    reason: ApplyFailed
    status: "False"
    type: ClusterResourcePlacementApplied
  placementStatuses:
  - clusterName: kind-cluster-1
    conditions:
    - lastTransitionTime: "2023-12-06T00:09:58Z"
      message: Failed to apply manifests, please check the `failedPlacements` status
      observedGeneration: 2
      reason: ApplyFailed
      status: "False"
      type: ResourceApplied
    failedPlacements:
    - condition:
        lastTransitionTime: "2023-12-06T00:09:53Z"
        message: 'Failed to apply manifest: service "guard" not found in namespace "ops"'
        reason: AppliedManifestFailedReason
        status: "False"
        type: Applied
      envelope:
        name: example
        type: ClusterResourceEnvelope
      kind: ValidatingWebhookConfiguration
      name: guard
      group: admissionregistration.k8s.io
      version: v1
  selectedResources:
  - group: placement.kubernetes-fleet.io
    kind: ClusterResourceEnvelope
    name: example
    version: v1beta1

10 - Controlling How Fleet Handles Pre-Existing Resources

How to fine-tune the way Fleet handles pre-existing resources

This guide provides an overview on how to set up Fleet’s takeover experience, which allows developers and admins to choose what will happen when Fleet encounters a pre-existing resource. This occurs most often in the Fleet adoption scenario, where a cluster just joins into a fleet and the system finds out that the resources to place onto the new member cluster via the CRP API have already been running there.

A concern commonly associated with this scenario is that the running (pre-existing) set of resources might have configuration differences from their equivalents on the hub cluster, for example: On the hub cluster one might have a namespace work where it hosts a deployment web-server that runs the image rpd-stars:latest; while on the member cluster in the same namespace lives a deployment of the same name but with the image umbrella-biolab:latest. If Fleet applies the resource template from the hub cluster, unexpected service interruptions might occur.

To address this concern, Fleet also introduces a new field, whenToTakeOver, in the apply strategy. Three options are available:

  • Always: This is the default option 😑. With this setting, Fleet will take over a pre-existing resource as soon as it encounters it. Fleet will apply the corresponding resource template from the hub cluster, and any value differences in the managed fields will be overwritten. This is consistent with the behavior before the new takeover experience is added.
  • IfNoDiff: This is the new option ✨ provided by the takeover mechanism. With this setting, Fleet will check for configuration differences when it finds a pre-existing resource and will only take over the resource (apply the resource template) if no configuration differences are found. Consider using this option for a safer adoption journey.
  • Never: This is another new option ✨ provided by the takeover mechanism. With this setting, Fleet will ignore pre-existing resources and no apply op will be performed. This will be considered as an apply error. Use this option if you would like to check for the presence of pre-existing resources without taking any action.

Before you begin

The new takeover experience is currently in preview.

Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default).

How Fleet can be used to safely take over pre-existing resources

The steps below explain how the takeover experience functions. The code assumes that you have a fleet of two clusters, member-1 and member-2:

  • Switch to the second member cluster, and create a namespace, work-2, with labels:

    kubectl config use-context member-2-admin
    kubectl create ns work-2
    kubectl label ns work-2 app=work-2
    kubectl label ns work-2 owner=wesker
    
  • Switch to the hub cluster, and create the same namespace, but with a slightly different set of labels:

    kubectl config use-context hub-admin
    kubectl create ns work-2
    kubectl label ns work-2 app=work-2
    kubectl label ns work-2 owner=redfield
    
  • Create a CRP object that places the namespace to all member clusters:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work-2
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work. 
          labelSelector:
            matchLabels:
              app: work-2
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1
        applyStrategy:
          whenToTakeOver: Never
    EOF
    
  • Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see a failure there that complains about an apply error on the cluster member-2:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' | jq
    # The command above uses JSON paths to query the relevant status information
    # directly and uses the jq utility to pretty print the output JSON.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    #
    # If the output is empty, the status might have not been populated properly
    # yet. Retry in a few seconds; you may also want to switch the output type
    # from jsonpath to yaml to see the full object.
    

    The output should look like this:

    {
        "clusterName": "member-1",
        "conditions": [
            ...
            {
                ...
                "status": "True",
                "type": "Applied"
            }
        ]
    },
    {
        "clusterName": "member-2",
        "conditions": [
            ...
            {
                ...
                "status": "False",
                "type": "Applied"
            }
        ],
        "failedPlacements": ...
    }
    
  • You can take a look at the failedPlacements part in the placement status for error details:

    The output should look like this:

    [
        {
            "condition": {
                "lastTransitionTime": "...",
                "message": "Failed to apply the manifest (error: no ownership of the object in the member cluster; takeover is needed)",
                "reason": "NotTakenOver",
                "status": "False",
                "type": "Applied"
            },
            "kind": "Namespace",
            "name": "work-2",
            "version": "v1"
        }
    ]
    

    Fleet finds out that the namespace work-2 already exists on the member cluster, and it is not owned by Fleet; since the takeover policy is set to Never, Fleet will not assume ownership of the namespace; no apply will be performed and an apply error will be raised instead.

    The following jq query can help you better locate clusters with failed placements and their failure details:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \
        | jq '[.[] | select (.failedPlacements != null)] | map({clusterName, failedPlacements})'
    # The command above uses JSON paths to retrieve the relevant status information
    # directly and uses the jq utility to query the data.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    

    It would filter out all the clusters that do not have failures and report only the failed clusters with the failure details:

    {
        "clusterName": "member-2",
        "failedPlacements": [
            {
                "condition": {
                    "lastTransitionTime": "...",
                    "message": "Failed to apply the manifest (error: no ownership of the object in the member cluster; takeover is needed)",
                    "reason": "NotTakenOver",
                    "status": "False",
                    "type": "Applied"
                },
                "kind": "Namespace",
                "name": "work-2",
                "version": "v1"
            }
        ]
    }
    
  • Next, update the CRP object and set the whenToTakeOver field to IfNoDiff:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work-2
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work. 
          labelSelector:
            matchLabels:
              app: work-2
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1
        applyStrategy:
          whenToTakeOver: IfNoDiff
    EOF
    
  • Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see the apply op still fails.

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
    
  • Verify the error details reported in the failedPlacements field for another time:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \
        | jq '[.[] | select (.failedPlacements != null)] | map({clusterName, failedPlacements})'
    # The command above uses JSON paths to retrieve the relevant status information
    # directly and uses the jq utility to query the data.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    

    The output has changed:

    {
        "clusterName": "member-2",
        "failedPlacements": [
            {
                "condition": {
                    "lastTransitionTime": "...",
                    "message": "Failed to apply the manifest (error: cannot take over object: configuration differences are found between the manifest object and the corresponding object in the member cluster)",
                    "reason": "FailedToTakeOver",
                    "status": "False",
                    "type": "Applied"
                },
                "kind": "Namespace",
                "name": "work-2",
                "version": "v1"
            }
        ]
    }
    

    Now, with the takeover policy set to IfNoDiff, Fleet can assume ownership of pre-existing resources; however, as a configuration difference has been found between the hub cluster and the member cluster, takeover is blocked.

  • Similar to the drift detection mechanism, Fleet will report details about the found configuration differences as well. You can learn about them in the diffedPlacements part of the status.

    Use the jq query below to list all clusters with the diffedPlacements status information populated:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \
        | jq '[.[] | select (.diffedPlacements != null)] | map({clusterName, diffedPlacements})'
    # The command above uses JSON paths to retrieve the relevant status information
    # directly and uses the jq utility to query the data.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    
    {
        "clusterName": "member-2",
        "diffedPlacements": [
            {
                "firstDiffedObservedTime": "...",
                "group": "",
                "version": "v1",
                "kind": "Namespace",    
                "name": "work-2",
                "observationTime": "...",
                "observedDiffs": [
                    {
                        "path": "/metadata/labels/owner",
                        "valueInHub": "redfield",
                        "valueInMember": "wesker"
                    }
                ],
                "targetClusterObservedGeneration": 0    
            }
        ]
    }
    

    Fleet will report the following information about a configuration difference:

    • group, kind, version, namespace, and name: the resource that has configuration differences.
    • observationTime: the timestamp where the current diff detail is collected.
    • firstDiffedObservedTime: the timestamp where the current diff is first observed.
    • observedDiffs: the diff details, specifically:
      • path: A JSON path (RFC 6901) that points to the diff’d field;
      • valueInHub: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.
      • valueInMember: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
    • targetClusterObservedGeneration: the generation of the member cluster resource.
  • To fix the configuration difference, consider one of the following options:

    • Switch the whenToTakeOver setting back to Always, which will instruct Fleet to take over the resource right away and overwrite all configuration differences; or
    • Edit the diff’d field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate diffs and should take over the resource soon after.
    • Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource.

    Here the guide will take the first option available, setting the whenToTakeOver field to Always:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work-2
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work. 
          labelSelector:
            matchLabels:
              app: work-2
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1
        applyStrategy:
          whenToTakeOver: Always
    EOF
    
  • Check the CRP status; in a few seconds, Fleet will report that all objects have been applied.

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
    

    If you switch to the member cluster member-2 now, you should see that the object looks exactly the same as the resource template kept on the hub cluster; the owner label has been over-written.

Important

When Fleet fails to take over an object, the pre-existing resource will not be put under Fleet’s management: any change made on the hub cluster side will have no effect on the pre-existing resource. If you choose to delete the resource template, or remove the CRP object, Fleet will not attempt to delete the pre-existing resource.

Takeover and comparison options

Fleet provides a comparisonOptions setting that allows you to fine-tune how Fleet calculates configuration differences between a resource template created on the hub cluster and the corresponding pre-existing resource on a member cluster.

Note

The comparisonOptions setting also controls how Fleet detects drifts. See the how-to guide on drift detection for more information.

If partialComparison is used, Fleet will only report configuration differences in managed fields, i.e., fields that are explicitly specified in the resource template; the presence of additional fields on the member cluster side will not stop Fleet from taking over the pre-existing resource; on the contrary, with fullComparison, Fleet will only take over a pre-existing resource if it looks exactly the same as its hub cluster counterpart.

Below is a table that summarizes the combos of different options and their respective effects:

whenToTakeOver settingcomparisonOption settingConfiguration difference scenarioOutcome
IfNoDiffpartialComparisonThere exists a value difference in a managed field between a pre-existing resource on a member cluster and the hub cluster resource template.Fleet will report an apply error in the status, plus the diff details.
IfNoDiffpartialComparisonThe pre-existing resource has a field that is absent on the hub cluster resource template.Fleet will take over the resource; the configuration difference in the unmanaged field will be left untouched.
IfNoDifffullComparisonDifference has been found on a field, managed or not.Fleet will report an apply error in the status, plus the diff details.
AlwaysAny optionDifference has been found on a field, managed or not.Fleet will take over the resource; configuration differences in unmanaged fields will be left untouched.

11 - Enabling Drift Detection in Fleet

How to enable drift detection in Fleet

This guide provides an overview on how to enable drift detection in Fleet. This feature can help developers and admins identify (and act upon) configuration drifts in their KubeFleet system, which are often brought by temporary fixes, inadvertent changes, and failed automations.

Before you begin

The new drift detection experience is currently in preview.

Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default).

What is a drift?

A drift occurs when a non-Fleet agent (e.g., a developer or a controller) makes changes to a field of a Fleet-managed resource directly on the member cluster side without modifying the corresponding resource template created on the hub cluster.

See the steps below for an example; the code assumes that you have a Fleet of two clusters, member-1 and member-2.

  • Switch to the hub cluster in the preview environment:

    kubectl config use-context hub-admin
    
  • Create a namespace, work, on the hub cluster, with some labels:

    kubectl create ns work
    kubectl label ns work app=work
    kubectl label ns work owner=redfield
    
  • Create a CRP object, which places the namespace on all member clusters:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work.      
          labelSelector:
            matchLabels:
              app: work
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.         
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1            
    EOF
    
  • Fleet should be able to finish the placement within seconds. To verify the progress, run the command below:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work
    

    Confirm that in the output, Fleet has reported that the placement is of the Available state.

  • Switch to the first member cluster, member-1:

    kubectl config use-context member-1-admin
    
  • You should see the namespace, work, being placed in this member cluster:

    kubectl get ns work --show-labels
    

    The output should look as follows; note that all the labels have been set (the kubernetes.io/metadata.name label is added by the Kubernetes system automatically):

    NAME     STATUS   AGE   LABELS
    work     Active   91m   app=work,owner=redfield,kubernetes.io/metadata.name=work
    
  • Anyone with proper access to the member cluster could modify the namespace as they want; for example, one can set the owner label to a different value:

    kubectl label ns work owner=wesker --overwrite
    kubectl label ns work use=hack --overwrite
    

    Now the namespace has drifted from its intended state.

Note that drifts are not necessarily a bad thing: to ensure system availability, often developers and admins would need to make ad-hoc changes to the system; for example, one might need to set a Deployment on a member cluster to use a different image from its template (as kept on the hub cluster) to test a fix. In the current version of Fleet, the system is not drift-aware, which means that Fleet will simply re-apply the resource template periodically with or without drifts.

In the case above:

  • Since the owner label has been set on the resource template, its value would be overwritten by Fleet, from wesker to redfield, within minutes. This provides a great consistency guarantee but also blocks out all possibilities of expedient fixes/changes, which can be an inconvenience at times.

  • The use label is not a part of the resource template, so it will not be affected by any apply op performed by Fleet. Its prolonged presence might pose an issue, depending on the nature of the setup.

How Fleet can be used to handle drifts gracefully

Fleet aims to provide an experience that:

  • ✅ allows developers and admins to make changes on the member cluster side when necessary; and
  • ✅ helps developers and admins to detect drifts, esp. long-living ones, in their systems, so that they can be handled properly; and
  • ✅ grants developers and admins great flexibility on when and how drifts should be handled.

To enable the new experience, set proper apply strategies in the CRP object, as illustrated by the steps below:

  • Switch to the hub cluster:

    kubectl config use-context hub-admin
    
  • Update the existing CRP (work), to use an apply strategy with the whenToApply field set to IfNotDrifted:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work. 
          labelSelector:
            matchLabels:
              app: work
      policy:
        placementType: PickAll
      strategy:
        applyStrategy:
          whenToApply: IfNotDrifted
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1                
    EOF
    

    The whenToApply field features two options:

    • Always: this is the default option 😑. With this setting, Fleet will periodically apply the resource templates from the hub cluster to member clusters, with or without drifts. This is consistent with the behavior before the new drift detection and takeover experience.
    • IfNotDrifted: this is the new option ✨ provided by the drift detection mechanism. With this setting, Fleet will check for drifts periodically; if drifts are found, Fleet will stop applying the resource templates and report in the CRP status.
  • Switch to the first member cluster and edit the labels for a second time, effectively re-introducing a drift in the system. After it’s done, switch back to the hub cluster:

    kubectl config use-context member-1-admin
    kubectl label ns work owner=wesker --overwrite
    kubectl label ns work use=hack --overwrite
    #
    kubectl config use-context hub-admin
    
  • Fleet should be able to find the drifts swiftly (w/in a few seconds). Inspect the placement status Fleet reports for each cluster:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' | jq
    # The command above uses JSON paths to query the relevant status information
    # directly and uses the jq utility to pretty print the output JSON.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    #
    # If the output is empty, the status might have not been populated properly
    # yet. Retry in a few seconds; you may also want to switch the output type
    # from jsonpath to yaml to see the full object.
    

    The output should look like this:

    {
        "clusterName": "member-1",
        "conditions": [
            ...
            {
                ...
                "status": "False",
                "type": "Applied"
            }
        ],
        "driftedPlacements": [
            {
                "firstDriftedObservedTime": "...",
                "kind": "Namespace",
                "name": "work",
                "observationTime": "...",
                "observedDrifts": [
                    {
                        "path": "/metadata/labels/owner",
                        "valueInHub": "redfield",
                        "valueInMember": "wesker"
                    }
                ],
                "targetClusterObservedGeneration": 0,
                "version": "v1"
            }
        ],
        "failedPlacements": [
            {
                "condition": {
                    "lastTransitionTime": "...",
                    "message": "Failed to apply the manifest (error: cannot apply manifest: drifts are found between the manifest and the object from the member cluster)",
                    "reason": "FoundDrifts",
                    "status": "False",
                    "type": "Applied"
                },
                "kind": "Namespace",
                "name": "work",
                "version": "v1"
            }
        ]
    },
    {
        "clusterName": "member-2",
        "conditions": [...]
    }
    

    You should see that cluster member-1 has encountered an apply failure. The failedPlacements part explains exactly which manifests have failed on member-1 and its reason; in this case, the apply op fails as Fleet finds out that the namespace work has drifted from its intended state. The driftedPlacements part specifies in detail which fields have drifted and the value differences between the hub cluster and the member cluster.

    Fleet will report the following information about a drift:

    • group, kind, version, namespace, and name: the resource that has drifted from its desired state.
    • observationTime: the timestamp where the current drift detail is collected.
    • firstDriftedObservedTime: the timestamp where the current drift is first observed.
    • observedDrifts: the drift details, specifically:
      • path: A JSON path (RFC 6901) that points to the drifted field;
      • valueInHub: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.
      • valueInMember: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
    • targetClusterObservedGeneration: the generation of the member cluster resource.

    The following jq query can help you better extract the drifted clusters and the drift details from the CRP status output:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \
        | jq '[.[] | select (.driftedPlacements != null)] | map({clusterName, driftedPlacements})'
    # The command above uses JSON paths to query the relevant status information
    # directly and uses the jq utility to pretty print the output JSON.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    

    This query would filter out all the clusters that do not have drifts and report only the drifted clusters with the drift details:

    {
        "clusterName": "member-1",
        "driftedPlacements": [
            {
                "firstDriftedObservedTime": "...",
                "kind": "Namespace",
                "name": "work",
                "observationTime": "...",
                "observedDrifts": [
                    {
                        "path": "/metadata/labels/owner",
                        "valueInHub": "redfield",
                        "valueInMember": "wesker"
                    }
                ],
                "targetClusterObservedGeneration": 0,
                "version": "v1"
            }
        ]
    }
    
  • To fix the drift, consider one of the following options:

    • Switch the whenToApply setting back to Always, which will instruct Fleet to overwrite the drifts using values from the hub cluster resource template; or
    • Edit the drifted field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate drifts and should report that no drifts are found soon after.
    • Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource.

    Important:

    The presence of drifts will NOT stop Fleet from rolling out newer resource versions. If you choose to edit the resource template on the hub cluster, Fleet will always apply the new resource template in the rollout process, which may also resolve the drift.

Comparison options

One may have found out that the namespace on the member cluster has another drift, the label use=hack, which is not reported in the CRP status by Fleet. This is because by default Fleet compares only managed fields, i.e., fields that are explicitly specified in the resource template. If a field is not populated on the hub cluster side, Fleet will not recognize its presence on the member cluster side as a drift. This allows controllers on the member cluster side to manage some fields automatically without Fleet’s involvement; for example, one might would like to use an HPA solution to auto-scale Deployments as appropriate and consequently decide not to include the .spec.replicas field in the resource template.

Fleet recognizes that there might be cases where developers and admins would like to have their resources look exactly the same across their fleet. If this scenario applies, one might set up the comparisonOptions field in the apply strategy from the partialComparison value (the default) to fullComparison:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: work
spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      version: v1
      labelSelector:
        matchLabels:
          app: work
  policy:
    placementType: PickAll
  strategy:
    applyStrategy:
      whenToApply: IfNotDrifted
      comparisonOption: fullComparison

With this setting, Fleet will recognize the presence of any unmanaged fields (i.e., fields that are present on the member cluster side, but not set on the hub cluster side) as drifts as well. If anyone adds a field to a Fleet-managed object directly on the member cluster, it would trigger an apply error, which you can find out about the details the same way as illustrated in the section above.

Summary

Below is a summary of the synergy between the whenToApply and comparisonOption settings:

whenToApply settingcomparisonOption settingDrift scenarioOutcome
IfNotDriftedpartialComparisonA managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited.Fleet will report an apply error in the status, plus the drift details.
IfNotDriftedpartialComparisonAn unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added.N/A; the change is left untouched, and Fleet will ignore it.
IfNotDriftedfullComparisonAny field is edited/added.Fleet will report an apply error in the status, plus the drift details.
AlwayspartialComparisonA managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited.N/A; the change is overwritten shortly.
AlwayspartialComparisonAn unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added.N/A; the change is left untouched, and Fleet will ignore it.
AlwaysfullComparisonAny field is edited/added.The change on managed fields will be overwritten shortly; Fleet will report drift details about changes on unmanaged fields, but this is not considered as an apply error.

12 - Using the ReportDiff Apply Mode

How to use the ReportDiff apply mode

This guide provides an overview on how to use the ReportDiff apply mode, which allows one to easily evaluate how things will change in the system without the risk of incurring unexpected changes. In this mode, Fleet will check for configuration differences between the hub cluster resource templates and their corresponding resources on the member clusters, but will not perform any apply op. This is most helpful in cases of experimentation and drift/diff analysis.

How the ReportDiff mode can help

To use this mode, simply set the type field in the apply strategy part of the CRP API from ClientSideApply (the default) or ServerSideApply to ReportDiff. Configuration differences are checked per comparisonOption setting, in consistency with the behavior documented in the drift detection how-to guide; see the document for more information.

The steps below might help explain the workflow better; it assumes that you have a fleet of two member clusters, member-1 and member-2:

  • Switch to the hub cluster and create a namespace, work-3, with some labels.

    kubectl config use-context hub-admin
    kubectl create ns work-3
    kubectl label ns work-3 app=work-3
    kubectl label ns work-3 owner=leon
    
  • Create a CRP object that places the namespace to all member clusters:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work-3
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work-3. 
          labelSelector:
            matchLabels:
              app: work-3
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1
    EOF
    
  • In a few seconds, Fleet will complete the placement. Verify that the CRP is available by checking its status.

  • After the CRP becomes available, edit its apply strategy and set it to use the ReportDiff mode:

    cat <<EOF | kubectl apply -f -
    # The YAML configuration of the CRP object.
    apiVersion: placement.kubernetes-fleet.io/v1beta1
    kind: ClusterResourcePlacement
    metadata:
      name: work-3
    spec:
      resourceSelectors:
        - group: ""
          kind: Namespace
          version: v1
          # Select all namespaces with the label app=work-3. 
          labelSelector:
            matchLabels:
              app: work-3
      policy:
        placementType: PickAll
      strategy:
        # For simplicity reasons, the CRP is configured to roll out changes to
        # all member clusters at once. This is not a setup recommended for production
        # use.      
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 100%
          unavailablePeriodSeconds: 1
        applyStrategy:
          type: ReportDiff   
    EOF
    
  • The CRP should remain available, as currently there is no configuration difference at all. Check the ClusterResourcePlacementDiffReported condition in the status; it should report no error:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-3 -o jsonpath='{.status.conditions[?(@.type=="ClusterResourcePlacementDiffReported")]}' | jq
    # The command above uses JSON paths to query the drift details directly and
    # uses the jq utility to pretty print the output JSON.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    #
    # If the output is empty, the status might have not been populated properly
    # yet. You can switch the output type from jsonpath to yaml to see the full
    # object.
    
    {
      "lastTransitionTime": "2025-03-19T06:45:58Z",
      "message": "Diff reporting in 2 cluster(s) has been completed",
      "observedGeneration": ...,
      "reason": "DiffReportingCompleted",
      "status": "True",
      "type": "ClusterResourcePlacementDiffReported"
    }
    
  • Now, switch to the second member cluster and make a label change on the applied namespace. After the change is done, switch back to the hub cluster.

    kubectl config use-context member-2-admin
    kubectl label ns work-3 owner=krauser --overwrite
    #
    kubectl config use-context hub-admin
    
  • Fleet will detect this configuration difference shortly (w/in 15 seconds). Verify that the diff details have been added to the CRP status, specifically reported in the diffedPlacements part of the status; the jq query below will list all the clusters with the diffedPlacements status information populated:

    kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-3 -o jsonpath='{.status.placementStatuses}' \
        | jq '[.[] | select (.diffedPlacements != null)] | map({clusterName, diffedPlacements})'
    # The command above uses JSON paths to retrieve the relevant status information
    # directly and uses the jq utility to query the data.
    #
    # jq might not be available in your environment. You may have to install it
    # separately, or omit it from the command.
    

    The output should be as follows:

    {
        "clusterName": "member-2",
        "diffedPlacements": [
            {
                "firstDiffedObservedTime": "2025-03-19T06:49:54Z",
                "kind": "Namespace",
                "name": "work-3",
                "observationTime": "2025-03-19T06:50:25Z",
                "observedDiffs": [
                    {
                        "path": "/metadata/labels/owner",
                        "valueInHub": "leon",
                        "valueInMember": "krauser"
                    }
                ],
                "targetClusterObservedGeneration": 0,
                "version": "v1" 
            }
        ]
    }
    

    Fleet will report the following information about a configuration difference:

    • group, kind, version, namespace, and name: the resource that has configuration differences.
    • observationTime: the timestamp where the current diff detail is collected.
    • firstDiffedObservedTime: the timestamp where the current diff is first observed.
    • observedDiffs: the diff details, specifically:
      • path: A JSON path (RFC 6901) that points to the diff’d field;
      • valueInHub: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.
      • valueInMember: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
    • targetClusterObservedGeneration: the generation of the member cluster resource.

More information on the ReportDiff mode

  • As mentioned earlier, with this mode no apply op will be run at all; it is up to the user to decide the best way to handle found configuration differences (if any).
  • Diff reporting becomes successful and complete as soon as Fleet finishes checking all the resources; whether configuration differences are found or not has no effect on the diff reporting success status.
    • When a resource change has been applied on the hub cluster side, for CRPs of the ReportDiff mode, the change will be immediately rolled out to all member clusters (when the rollout strategy is set to RollingUpdate, the default type), as soon as they have completed diff reporting earlier.
  • It is worth noting that Fleet will only report differences on resources that have corresponding manifests on the hub cluster. If, for example, a namespace-scoped object has been created on the member cluster but not on the hub cluster, Fleet will ignore the object, even if its owner namespace has been selected for placement.

13 - How to Roll Out and Roll Back Changes in Stage

How to roll out and roll back changes with the ClusterStagedUpdateRun API

This how-to guide demonstrates how to use ClusterStagedUpdateRun to rollout resources to member clusters in a staged manner and rollback resources to a previous version.

Prerequisite

ClusterStagedUpdateRun CR is used to deploy resources from hub cluster to member clusters with ClusterResourcePlacement (or CRP) in a stage by stage manner. This tutorial is based on a demo fleet environment with 3 member clusters:

cluster namelabels
member1environment=canary, order=2
member2environment=staging
member3environment=canary, order=1

To demonstrate the rollout and rollback behavior, we create a demo namespace and a sample configmap with very simple data on the hub cluster. The namespace with configmap will be deployed to the member clusters.

kubectl create ns test-namespace
kubectl create cm test-cm --from-literal=key=value1 -n test-namespace

Now we create a ClusterResourcePlacement to deploy the resources:

kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: example-placement
spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      name: test-namespace
      version: v1
  policy:
    placementType: PickAll
  strategy:
    type: External
EOF

Note that spec.strategy.type is set to External to allow rollout triggered with a ClusterStagedUpdateRun. Both clusters should be scheduled since we use the PickAll policy but at the moment no resource should be deployed on the member clusters because we haven’t created a ClusterStagedUpdateRun yet. The CRP is not AVAILABLE yet.

kubectl get crp example-placement
NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1                                           8s

Check resource snapshot versions

Fleet keeps a list of resource snapshots for version control and audit, (for more details, please refer to api-reference).

To check current resource snapshots:

kubectl get clusterresourcesnapshots --show-labels
NAME                           GEN   AGE     LABELS
example-placement-0-snapshot   1     7m31s   kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0

We only have one version of the snapshot. It is the current latest (kubernetes-fleet.io/is-latest-snapshot=true) and has resource-index 0 (kubernetes-fleet.io/resource-index=0).

Now we modify the our configmap with a new value value2:

kubectl edit cm test-cm -n test-namespace

kubectl get configmap test-cm -n test-namespace -o yaml
apiVersion: v1
data:
  key: value2     # value updated here, old value: value1
kind: ConfigMap
metadata:
  creationTimestamp: ...
  name: test-cm
  namespace: test-namespace
  resourceVersion: ...
  uid: ...

It now shows 2 versions of resource snapshots with index 0 and 1 respectively:

kubectl get clusterresourcesnapshots --show-labels
NAME                           GEN   AGE    LABELS
example-placement-0-snapshot   1     17m    kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
example-placement-1-snapshot   1     2m2s   kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=1

The latest label set to example-placement-1-snapshot which contains the latest configmap data:

kubectl get clusterresourcesnapshots example-placement-1-snapshot -o yaml
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
  ...
  labels:
    kubernetes-fleet.io/is-latest-snapshot: "true"
    kubernetes-fleet.io/parent-CRP: example-placement
    kubernetes-fleet.io/resource-index: "1"
  name: example-placement-1-snapshot
  ...
spec:
  selectedResources:
  - apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: test-namespace
      name: test-namespace
    spec:
      finalizers:
      - kubernetes
  - apiVersion: v1
    data:
      key: value2 # latest value: value2, old value: value1
    kind: ConfigMap
    metadata:
      name: test-cm
      namespace: test-namespace

Deploy a ClusterStagedUpdateStrategy

A ClusterStagedUpdateStrategy defines the orchestration pattern that groups clusters into stages and specifies the rollout sequence. It selects member clusters by labels. For our demonstration, we create one with two stages:

kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateStrategy
metadata:
  name: example-strategy
spec:
  stages:
    - name: staging
      labelSelector:
        matchLabels:
          environment: staging
      afterStageTasks:
        - type: TimedWait
          waitTime: 1m
    - name: canary
      labelSelector:
        matchLabels:
          environment: canary
      sortingLabelKey: order
      afterStageTasks:
        - type: Approval
EOF

Deploy a ClusterStagedUpdateRun to rollout latest change

A ClusterStagedUpdateRun executes the rollout of a ClusterResourcePlacement following a ClusterStagedUpdateStrategy. To trigger the staged update run for our CRP, we create a ClusterStagedUpdateRun specifying the CRP name, updateRun strategy name, and the latest resource snapshot index (“1”):

kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy
EOF

The staged update run is initialized and running:

kubectl get csur example-run
NAME          PLACEMENT           RESOURCE-SNAPSHOT   POLICY-SNAPSHOT   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                   0                 True                      44s

A more detailed look at the status:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  ...
  name: example-run
  ...
spec:
  placementName: example-placement
  resourceSnapshotIndex: "1"
  stagedRolloutStrategyName: example-strategy
status:
  conditions:
  - lastTransitionTime: ...
    message: ClusterStagedUpdateRun initialized successfully
    observedGeneration: 1
    reason: UpdateRunInitializedSuccessfully
    status: "True" # the updateRun is initialized successfully
    type: Initialized
  - lastTransitionTime: ...
    message: ""
    observedGeneration: 1
    reason: UpdateRunStarted
    status: "True"
    type: Progressing # the updateRun is still running
  deletionStageStatus:
    clusters: [] # no clusters need to be cleaned up
    stageName: kubernetes-fleet.io/deleteStage
  policyObservedClusterCount: 3 # number of clusters to be updated
  policySnapshotIndexUsed: "0"
  stagedUpdateStrategySnapshot: # snapshot of the strategy
    stages:
    - afterStageTasks:
      - type: TimedWait
        waitTime: 1m0s
      labelSelector:
        matchLabels:
          environment: staging
      name: staging
    - afterStageTasks:
      - type: Approval
      labelSelector:
        matchLabels:
          environment: canary
      name: canary
      sortingLabelKey: order
  stagesStatus: # detailed status for each stage
  - afterStageTaskStatus:
    - conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: AfterStageTaskWaitTimeElapsed
        status: "True" # the wait after-stage task has completed
        type: WaitTimeElapsed
      type: TimedWait
    clusters:
    - clusterName: member2 # stage staging contains member2 cluster only
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member2 is updated successfully
        type: Succeeded
    conditions:
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingWaiting
      status: "False"
      type: Progressing
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True" # stage staging has completed successfully
      type: Succeeded
    endTime: ...
    stageName: staging
    startTime: ...
  - afterStageTaskStatus:
    - approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage
      type: Approval
    clusters:
    - clusterName: member3 # according the labelSelector and sortingLabelKey, member3 is selected first in this stage
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True" # member3 update is completed
        type: Succeeded
    - clusterName: member1 # member1 is selected after member3 because of order=2 label
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True" # member1 update has not finished yet
        type: Started
    conditions:
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingStarted
      status: "True" # stage canary is still executing
      type: Progressing
    stageName: canary
    startTime: ...

Wait a little bit more, and we can see stage canary finishes cluster update and is waiting for the Approval task. We can check the ClusterApprovalRequest generated and not approved yet:

kubectl get clusterapprovalrequest
NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary                                 2m2s

We can approve the ClusterApprovalRequest by patching its status:

kubectl patch clusterapprovalrequests example-run-canary --type=merge -p {"status":{"conditions":[{"type":"Approved","status":"True","reason":"lgtm","message":"lgtm","lastTransitionTime":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","observedGeneration":1}]}} --subresource=status
clusterapprovalrequest.placement.kubernetes-fleet.io/example-run-canary patched

This can be done equivalently by creating a json patch file and applying it:

cat << EOF > approval.json
"status": {
    "conditions": [
        {
            "lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
            "message": "lgtm",
            "observedGeneration": 1,
            "reason": "lgtm",
            "status": "True",
            "type": "Approved"
        }
    ]
}
EOF
kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json

Then verify it’s approved:

kubectl get clusterapprovalrequest
NAME                 UPDATE-RUN    STAGE    APPROVED   APPROVALACCEPTED   AGE
example-run-canary   example-run   canary   True       True               2m30s

The updateRun now is able to proceed and complete:

kubectl get csur example-run
NAME          PLACEMENT           RESOURCE-SNAPSHOT   POLICY-SNAPSHOT   INITIALIZED   SUCCEEDED   AGE
example-run   example-placement   1                   0                 True          True        4m22s

The CRP also shows rollout has completed and resources are available on all member clusters:

kubectl get crp example-placement
NAME                GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
example-placement   1     True        1               True        1               134m

The configmap test-cm should be deployed on all 3 member clusters, with latest data:

data:
  key: value2

Deploy a second ClusterStagedUpdateRun to rollback to a previous version

Now suppose the workload admin wants to rollback the configmap change, reverting the value value2 back to value1. Instead of manually updating the configmap from hub, they can create a new ClusterStagedUpdateRun with a previous resource snapshot index, “0” in our context and they can reuse the same strategy:

kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  name: example-run-2
spec:
  placementName: example-placement
  resourceSnapshotIndex: "0"
  stagedRolloutStrategyName: example-strategy
EOF

Following the same step as deploying the first updateRun, the second updateRun should succeed also. Complete status shown as below:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
  ...
  name: example-run-2
  ...
spec:
  placementName: example-placement
  resourceSnapshotIndex: "0"
  stagedRolloutStrategyName: example-strategy
status:
  conditions:
  - lastTransitionTime: ...
    message: ClusterStagedUpdateRun initialized successfully
    observedGeneration: 1
    reason: UpdateRunInitializedSuccessfully
    status: "True"
    type: Initialized
  - lastTransitionTime: ...
    message: ""
    observedGeneration: 1
    reason: UpdateRunStarted
    status: "True"
    type: Progressing
  - lastTransitionTime: ...
    message: ""
    observedGeneration: 1
    reason: UpdateRunSucceeded # updateRun succeeded 
    status: "True"
    type: Succeeded
  deletionStageStatus:
    clusters: []
    conditions:
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingStarted
      status: "True"
      type: Progressing
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True" # no clusters in the deletion stage, it completes directly
      type: Succeeded
    endTime: ...
    stageName: kubernetes-fleet.io/deleteStage
    startTime: ...
  policyObservedClusterCount: 3
  policySnapshotIndexUsed: "0"
  stagedUpdateStrategySnapshot:
    stages:
    - afterStageTasks:
      - type: TimedWait
        waitTime: 1m0s
      labelSelector:
        matchLabels:
          environment: staging
      name: staging
    - afterStageTasks:
      - type: Approval
      labelSelector:
        matchLabels:
          environment: canary
      name: canary
      sortingLabelKey: order
  stagesStatus:
  - afterStageTaskStatus:
    - conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: AfterStageTaskWaitTimeElapsed
        status: "True"
        type: WaitTimeElapsed
      type: TimedWait
    clusters:
    - clusterName: member2
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True"
        type: Succeeded
    conditions:
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingWaiting
      status: "False"
      type: Progressing
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True"
      type: Succeeded
    endTime: ...
    stageName: staging
    startTime: ...
  - afterStageTaskStatus:
    - approvalRequestName: example-run-2-canary
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: AfterStageTaskApprovalRequestCreated
        status: "True"
        type: ApprovalRequestCreated
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: AfterStageTaskApprovalRequestApproved
        status: "True"
        type: ApprovalRequestApproved
      type: Approval
    clusters:
    - clusterName: member3
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True"
        type: Succeeded
    - clusterName: member1
      conditions:
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingStarted
        status: "True"
        type: Started
      - lastTransitionTime: ...
        message: ""
        observedGeneration: 1
        reason: ClusterUpdatingSucceeded
        status: "True"
        type: Succeeded
    conditions:
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingWaiting
      status: "False"
      type: Progressing
    - lastTransitionTime: ...
      message: ""
      observedGeneration: 1
      reason: StageUpdatingSucceeded
      status: "True"
      type: Succeeded
    endTime: ...
    stageName: canary
    startTime: ...

The configmap test-cm should be updated on all 3 member clusters, with old data:

data:
  key: value1

14 - Evicting Resources and Setting up Disruption Budgets

How to evict resources from a cluster and set up disruption budgets to protect against untimely evictions

This how-to guide discusses how to create ClusterResourcePlacementEviction objects and ClusterResourcePlacementDisruptionBudget objects to evict resources from member clusters and protect resources on member clusters from voluntary disruption, respectively.

Evicting Resources from Member Clusters using ClusterResourcePlacementEviction

The ClusterResourcePlacementEviction object is used to remove resources from a member cluster once the resources have already been propagated from the hub cluster.

To successfully evict resources from a cluster, the user needs to specify:

  • The name of the ClusterResourcePlacement object which propagated resources to the target cluster.
  • The name of the target cluster from which we need to evict resources.

In this example, we will create a ClusterResourcePlacement object with PickAll placement policy to propagate resources to an existing MemberCluster, add a taint to the member cluster resource and then create a ClusterResourcePlacementEviction object to evict resources from the MemberCluster.

We will first create a namespace that we will propagate to the member cluster.

kubectl create ns test-ns

Then we will apply a ClusterResourcePlacement with the following spec:

spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      version: v1
      name: test-ns
  policy:
    placementType: PickN
    numberOfClusters: 1

The CRP status after applying should look something like this:

kubectl get crp test-crp
NAME       GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
test-crp   2     True        2               True        2               5m49s

let’s now add a taint to the member cluster to ensure this cluster is not picked again by the scheduler once we evict resources from it.

Modify the cluster object to add a taint:

spec:
  heartbeatPeriodSeconds: 60
  identity:
    kind: ServiceAccount
    name: fleet-member-agent-cluster-1
    namespace: fleet-system
  taints:
    - effect: NoSchedule
      key: test-key
      value: test-value

Now we will create a ClusterResourcePlacementEviction object to evict resources from the member cluster:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementEviction
metadata:
  name: test-eviction
spec:
  placementName: test-crp
  clusterName: kind-cluster-1

the eviction object should look like this, if the eviction was successful:

kubectl get crpe test-eviction
NAME            VALID   EXECUTED
test-eviction   True    True

since the eviction is successful, the resources should be removed from the cluster, let’s take a look at the CRP object status to verify:

kubectl get crp test-crp
NAME       GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
test-crp   2     True        2                                           15m

from the object we can clearly tell that the resources were evicted since the AVAILABLE column is empty. If the user needs more information ClusterResourcePlacement object’s status can be checked.

Protecting resources from voluntary disruptions using ClusterResourcePlacementDisruptionBudget

In this example, we will create a ClusterResourcePlacement object with PickN placement policy to propagate resources to an existing MemberCluster, then create a ClusterResourcePlacementDisruptionBudget object to protect resources on the MemberCluster from voluntary disruption and then try to evict resources from the MemberCluster using ClusterResourcePlacementEviction.

We will first create a namespace that we will propagate to the member cluster.

kubectl create ns test-ns

Then we will apply a ClusterResourcePlacement with the following spec:

spec:
  resourceSelectors:
    - group: ""
      kind: Namespace
      version: v1
      name: test-ns
  policy:
    placementType: PickN
    numberOfClusters: 1

The CRP object after applying should look something like this:

kubectl get crp test-crp
NAME       GEN   SCHEDULED   SCHEDULED-GEN   AVAILABLE   AVAILABLE-GEN   AGE
test-crp   2     True        2               True        2               8s

Now we will create a ClusterResourcePlacementDisruptionBudget object to protect resources on the member cluster from voluntary disruption:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementDisruptionBudget
metadata:
  name: test-crp
spec:
  minAvailable: 1

Note: An eviction object is only reconciled once, after which it reaches a terminal state, if the user desires to create/apply the same eviction object again they need to delete the existing eviction object and re-create the object for the eviction to occur again.

Now we will create a ClusterResourcePlacementEviction object to evict resources from the member cluster:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementEviction
metadata:
  name: test-eviction
spec:
  placementName: test-crp
  clusterName: kind-cluster-1

Note: The eviction controller will try to get the corresponding ClusterResourcePlacementDisruptionBudget object when a ClusterResourcePlacementEviction object is reconciled to check if the specified MaxAvailable or MinAvailable allows the eviction to be executed.

let’s take a look at the eviction object to see if the eviction was executed,

kubectl get crpe test-eviction
NAME            VALID   EXECUTED
test-eviction   True    False

from the eviction object we can see the eviction was not executed.

let’s take a look at the ClusterResourcePlacementEviction object status to verify why the eviction was not executed:

status:
  conditions:
  - lastTransitionTime: "2025-01-21T15:52:29Z"
    message: Eviction is valid
    observedGeneration: 1
    reason: ClusterResourcePlacementEvictionValid
    status: "True"
    type: Valid
  - lastTransitionTime: "2025-01-21T15:52:29Z"
    message: 'Eviction is blocked by specified ClusterResourcePlacementDisruptionBudget,
      availablePlacements: 1, totalPlacements: 1'
    observedGeneration: 1
    reason: ClusterResourcePlacementEvictionNotExecuted
    status: "False"
    type: Executed

the eviction status clearly mentions that the eviction was blocked by the specified ClusterResourcePlacementDisruptionBudget.