Fleet documentation features a number of how-to guides to help you complete common Fleet tasks. Pick one below to proceed.
This is the multi-page printable view of this section. Click here to print.
How-To Guides
- 1: Managing clusters
- 2: Using the ClusterResourcePlacement API
- 3: Using Affinity to Pick Clusters
- 4: Using Topology Spread Constraints to Spread Resources
- 5: Using Property-Based Scheduling
- 6: Using Taints and Tolerations
- 7: Using the ClusterResourceOverride API
- 8: Using the ResourceOverride API
- 9: Using Envelope Objects to Place Resources
- 10: Controlling How Fleet Handles Pre-Existing Resources
- 11: Enabling Drift Detection in Fleet
- 12: Using the ReportDiff Apply Mode
- 13: How to Roll Out and Roll Back Changes in Stage
- 14: Evicting Resources and Setting up Disruption Budgets
1 - Managing clusters
This how-to guide discusses how to manage clusters in a fleet, specifically:
- how to join a cluster into a fleet; and
- how to set a cluster to leave a fleet; and
- how to add labels to a member cluster
Joining a cluster into a fleet
A cluster can join in a fleet if:
- it runs a supported Kubernetes version; it is recommended that you use Kubernetes 1.24 or later versions, and
- it has network connectivity to the hub cluster of the fleet.
For your convenience, Fleet provides a script that can automate the process of joining a cluster into a fleet. To use the script, run the commands below:
Note
To run this script, make sure that you have already installed the following tools in your system:
kubectl
, the Kubernetes CLIhelm
, a Kubernetes package managercurl
jq
base64
# Replace the value of HUB_CLUSTER_CONTEXT with the name of the kubeconfig context you use for
# accessing your hub cluster.
export HUB_CLUSTER_CONTEXT=YOUR-HUB-CLUSTER-CONTEXT
# Replace the value of HUB_CLUSTER_ADDRESS with the address of your hub cluster API server.
export HUB_CLUSTER_ADDRESS=YOUR-HUB-CLUSTER-ADDRESS
# Replace the value of MEMBER_CLUSTER with the name you would like to assign to the new member
# cluster.
#
# Note that Fleet will recognize your cluster with this name once it joins.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
# Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use
# for accessing your member cluster.
export MEMBER_CLUSTER_CONTEXT=YOUR-MEMBER-CLUSTER-CONTEXT
# Clone the Fleet GitHub repository.
git clone https://github.com/Azure/fleet.git
# Run the script.
chmod +x fleet/hack/membership/join.sh
./fleet/hack/membership/join.sh
It may take a few minutes for the script to finish running. Once it is completed, verify that the cluster has joined successfully with the command below:
kubectl config use-context $HUB_CLUSTER_CONTEXT
kubectl get membercluster $MEMBER_CLUSTER
If you see that the cluster is still in an unknown state, it might be that the member cluster is still connecting to the hub cluster. Should this state persist for a prolonged period, refer to the Troubleshooting Guide for more information.
Alternatively, if you would like to find out the exact steps the script performs, or if you feel like fine-tuning some of the steps, you may join a cluster manually to your fleet with the instructions below:
Joining a member cluster manually
Make sure that you have installed
kubectl
,helm
,curl
,jq
, andbase64
in your system.Create a Kubernetes service account in your hub cluster:
# Replace the value of HUB_CLUSTER_CONTEXT with the name of the kubeconfig # context you use for accessing your hub cluster. export HUB_CLUSTER_CONTEXT="YOUR-HUB-CLUSTER-CONTEXT" # Replace the value of MEMBER_CLUSTER with a name you would like to assign to the new # member cluster. # # Note that the value of MEMBER_CLUSTER will be used as the name the member cluster registers # with the hub cluster. export MEMBER_CLUSTER="YOUR-MEMBER-CLUSTER" export SERVICE_ACCOUNT="$MEMBER_CLUSTER-hub-cluster-access" kubectl config use-context $HUB_CLUSTER_CONTEXT # The service account can, in theory, be created in any namespace; for simplicity reasons, # here you will use the namespace reserved by Fleet installation, `fleet-system`. # # Note that if you choose a different value, commands in some steps below need to be # modified accordingly. kubectl create serviceaccount $SERVICE_ACCOUNT -n fleet-system
Create a Kubernetes secret of the service account token type, which the member cluster will use to access the hub cluster.
export SERVICE_ACCOUNT_SECRET="$MEMBER_CLUSTER-hub-cluster-access-token" cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: $SERVICE_ACCOUNT_SECRET namespace: fleet-system annotations: kubernetes.io/service-account.name: $SERVICE_ACCOUNT type: kubernetes.io/service-account-token EOF
After the secret is created successfully, extract the token from the secret:
export TOKEN=$(kubectl get secret $SERVICE_ACCOUNT_SECRET -n fleet-system -o jsonpath='{.data.token}' | base64 -d)
Note
Keep the token in a secure place; anyone with access to this token can access the hub cluster in the same way as the Fleet member cluster does.
You may have noticed that at this moment, no access control has been set on the service account; Fleet will set things up when the member cluster joins. The service account will be given the minimally viable set of permissions for the Fleet member cluster to connect to the hub cluster; its access will be restricted to one namespace, specifically reserved for the member cluster, as per security best practices.
Register the member cluster with the hub cluster; Fleet manages cluster membership using the
MemberCluster
API:cat <<EOF | kubectl apply -f - apiVersion: cluster.kubernetes-fleet.io/v1beta1 kind: MemberCluster metadata: name: $MEMBER_CLUSTER spec: identity: name: $SERVICE_ACCOUNT kind: ServiceAccount namespace: fleet-system apiGroup: "" heartbeatPeriodSeconds: 60 EOF
Set up the member agent, the Fleet component that works on the member cluster end, to enable Fleet connection:
# Clone the Fleet repository from GitHub. git clone https://github.com/Azure/fleet.git # Install the member agent helm chart on the member cluster. # Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use # for member cluster access. export MEMBER_CLUSTER_CONTEXT="YOUR-MEMBER-CLUSTER-CONTEXT" # Replace the value of HUB_CLUSTER_ADDRESS with the address of the hub cluster API server. export HUB_CLUSTER_ADDRESS="YOUR-HUB-CLUSTER-ADDRESS" # The variables below uses the Fleet images kept in the Microsoft Container Registry (MCR), # and will retrieve the latest version from the Fleet GitHub repository. # # You can, however, build the Fleet images of your own; see the repository README for # more information. export REGISTRY="mcr.microsoft.com/aks/fleet" export FLEET_VERSION=$(curl "https://api.github.com/repos/Azure/fleet/tags" | jq -r '.[0].name') export MEMBER_AGENT_IMAGE="member-agent" export REFRESH_TOKEN_IMAGE="refresh-token" kubectl config use-context $MEMBER_CLUSTER_CONTEXT # Create the secret with the token extracted previously for member agent to use. kubectl create secret generic hub-kubeconfig-secret --from-literal=token=$TOKEN helm install member-agent fleet/charts/member-agent/ \ --set config.hubURL=$HUB_CLUSTER_ADDRESS \ --set image.repository=$REGISTRY/$MEMBER_AGENT_IMAGE \ --set image.tag=$FLEET_VERSION \ --set refreshtoken.repository=$REGISTRY/$REFRESH_TOKEN_IMAGE \ --set refreshtoken.tag=$FLEET_VERSION \ --set image.pullPolicy=Always \ --set refreshtoken.pullPolicy=Always \ --set config.memberClusterName="$MEMBER_CLUSTER" \ --set logVerbosity=5 \ --set namespace=fleet-system \ --set enableV1Alpha1APIs=false \ --set enableV1Beta1APIs=true
Verify that the installation of the member agent is successful:
kubectl get pods -n fleet-system
You should see that all the returned pods are up and running. Note that it may take a few minutes for the member agent to get ready.
Verify that the member cluster has joined the fleet successfully:
kubectl config use-context $HUB_CLUSTER_CONTEXT kubectl get membercluster $MEMBER_CLUSTER
Setting a cluster to leave a fleet
Fleet uses the MemberCluster
API to manage cluster memberships. To remove a member cluster
from a fleet, simply delete its corresponding MemberCluster
object from your hub cluster:
# Replace the value of MEMBER-CLUSTER with the name of the member cluster you would like to
# remove from a fleet.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
kubectl delete membercluster $MEMBER_CLUSTER
It may take a while before the member cluster leaves the fleet successfully. Fleet will perform some cleanup; all the resources placed onto the cluster will be removed.
After the member cluster leaves, you can remove the member agent installation from it using Helm:
# Replace the value of MEMBER_CLUSTER_CONTEXT with the name of the kubeconfig context you use
# for member cluster access.
export MEMBER_CLUSTER_CONTEXT=YOUR-MEMBER-CLUSTER-CONTEXT
kubectl config use-context $MEMBER_CLUSTER_CONTEXT
helm uninstall member-agent
It may take a few moments before the uninstallation completes.
Viewing the status of a member cluster
Similarly, you can use the MemberCluster
API in the hub cluster to view the status of a
member cluster:
# Replace the value of MEMBER-CLUSTER with the name of the member cluster of which you would like
# to view the status.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
kubectl get membercluster $MEMBER_CLUSTER -o jsonpath="{.status}"
The status consists of:
an array of conditions, including:
- the
ReadyToJoin
condition, which signals whether the hub cluster is ready to accept the member cluster; and - the
Joined
condition, which signals whether the cluster has joined the fleet; and - the
Healthy
condition, which signals whether the cluster is in a healthy state.
Typically, a member cluster should have all three conditions set to true. Refer to the Troubleshooting Guide for help if a cluster fails to join into a fleet.
- the
the resource usage of the cluster; at this moment Fleet reports the capacity and the allocatable amount of each resource in the cluster, summed up from all nodes in the cluster.
an array of agent status, which reports the status of specific Fleet agents installed in the cluster; each entry features:
- an array of conditions, in which
Joined
signals whether the specific agent has been successfully installed in the cluster, andHealthy
signals whether the agent is in a healthy state; and - the timestamp of the last received heartbeat from the agent.
- an array of conditions, in which
Adding labels to a member cluster
You can add labels to a MemberCluster
object in the same as with any other Kubernetes object.
These labels can then be used for targeting specific clusters in resource placement. To add a label,
run the command below:
# Replace the values of MEMBER_CLUSTER, LABEL_KEY, and LABEL_VALUE with those of your own.
export MEMBER_CLUSTER=YOUR-MEMBER-CLUSTER
export LABEL_KEY=YOUR-LABEL-KEY
export LABEL_VALUE=YOUR-LABEL-VALUE
kubectl label membercluster $MEMBER_CLUSTER $LABEL_KEY=$LABEL_VALUE
2 - Using the ClusterResourcePlacement API
ClusterResourcePlacement
APIThis guide provides an overview of how to use the Fleet ClusterResourcePlacement
(CRP) API to orchestrate workload distribution across your fleet.
Overview
The CRP API is a core Fleet API that facilitates the distribution of specific resources from the hub cluster to member clusters within a fleet. This API offers scheduling capabilities that allow you to target the most suitable group of clusters for a set of resources using a complex rule set. For example, you can distribute resources to clusters in specific regions (North America, East Asia, Europe, etc.) and/or release stages (production, canary, etc.). You can even distribute resources according to certain topology spread constraints.
API Components
The CRP API generally consists of the following components:
- Resource Selectors: These specify the set of resources selected for placement.
- Scheduling Policy: This determines the set of clusters where the resources will be placed.
- Rollout Strategy: This controls the behavior of resource placement when the resources themselves and/or the scheduling policy are updated, minimizing interruptions caused by refreshes.
The following sections discuss these components in depth.
Resource selectors
A ClusterResourcePlacement
object may feature one or more resource selectors,
specifying which resources to select for placement. To add a resource selector, edit
the resourceSelectors
field in the ClusterResourcePlacement
spec:
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- group: "rbac.authorization.k8s.io"
kind: ClusterRole
version: v1
name: secretReader
The example above will pick a ClusterRole
named secretReader
for resource placement.
It is important to note that, as its name implies, ClusterResourcePlacement
selects only
cluster-scoped resources. However, if you select a namespace, all the resources under the
namespace will also be placed.
Different types of resource selectors
You can specify a resource selector in many different ways:
To select one specific resource, such as a namespace, specify its API GVK (group, version, and kind), and its name, in the resource selector:
# As mentioned earlier, all the resources under the namespace will also be selected. resourceSelectors: - group: "" kind: Namespace version: v1 name: work
Alternately, you may also select a set of resources of the same API GVK using a label selector; it also requires that you specify the API GVK and the filtering label(s):
# As mentioned earlier, all the resources under the namespaces will also be selected. resourceSelectors: - group: "" kind: Namespace version: v1 labelSelector: matchLabels: system: critical
In the example above, all the namespaces in the hub cluster with the label
system=critical
will be selected (along with the resources under them).Fleet uses standard Kubernetes label selectors; for its specification and usage, see the Kubernetes API reference.
Very occasionally, you may need to select all the resources under a specific GVK; to achieve this, use a resource selector with only the API GVK added:
resourceSelectors: - group: "rbac.authorization.k8s.io" kind: ClusterRole version: v1
In the example above, all the cluster roles in the hub cluster will be picked.
Multiple resource selectors
You may specify up to 100 different resource selectors; Fleet will pick a resource if it matches any of the resource selectors specified (i.e., all selectors are OR’d).
# As mentioned earlier, all the resources under the namespace will also be selected.
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: work
- group: "rbac.authorization.k8s.io"
kind: ClusterRole
version: v1
name: secretReader
In the example above, Fleet will pick the namespace work
(along with all the resources
under it) and the cluster role secretReader
.
Note
You can find the GVKs of built-in Kubernetes API objects in the Kubernetes API reference.
Scheduling policy
Each scheduling policy is associated with a placement type, which determines how Fleet will
pick clusters. The ClusterResourcePlacement
API supports the following placement types:
Placement type | Description |
---|---|
PickFixed | Pick a specific set of clusters by their names. |
PickAll | Pick all the clusters in the fleet, per some standard. |
PickN | Pick a count of N clusters in the fleet, per some standard. |
Note
Scheduling policy itself is optional. If you do not specify a scheduling policy, Fleet will assume that you would like to use a scheduling of the
PickAll
placement type; it effectively sets Fleet to pick all the clusters in the fleet.
Fleet does not support switching between different placement types; if you need to do
so, re-create a new ClusterResourcePlacement
object.
PickFixed
placement type
PickFixed
is the most straightforward placement type, through which you directly tell Fleet
which clusters to place resources at. To use this placement type, specify the target cluster
names in the clusterNames
field, such as
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickFixed
clusterNames:
- bravelion
- smartfish
The example above will place resources to two clusters, bravelion
and smartfish
.
PickAll
placement type
PickAll
placement type allows you to pick all clusters in the fleet per some standard. With
this placement type, you may use affinity terms to fine-tune which clusters you would like
for Fleet to pick:
An affinity term specifies a requirement that a cluster needs to meet, usually the presence of a label.
There are two types of affinity terms:
requiredDuringSchedulingIgnoredDuringExecution
terms are requirements that a cluster must meet before it can be picked; andpreferredDuringSchedulingIgnoredDuringExecution
terms are requirements that, if a cluster meets, will set Fleet to prioritize it in scheduling.
In the scheduling policy of the
PickAll
placement type, you may only use therequiredDuringSchedulingIgnoredDuringExecution
terms.
Note
You can learn more about affinities in Using Affinities to Pick Clusters How-To Guide.
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
system: critical
The ClusterResourcePlacement
object above will pick all the clusters with the label
system:critical
on them; clusters without the label will be ignored.
Fleet is forward-looking with the PickAll
placement type: any cluster that satisfies the
affinity terms of a ClusterResourcePlacement
object, even if it joins after the
ClusterResourcePlacement
object is created, will be picked.
Note
You may specify a scheduling policy of the
PickAll
placement with no affinity; this will set Fleet to select all clusters currently present in the fleet.
PickN
placement type
PickN
placement type allows you to pick a specific number of clusters in the fleet for resource
placement; with this placement type, you may use affinity terms and topology spread constraints
to fine-tune which clusters you would like Fleet to pick.
An affinity term specifies a requirement that a cluster needs to meet, usually the presence of a label.
There are two types of affinity terms:
requiredDuringSchedulingIgnoredDuringExecution
terms are requirements that a cluster must meet before it can be picked; andpreferredDuringSchedulingIgnoredDuringExecution
terms are requirements that, if a cluster meets, will set Fleet to prioritize it in scheduling.
A topology spread constraint can help you spread resources evenly across different groups of clusters. For example, you may want to have a database replica deployed in each region to enable high-availability.
Note
You can learn more about affinities in Using Affinities to Pick Clusters How-To Guide, and more about topology spread constraints in Using Topology Spread Constraints to Pick Clusters How-To Guide.
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 3
affinity:
clusterAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
- labelSelector:
matchLabels:
critical-level: 1
The ClusterResourcePlacement
object above will pick first clusters with the critical-level=1
on it; if only there are not enough (less than 3) such clusters, will Fleet pick clusters with no
such label.
To be more precise, with this placement type, Fleet scores clusters on how well it satisfies the affinity terms and the topology spread constraints; Fleet will assign:
- an affinity score, for how well the cluster satisfies the affinity terms; and
- a topology spread score, for how well the cluster satisfies the topology spread constraints.
Note
For more information on the scoring specifics, see Using Affinities to Pick Clusters How-To Guide (for affinity score) and Using Topology Spread Constraints to Pick Clusters How-To Guide (for topology spread score).
After scoring, Fleet ranks the clusters using the rule below and picks the top N clusters:
the cluster with the highest topology spread score ranks the highest;
if there are multiple clusters with the same topology spread score, the one with the highest affinity score ranks the highest;
if there are multiple clusters with same topology spread score and affinity score, sort their names by alphanumeric order; the one with the most significant name ranks the highest.
This helps establish deterministic scheduling behavior.
Both affinity terms and topology spread constraints are optional. If you do not specify affinity terms or topology spread constraints, all clusters will be assigned 0 in affinity score or topology spread score respectively. When neither is added in the scheduling policy, Fleet will simply rank clusters by their names, and pick N out of them, with most significant names in alphanumeric order.
When there are not enough clusters to pick
It may happen that Fleet cannot find enough clusters to pick. In this situation, Fleet will keep looking until all N clusters are found.
Note that Fleet will stop looking once all N clusters are found, even if there appears a cluster that scores higher.
Up-scaling and downscaling
You can edit the numberOfClusters
field in the scheduling policy to pick more or less clusters.
When up-scaling, Fleet will score all the clusters that have not been picked earlier, and find
the most appropriate ones; for downscaling, Fleet will unpick the clusters that ranks lower
first.
Note
For downscaling, the ranking Fleet uses for unpicking clusters is composed when the scheduling is performed, i.e., it may not reflect the latest setup in the Fleet.
A few more points about scheduling policies
Responding to changes in the fleet
Generally speaking, once a cluster is picked by Fleet for a ClusterResourcePlacement
object,
it will not be unpicked even if you modify the cluster in a way that renders it unfit for
the scheduling policy, e.g., you have removed a label for the cluster, which is required for
some affinity term. Fleet will also not remove resources from the cluster even if the cluster
becomes unhealthy, e.g., it gets disconnected from the hub cluster. This helps reduce service
interruption.
However, Fleet will unpick a cluster if it leaves the fleet. If you are using a scheduling
policy of the PickN
placement type, Fleet will attempt to find a new cluster as replacement.
Finding the scheduling decisions Fleet makes
You can find out why Fleet picks a cluster in the status of a ClusterResourcePlacement
object.
For more information, see the
Understanding the Status of a ClusterResourcePlacement
How-To Guide.
Available fields for each placement type
The table below summarizes the available scheduling policy fields for each placement type:
PickFixed | PickAll | PickN | |
---|---|---|---|
placementType | ✅ | ✅ | ✅ |
numberOfClusters | ❌ | ❌ | ✅ |
clusterNames | ✅ | ❌ | ❌ |
affinity | ❌ | ✅ | ✅ |
topologySpreadConstraints | ❌ | ❌ | ✅ |
Rollout strategy
After a ClusterResourcePlacement
is created, you may want to
- Add, update, or remove the resources that have been selected by the
ClusterResourcePlacement
in the hub cluster - Update the resource selectors in the
ClusterResourcePlacement
- Update the scheduling policy in the
ClusterResourcePlacement
These changes may trigger the following outcomes:
- New resources may need to be placed on all picked clusters
- Resources already placed on a picked cluster may get updated or deleted
- Some clusters picked previously are now unpicked, and resources must be removed from such clusters
- Some clusters are newly picked, and resources must be added to them
Most outcomes can lead to service interruptions. Apps running on member clusters may temporarily become unavailable as Fleet dispatches updated resources. Clusters that are no longer selected will lose all placed resources, resulting in lost traffic. If too many new clusters are selected and Fleet places resources on them simultaneously, your backend may become overloaded. The exact interruption pattern may vary depending on the resources you place using Fleet.
To minimize interruption, Fleet allows users to configure the rollout strategy, similar to native Kubernetes deployment, to transition between changes as smoothly as possible. Currently, Fleet supports only one rollout strategy: rolling update. This strategy ensures changes, including the addition or removal of selected clusters and resource refreshes, are applied incrementally in a phased manner at a pace suitable for you. This is the default option and applies to all changes you initiate.
This rollout strategy can be configured with the following parameters:
maxUnavailable
determines how many clusters may become unavailable during a change for the selected set of resources. It can be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.Setting this parameter to a lower value will result in less interruption during a change but will lead to slower rollouts.
Fleet considers a cluster as unavailable if resources have not been successfully applied to the cluster.
How Fleet interprets this value
Fleet, in actuality, makes sure that at any time, there are **at least** N - `maxUnavailable` number of clusters available, where N is:- for scheduling policies of the
PickN
placement type, thenumberOfClusters
value given; - for scheduling policies of the
PickFixed
placement type, the number of cluster names given; - for scheduling policies of the
PickAll
placement type, the number of clusters Fleet picks.
If you use a percentage for the
maxUnavailable
parameter, it is calculated against N as well.- for scheduling policies of the
maxSurge
determines the number of additional clusters, beyond the required number, that will receive resource placements. It can also be set as an absolute number or a percentage. The default is 25%, and zero should not be used for this value.Setting this parameter to a lower value will result in fewer resource placements on additional clusters by Fleet, which may slow down the rollout process.
How Fleet interprets this value
Fleet, in actuality, makes sure that at any time, there are **at most** N + `maxSurge` number of clusters available, where N is:- for scheduling policies of the
PickN
placement type, thenumberOfClusters
value given; - for scheduling policies of the
PickFixed
placement type, the number of cluster names given; - for scheduling policies of the
PickAll
placement type, the number of clusters Fleet picks.
If you use a percentage for the
maxUnavailable
parameter, it is calculated against N as well.- for scheduling policies of the
unavailablePeriodSeconds
allows users to inform the fleet when the resources are deemed “ready”. The default value is 60 seconds.- Fleet only considers newly applied resources on a cluster as “ready” once
unavailablePeriodSeconds
seconds have passed after the resources have been successfully applied to that cluster. - Setting a lower value for this parameter will result in faster rollouts. However, we strongly
recommend that users set it to a value that all the initialization/preparation tasks can be completed within
that time frame. This ensures that the resources are typically ready after the
unavailablePeriodSeconds
have passed. - We are currently designing a generic “ready gate” for resources being applied to clusters. Please feel free to raise issues or provide feedback if you have any thoughts on this.
- Fleet only considers newly applied resources on a cluster as “ready” once
Note
Fleet will round numbers up if you use a percentage for
maxUnavailable
and/ormaxSurge
.
For example, if you have a ClusterResourcePlacement
with a scheduling policy of the PickN
placement type and a target number of clusters of 10, with the default rollout strategy, as
shown in the example below,
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
...
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
unavailablePeriodSeconds: 60
Every time you initiate a change on selected resources, Fleet will:
- Find
10 * 25% = 2.5, rounded up to 3
clusters, which will receive the resource refresh; - Wait for 60 seconds (
unavailablePeriodSeconds
), and repeat the process; - Stop when all the clusters have received the latest version of resources.
The exact period of time it takes for Fleet to complete a rollout depends not only on the
unavailablePeriodSeconds
, but also the actual condition of a resource placement; that is,
if it takes longer for a cluster to get the resources applied successfully, Fleet will wait
longer to complete the rollout, in accordance with the rolling update strategy you specified.
Note
In very extreme circumstances, rollout may get stuck, if Fleet just cannot apply resources to some clusters. You can identify this behavior if CRP status; for more information, see Understanding the Status of a
ClusterResourcePlacement
How-To Guide.
Snapshots and revisions
Internally, Fleet keeps a history of all the scheduling policies you have used with a
ClusterResourcePlacement
, and all the resource versions (snapshots) the
ClusterResourcePlacement
has selected. These are kept as ClusterSchedulingPolicySnapshot
and ClusterResourceSnapshot
objects respectively.
You can list and view such objects for reference, but you should not modify their contents
(in a typical setup, such requests will be rejected automatically). To control the length
of the history (i.e., how many snapshot objects Fleet will keep for a ClusterResourcePlacement
),
configure the revisionHistoryLimit
field:
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
...
strategy:
...
revisionHistoryLimit: 10
The default value is 10.
Note
In this early stage, the history is kept for reference purposes only; in the future, Fleet may add features to allow rolling back to a specific scheduling policy and/or resource version.
3 - Using Affinity to Pick Clusters
ClusterResourcePlacement
API to fine-tune Fleet scheduling decisionsThis how-to guide discusses how to use affinity settings to fine-tune how Fleet picks clusters for resource placement.
Affinities terms are featured in the ClusterResourcePlacement
API, specifically the scheduling
policy section. Each affinity term is a particular requirement that Fleet will check against clusters;
and the fulfillment of this requirement (or the lack of) would have certain effect on whether
Fleet would pick a cluster for resource placement.
Fleet currently supports two types of affinity terms:
requiredDuringSchedulingIgnoredDuringExecution
affinity terms; andperferredDuringSchedulingIgnoredDuringExecution
affinity terms
Most affinity terms deal with cluster labels. To manage member clusters, specifically adding/removing labels from a member cluster, see Managing Member Clusters How-To Guide.
requiredDuringSchedulingIgnoredDuringExecution
affinity terms
The requiredDuringSchedulingIgnoredDuringExecution
type of affinity terms serves as a hard
constraint that a cluster must satisfy before it can be picked. Each term may feature:
- a label selector, which specifies a set of labels that a cluster must have or not have before it can be picked;
- a property selector, which specifies a cluster property requirement that a cluster must satisfy before it can be picked;
- a combination of both.
For the specifics about property selectors, see the How-To Guide: Using Property-Based Scheduling.
matchLabels
The most straightforward way is to specify matchLabels
in the label selector, as showcased below:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
system: critical
The example above includes a requiredDuringSchedulingIgnoredDuringExecution
term which requires
that the label system=critical
must be present on a cluster before Fleet can pick it for the
ClusterResourcePlacement
.
You can add multiple labels to matchLabels
; any cluster that satisfy this affinity term would
have all the labels present.
matchExpressions
For more complex logic, consider using matchExpressions
, which allow you to use operators to
set rules for validating labels on a member cluster. Each matchExpressions
requirement
includes:
a key, which is the key of the label; and
a list of values, which are the possible values for the label key; and
an operator, which represents the relationship between the key and the list of values.
Supported operators include:
In
: the cluster must have a label key with one of the listed values.NotIn
: the cluster must have a label key that is not associated with any of the listed values.Exists
: the cluster must have the label key present; any value is acceptable.NotExists
: the cluster must not have the label key.
If you plan to use
Exists
and/orNotExists
, you must leave the list of values empty.
Below is an example of matchExpressions
affinity term using the In
operator:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchExpressions:
- key: system
operator: In
values:
- critical
- standard
Any cluster with the label system=critical
or system=standard
will be picked by Fleet.
Similarly, you can also specify multiple matchExpressions
requirements; any cluster that
satisfy this affinity term would meet all the requirements.
Using both matchLabels
and matchExpressions
in one affinity term
You can specify both matchLabels
and matchExpressions
in one requiredDuringSchedulingIgnoredDuringExecution
affinity term, as showcased below:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
region: east
matchExpressions:
- key: system
operator: Exists
With this affinity term, any cluster picked must:
- have the label
region=east
present; - have the label
system
present, any value would do.
Using multiple affinity terms
You can also specify multiple requiredDuringSchedulingIgnoredDuringExecution
affinity terms,
as showcased below; a cluster will be picked if it can satisfy any affinity term.
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
region: west
- labelSelector:
matchExpressions:
- key: system
operator: DoesNotExist
With these two affinity terms, any cluster picked must:
- have the label
region=west
present; or - does not have the label
system
preferredDuringSchedulingIgnoredDuringExecution
affinity terms
The preferredDuringSchedulingIgnoredDuringExecution
type of affinity terms serves as a soft
constraint for clusters; any cluster that satisfy such terms would receive an affinity score,
which Fleet uses to rank clusters when processing ClusterResourcePlacement
with scheduling
policy of the PickN
placement type.
Each term features:
- a weight, between -100 and 100, which is the affinity score that Fleet would assign to a cluster if it satisfies this term; and
- a label selector, or a property sorter.
Both are required for this type of affinity terms to function.
The label selector is of the same struct as the one used in
requiredDuringSchedulingIgnoredDuringExecution
type of affinity terms; see
the documentation above for usage.
For the specifics about property sorters, see the How-To Guide: Using Property-Based Scheduling.
Below is an example with a preferredDuringSchedulingIgnoredDuringExecution
affinity term:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 10
affinity:
clusterAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
labelSelector:
matchLabels:
region: west
Any cluster with the region=west
label would receive an affinity score of 20.
Using multiple affinity terms
Similarly, you can use multiple preferredDuringSchedulingIgnoredDuringExection
affinity terms,
as showcased below:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 10
affinity:
clusterAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
labelSelector:
matchLabels:
region: west
- weight: -20
preference:
labelSelector:
matchLabels:
environment: prod
Cluster will be validated against each affinity term individually; the affinity scores it receives will be summed up. For example:
- if a cluster has only the
region=west
label, it would receive an affinity score of 20; however - if a cluster has both the
region=west
andenvironment=prod
labels, it would receive an affinity score of20 + (-20) = 0
.
Use both types of affinity terms
You can, if necessary, add both requiredDuringSchedulingIgnoredDuringExecution
and
preferredDuringSchedulingIgnoredDuringExection
types of affinity terms. Fleet will
first run all clusters against all the requiredDuringSchedulingIgnoredDuringExecution
type
of affinity terms, filter out any that does not meet the requirements, and then
assign the rest with affinity scores per preferredDuringSchedulingIgnoredDuringExection
type of
affinity terms.
Below is an example with both types of affinity terms:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 10
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchExpressions:
- key: system
operator: Exists
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
labelSelector:
matchLabels:
region: west
With these affinity terms, only clusters with the label system
(any value would do) can be
picked; and among them, those with the region=west
will be prioritized for resource placement
as they receive an affinity score of 20.
4 - Using Topology Spread Constraints to Spread Resources
ClusterResourcePlacement
API to fine-tune Fleet scheduling decisionsThis how-to guide discusses how to use topology spread constraints to fine-tune how Fleet picks clusters for resource placement.
Topology spread constraints are features in the ClusterResourcePlacement
API, specifically
the scheduling policy section. Generally speaking, these constraints can help you spread
resources evenly across different groups of clusters in your fleet; or in other words, it
assures that Fleet will not pick too many clusters from one group, and too little from another.
You can use topology spread constraints to, for example:
- achieve high-availability for your database backend by making sure that there is at least one database replica in each region; or
- verify if your application can support clusters of different configurations; or
- eliminate resource utilization hotspots in your infrastructure through spreading jobs evenly across sections.
Specifying a topology spread constraint
A topology spread constraint consists of three fields:
topologyKey
is a label key which Fleet uses to split your clusters from a fleet into different groups.Specifically, clusters are grouped by the label values they have. For example, if you have three clusters in a fleet:
- cluster
bravelion
with the labelsystem=critical
andregion=east
; and - cluster
smartfish
with the labelsystem=critical
andregion=west
; and - cluster
jumpingcat
with the labelsystem=normal
andregion=east
,
and you use
system
as the topology key, the clusters will be split into 2 groups:- group 1 with cluster
bravelion
andsmartfish
, as they both have the valuecritical
for labelsystem
; and - group 2 with cluster
jumpingcat
, as it has the valuenormal
for labelsystem
.
Note that the splitting concerns only one label
system
; other labels, such asregion
, do not count.If a cluster does not have the given topology key, it does not belong to any group. Fleet may still pick this cluster, as placing resources on it does not violate the associated topology spread constraint.
This is a required field.
- cluster
maxSkew
specifies how unevenly resource placements are spread in your fleet.The skew of a set of resource placements are defined as the difference in count of resource placements between the group with the most and the group with the least, as split by the topology key.
For example, in the fleet described above (3 clusters, 2 groups):
- if Fleet picks two clusters from group A, but none from group B, the skew would be
2 - 0 = 2
; however, - if Fleet picks one cluster from group A and one from group B, the skew would be
1 - 1 = 0
.
The minimum value of
maxSkew
is 1. The less you set this value with, the more evenly resource placements are spread in your fleet.This is a required field.
Note
Naturally,
maxSkew
only makes sense when there are no less than two groups. If you set a topology key that will not split the Fleet at all (i.e., all clusters with the given topology key has exactly the same value), the associated topology spread constraint will take no effect.- if Fleet picks two clusters from group A, but none from group B, the skew would be
whenUnsatisfiable
specifies what Fleet would do when it exhausts all options to satisfy the topology spread constraint; that is, picking any cluster in the fleet would lead to a violation.Two options are available:
DoNotSchedule
: with this option, Fleet would guarantee that the topology spread constraint will be enforced all time; scheduling may fail if there is simply no possible way to satisfy the topology spread constraint.ScheduleAnyway
: with this option, Fleet would enforce the topology spread constraint in a best-effort manner; Fleet may, however, pick clusters that would violate the topology spread constraint if there is no better option.
This is an optional field; if you do not specify a value, Fleet will use
DoNotSchedule
by default.
Below is an example of topology spread constraint, which tells Fleet to pick clusters evenly
from different groups, split based on the value of the label system
:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 3
topologySpreadConstraints:
- maxSkew: 2
topologyKey: system
whenUnsatisfiable: DoNotSchedule
How Fleet enforces topology spread constraints: topology spread scores
When you specify some topology spread constraints in the scheduling policy of
a ClusterResourcePlacement
object, Fleet will start picking clusters one at a time.
More specifically, Fleet will:
for each cluster in the fleet, evaluate how skew would change if resources were placed on it.
Depending on the current spread of resource placements, there are three possible outcomes:
- placing resources on the cluster reduces the skew by 1; or
- placing resources on the cluster has no effect on the skew; or
- placing resources on the cluster increases the skew by 1.
Fleet would then assign a topology spread score to the cluster:
if the provisional placement reduces the skew by 1, the cluster receives a topology spread score of 1; or
if the provisional placement has no effect on the skew, the cluster receives a topology spread score of 0; or
if the provisional placement increases the skew by 1, but does not yet exceed the max skew specified in the constraint, the cluster receives a topology spread score of -1; or
if the provisional placement increases the skew by 1, and has exceeded the max skew specified in the constraint,
- for topology spread constraints with the
ScheduleAnyway
effect, the cluster receives a topology spread score of -1000; and - for those with the
DoNotSchedule
effect, the cluster will be removed from resource placement consideration.
- for topology spread constraints with the
rank the clusters based on the topology spread score and other factors (e.g., affinity), pick the one that is most appropriate.
repeat the process, until all the needed count of clusters are found.
Below is an example that illustrates the process:
Suppose you have a fleet of 4 clusters:
- cluster
bravelion
, with labelregion=east
andsystem=critical
; and - cluster
smartfish
, with labelregion=east
; and - cluster
jumpingcat
, with labelregion=west
, andsystem=critical
; and - cluster
flyingpenguin
, with labelregion=west
,
And you have created a ClusterResourcePlacement
as follows:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: region
whenUnsatisfiable: DoNotSchedule
Fleet will first scan all the 4 clusters in the fleet; they all have the region
label, with
two different values east
and west
(2 cluster in each of them). This divides the clusters
into two groups, the east
and the west
At this stage, no cluster has been picked yet, so there is no resource placement at all. The
current skew is thus 0, and placing resources on any of them would increase the skew by 1. This
is still below the maxSkew
threshold given, so all clusters would receive a topology spread
score of -1.
Fleet could not find the most appropriate cluster based on the topology spread score so far,
so it would resort to other measures for ranking clusters. This would lead Fleet to pick cluster
smartfish
.
Note
See Using
ClusterResourcePlacement
to Place Resources How-To Guide for more information on how Fleet picks clusters.
Now, one cluster has been picked, and one more is needed by the ClusterResourcePlacement
object (as the numberOfClusters
field is set to 2). Fleet scans the left 3 clusters again,
and this time, since smartfish
from group east
has been picked, any more resource placement
on clusters from group east
would increase the skew by 1 more, and would lead to violation
of the topology spread constraint; Fleet will then assign the topology spread score of -1000 to
cluster bravelion
, which is in group east
. On the contrary, picking a cluster from any
cluster in group west
would reduce the skew by 1, so Fleet assigns the topology spread score
of 1 to cluster jumpingcat
and flyingpenguin
.
With the higher topology spread score, jumpingcat
and flyingpenguin
become the leading
candidate in ranking. They have the same topology spread score, and based on the rules Fleet
has for picking clusters, jumpingcat
would be picked finally.
Using multiple topology spread constraints
You can, if necessary, use multiple topology spread constraints. Fleet will evaluate each of them
separately, and add up topology spread scores for each cluster for the final ranking. A cluster
would be removed from resource placement consideration if placing resources on it would violate
any one of the DoNotSchedule
topology spread constraints.
Below is an example where two topology spread constraints are used:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 2
topologySpreadConstraints:
- maxSkew: 2
topologyKey: region
whenUnsatisfiable: DoNotSchedule
- maxSkew: 3
topologyKey: environment
whenUnsatisfiable: ScheduleAnyway
Note
It might be very difficult to find candidate clusters when multiple topology spread constraints are added. Considering using the
ScheduleAnyway
effect to add some leeway to the scheduling, if applicable.
5 - Using Property-Based Scheduling
This how-to guide discusses how to use property-based scheduling to produce scheduling decisions based on cluster properties.
Note
The availability of properties depend on which (and if) you have a property provider set up in your Fleet deployment. For more information, see the Concept: Property Provider and Cluster Properties documentation.
It is also recommended that you read the How-To Guide: Using Affinity to Pick Clusters first before following instructions in this document.
Fleet allows users to pick clusters based on exposed cluster properties via the affinity
terms in the ClusterResourcePlacement
API:
- for the
requiredDuringSchedulingIgnoredDuringExecution
affinity terms, you may specify property selectors to filter clusters based on their properties; - for the
preferredDuringSchedulingIgnoredDuringExecution
affinity terms, you may specify property sorters to prefer clusters with a property that ranks higher or lower.
Property selectors in requiredDuringSchedulingIgnoredDuringExecution
affinity terms
A property selector is an array of expression matchers against cluster properties. In each matcher you will specify:
A name, which is the name of the property.
If the property is a non-resource one, you may refer to it directly here; however, if the property is a resource one, the name here should be of the following format:
resources.kubernetes-fleet.io/[CAPACITY-TYPE]-[RESOURCE-NAME]
where
[CAPACITY-TYPE]
is one oftotal
,allocatable
, oravailable
, depending on which capacity (usage information) you would like to check against, and[RESOURCE-NAME]
is the name of the resource.For example, if you would like to select clusters based on the available CPU capacity of a cluster, the name used in the property selector should be
resources.kubernetes-fleet.io/available-cpu
and for the allocatable memory capacity, use
resources.kubernetes-fleet.io/allocatable-memory
A list of values, which are possible values of the property.
An operator, which describes the relationship between a cluster’s observed value of the given property and the list of values in the matcher.
Currently, available operators are
Gt
(Greater than): a cluster’s observed value of the given property must be greater than the value in the matcher before it can be picked for resource placement.Ge
(Greater than or equal to): a cluster’s observed value of the given property must be greater than or equal to the value in the matcher before it can be picked for resource placement.Lt
(Less than): a cluster’s observed value of the given property must be less than the value in the matcher before it can be picked for resource placement.Le
(Less than or equal to): a cluster’s observed value of the given property must be less than or equal to the value in the matcher before it can be picked for resource placement.Eq
(Equal to): a cluster’s observed value of the given property must be equal to the value in the matcher before it can be picked for resource placement.Ne
(Not equal to): a cluster’s observed value of the given property must be not equal to the value in the matcher before it can be picked for resource placement.
Note that if you use the operator
Gt
,Ge
,Lt
,Le
,Eq
, orNe
, the list of values in the matcher should have exactly one value.
Fleet will evaluate each cluster, specifically their exposed properties, against the matchers; failure to satisfy any matcher in the selector will exclude the cluster from resource placement.
Note that if a cluster does not have the specified property for a matcher, it will automatically fail the matcher.
Below is an example that uses a property selector to select only clusters with a node count of at least 5 for resource placement:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- propertySelector:
matchExpressions:
- name: "kubernetes-fleet.io/node-count"
operator: Ge
values:
- "5"
You may use both label selector and property selector in a
requiredDuringSchedulingIgnoredDuringExecution
affinity term. Both selectors must be satisfied
before a cluster can be picked for resource placement:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
region: east
propertySelector:
matchExpressions:
- name: "kubernetes-fleet.io/node-count"
operator: Ge
values:
- "5"
In the example above, Fleet will only consider a cluster for resource placement if it has the
region=east
label and a node count no less than 5.
Property sorters in preferredDuringSchedulingIgnoredDuringExecution
affinity terms
A property sorter ranks all the clusters in the Fleet based on their values of a specified
property in ascending or descending order, then yields weights for the clusters in proportion
to their ranks. The proportional weights are calculated based on the weight value given in the
preferredDuringSchedulingIgnoredDuringExecution
term.
A property sorter consists of:
A name, which is the name of the property; see the format in the previous section for more information.
A sort order, which is one of
Ascending
andDescending
, for ranking in ascending and descending order respectively.As a rule of thumb, when the
Ascending
order is used, Fleet will prefer clusters with lower observed values, and when theDescending
order is used, clusters with higher observed values will be preferred.
When using the sort order Descending
, the proportional weight is calculated using the formula:
((Observed Value - Minimum observed value) / (Maximum observed value - Minimum observed value)) * Weight
For example, suppose that you would like to rank clusters based on the property of available CPU capacity in descending order and currently, you have a fleet of 3 clusters with the available CPU capacities as follows:
Cluster | Available CPU capacity |
---|---|
bravelion | 100 |
smartfish | 20 |
jumpingcat | 10 |
The sorter would yield the weights below:
Cluster | Available CPU capacity | Weight |
---|---|---|
bravelion | 100 | (100 - 10) / (100 - 10) = 100% of the weight |
smartfish | 20 | (20 - 10) / (100 - 10) = 11.11% of the weight |
jumpingcat | 10 | (10 - 10) / (100 - 10) = 0% of the weight |
And when using the sort order Ascending
, the proportional weight is calculated using the formula:
(1 - ((Observed Value - Minimum observed value) / (Maximum observed value - Minimum observed value))) * Weight
For example, suppose that you would like to rank clusters based on their per CPU core cost in ascending order and currently across the fleet, you have a fleet of 3 clusters with the per CPU core costs as follows:
Cluster | Per CPU core cost |
---|---|
bravelion | 1 |
smartfish | 0.2 |
jumpingcat | 0.1 |
The sorter would yield the weights below:
Cluster | Per CPU core cost | Weight |
---|---|---|
bravelion | 1 | 1 - ((1 - 0.1) / (1 - 0.1)) = 0% of the weight |
smartfish | 0.2 | 1 - ((0.2 - 0.1) / (1 - 0.1)) = 88.89% of the weight |
jumpingcat | 0.1 | 1 - (0.1 - 0.1) / (1 - 0.1) = 100% of the weight |
The example below showcases a property sorter using the Descending
order:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 10
affinity:
clusterAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
metricSorter:
name: kubernetes-fleet.io/node-count
sortOrder: Descending
In this example, Fleet will prefer clusters with higher node counts. The cluster with the highest node count would receive a weight of 20, and the cluster with the lowest would receive 0. Other clusters receive proportional weights calculated using the formulas above.
You may use both label selector and property sorter in a
preferredDuringSchedulingIgnoredDuringExecution
affinity term. A cluster that fails the label
selector would receive no weight, and clusters that pass the label selector receive proportional
weights under the property sorter.
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
resourceSelectors:
- ...
policy:
placementType: PickN
numberOfClusters: 10
affinity:
clusterAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
preference:
labelSelector:
matchLabels:
env: prod
metricSorter:
name: resources.kubernetes-fleet.io/total-cpu
sortOrder: Descending
In the example above, a cluster would only receive additional weight if it has the label
env=prod
, and the more total CPU capacity it has, the more weight it will receive, up to the
limit of 20.
6 - Using Taints and Tolerations
This how-to guide discusses how to add/remove taints on MemberCluster
and how to add tolerations on ClusterResourcePlacement
.
Adding taint to MemberCluster
In this example, we will add a taint to a MemberCluster
. Then try to propagate resources to the MemberCluster
using a ClusterResourcePlacement
with PickAll placement policy. The resources should not be propagated to the MemberCluster
because of the taint.
We will first create a namespace that we will propagate to the member cluster,
kubectl create ns test-ns
Then apply the MemberCluster
with a taint,
Example MemberCluster
with taint:
apiVersion: cluster.kubernetes-fleet.io/v1beta1
kind: MemberCluster
metadata:
name: kind-cluster-1
spec:
identity:
name: fleet-member-agent-cluster-1
kind: ServiceAccount
namespace: fleet-system
apiGroup: ""
taints:
- key: test-key1
value: test-value1
effect: NoSchedule
After applying the above MemberCluster
, we will apply a ClusterResourcePlacement
with the following spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickAll
The ClusterResourcePlacement
CR should not propagate the test-ns
namespace to the member cluster because of the taint,
looking at the status of the CR should show the following:
status:
conditions:
- lastTransitionTime: "2024-04-16T19:03:17Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-04-16T19:03:17Z"
message: All 0 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 2
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2024-04-16T19:03:17Z"
message: There are no clusters selected to place the resources
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
observedResourceIndex: "0"
selectedResources:
- kind: Namespace
name: test-ns
version: v1
Looking at the ClusterResourcePlacementSynchronized
, ClusterResourcePlacementApplied
conditions and reading the message fields
we can see that no clusters were selected to place the resources.
Removing taint from MemberCluster
In this example, we will remove the taint from the MemberCluster
from the last section. This should automatically trigger the Fleet scheduler to propagate the resources to the MemberCluster
.
After removing the taint from the MemberCluster
. Let’s take a look at the status of the ClusterResourcePlacement
:
status:
conditions:
- lastTransitionTime: "2024-04-16T20:00:03Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-04-16T20:02:57Z"
message: All 1 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 2
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2024-04-16T20:02:57Z"
message: Successfully applied resources to 1 member clusters
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-04-16T20:02:52Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 2
reason: ScheduleSucceeded
status: "True"
type: Scheduled
- lastTransitionTime: "2024-04-16T20:02:57Z"
message: Successfully Synchronized work(s) for placement
observedGeneration: 2
reason: WorkSynchronizeSucceeded
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-04-16T20:02:57Z"
message: Successfully applied resources
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: Applied
selectedResources:
- kind: Namespace
name: test-ns
version: v1
From the status we can clearly see that the resources were propagated to the member cluster after removing the taint.
Adding toleration to ClusterResourcePlacement
Adding a toleration to a ClusterResourcePlacement
CR allows the Fleet scheduler to tolerate specific taints on the MemberClusters
.
For this section we will start from scratch, we will first create a namespace that we will propagate to the MemberCluster
kubectl create ns test-ns
Then apply the MemberCluster
with a taint,
Example MemberCluster
with taint:
spec:
heartbeatPeriodSeconds: 60
identity:
apiGroup: ""
kind: ServiceAccount
name: fleet-member-agent-cluster-1
namespace: fleet-system
taints:
- effect: NoSchedule
key: test-key1
value: test-value1
The ClusterResourcePlacement
CR will not propagate the test-ns
namespace to the member cluster because of the taint.
Now we will add a toleration to a ClusterResourcePlacement
CR as part of the placement policy, which will use the Exists operator to tolerate the taint.
Example ClusterResourcePlacement
spec with tolerations after adding new toleration:
spec:
policy:
placementType: PickAll
tolerations:
- key: test-key1
operator: Exists
resourceSelectors:
- group: ""
kind: Namespace
name: test-ns
version: v1
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
Let’s take a look at the status of the ClusterResourcePlacement
CR after adding the toleration:
status:
conditions:
- lastTransitionTime: "2024-04-16T20:16:10Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 3
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-04-16T20:16:15Z"
message: All 1 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 3
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2024-04-16T20:16:15Z"
message: Successfully applied resources to 1 member clusters
observedGeneration: 3
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-04-16T20:16:10Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 3
reason: ScheduleSucceeded
status: "True"
type: Scheduled
- lastTransitionTime: "2024-04-16T20:16:15Z"
message: Successfully Synchronized work(s) for placement
observedGeneration: 3
reason: WorkSynchronizeSucceeded
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-04-16T20:16:15Z"
message: Successfully applied resources
observedGeneration: 3
reason: ApplySucceeded
status: "True"
type: Applied
selectedResources:
- kind: Namespace
name: test-ns
version: v1
From the status we can see that the resources were propagated to the MemberCluster
after adding the toleration.
Now let’s try adding a new taint to the member cluster CR and see if the resources are still propagated to the MemberCluster
,
Example MemberCluster
CR with new taint:
heartbeatPeriodSeconds: 60
identity:
apiGroup: ""
kind: ServiceAccount
name: fleet-member-agent-cluster-1
namespace: fleet-system
taints:
- effect: NoSchedule
key: test-key1
value: test-value1
- effect: NoSchedule
key: test-key2
value: test-value2
Let’s take a look at the ClusterResourcePlacement
CR status after adding the new taint:
status:
conditions:
- lastTransitionTime: "2024-04-16T20:27:44Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-04-16T20:27:49Z"
message: All 1 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 2
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2024-04-16T20:27:49Z"
message: Successfully applied resources to 1 member clusters
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-04-16T20:27:44Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 2
reason: ScheduleSucceeded
status: "True"
type: Scheduled
- lastTransitionTime: "2024-04-16T20:27:49Z"
message: Successfully Synchronized work(s) for placement
observedGeneration: 2
reason: WorkSynchronizeSucceeded
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-04-16T20:27:49Z"
message: Successfully applied resources
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: Applied
selectedResources:
- kind: Namespace
name: test-ns
version: v1
Nothing changes in the status because even if the new taint is not tolerated, the exising resources on the MemberCluster
will continue to run because the taint effect is NoSchedule
and the cluster was already selected for resource propagation in a
previous scheduling cycle.
7 - Using the ClusterResourceOverride API
ClusterResourceOverride
API to override cluster-scoped resourcesThis guide provides an overview of how to use the Fleet ClusterResourceOverride
API to override cluster resources.
Overview
ClusterResourceOverride
is a feature within Fleet that allows for the modification or override of specific attributes
across cluster-wide resources. With ClusterResourceOverride, you can define rules based on cluster labels or other
criteria, specifying changes to be applied to various cluster-wide resources such as namespaces, roles, role bindings,
or custom resource definitions. These modifications may include updates to permissions, configurations, or other
parameters, ensuring consistent management and enforcement of configurations across your Fleet-managed Kubernetes clusters.
API Components
The ClusterResourceOverride API consists of the following components:
- Placement: This specifies which placement the override is applied to.
- Cluster Resource Selectors: These specify the set of cluster resources selected for overriding.
- Policy: This specifies the policy to be applied to the selected resources.
The following sections discuss these components in depth.
Placement
To configure which placement the override is applied to, you can use the name of ClusterResourcePlacement
.
Cluster Resource Selectors
A ClusterResourceOverride
object may feature one or more cluster resource selectors, specifying which resources to select to be overridden.
The ClusterResourceSelector
object supports the following fields:
group
: The API group of the resourceversion
: The API version of the resourcekind
: The kind of the resourcename
: The name of the resource
Note: The resource can only be selected by name.
To add a resource selector, edit the clusterResourceSelectors
field in the ClusterResourceOverride
spec:
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
name: example-cro
spec:
placement:
name: crp-example
clusterResourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
The example in the tutorial will pick the ClusterRole
named secret-reader
, as shown below, to be overridden.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: secret-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "watch", "list"]
Policy
The Policy
is made up of a set of rules (OverrideRules
) that specify the changes to be applied to the selected
resources on selected clusters.
Each OverrideRule
supports the following fields:
- Cluster Selector: This specifies the set of clusters to which the override applies.
- Override Type: This specifies the type of override to be applied. The default type is
JSONPatch
.JSONPatch
: applies the JSON patch to the selected resources using RFC 6902.Delete
: deletes the selected resources on the target cluster.
- JSON Patch Override: This specifies the changes to be applied to the selected resources when the override type is
JSONPatch
.
Cluster Selector
To specify the clusters to which the override applies, you can use the clusterSelector
field in the OverrideRule
spec.
The clusterSelector
field supports the following fields:
clusterSelectorTerms
: A list of terms that are used to select clusters.- Each term in the list is used to select clusters based on the label selector.
IMPORTANT: Only
labelSelector
is supported in theclusterSelectorTerms
field.
Override Type
To specify the type of override to be applied, you can use the overrideType field in the OverrideRule spec.
The default value is JSONPatch
.
JSONPatch
: applies the JSON patch to the selected resources using RFC 6902.Delete
: deletes the selected resources on the target cluster.
JSON Patch Override
To specify the changes to be applied to the selected resources, you can use the jsonPatchOverrides field in the OverrideRule spec. The jsonPatchOverrides field supports the following fields:
JSONPatchOverride applies a JSON patch on the selected resources following RFC 6902. All the fields defined follow this RFC.
op
: The operation to be performed. The supported operations areadd
,remove
, andreplace
.add
: Adds a new value to the specified path.remove
: Removes the value at the specified path.replace
: Replaces the value at the specified path.
path
: The path to the field to be modified.- Some guidelines for the path are as follows:
- Must start with a
/
character. - Cannot be empty.
- Cannot contain an empty string ("///").
- Cannot be a TypeMeta Field ("/kind", “/apiVersion”).
- Cannot be a Metadata Field ("/metadata/name", “/metadata/namespace”), except the fields “/metadata/annotations” and “metadata/labels”.
- Cannot be any field in the status of the resource.
- Must start with a
- Some examples of valid paths are:
/metadata/labels/new-label
/metadata/annotations/new-annotation
/spec/template/spec/containers/0/resources/limits/cpu
/spec/template/spec/containers/0/resources/requests/memory
- Some guidelines for the path are as follows:
value
: The value to be set.- If the
op
isremove
, the value cannot be set. - There is a list of reserved variables that will be replaced by the actual values:
${MEMBER-CLUSTER-NAME}
: this will be replaced by the name of thememberCluster
that represents this cluster.
- If the
Example: Override Labels
To overwrite the existing labels on the ClusterRole
named secret-reader
on clusters with the label env: prod
,
you can use the following configuration:
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
name: example-cro
spec:
placement:
name: crp-example
clusterResourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: add
path: /metadata/labels
value:
{"cluster-name":"${MEMBER-CLUSTER-NAME}"}
Note: To add a new label to the existing labels, please use the below configuration:
- op: add path: /metadata/labels/new-label value: "new-value"
The ClusterResourceOverride
object above will add a label cluster-name
with the value of the memberCluster
name to the ClusterRole
named secret-reader
on clusters with the label env: prod
.
Example: Remove Verbs
To remove the verb “list” in the ClusterRole
named secret-reader
on clusters with the label env: prod
,
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
name: example-cro
spec:
placement:
name: crp-example
clusterResourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: remove
path: /rules/0/verbs/2
The ClusterResourceOverride
object above will remove the verb “list” in the ClusterRole
named secret-reader
on
clusters with the label env: prod
selected by the clusterResourcePlacement crp-example
.
The ClusterResourceOverride mentioned above utilizes the cluster role displayed below:
Name: secret-reader Labels: <none> Annotations: <none> PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- secrets [] [] [get watch list]
Delete
The Delete
override type can be used to delete the selected resources on the target cluster.
Example: Delete Selected Resource
To delete the secret-reader
on the clusters with the label env: test
selected by the clusterResourcePlacement crp-example
, you can use the Delete
override type.
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
name: example-cro
spec:
placement:
name: crp-example
clusterResourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: test
overrideType: Delete
Multiple Override Patches
You may add multiple JSONPatchOverride
to an OverrideRule
to apply multiple changes to the selected cluster resources.
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourceOverride
metadata:
name: example-cro
spec:
placement:
name: crp-example
clusterResourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: remove
path: /rules/0/verbs/2
- op: remove
path: /rules/0/verbs/1
The ClusterResourceOverride
object above will remove the verbs “list” and “watch” in the ClusterRole
named
secret-reader
on clusters with the label env: prod
.
Breaking down the paths:
- First
JSONPatchOverride
:/rules/0
: This denotes the first rule in the rules array of the ClusterRole. In the provided ClusterRole definition, there’s only one rule defined (“secrets”), so this corresponds to the first (and only) rule./verbs/2
: Within this rule, the third element of the verbs array is targeted (“list”).
- Second
JSONPatchOverride
:/rules/0
: This denotes the first rule in the rules array of the ClusterRole. In the provided ClusterRole definition, there’s only one rule defined (“secrets”), so this corresponds to the first (and only) rule./verbs/1
: Within this rule, the second element of the verbs array is targeted (“watch”).
The ClusterResourceOverride mentioned above utilizes the cluster role displayed below:
Name: secret-reader Labels: <none> Annotations: <none> PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- secrets [] [] [get watch list]
Applying the ClusterResourceOverride
Create a ClusterResourcePlacement resource to specify the placement rules for distributing the cluster resource overrides across the cluster infrastructure. Ensure that you select the appropriate resource.
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp-example
spec:
resourceSelectors:
- group: rbac.authorization.k8s.io
kind: ClusterRole
version: v1
name: secret-reader
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
- labelSelector:
matchLabels:
env: test
The ClusterResourcePlacement
configuration outlined above will disperse resources across all clusters labeled with env: prod
.
As the changes are implemented, the corresponding ClusterResourceOverride
configurations will be applied to the
designated clusters, triggered by the selection of matching cluster role resource secret-reader
.
Verifying the Cluster Resource is Overridden
To ensure that the ClusterResourceOverride
object is applied to the selected clusters, verify the ClusterResourcePlacement
status by running kubectl describe crp crp-example
command:
Status:
Conditions:
...
Message: The selected resources are successfully overridden in the 10 clusters
Observed Generation: 1
Reason: OverriddenSucceeded
Status: True
Type: ClusterResourcePlacementOverridden
...
Observed Resource Index: 0
Placement Statuses:
Applicable Cluster Resource Overrides:
example-cro-0
Cluster Name: member-50
Conditions:
...
Message: Successfully applied the override rules on the resources
Observed Generation: 1
Reason: OverriddenSucceeded
Status: True
Type: Overridden
...
Each cluster maintains its own Applicable Cluster Resource Overrides
which contain the cluster resource override snapshot
if relevant. Additionally, individual status messages for each cluster indicate whether the override rules have been
effectively applied.
The ClusterResourcePlacementOverridden
condition indicates whether the resource override has been successfully applied
to the selected resources in the selected clusters.
To verify that the ClusterResourceOverride
object has been successfully applied to the selected resources,
check resources in the selected clusters:
- Get cluster credentials:
az aks get-credentials --resource-group <resource-group> --name <cluster-name>
- Get the
ClusterRole
object in the selected cluster:kubectl --context=<member-cluster-context> get clusterrole secret-reader -o yaml
Upon inspecting the described ClusterRole
object, it becomes apparent that the verbs “watch” and “list” have been
removed from the permissions list within the ClusterRole
named “secret-reader” on the prod clusters.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
...
rules:
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
Similarly, you can verify that this cluster role does not exist in the test clusters.
8 - Using the ResourceOverride API
ResourceOverride
API to override namespace-scoped resourcesThis guide provides an overview of how to use the Fleet ResourceOverride
API to override resources.
Overview
ResourceOverride
is a Fleet API that allows you to modify or override specific attributes of
existing resources within your cluster. With ResourceOverride, you can define rules based on cluster
labels or other criteria, specifying changes to be applied to resources such as Deployments, StatefulSets, ConfigMaps, or Secrets.
These changes can include updates to container images, environment variables, resource limits, or any other configurable parameters.
API Components
The ResourceOverride API consists of the following components:
- Placement: This specifies which placement the override is applied to.
- Resource Selectors: These specify the set of resources selected for overriding.
- Policy: This specifies the policy to be applied to the selected resources.
The following sections discuss these components in depth.
Placement
To configure which placement the override is applied to, you can use the name of ClusterResourcePlacement
.
Resource Selectors
A ResourceOverride
object may feature one or more resource selectors, specifying which resources to select to be overridden.
The ResourceSelector
object supports the following fields:
group
: The API group of the resourceversion
: The API version of the resourcekind
: The kind of the resourcename
: The name of the resource
Note: The resource can only be selected by name.
To add a resource selector, edit the resourceSelectors
field in the ResourceOverride
spec:
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
name: example-ro
namespace: test-namespace
spec:
placement:
name: crp-example
resourceSelectors:
- group: apps
kind: Deployment
version: v1
name: my-deployment
Note: The ResourceOverride needs to be in the same namespace as the resources it is overriding.
The examples in the tutorial will pick a Deployment
named my-deployment
from the namespace test-namespace
, as shown below, to be overridden.
apiVersion: apps/v1
kind: Deployment
metadata:
...
name: my-deployment
namespace: test-namespace
...
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: test-nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: test-nginx
spec:
containers:
- image: nginx:1.14.2
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
...
Policy
The Policy
is made up of a set of rules (OverrideRules
) that specify the changes to be applied to the selected
resources on selected clusters.
Each OverrideRule
supports the following fields:
- Cluster Selector: This specifies the set of clusters to which the override applies.
- Override Type: This specifies the type of override to be applied. The default type is
JSONPatch
.JSONPatch
: applies the JSON patch to the selected resources using RFC 6902.Delete
: deletes the selected resources on the target cluster.
- JSON Patch Override: This specifies the changes to be applied to the selected resources when the override type is
JSONPatch
.
Cluster Selector
To specify the clusters to which the override applies, you can use the clusterSelector
field in the OverrideRule
spec.
The clusterSelector
field supports the following fields:
clusterSelectorTerms
: A list of terms that are used to select clusters.- Each term in the list is used to select clusters based on the label selector.
IMPORTANT: Only
labelSelector
is supported in theclusterSelectorTerms
field.
Override Type
To specify the type of override to be applied, you can use the overrideType field in the OverrideRule spec.
The default value is JSONPatch
.
JSONPatch
: applies the JSON patch to the selected resources using RFC 6902.Delete
: deletes the selected resources on the target cluster.
JSON Patch Override
To specify the changes to be applied to the selected resources, you can use the jsonPatchOverrides field in the OverrideRule spec. The jsonPatchOverrides field supports the following fields:
JSONPatchOverride applies a JSON patch on the selected resources following RFC 6902. All the fields defined follow this RFC.
The jsonPatchOverrides
field supports the following fields:
op
: The operation to be performed. The supported operations areadd
,remove
, andreplace
.add
: Adds a new value to the specified path.remove
: Removes the value at the specified path.replace
: Replaces the value at the specified path.
path
: The path to the field to be modified.- Some guidelines for the path are as follows:
- Must start with a
/
character. - Cannot be empty.
- Cannot contain an empty string ("///").
- Cannot be a TypeMeta Field ("/kind", “/apiVersion”).
- Cannot be a Metadata Field ("/metadata/name", “/metadata/namespace”), except the fields “/metadata/annotations” and “metadata/labels”.
- Cannot be any field in the status of the resource.
- Must start with a
- Some examples of valid paths are:
/metadata/labels/new-label
/metadata/annotations/new-annotation
/spec/template/spec/containers/0/resources/limits/cpu
/spec/template/spec/containers/0/resources/requests/memory
- Some guidelines for the path are as follows:
value
: The value to be set.- If the
op
isremove
, the value cannot be set. - There is a list of reserved variables that will be replaced by the actual values:
${MEMBER-CLUSTER-NAME}
: this will be replaced by the name of thememberCluster
that represents this cluster.
- If the
Example: Override Labels
To overwrite the existing labels on the Deployment
named my-deployment
on clusters with the label env: prod
,
you can use the following configuration:
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
name: example-ro
namespace: test-namespace
spec:
placement:
name: crp-example
resourceSelectors:
- group: apps
kind: Deployment
version: v1
name: my-deployment
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: add
path: /metadata/labels
value:
{"cluster-name":"${MEMBER-CLUSTER-NAME}"}
Note: To add a new label to the existing labels, please use the below configuration:
- op: add path: /metadata/labels/new-label value: "new-value"
The ResourceOverride
object above will add a label cluster-name
with the value of the memberCluster
name to the Deployment
named example-ro
on clusters with the label env: prod
.
Example: Override Image
To override the image of the container in the Deployment
named my-deployment
on all clusters with the label env: prod
:
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
name: example-ro
namespace: test-namespace
spec:
placement:
name: crp-example
resourceSelectors:
- group: apps
kind: Deployment
version: v1
name: my-deployment
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: replace
path: /spec/template/spec/containers/0/image
value: "nginx:1.20.0"
The ResourceOverride
object above will replace the image of the container in the Deployment
named my-deployment
with the image nginx:1.20.0
on all clusters with the label env: prod
selected by the clusterResourcePlacement crp-example
.
The ResourceOverride mentioned above utilizes the deployment displayed below:
apiVersion: apps/v1 kind: Deployment metadata: ... name: my-deployment namespace: test-namespace ... spec: ... template: ... spec: containers: - image: nginx:1.14.2 imagePullPolicy: IfNotPresent name: nginx ports: ... ... ...
Delete
The Delete
override type can be used to delete the selected resources on the target cluster.
Example: Delete Selected Resource
To delete the my-deployment
on the clusters with the label env: test
selected by the clusterResourcePlacement crp-example
,
you can use the Delete
override type.
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
name: example-ro
namespace: test-namespace
spec:
placement:
name: crp-example
resourceSelectors:
- group: apps
kind: Deployment
version: v1
name: my-deployment
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: test
overrideType: Delete
Multiple Override Rules
You may add multiple OverrideRules
to a Policy
to apply multiple changes to the selected resources.
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ResourceOverride
metadata:
name: example-ro
namespace: test-namespace
spec:
placement:
name: crp-example
resourceSelectors:
- group: apps
kind: Deployment
version: v1
name: my-deployment
policy:
overrideRules:
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
jsonPatchOverrides:
- op: replace
path: /spec/template/spec/containers/0/image
value: "nginx:1.20.0"
- clusterSelector:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: test
jsonPatchOverrides:
- op: replace
path: /spec/template/spec/containers/0/image
value: "nginx:latest"
The ResourceOverride
object above will replace the image of the container in the Deployment
named my-deployment
with the image nginx:1.20.0
on all clusters with the label env: prod
and the image nginx:latest
on all clusters with the label env: test
.
The ResourceOverride mentioned above utilizes the deployment displayed below:
apiVersion: apps/v1 kind: Deployment metadata: ... name: my-deployment namespace: test-namespace ... spec: ... template: ... spec: containers: - image: nginx:1.14.2 imagePullPolicy: IfNotPresent name: nginx ports: ... ... ...
Applying the ResourceOverride
Create a ClusterResourcePlacement resource to specify the placement rules for distributing the resource overrides across the cluster infrastructure. Ensure that you select the appropriate namespaces containing the matching resources.
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp-example
spec:
resourceSelectors:
- group: ""
kind: Namespace
name: test-namespace
version: v1
policy:
placementType: PickAll
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
- labelSelector:
matchLabels:
env: test
The ClusterResourcePlacement
configuration outlined above will disperse resources within test-namespace
across all
clusters labeled with env: prod
and env: test
. As the changes are implemented, the corresponding ResourceOverride
configurations will be applied to the designated clusters, triggered by the selection of matching deployment resource
my-deployment
.
Verifying the Cluster Resource is Overridden
To ensure that the ResourceOverride
object is applied to the selected resources, verify the ClusterResourcePlacement
status by running kubectl describe crp crp-example
command:
Status:
Conditions:
...
Message: The selected resources are successfully overridden in the 10 clusters
Observed Generation: 1
Reason: OverriddenSucceeded
Status: True
Type: ClusterResourcePlacementOverridden
...
Observed Resource Index: 0
Placement Statuses:
Applicable Resource Overrides:
Name: example-ro-0
Namespace: test-namespace
Cluster Name: member-50
Conditions:
...
Message: Successfully applied the override rules on the resources
Observed Generation: 1
Reason: OverriddenSucceeded
Status: True
Type: Overridden
...
Each cluster maintains its own Applicable Resource Overrides
which contain the resource override snapshot and
the resource override namespace if relevant. Additionally, individual status messages for each cluster indicates
whether the override rules have been effectively applied.
The ClusterResourcePlacementOverridden
condition indicates whether the resource override has been successfully applied
to the selected resources in the selected clusters.
To verify that the ResourceOverride
object has been successfully applied to the selected resources,
check resources in the selected clusters:
- Get cluster credentials:
az aks get-credentials --resource-group <resource-group> --name <cluster-name>
- Get the
Deployment
object in the selected cluster:kubectl --context=<member-cluster-context> get deployment my-deployment -n test-namespace -o yaml
Upon inspecting the member cluster, it was found that the selected cluster had the label env: prod.
Consequently, the image on deployment my-deployment
was modified to be nginx:1.20.0
on selected cluster.
apiVersion: apps/v1
kind: Deployment
metadata:
...
name: my-deployment
namespace: test-namespace
...
spec:
...
template:
...
spec:
containers:
- image: nginx:1.20.0
imagePullPolicy: IfNotPresent
name: nginx
ports:
...
...
status:
...
9 - Using Envelope Objects to Place Resources
Propagating Resources with Envelope Objects
This guide provides instructions on propagating a set of resources from the hub cluster to joined member clusters within an envelope object.
Why Use Envelope Objects?
When propagating resources to member clusters using Fleet, it’s important to understand that the hub cluster itself is also a Kubernetes cluster. Without envelope objects, any resource you want to propagate would first be applied directly to the hub cluster, which can lead to some potential side effects:
Unintended Side Effects: Resources like ValidatingWebhookConfigurations, MutatingWebhookConfigurations, or Admission Controllers would become active on the hub cluster, potentially intercepting and affecting hub cluster operations.
Security Risks: RBAC resources (Roles, ClusterRoles, RoleBindings, ClusterRoleBindings) intended for member clusters could grant unintended permissions on the hub cluster.
Resource Limitations: ResourceQuotas, FlowSchema or LimitRanges defined for member clusters would take effect on the hub cluster. While this is generally not a critical issue, there may be cases where you want to avoid these constraints on the hub.
Envelope objects solve these problems by allowing you to define resources that should be propagated without actually deploying their contents on the hub cluster. The envelope object itself is applied to the hub, but the resources it contains are only extracted and applied when they reach the member clusters.
Envelope Objects with CRDs
Fleet now supports two types of envelope Custom Resource Definitions (CRDs) for propagating resources:
- ClusterResourceEnvelope: Used to wrap cluster-scoped resources for placement.
- ResourceEnvelope: Used to wrap namespace-scoped resources for placement.
These CRDs provide a more structured and Kubernetes-native way to package resources for propagation to member clusters without causing unintended side effects on the hub cluster.
ClusterResourceEnvelope Example
The ClusterResourceEnvelope
is a cluster-scoped resource that can only wrap other cluster-scoped resources. For example:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourceEnvelope
metadata:
name: example
data:
"webhook.yaml":
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: guard
webhooks:
- name: guard.example.com
rules:
- operations: ["CREATE"]
apiGroups: ["*"]
apiVersions: ["*"]
resources: ["*"]
clientConfig:
service:
name: guard
namespace: ops
admissionReviewVersions: ["v1"]
sideEffects: None
timeoutSeconds: 10
"clusterrole.yaml":
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
ResourceEnvelope Example
The ResourceEnvelope
is a namespace-scoped resource that can only wrap namespace-scoped resources. For example:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ResourceEnvelope
metadata:
name: example
namespace: app
data:
"cm.yaml":
apiVersion: v1
kind: ConfigMap
metadata:
name: config
namespace: app
data:
foo: bar
"deploy.yaml":
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress
namespace: app
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: web
image: nginx
Propagating envelope objects from hub cluster to member cluster
We apply our envelope objects on the hub cluster and then use a ClusterResourcePlacement
object to propagate these resources from the hub to member clusters.
Example CRP spec for propagating a ResourceEnvelope:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp-with-envelope
spec:
policy:
clusterNames:
- kind-cluster-1
placementType: PickFixed
resourceSelectors:
- group: ""
kind: Namespace
name: app
version: v1
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
Example CRP spec for propagating a ClusterResourceEnvelope:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp-with-cluster-envelop
spec:
policy:
clusterNames:
- kind-cluster-1
placementType: PickFixed
resourceSelectors:
- group: placement.kubernetes-fleet.io
kind: ClusterResourceEnvelope
name: example
version: v1beta1
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
CRP status for ResourceEnvelope:
status:
conditions:
- lastTransitionTime: "2023-11-30T19:54:13Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2023-11-30T19:54:18Z"
message: All 1 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 2
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2023-11-30T19:54:18Z"
message: Successfully applied resources to 1 member clusters
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2023-11-30T19:54:13Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1:
picked by scheduling policy'
observedGeneration: 2
reason: ScheduleSucceeded
status: "True"
type: ResourceScheduled
- lastTransitionTime: "2023-11-30T19:54:18Z"
message: Successfully Synchronized work(s) for placement
observedGeneration: 2
reason: WorkSynchronizeSucceeded
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2023-11-30T19:54:18Z"
message: Successfully applied resources
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ResourceApplied
selectedResources:
- kind: Namespace
name: app
version: v1
- group: placement.kubernetes-fleet.io
kind: ResourceEnvelope
name: example
namespace: app
version: v1beta1
Note: In the
selectedResources
section, we specifically display the propagated envelope object. We do not individually list all the resources contained within the envelope object in the status.
Upon inspection of the selectedResources
, it indicates that the namespace app
and the ResourceEnvelope example
have been successfully propagated. Users can further verify the successful propagation of resources contained within the envelope object by ensuring that the failedPlacements
section in the placementStatus
for the target cluster does not appear in the status.
Example CRP status where resources within an envelope object failed to apply
CRP status with failed ResourceEnvelope resource:
In the example below, within the placementStatus
section for kind-cluster-1
, the failedPlacements
section provides details on a resource that failed to apply along with information about the envelope object which contained the resource.
status:
conditions:
- lastTransitionTime: "2023-12-06T00:09:53Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: All 1 cluster(s) are synchronized to the latest resources on the hub
cluster
observedGeneration: 2
reason: SynchronizeSucceeded
status: "True"
type: ClusterResourcePlacementSynchronized
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: Failed to apply manifests to 1 clusters, please check the `failedPlacements`
status
observedGeneration: 2
reason: ApplyFailed
status: "False"
type: ClusterResourcePlacementApplied
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2023-12-06T00:09:53Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1:
picked by scheduling policy'
observedGeneration: 2
reason: ScheduleSucceeded
status: "True"
type: ResourceScheduled
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: Successfully Synchronized work(s) for placement
observedGeneration: 2
reason: WorkSynchronizeSucceeded
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: Failed to apply manifests, please check the `failedPlacements` status
observedGeneration: 2
reason: ApplyFailed
status: "False"
type: ResourceApplied
failedPlacements:
- condition:
lastTransitionTime: "2023-12-06T00:09:53Z"
message: 'Failed to apply manifest: namespaces "app" not found'
reason: AppliedManifestFailedReason
status: "False"
type: Applied
envelope:
name: example
namespace: app
type: ResourceEnvelope
kind: Deployment
name: ingress
namespace: app
version: apps/v1
selectedResources:
- kind: Namespace
name: app
version: v1
- group: placement.kubernetes-fleet.io
kind: ResourceEnvelope
name: example
namespace: app
version: v1beta1
CRP status with failed ClusterResourceEnvelope resource:
Similar to namespace-scoped resources, cluster-scoped resources within a ClusterResourceEnvelope can also fail to apply:
status:
conditions:
- lastTransitionTime: "2023-12-06T00:09:53Z"
message: found all the clusters needed as specified by the scheduling policy
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: Failed to apply manifests to 1 clusters, please check the `failedPlacements`
status
observedGeneration: 2
reason: ApplyFailed
status: "False"
type: ClusterResourcePlacementApplied
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2023-12-06T00:09:58Z"
message: Failed to apply manifests, please check the `failedPlacements` status
observedGeneration: 2
reason: ApplyFailed
status: "False"
type: ResourceApplied
failedPlacements:
- condition:
lastTransitionTime: "2023-12-06T00:09:53Z"
message: 'Failed to apply manifest: service "guard" not found in namespace "ops"'
reason: AppliedManifestFailedReason
status: "False"
type: Applied
envelope:
name: example
type: ClusterResourceEnvelope
kind: ValidatingWebhookConfiguration
name: guard
group: admissionregistration.k8s.io
version: v1
selectedResources:
- group: placement.kubernetes-fleet.io
kind: ClusterResourceEnvelope
name: example
version: v1beta1
10 - Controlling How Fleet Handles Pre-Existing Resources
This guide provides an overview on how to set up Fleet’s takeover experience, which allows developers and admins to choose what will happen when Fleet encounters a pre-existing resource. This occurs most often in the Fleet adoption scenario, where a cluster just joins into a fleet and the system finds out that the resources to place onto the new member cluster via the CRP API have already been running there.
A concern commonly associated with this scenario is that the running (pre-existing) set of
resources might have configuration differences from their equivalents on the hub cluster,
for example: On the hub cluster one might have a namespace work
where it hosts a deployment
web-server
that runs the image rpd-stars:latest
; while on the member cluster in the same
namespace lives a deployment of the same name but with the image umbrella-biolab:latest
.
If Fleet applies the resource template from the hub cluster, unexpected service interruptions
might occur.
To address this concern, Fleet also introduces a new field, whenToTakeOver
, in the apply
strategy. Three options are available:
Always
: This is the default option 😑. With this setting, Fleet will take over a pre-existing resource as soon as it encounters it. Fleet will apply the corresponding resource template from the hub cluster, and any value differences in the managed fields will be overwritten. This is consistent with the behavior before the new takeover experience is added.IfNoDiff
: This is the new option ✨ provided by the takeover mechanism. With this setting, Fleet will check for configuration differences when it finds a pre-existing resource and will only take over the resource (apply the resource template) if no configuration differences are found. Consider using this option for a safer adoption journey.Never
: This is another new option ✨ provided by the takeover mechanism. With this setting, Fleet will ignore pre-existing resources and no apply op will be performed. This will be considered as an apply error. Use this option if you would like to check for the presence of pre-existing resources without taking any action.
Before you begin
The new takeover experience is currently in preview.
Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default).
How Fleet can be used to safely take over pre-existing resources
The steps below explain how the takeover experience functions. The code assumes that you have
a fleet of two clusters, member-1
and member-2
:
Switch to the second member cluster, and create a namespace,
work-2
, with labels:kubectl config use-context member-2-admin kubectl create ns work-2 kubectl label ns work-2 app=work-2 kubectl label ns work-2 owner=wesker
Switch to the hub cluster, and create the same namespace, but with a slightly different set of labels:
kubectl config use-context hub-admin kubectl create ns work-2 kubectl label ns work-2 app=work-2 kubectl label ns work-2 owner=redfield
Create a CRP object that places the namespace to all member clusters:
cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work-2 spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. labelSelector: matchLabels: app: work-2 policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 applyStrategy: whenToTakeOver: Never EOF
Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see a failure there that complains about an apply error on the cluster
member-2
:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' | jq # The command above uses JSON paths to query the relevant status information # directly and uses the jq utility to pretty print the output JSON. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command. # # If the output is empty, the status might have not been populated properly # yet. Retry in a few seconds; you may also want to switch the output type # from jsonpath to yaml to see the full object.
The output should look like this:
{ "clusterName": "member-1", "conditions": [ ... { ... "status": "True", "type": "Applied" } ] }, { "clusterName": "member-2", "conditions": [ ... { ... "status": "False", "type": "Applied" } ], "failedPlacements": ... }
You can take a look at the
failedPlacements
part in the placement status for error details:The output should look like this:
[ { "condition": { "lastTransitionTime": "...", "message": "Failed to apply the manifest (error: no ownership of the object in the member cluster; takeover is needed)", "reason": "NotTakenOver", "status": "False", "type": "Applied" }, "kind": "Namespace", "name": "work-2", "version": "v1" } ]
Fleet finds out that the namespace
work-2
already exists on the member cluster, and it is not owned by Fleet; since the takeover policy is set toNever
, Fleet will not assume ownership of the namespace; no apply will be performed and an apply error will be raised instead.The following
jq
query can help you better locate clusters with failed placements and their failure details:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \ | jq '[.[] | select (.failedPlacements != null)] | map({clusterName, failedPlacements})' # The command above uses JSON paths to retrieve the relevant status information # directly and uses the jq utility to query the data. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command.
It would filter out all the clusters that do not have failures and report only the failed clusters with the failure details:
{ "clusterName": "member-2", "failedPlacements": [ { "condition": { "lastTransitionTime": "...", "message": "Failed to apply the manifest (error: no ownership of the object in the member cluster; takeover is needed)", "reason": "NotTakenOver", "status": "False", "type": "Applied" }, "kind": "Namespace", "name": "work-2", "version": "v1" } ] }
Next, update the CRP object and set the
whenToTakeOver
field toIfNoDiff
:cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work-2 spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. labelSelector: matchLabels: app: work-2 policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 applyStrategy: whenToTakeOver: IfNoDiff EOF
Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see the apply op still fails.
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
Verify the error details reported in the
failedPlacements
field for another time:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \ | jq '[.[] | select (.failedPlacements != null)] | map({clusterName, failedPlacements})' # The command above uses JSON paths to retrieve the relevant status information # directly and uses the jq utility to query the data. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command.
The output has changed:
{ "clusterName": "member-2", "failedPlacements": [ { "condition": { "lastTransitionTime": "...", "message": "Failed to apply the manifest (error: cannot take over object: configuration differences are found between the manifest object and the corresponding object in the member cluster)", "reason": "FailedToTakeOver", "status": "False", "type": "Applied" }, "kind": "Namespace", "name": "work-2", "version": "v1" } ] }
Now, with the takeover policy set to
IfNoDiff
, Fleet can assume ownership of pre-existing resources; however, as a configuration difference has been found between the hub cluster and the member cluster, takeover is blocked.Similar to the drift detection mechanism, Fleet will report details about the found configuration differences as well. You can learn about them in the
diffedPlacements
part of the status.Use the
jq
query below to list all clusters with thediffedPlacements
status information populated:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \ | jq '[.[] | select (.diffedPlacements != null)] | map({clusterName, diffedPlacements})' # The command above uses JSON paths to retrieve the relevant status information # directly and uses the jq utility to query the data. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command.
{ "clusterName": "member-2", "diffedPlacements": [ { "firstDiffedObservedTime": "...", "group": "", "version": "v1", "kind": "Namespace", "name": "work-2", "observationTime": "...", "observedDiffs": [ { "path": "/metadata/labels/owner", "valueInHub": "redfield", "valueInMember": "wesker" } ], "targetClusterObservedGeneration": 0 } ] }
Fleet will report the following information about a configuration difference:
group
,kind
,version
,namespace
, andname
: the resource that has configuration differences.observationTime
: the timestamp where the current diff detail is collected.firstDiffedObservedTime
: the timestamp where the current diff is first observed.observedDiffs
: the diff details, specifically:path
: A JSON path (RFC 6901) that points to the diff’d field;valueInHub
: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.valueInMember
: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
targetClusterObservedGeneration
: the generation of the member cluster resource.
To fix the configuration difference, consider one of the following options:
- Switch the
whenToTakeOver
setting back toAlways
, which will instruct Fleet to take over the resource right away and overwrite all configuration differences; or - Edit the diff’d field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate diffs and should take over the resource soon after.
- Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource.
Here the guide will take the first option available, setting the
whenToTakeOver
field toAlways
:cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work-2 spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. labelSelector: matchLabels: app: work-2 policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 applyStrategy: whenToTakeOver: Always EOF
- Switch the
Check the CRP status; in a few seconds, Fleet will report that all objects have been applied.
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
If you switch to the member cluster
member-2
now, you should see that the object looks exactly the same as the resource template kept on the hub cluster; the owner label has been over-written.
Important
When Fleet fails to take over an object, the pre-existing resource will not be put under Fleet’s management: any change made on the hub cluster side will have no effect on the pre-existing resource. If you choose to delete the resource template, or remove the CRP object, Fleet will not attempt to delete the pre-existing resource.
Takeover and comparison options
Fleet provides a comparisonOptions
setting that allows you to fine-tune how Fleet calculates configuration differences between a resource template created on the hub cluster and the corresponding pre-existing resource on a member cluster.
Note
The
comparisonOptions
setting also controls how Fleet detects drifts. See the how-to guide on drift detection for more information.
If partialComparison
is used, Fleet will only report configuration differences in managed fields, i.e., fields that are explicitly specified in the resource template; the presence of additional fields on the member cluster side will not stop Fleet from taking over the pre-existing resource; on the contrary, with fullComparison
, Fleet will only take over a pre-existing resource if it looks exactly the same as its hub cluster counterpart.
Below is a table that summarizes the combos of different options and their respective effects:
whenToTakeOver setting | comparisonOption setting | Configuration difference scenario | Outcome |
---|---|---|---|
IfNoDiff | partialComparison | There exists a value difference in a managed field between a pre-existing resource on a member cluster and the hub cluster resource template. | Fleet will report an apply error in the status, plus the diff details. |
IfNoDiff | partialComparison | The pre-existing resource has a field that is absent on the hub cluster resource template. | Fleet will take over the resource; the configuration difference in the unmanaged field will be left untouched. |
IfNoDiff | fullComparison | Difference has been found on a field, managed or not. | Fleet will report an apply error in the status, plus the diff details. |
Always | Any option | Difference has been found on a field, managed or not. | Fleet will take over the resource; configuration differences in unmanaged fields will be left untouched. |
11 - Enabling Drift Detection in Fleet
This guide provides an overview on how to enable drift detection in Fleet. This feature can help developers and admins identify (and act upon) configuration drifts in their KubeFleet system, which are often brought by temporary fixes, inadvertent changes, and failed automations.
Before you begin
The new drift detection experience is currently in preview.
Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default).
What is a drift?
A drift occurs when a non-Fleet agent (e.g., a developer or a controller) makes changes to a field of a Fleet-managed resource directly on the member cluster side without modifying the corresponding resource template created on the hub cluster.
See the steps below for an example; the code assumes that you have a Fleet of two clusters,
member-1
and member-2
.
Switch to the hub cluster in the preview environment:
kubectl config use-context hub-admin
Create a namespace,
work
, on the hub cluster, with some labels:kubectl create ns work kubectl label ns work app=work kubectl label ns work owner=redfield
Create a CRP object, which places the namespace on all member clusters:
cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. labelSelector: matchLabels: app: work policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 EOF
Fleet should be able to finish the placement within seconds. To verify the progress, run the command below:
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work
Confirm that in the output, Fleet has reported that the placement is of the
Available
state.Switch to the first member cluster,
member-1
:kubectl config use-context member-1-admin
You should see the namespace,
work
, being placed in this member cluster:kubectl get ns work --show-labels
The output should look as follows; note that all the labels have been set (the
kubernetes.io/metadata.name
label is added by the Kubernetes system automatically):NAME STATUS AGE LABELS work Active 91m app=work,owner=redfield,kubernetes.io/metadata.name=work
Anyone with proper access to the member cluster could modify the namespace as they want; for example, one can set the
owner
label to a different value:kubectl label ns work owner=wesker --overwrite kubectl label ns work use=hack --overwrite
Now the namespace has drifted from its intended state.
Note that drifts are not necessarily a bad thing: to ensure system availability, often developers and admins would need to make ad-hoc changes to the system; for example, one might need to set a Deployment on a member cluster to use a different image from its template (as kept on the hub cluster) to test a fix. In the current version of Fleet, the system is not drift-aware, which means that Fleet will simply re-apply the resource template periodically with or without drifts.
In the case above:
Since the owner label has been set on the resource template, its value would be overwritten by Fleet, from
wesker
toredfield
, within minutes. This provides a great consistency guarantee but also blocks out all possibilities of expedient fixes/changes, which can be an inconvenience at times.The
use
label is not a part of the resource template, so it will not be affected by any apply op performed by Fleet. Its prolonged presence might pose an issue, depending on the nature of the setup.
How Fleet can be used to handle drifts gracefully
Fleet aims to provide an experience that:
- ✅ allows developers and admins to make changes on the member cluster side when necessary; and
- ✅ helps developers and admins to detect drifts, esp. long-living ones, in their systems, so that they can be handled properly; and
- ✅ grants developers and admins great flexibility on when and how drifts should be handled.
To enable the new experience, set proper apply strategies in the CRP object, as illustrated by the steps below:
Switch to the hub cluster:
kubectl config use-context hub-admin
Update the existing CRP (
work
), to use an apply strategy with thewhenToApply
field set toIfNotDrifted
:cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work. labelSelector: matchLabels: app: work policy: placementType: PickAll strategy: applyStrategy: whenToApply: IfNotDrifted # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 EOF
The
whenToApply
field features two options:Always
: this is the default option 😑. With this setting, Fleet will periodically apply the resource templates from the hub cluster to member clusters, with or without drifts. This is consistent with the behavior before the new drift detection and takeover experience.IfNotDrifted
: this is the new option ✨ provided by the drift detection mechanism. With this setting, Fleet will check for drifts periodically; if drifts are found, Fleet will stop applying the resource templates and report in the CRP status.
Switch to the first member cluster and edit the labels for a second time, effectively re-introducing a drift in the system. After it’s done, switch back to the hub cluster:
kubectl config use-context member-1-admin kubectl label ns work owner=wesker --overwrite kubectl label ns work use=hack --overwrite # kubectl config use-context hub-admin
Fleet should be able to find the drifts swiftly (w/in a few seconds). Inspect the placement status Fleet reports for each cluster:
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' | jq # The command above uses JSON paths to query the relevant status information # directly and uses the jq utility to pretty print the output JSON. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command. # # If the output is empty, the status might have not been populated properly # yet. Retry in a few seconds; you may also want to switch the output type # from jsonpath to yaml to see the full object.
The output should look like this:
{ "clusterName": "member-1", "conditions": [ ... { ... "status": "False", "type": "Applied" } ], "driftedPlacements": [ { "firstDriftedObservedTime": "...", "kind": "Namespace", "name": "work", "observationTime": "...", "observedDrifts": [ { "path": "/metadata/labels/owner", "valueInHub": "redfield", "valueInMember": "wesker" } ], "targetClusterObservedGeneration": 0, "version": "v1" } ], "failedPlacements": [ { "condition": { "lastTransitionTime": "...", "message": "Failed to apply the manifest (error: cannot apply manifest: drifts are found between the manifest and the object from the member cluster)", "reason": "FoundDrifts", "status": "False", "type": "Applied" }, "kind": "Namespace", "name": "work", "version": "v1" } ] }, { "clusterName": "member-2", "conditions": [...] }
You should see that cluster
member-1
has encountered an apply failure. ThefailedPlacements
part explains exactly which manifests have failed onmember-1
and its reason; in this case, the apply op fails as Fleet finds out that the namespacework
has drifted from its intended state. ThedriftedPlacements
part specifies in detail which fields have drifted and the value differences between the hub cluster and the member cluster.Fleet will report the following information about a drift:
group
,kind
,version
,namespace
, andname
: the resource that has drifted from its desired state.observationTime
: the timestamp where the current drift detail is collected.firstDriftedObservedTime
: the timestamp where the current drift is first observed.observedDrifts
: the drift details, specifically:path
: A JSON path (RFC 6901) that points to the drifted field;valueInHub
: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.valueInMember
: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
targetClusterObservedGeneration
: the generation of the member cluster resource.
The following
jq
query can help you better extract the drifted clusters and the drift details from the CRP status output:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work -o jsonpath='{.status.placementStatuses}' \ | jq '[.[] | select (.driftedPlacements != null)] | map({clusterName, driftedPlacements})' # The command above uses JSON paths to query the relevant status information # directly and uses the jq utility to pretty print the output JSON. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command.
This query would filter out all the clusters that do not have drifts and report only the drifted clusters with the drift details:
{ "clusterName": "member-1", "driftedPlacements": [ { "firstDriftedObservedTime": "...", "kind": "Namespace", "name": "work", "observationTime": "...", "observedDrifts": [ { "path": "/metadata/labels/owner", "valueInHub": "redfield", "valueInMember": "wesker" } ], "targetClusterObservedGeneration": 0, "version": "v1" } ] }
To fix the drift, consider one of the following options:
- Switch the
whenToApply
setting back toAlways
, which will instruct Fleet to overwrite the drifts using values from the hub cluster resource template; or - Edit the drifted field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate drifts and should report that no drifts are found soon after.
- Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource.
Important:
The presence of drifts will NOT stop Fleet from rolling out newer resource versions. If you choose to edit the resource template on the hub cluster, Fleet will always apply the new resource template in the rollout process, which may also resolve the drift.
- Switch the
Comparison options
One may have found out that the namespace on the member cluster has another drift, the
label use=hack
, which is not reported in the CRP status by Fleet. This is because by default
Fleet compares only managed fields, i.e., fields that are explicitly specified in the resource
template. If a field is not populated on the hub cluster side, Fleet will not recognize its
presence on the member cluster side as a drift. This allows controllers on the member cluster
side to manage some fields automatically without Fleet’s involvement; for example, one might would
like to use an HPA solution to auto-scale Deployments as appropriate and consequently decide not
to include the .spec.replicas
field in the resource template.
Fleet recognizes that there might be cases where developers and admins would like to have their
resources look exactly the same across their fleet. If this scenario applies, one might set up
the comparisonOptions
field in the apply strategy from the partialComparison
value
(the default) to fullComparison
:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: work
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
labelSelector:
matchLabels:
app: work
policy:
placementType: PickAll
strategy:
applyStrategy:
whenToApply: IfNotDrifted
comparisonOption: fullComparison
With this setting, Fleet will recognize the presence of any unmanaged fields (i.e., fields that are present on the member cluster side, but not set on the hub cluster side) as drifts as well. If anyone adds a field to a Fleet-managed object directly on the member cluster, it would trigger an apply error, which you can find out about the details the same way as illustrated in the section above.
Summary
Below is a summary of the synergy between the whenToApply
and comparisonOption
settings:
whenToApply setting | comparisonOption setting | Drift scenario | Outcome |
---|---|---|---|
IfNotDrifted | partialComparison | A managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited. | Fleet will report an apply error in the status, plus the drift details. |
IfNotDrifted | partialComparison | An unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added. | N/A; the change is left untouched, and Fleet will ignore it. |
IfNotDrifted | fullComparison | Any field is edited/added. | Fleet will report an apply error in the status, plus the drift details. |
Always | partialComparison | A managed field (i.e., a field that has been explicitly set in the hub cluster resource template) is edited. | N/A; the change is overwritten shortly. |
Always | partialComparison | An unmanaged field (i.e., a field that has not been explicitly set in the hub cluster resource template) is edited/added. | N/A; the change is left untouched, and Fleet will ignore it. |
Always | fullComparison | Any field is edited/added. | The change on managed fields will be overwritten shortly; Fleet will report drift details about changes on unmanaged fields, but this is not considered as an apply error. |
12 - Using the ReportDiff Apply Mode
This guide provides an overview on how to use the ReportDiff
apply mode, which allows one to
easily evaluate how things will change in the system without the risk of incurring unexpected
changes. In this mode, Fleet will check for configuration differences between the hub cluster
resource templates and their corresponding resources on the member clusters, but will not
perform any apply op. This is most helpful in cases of experimentation and drift/diff analysis.
How the ReportDiff
mode can help
To use this mode, simply set the type
field in the apply strategy part of the CRP API
from ClientSideApply
(the default) or ServerSideApply
to ReportDiff
. Configuration
differences are checked per comparisonOption
setting, in consistency with the behavior
documented in the drift detection how-to guide; see the document for more information.
The steps below might help explain the workflow better; it assumes that you have a fleet
of two member clusters, member-1
and member-2
:
Switch to the hub cluster and create a namespace,
work-3
, with some labels.kubectl config use-context hub-admin kubectl create ns work-3 kubectl label ns work-3 app=work-3 kubectl label ns work-3 owner=leon
Create a CRP object that places the namespace to all member clusters:
cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work-3 spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work-3. labelSelector: matchLabels: app: work-3 policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 EOF
In a few seconds, Fleet will complete the placement. Verify that the CRP is available by checking its status.
After the CRP becomes available, edit its apply strategy and set it to use the ReportDiff mode:
cat <<EOF | kubectl apply -f - # The YAML configuration of the CRP object. apiVersion: placement.kubernetes-fleet.io/v1beta1 kind: ClusterResourcePlacement metadata: name: work-3 spec: resourceSelectors: - group: "" kind: Namespace version: v1 # Select all namespaces with the label app=work-3. labelSelector: matchLabels: app: work-3 policy: placementType: PickAll strategy: # For simplicity reasons, the CRP is configured to roll out changes to # all member clusters at once. This is not a setup recommended for production # use. type: RollingUpdate rollingUpdate: maxUnavailable: 100% unavailablePeriodSeconds: 1 applyStrategy: type: ReportDiff EOF
The CRP should remain available, as currently there is no configuration difference at all. Check the
ClusterResourcePlacementDiffReported
condition in the status; it should report no error:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-3 -o jsonpath='{.status.conditions[?(@.type=="ClusterResourcePlacementDiffReported")]}' | jq # The command above uses JSON paths to query the drift details directly and # uses the jq utility to pretty print the output JSON. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command. # # If the output is empty, the status might have not been populated properly # yet. You can switch the output type from jsonpath to yaml to see the full # object.
{ "lastTransitionTime": "2025-03-19T06:45:58Z", "message": "Diff reporting in 2 cluster(s) has been completed", "observedGeneration": ..., "reason": "DiffReportingCompleted", "status": "True", "type": "ClusterResourcePlacementDiffReported" }
Now, switch to the second member cluster and make a label change on the applied namespace. After the change is done, switch back to the hub cluster.
kubectl config use-context member-2-admin kubectl label ns work-3 owner=krauser --overwrite # kubectl config use-context hub-admin
Fleet will detect this configuration difference shortly (w/in 15 seconds). Verify that the diff details have been added to the CRP status, specifically reported in the
diffedPlacements
part of the status; thejq
query below will list all the clusters with thediffedPlacements
status information populated:kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-3 -o jsonpath='{.status.placementStatuses}' \ | jq '[.[] | select (.diffedPlacements != null)] | map({clusterName, diffedPlacements})' # The command above uses JSON paths to retrieve the relevant status information # directly and uses the jq utility to query the data. # # jq might not be available in your environment. You may have to install it # separately, or omit it from the command.
The output should be as follows:
{ "clusterName": "member-2", "diffedPlacements": [ { "firstDiffedObservedTime": "2025-03-19T06:49:54Z", "kind": "Namespace", "name": "work-3", "observationTime": "2025-03-19T06:50:25Z", "observedDiffs": [ { "path": "/metadata/labels/owner", "valueInHub": "leon", "valueInMember": "krauser" } ], "targetClusterObservedGeneration": 0, "version": "v1" } ] }
Fleet will report the following information about a configuration difference:
group
,kind
,version
,namespace
, andname
: the resource that has configuration differences.observationTime
: the timestamp where the current diff detail is collected.firstDiffedObservedTime
: the timestamp where the current diff is first observed.observedDiffs
: the diff details, specifically:path
: A JSON path (RFC 6901) that points to the diff’d field;valueInHub
: the value at the JSON path as seen from the hub cluster resource template (the desired state). If this value is absent, the field does not exist in the resource template.valueInMember
: the value at the JSON path as seen from the member cluster resource (the current state). If this value is absent, the field does not exist in the current state.
targetClusterObservedGeneration
: the generation of the member cluster resource.
More information on the ReportDiff mode
- As mentioned earlier, with this mode no apply op will be run at all; it is up to the user to decide the best way to handle found configuration differences (if any).
- Diff reporting becomes successful and complete as soon as Fleet finishes checking all the resources;
whether configuration differences are found or not has no effect on the diff reporting success status.
- When a resource change has been applied on the hub cluster side, for CRPs of the ReportDiff mode, the change will be immediately rolled out to all member clusters (when the rollout strategy is set to RollingUpdate, the default type), as soon as they have completed diff reporting earlier.
- It is worth noting that Fleet will only report differences on resources that have corresponding manifests on the hub cluster. If, for example, a namespace-scoped object has been created on the member cluster but not on the hub cluster, Fleet will ignore the object, even if its owner namespace has been selected for placement.
13 - How to Roll Out and Roll Back Changes in Stage
ClusterStagedUpdateRun
APIThis how-to guide demonstrates how to use ClusterStagedUpdateRun
to rollout resources to member clusters in a staged manner and rollback resources to a previous version.
Prerequisite
ClusterStagedUpdateRun
CR is used to deploy resources from hub cluster to member clusters with ClusterResourcePlacement
(or CRP) in a stage by stage manner. This tutorial is based on a demo fleet environment with 3 member clusters:
cluster name | labels |
---|---|
member1 | environment=canary, order=2 |
member2 | environment=staging |
member3 | environment=canary, order=1 |
To demonstrate the rollout and rollback behavior, we create a demo namespace and a sample configmap with very simple data on the hub cluster. The namespace with configmap will be deployed to the member clusters.
kubectl create ns test-namespace
kubectl create cm test-cm --from-literal=key=value1 -n test-namespace
Now we create a ClusterResourcePlacement
to deploy the resources:
kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: example-placement
spec:
resourceSelectors:
- group: ""
kind: Namespace
name: test-namespace
version: v1
policy:
placementType: PickAll
strategy:
type: External
EOF
Note that spec.strategy.type
is set to External
to allow rollout triggered with a ClusterStagedUpdateRun.
Both clusters should be scheduled since we use the PickAll
policy but at the moment no resource should be deployed on the member clusters because we haven’t created a ClusterStagedUpdateRun
yet. The CRP is not AVAILABLE yet.
kubectl get crp example-placement
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
example-placement 1 True 1 8s
Check resource snapshot versions
Fleet keeps a list of resource snapshots for version control and audit, (for more details, please refer to api-reference).
To check current resource snapshots:
kubectl get clusterresourcesnapshots --show-labels
NAME GEN AGE LABELS
example-placement-0-snapshot 1 7m31s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
We only have one version of the snapshot. It is the current latest (kubernetes-fleet.io/is-latest-snapshot=true
) and has resource-index 0 (kubernetes-fleet.io/resource-index=0
).
Now we modify the our configmap with a new value value2
:
kubectl edit cm test-cm -n test-namespace
kubectl get configmap test-cm -n test-namespace -o yaml
apiVersion: v1
data:
key: value2 # value updated here, old value: value1
kind: ConfigMap
metadata:
creationTimestamp: ...
name: test-cm
namespace: test-namespace
resourceVersion: ...
uid: ...
It now shows 2 versions of resource snapshots with index 0 and 1 respectively:
kubectl get clusterresourcesnapshots --show-labels
NAME GEN AGE LABELS
example-placement-0-snapshot 1 17m kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=0
example-placement-1-snapshot 1 2m2s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=example-placement,kubernetes-fleet.io/resource-index=1
The latest
label set to example-placement-1-snapshot
which contains the latest configmap data:
kubectl get clusterresourcesnapshots example-placement-1-snapshot -o yaml
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterResourceSnapshot
metadata:
...
labels:
kubernetes-fleet.io/is-latest-snapshot: "true"
kubernetes-fleet.io/parent-CRP: example-placement
kubernetes-fleet.io/resource-index: "1"
name: example-placement-1-snapshot
...
spec:
selectedResources:
- apiVersion: v1
kind: Namespace
metadata:
labels:
kubernetes.io/metadata.name: test-namespace
name: test-namespace
spec:
finalizers:
- kubernetes
- apiVersion: v1
data:
key: value2 # latest value: value2, old value: value1
kind: ConfigMap
metadata:
name: test-cm
namespace: test-namespace
Deploy a ClusterStagedUpdateStrategy
A ClusterStagedUpdateStrategy
defines the orchestration pattern that groups clusters into stages and specifies the rollout sequence.
It selects member clusters by labels. For our demonstration, we create one with two stages:
kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateStrategy
metadata:
name: example-strategy
spec:
stages:
- name: staging
labelSelector:
matchLabels:
environment: staging
afterStageTasks:
- type: TimedWait
waitTime: 1m
- name: canary
labelSelector:
matchLabels:
environment: canary
sortingLabelKey: order
afterStageTasks:
- type: Approval
EOF
Deploy a ClusterStagedUpdateRun to rollout latest change
A ClusterStagedUpdateRun
executes the rollout of a ClusterResourcePlacement
following a ClusterStagedUpdateStrategy
. To trigger the staged update run for our CRP, we create a ClusterStagedUpdateRun
specifying the CRP name, updateRun strategy name, and the latest resource snapshot index (“1”):
kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
name: example-run
spec:
placementName: example-placement
resourceSnapshotIndex: "1"
stagedRolloutStrategyName: example-strategy
EOF
The staged update run is initialized and running:
kubectl get csur example-run
NAME PLACEMENT RESOURCE-SNAPSHOT POLICY-SNAPSHOT INITIALIZED SUCCEEDED AGE
example-run example-placement 1 0 True 44s
A more detailed look at the status:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
...
name: example-run
...
spec:
placementName: example-placement
resourceSnapshotIndex: "1"
stagedRolloutStrategyName: example-strategy
status:
conditions:
- lastTransitionTime: ...
message: ClusterStagedUpdateRun initialized successfully
observedGeneration: 1
reason: UpdateRunInitializedSuccessfully
status: "True" # the updateRun is initialized successfully
type: Initialized
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: UpdateRunStarted
status: "True"
type: Progressing # the updateRun is still running
deletionStageStatus:
clusters: [] # no clusters need to be cleaned up
stageName: kubernetes-fleet.io/deleteStage
policyObservedClusterCount: 3 # number of clusters to be updated
policySnapshotIndexUsed: "0"
stagedUpdateStrategySnapshot: # snapshot of the strategy
stages:
- afterStageTasks:
- type: TimedWait
waitTime: 1m0s
labelSelector:
matchLabels:
environment: staging
name: staging
- afterStageTasks:
- type: Approval
labelSelector:
matchLabels:
environment: canary
name: canary
sortingLabelKey: order
stagesStatus: # detailed status for each stage
- afterStageTaskStatus:
- conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: AfterStageTaskWaitTimeElapsed
status: "True" # the wait after-stage task has completed
type: WaitTimeElapsed
type: TimedWait
clusters:
- clusterName: member2 # stage staging contains member2 cluster only
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True" # member2 is updated successfully
type: Succeeded
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingWaiting
status: "False"
type: Progressing
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "True" # stage staging has completed successfully
type: Succeeded
endTime: ...
stageName: staging
startTime: ...
- afterStageTaskStatus:
- approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage
type: Approval
clusters:
- clusterName: member3 # according the labelSelector and sortingLabelKey, member3 is selected first in this stage
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True" # member3 update is completed
type: Succeeded
- clusterName: member1 # member1 is selected after member3 because of order=2 label
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True" # member1 update has not finished yet
type: Started
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingStarted
status: "True" # stage canary is still executing
type: Progressing
stageName: canary
startTime: ...
Wait a little bit more, and we can see stage canary
finishes cluster update and is waiting for the Approval task.
We can check the ClusterApprovalRequest
generated and not approved yet:
kubectl get clusterapprovalrequest
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-canary example-run canary 2m2s
We can approve the ClusterApprovalRequest
by patching its status:
kubectl patch clusterapprovalrequests example-run-canary --type=merge -p {"status":{"conditions":[{"type":"Approved","status":"True","reason":"lgtm","message":"lgtm","lastTransitionTime":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","observedGeneration":1}]}} --subresource=status
clusterapprovalrequest.placement.kubernetes-fleet.io/example-run-canary patched
This can be done equivalently by creating a json patch file and applying it:
cat << EOF > approval.json
"status": {
"conditions": [
{
"lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"message": "lgtm",
"observedGeneration": 1,
"reason": "lgtm",
"status": "True",
"type": "Approved"
}
]
}
EOF
kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json
Then verify it’s approved:
kubectl get clusterapprovalrequest
NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE
example-run-canary example-run canary True True 2m30s
The updateRun now is able to proceed and complete:
kubectl get csur example-run
NAME PLACEMENT RESOURCE-SNAPSHOT POLICY-SNAPSHOT INITIALIZED SUCCEEDED AGE
example-run example-placement 1 0 True True 4m22s
The CRP also shows rollout has completed and resources are available on all member clusters:
kubectl get crp example-placement
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
example-placement 1 True 1 True 1 134m
The configmap test-cm
should be deployed on all 3 member clusters, with latest data:
data:
key: value2
Deploy a second ClusterStagedUpdateRun to rollback to a previous version
Now suppose the workload admin wants to rollback the configmap change, reverting the value value2
back to value1
.
Instead of manually updating the configmap from hub, they can create a new ClusterStagedUpdateRun
with a previous resource snapshot index, “0” in our context and they can reuse the same strategy:
kubectl apply -f - << EOF
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
name: example-run-2
spec:
placementName: example-placement
resourceSnapshotIndex: "0"
stagedRolloutStrategyName: example-strategy
EOF
Following the same step as deploying the first updateRun, the second updateRun should succeed also. Complete status shown as below:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterStagedUpdateRun
metadata:
...
name: example-run-2
...
spec:
placementName: example-placement
resourceSnapshotIndex: "0"
stagedRolloutStrategyName: example-strategy
status:
conditions:
- lastTransitionTime: ...
message: ClusterStagedUpdateRun initialized successfully
observedGeneration: 1
reason: UpdateRunInitializedSuccessfully
status: "True"
type: Initialized
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: UpdateRunStarted
status: "True"
type: Progressing
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: UpdateRunSucceeded # updateRun succeeded
status: "True"
type: Succeeded
deletionStageStatus:
clusters: []
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingStarted
status: "True"
type: Progressing
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "True" # no clusters in the deletion stage, it completes directly
type: Succeeded
endTime: ...
stageName: kubernetes-fleet.io/deleteStage
startTime: ...
policyObservedClusterCount: 3
policySnapshotIndexUsed: "0"
stagedUpdateStrategySnapshot:
stages:
- afterStageTasks:
- type: TimedWait
waitTime: 1m0s
labelSelector:
matchLabels:
environment: staging
name: staging
- afterStageTasks:
- type: Approval
labelSelector:
matchLabels:
environment: canary
name: canary
sortingLabelKey: order
stagesStatus:
- afterStageTaskStatus:
- conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: AfterStageTaskWaitTimeElapsed
status: "True"
type: WaitTimeElapsed
type: TimedWait
clusters:
- clusterName: member2
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True"
type: Succeeded
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingWaiting
status: "False"
type: Progressing
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "True"
type: Succeeded
endTime: ...
stageName: staging
startTime: ...
- afterStageTaskStatus:
- approvalRequestName: example-run-2-canary
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: AfterStageTaskApprovalRequestCreated
status: "True"
type: ApprovalRequestCreated
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: AfterStageTaskApprovalRequestApproved
status: "True"
type: ApprovalRequestApproved
type: Approval
clusters:
- clusterName: member3
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True"
type: Succeeded
- clusterName: member1
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingStarted
status: "True"
type: Started
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: ClusterUpdatingSucceeded
status: "True"
type: Succeeded
conditions:
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingWaiting
status: "False"
type: Progressing
- lastTransitionTime: ...
message: ""
observedGeneration: 1
reason: StageUpdatingSucceeded
status: "True"
type: Succeeded
endTime: ...
stageName: canary
startTime: ...
The configmap test-cm
should be updated on all 3 member clusters, with old data:
data:
key: value1
14 - Evicting Resources and Setting up Disruption Budgets
This how-to guide discusses how to create ClusterResourcePlacementEviction
objects and ClusterResourcePlacementDisruptionBudget
objects to evict resources from member clusters and protect resources on member clusters from voluntary disruption, respectively.
Evicting Resources from Member Clusters using ClusterResourcePlacementEviction
The ClusterResourcePlacementEviction
object is used to remove resources from a member cluster once the resources have already been propagated from the hub cluster.
To successfully evict resources from a cluster, the user needs to specify:
- The name of the
ClusterResourcePlacement
object which propagated resources to the target cluster. - The name of the target cluster from which we need to evict resources.
In this example, we will create a ClusterResourcePlacement
object with PickAll placement policy to propagate resources to an existing MemberCluster
, add a taint to the member cluster
resource and then create a ClusterResourcePlacementEviction
object to evict resources from the MemberCluster
.
We will first create a namespace that we will propagate to the member cluster.
kubectl create ns test-ns
Then we will apply a ClusterResourcePlacement
with the following spec:
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickN
numberOfClusters: 1
The CRP
status after applying should look something like this:
kubectl get crp test-crp
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
test-crp 2 True 2 True 2 5m49s
let’s now add a taint to the member cluster to ensure this cluster is not picked again by the scheduler once we evict resources from it.
Modify the cluster object to add a taint:
spec:
heartbeatPeriodSeconds: 60
identity:
kind: ServiceAccount
name: fleet-member-agent-cluster-1
namespace: fleet-system
taints:
- effect: NoSchedule
key: test-key
value: test-value
Now we will create a ClusterResourcePlacementEviction
object to evict resources from the member cluster:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementEviction
metadata:
name: test-eviction
spec:
placementName: test-crp
clusterName: kind-cluster-1
the eviction object should look like this, if the eviction was successful:
kubectl get crpe test-eviction
NAME VALID EXECUTED
test-eviction True True
since the eviction is successful, the resources should be removed from the cluster, let’s take a look at the CRP
object status to verify:
kubectl get crp test-crp
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
test-crp 2 True 2 15m
from the object we can clearly tell that the resources were evicted since the AVAILABLE
column is empty. If the user needs more information ClusterResourcePlacement
object’s status can be checked.
Protecting resources from voluntary disruptions using ClusterResourcePlacementDisruptionBudget
In this example, we will create a ClusterResourcePlacement
object with PickN placement policy to propagate resources to an existing MemberCluster,
then create a ClusterResourcePlacementDisruptionBudget
object to protect resources on the MemberCluster from voluntary disruption and
then try to evict resources from the MemberCluster using ClusterResourcePlacementEviction
.
We will first create a namespace that we will propagate to the member cluster.
kubectl create ns test-ns
Then we will apply a ClusterResourcePlacement
with the following spec:
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickN
numberOfClusters: 1
The CRP
object after applying should look something like this:
kubectl get crp test-crp
NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE
test-crp 2 True 2 True 2 8s
Now we will create a ClusterResourcePlacementDisruptionBudget
object to protect resources on the member cluster from voluntary disruption:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementDisruptionBudget
metadata:
name: test-crp
spec:
minAvailable: 1
Note: An eviction object is only reconciled once, after which it reaches a terminal state, if the user desires to create/apply the same eviction object again they need to delete the existing eviction object and re-create the object for the eviction to occur again.
Now we will create a ClusterResourcePlacementEviction
object to evict resources from the member cluster:
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacementEviction
metadata:
name: test-eviction
spec:
placementName: test-crp
clusterName: kind-cluster-1
Note: The eviction controller will try to get the corresponding
ClusterResourcePlacementDisruptionBudget
object when aClusterResourcePlacementEviction
object is reconciled to check if the specified MaxAvailable or MinAvailable allows the eviction to be executed.
let’s take a look at the eviction object to see if the eviction was executed,
kubectl get crpe test-eviction
NAME VALID EXECUTED
test-eviction True False
from the eviction object we can see the eviction was not executed.
let’s take a look at the ClusterResourcePlacementEviction
object status to verify why the eviction was not executed:
status:
conditions:
- lastTransitionTime: "2025-01-21T15:52:29Z"
message: Eviction is valid
observedGeneration: 1
reason: ClusterResourcePlacementEvictionValid
status: "True"
type: Valid
- lastTransitionTime: "2025-01-21T15:52:29Z"
message: 'Eviction is blocked by specified ClusterResourcePlacementDisruptionBudget,
availablePlacements: 1, totalPlacements: 1'
observedGeneration: 1
reason: ClusterResourcePlacementEvictionNotExecuted
status: "False"
type: Executed
the eviction status clearly mentions that the eviction was blocked by the specified ClusterResourcePlacementDisruptionBudget
.