The default OpenShift Container Platform pod scheduler is responsible for determining placement of new pods onto nodes within the cluster. It reads data from the pod and tries to find a node that is a good fit based on configured policies. It is completely independent and exists as a standalone/pluggable solution. It does not modify the pod and just creates a binding for the pod that ties the pod to the particular node.

Sample default scheduler object
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  creationTimestamp: 2019-05-20T15:39:01Z
  generation: 1
  name: cluster
  resourceVersion: "1491"
  selfLink: /apis/config.openshift.io/v1/schedulers/cluster
  uid: 6435dd99-7b15-11e9-bd48-0aec821b8e34
spec:
  policy: (1)
    name: scheduler-policy
  defaultNodeSelector: type=user-node,region=east (2)
1 You can specify the name of a custom scheduler policy file.
2 Optionally, specify a default node selector to restrict pod placement to specific nodes.

Understanding default scheduling

The existing generic scheduler is the default platform-provided scheduler engine that selects a node to host the pod in a three-step operation:

Filters the Nodes

The available nodes are filtered based on the constraints or requirements specified. This is done by running each node through the list of filter functions called predicates.

Prioritize the Filtered List of Nodes

This is achieved by passing each node through a series of priority_ functions that assign it a score between 0 - 10, with 0 indicating a bad fit and 10 indicating a good fit to host the pod. The scheduler configuration can also take in a simple weight (positive numeric value) for each priority function. The node score provided by each priority function is multiplied by the weight (default weight for most priorities is 1) and then combined by adding the scores for each node provided by all the priorities. This weight attribute can be used by administrators to give higher importance to some priorities.

Select the Best Fit Node

The nodes are sorted based on their scores and the node with the highest score is selected to host the pod. If multiple nodes have the same high score, then one of them is selected at random.

Understanding Scheduler Policy

The selection of the predicate and priorities defines the policy for the scheduler.

The scheduler configuration file is a JSON file that specifies the predicates and priorities the scheduler will consider.

In the absence of the scheduler policy file, the default scheduler behavior is used.

The predicates and priorities defined in the scheduler configuration file completely override the default scheduler policy. If any of the default predicates and priorities are required, you must explicitly specify the functions in the policy configuration.

Sample scheduler configuration file
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
        {"name" : "PodFitsHostPorts"},
        {"name" : "PodFitsResources"},
        {"name" : "NoDiskConflict"},
        {"name" : "NoVolumeZoneConflict"},
        {"name" : "MatchNodeSelector"},
        {"name" : "HostName"}
        ],
"priorities" : [
        {"name" : "LeastRequestedPriority", "weight" : 1},
        {"name" : "BalancedResourceAllocation", "weight" : 1},
        {"name" : "ServiceSpreadingPriority", "weight" : 1},
        {"name" : "EqualPriority", "weight" : 1}
        ]
}

Scheduler Use Cases

One of the important use cases for scheduling within OpenShift Container Platform is to support flexible affinity and anti-affinity policies.

Infrastructure Topological Levels

Administrators can define multiple topological levels for their infrastructure (nodes) by specifying labels on nodes. For example: region=r1, zone=z1, rack=s1.

These label names have no particular meaning and administrators are free to name their infrastructure levels anything, such as city/building/room. Also, administrators can define any number of levels for their infrastructure topology, with three levels usually being adequate (such as: regionszonesracks). Administrators can specify affinity and anti-affinity rules at each of these levels in any combination.

Affinity

Administrators should be able to configure the scheduler to specify affinity at any topological level, or even at multiple levels. Affinity at a particular level indicates that all pods that belong to the same service are scheduled onto nodes that belong to the same level. This handles any latency requirements of applications by allowing administrators to ensure that peer pods do not end up being too geographically separated. If no node is available within the same affinity group to host the pod, then the pod is not scheduled.

If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.

These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.

Anti-Affinity

Administrators should be able to configure the scheduler to specify anti-affinity at any topological level, or even at multiple levels. Anti-affinity (or 'spread') at a particular level indicates that all pods that belong to the same service are spread across nodes that belong to that level. This ensures that the application is well spread for high availability purposes. The scheduler tries to balance the service pods across all applicable nodes as evenly as possible.

If you need greater control over where the pods are scheduled, see Controlling pod placement on nodes using node affinity rules and Placing pods relative to other pods using affinity and anti-affinity rules.

These advanced scheduling features allow administrators to specify which node a pod can be scheduled on and to force or reject scheduling relative to other pods.

Creating a scheduler policy file

You can control change the default scheduling behavior using a ConfigMap in the openshift-config project. Add and remove predicates and priorities to the ConfigMap to create a scheduler policy.

Sample scheduler configuration map
kind: ConfigMap
apiVersion: v1
metadata:
  name: scheduler-policy
  namespace: openshift-config
  selfLink: /api/v1/namespaces/openshift-config/configmaps/mypolicy
  uid: 83917dfb-4422-11e9-b2c9-0a5e37b2b12e
  resourceVersion: '1049773'
  creationTimestamp: '2019-03-11T17:24:23Z'
data:
  policy.cfg: |
    {
    "kind" : "Policy",
    "apiVersion" : "v1",
    "predicates" : [
            {"name" : "MaxGCEPDVolumeCount"},
            {"name" : "GeneralPredicates"},
            {"name" : "MaxAzureDiskVolumeCount"},
            {"name" : "MaxCSIVolumeCountPred"},
            {"name" : "CheckVolumeBinding"},
            {"name" : "MaxEBSVolumeCount"},
            {"name" : "PodFitsResources"},
            {"name" : "MatchInterPodAffinity"},
            {"name" : "CheckNodeUnschedulable"},
            {"name" : "NoDiskConflict"},
            {"name" : "CheckServiceAffinity"},
            {"name" : "NoVolumeZoneConflict"},
            {"name" : "MatchNodeSelector"},
            {"name" : "PodToleratesNodeNoExecuteTaints"},
            {"name" : "HostName"},
            {"name" : "PodToleratesNodeTaints"}
            ],
    "priorities" : [
            {"name" : "LeastRequestedPriority", "weight" : 1},
            {"name" : "BalancedResourceAllocation", "weight" : 1},
            {"name" : "ServiceSpreadingPriority", "weight" : 1},
            {"name" : "NodePreferAvoidPodsPriority", "weight" : 1},
            {"name" : "NodeAffinityPriority", "weight" : 1},
            {"name" : "TaintTolerationPriority", "weight" : 1},
            {"name" : "ImageLocalityPriority", "weight" : 1},
            {"name" : "SelectorSpreadPriority", "weight" : 1},
            {"name" : "InterPodAffinityPriority", "weight" : 1},
            {"name" : "EqualPriority", "weight" : 1}
            ]
    }
Procedure

To create the scheduler policy:

  1. Create the a JSON file with the desired predicates and priorities.

    Sample scheduler JSON file
    {
    "kind" : "Policy",
    "apiVersion" : "v1",
    "predicates" : [      (1)
            {"name" : "PodFitsHostPorts"},
            {"name" : "PodFitsResources"},
            {"name" : "NoDiskConflict"},
            {"name" : "NoVolumeZoneConflict"},
            {"name" : "MatchNodeSelector"},
            {"name" : "HostName"}
            ],
    "priorities" : [     (2)
            {"name" : "LeastRequestedPriority", "weight" : 1},
            {"name" : "BalancedResourceAllocation", "weight" : 1},
            {"name" : "ServiceSpreadingPriority", "weight" : 1},
            {"name" : "EqualPriority", "weight" : 1}
            ]
    }
    1 Add the predicates as needed.
    2 Add the priorities as needed.
  2. Create a ConfigMap based on the JSON file:

    $ oc create configmap -n openshift-config --from-file=policy.cfg <configmap-name>

    For example:

    $ oc create configmap -n openshift-config --from-file=policy.cfg scheduler-policy
    
    configmap/scheduler-policy created
  3. Edit the Scheduler Operator Custom Resource to add the ConfigMap:

    $ oc edit scheduler cluster
    apiVersion: config.openshift.io/v1
    kind: Scheduler
    metadata:
      name: cluster
    spec: {}
      policy:
        name: scheduler-policy

Modifying scheduler policies

You change scheduling behavior by creating or editing your scheduler policy ConfigMap in the openshift-config project. Add and remove predicates and priorities to the ConfigMap to create a scheduler policy.

Typical predicate string
\n\t{\"name\" : \"<PredicateName>\", \"label\" : \"<label>\",  \"<condition>\" : \"<state>\"},
  • name is the name of the predicate, such as labelsPresence.

  • label and <label> is the node label:value pair to match to apply the predicate, such label:rack.

  • <condition> and <state> is when the predicate should be applied, such as presence:true.

Typical priority string
\n\t{\"name\" : \"<PredicateName>\", \"label\" : \"<label>\",  \"<condition>\" : \"<state>\", \"weight\" : <weight>},
  • name is the name of the priority, such as labelsPresence.

  • label and <label> is the node label:value pair to match to apply the priority, such label:rack.

  • <condition> and <state> is when the priority should be applied, such as presence:true.

  • weight and `<weight> is the numerical weight to apply to the priority.

Procedure

To modify the scheduler policy:

Edit the scheduler configuration file to configure the desired predicates and priorities.

Sample modified scheduler configuration map
kind: ConfigMap
apiVersion: v1
metadata:
  name: scheduler-policy
  namespace: openshift-config
  selfLink: /api/v1/namespaces/openshift-config/configmaps/mypolicy
  uid: 83917dfb-4422-11e9-b2c9-0a5e37b2b12e
  resourceVersion: '1049773'
  creationTimestamp: '2019-03-11T17:24:23Z'
data:
  policy.cfg: "{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"PodFitsHostPorts\"},\n\t{\"name\" : \"PodFitsResources\"},\n\t{\"name\" : \"NoDiskConflict\"},\n\t{\"name\" : \"NoVolumeZoneConflict\"},\n\t{\"name\" : \"MatchNodeSelector\"},\n\t{\"name\" : \"HostName\"}\n\t],\n\"priorities\" : [\n\t{\"name\" : \"LeastRequestedPriority\", \"weight\" : 10},\n\t{\"name\" : \"BalancedResourceAllocation\", \"weight\" : 1},\n\t{\"name\" : \"ServiceSpreadingPriority\", \"weight\" : 1},\n\t{\"name\" : \"EqualPriority\", \"weight\" : 1}\n\t]\n}\n"

For example, the following strings add the labelpresence predicate requiring the rack label on the nodes and the labelPreference priority giving a weight of 2 to the rack label:

\n\t{\"name\" : \"labelPresence\", \"label\" : \"rack\",  \"presence\" : \"true\"},
\n\t{\"name\" : \"labelPreference\", \"label\" : \"rack\", \"presence\" : \"true\", \"weight\" : 2},\n\t

The ConfigMap appears as following with the new priority:

policy.cfg: "{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"PodFitsHostPorts\"},\n\t{\"name\" : \"PodFitsResources\"},\n\t{\"name\" : \"NoDiskConflict\"},\n\t{\"name\" : \"NoVolumeZoneConflict\"},\n\t{\"name\" : \"MatchNodeSelector\"},\n\t{\"name\" : \"HostName\"},\n\t{\"name\" : \"labelPresence\", \"label\" : \"rack\",  \"presence\" : \"true\"}\n\t],\n\"priorities\" : [\n\t{\"name\" : \"LeastRequestedPriority\", \"weight\" : 10},\n\t{\"name\" : \"BalancedResourceAllocation\", \"weight\" : 1},\n\t{\"name\" : \"ServiceSpreadingPriority\", \"weight\" : 1},\n\t{\"name\" : \"EqualPriority\", \"weight\" : 1},\n\t{\"name\" : \"labelPreference\", \"label\" : \"rack\", \"presence\" : \"true\", \"weight\" : 2},\n\t]\n}\n "

Understanding the scheduler predicates

Predicates are rules that filter out unqualified nodes.

There are several predicates provided by default in OpenShift Container Platform. Some of these predicates can be customized by providing certain parameters. Multiple predicates can be combined to provide additional filtering of nodes.

Static Predicates

These predicates do not take any configuration parameters or inputs from the user. These are specified in the scheduler configuration using their exact name.

Default Predicates

The default scheduler policy includes the following predicates:

NoVolumeZoneConflict checks that the volumes a pod requests are available in the zone.

{"name" : "NoVolumeZoneConflict"}

MaxEBSVolumeCount checks the maximum number of volumes that can be attached to an AWS instance.

{"name" : "MaxEBSVolumeCount"}

MaxGCEPDVolumeCount checks the maximum number of Google Compute Engine (GCE) Persistent Disks (PD).

{"name" : "MaxGCEPDVolumeCount"}

MatchInterPodAffinity checks if the pod affinity/anti-affinity rules permit the pod.

{"name" : "MatchInterPodAffinity"}

NoDiskConflict checks if the volume requested by a pod is available.

{"name" : "NoDiskConflict"}

PodToleratesNodeTaints checks if a pod can tolerate the node taints.

{"name" : "PodToleratesNodeTaints"}

CheckNodeMemoryPressure checks if a pod can be scheduled on a node with a memory pressure condition.

{"name" : "CheckNodeMemoryPressure"}
Other Static Predicates

OpenShift Container Platform also supports the following predicates:

CheckNodeDiskPressure checks if a pod can be scheduled on a node with a disk pressure condition.

{"name" : "CheckNodeDiskPressure"}

CheckVolumeBinding evaluates if a pod can fit based on the volumes, it requests, for both bound and unbound PVCs. * For PVCs that are bound, the predicate checks that the corresponding PV’s node affinity is satisfied by the given node. * For PVCs that are unbound, the predicate searched for available PVs that can satisfy the PVC requirements and that the PV node affinity is satisfied by the given node.

The predicate returns true if all bound PVCs have compatible PVs with the node, and if all unbound PVCs can be matched with an available and node-compatible PV.

{"name" : "CheckVolumeBinding"}

The CheckVolumeBinding predicate must be enabled in non-default schedulers.

CheckNodeCondition checks if a pod can be scheduled on a node reporting out of disk, network unavailable, or not ready conditions.

{"name" : "CheckNodeCondition"}

PodToleratesNodeNoExecuteTaints checks if a pod tolerations can tolerate a node NoExecute taints.

{"name" : "PodToleratesNodeNoExecuteTaints"}

CheckNodeLabelPresence checks if all of the specified labels exist on a node, regardless of their value.

{"name" : "CheckNodeLabelPresence"}

checkServiceAffinity checks that ServiceAffinity labels are homogeneous for pods that are scheduled on a node.

{"name" : "checkServiceAffinity"}

MaxAzureDiskVolumeCount checks the maximum number of Azure Disk Volumes.

{"name" : "MaxAzureDiskVolumeCount"}

General Predicates

The following general predicates check whether non-critical predicates and essential predicates pass. Non-critical predicates are the predicates that only non-critical pods must pass and essential predicates are the predicates that all pods must pass.

The default scheduler policy includes the general predicates.

Non-critical general predicates

PodFitsResources determines a fit based on resource availability (CPU, memory, GPU, and so forth). The nodes can declare their resource capacities and then pods can specify what resources they require. Fit is based on requested, rather than used resources.

{"name" : "PodFitsResources"}
Essential general predicates

PodFitsHostPorts determines if a node has free ports for the requested pod ports (absence of port conflicts).

{"name" : "PodFitsHostPorts"}

HostName determines fit based on the presence of the Host parameter and a string match with the name of the host.

{"name" : "HostName"}

MatchNodeSelector determines fit based on node selector (nodeSelector) queries defined in the pod.

{"name" : "MatchNodeSelector"}

Configurable Predicates

You can configure these predicates in the scheduler policy Configmap, policy-configmap in the openshift-config project, to add labels to affect how the predicate functions.

Since these are configurable, multiple predicates of the same type (but different configuration parameters) can be combined as long as their user-defined names are different.

For information on using these priorities, see Modifying Scheduler Policy.

ServiceAffinity places pods on nodes based on the service running on that pod. Placing pods of the same service on the same or co-located nodes can lead to higher efficiency.

This predicate attempts to place pods with specific labels in its node selector on nodes that have the same label.

If the pod does not specify the labels in its node selector, then the first pod is placed on any node based on availability and all subsequent pods of the service are scheduled on nodes that have the same label values as that node.

"predicates":[
      {
         "name":"<name>", (1)
         "argument":{
            "serviceAffinity":{
               "labels":[
                  "<label>" (2)
               ]
            }
         }
      }
   ],
1 Specify a name for the predicate.
2 Specify a label to match.

For example:

        "name":"ZoneAffinity",
        "argument":{
            "serviceAffinity":{
                "labels":[
                    "rack"
                ]
            }
        }

For example. if the first pod of a service had a node selector rack was scheduled to a node with label region=rack, all the other subsequent pods belonging to the same service will be scheduled on nodes with the same region=rack label.

Multiple-level labels are also supported. Users can also specify all pods for a service to be scheduled on nodes within the same region and within the same zone (under the region).

The labelsPresence parameter checks whether a particular node has a specific label. The labels create node groups that the LabelPreference priority uses. Matching by label can be useful, for example, where nodes have their physical location or status defined by labels.

"predicates":[
      {
         "name":"<name>", (1)
         "argument":{
            "labelsPresence":{
               "labels":[
                  "<label>" (2)
                ],
                "presence": true (3)
            }
         }
      }
   ],
1 Specify a name for the predicate.
2 Specify a label to match.
3 Specify whether the labels are required, either true or false.
  • For presence:false, if any of the requested labels are present in the node labels, the pod cannot be scheduled. If the labels are not present, the pod can be scheduled.

  • For presence:true, if all of the requested labels are present in the node labels, the pod can be scheduled. If all of the labels are not present, the pod is not scheduled.

For example:

        "name":"RackPreferred",
        "argument":{
            "labelsPresence":{
                "labels":[
                    "rack",
                    "region"
                ],
                "presence": true
            }
        }

Understanding the scheduler priorities

Priorities are rules that rank nodes according to preferences.

A custom set of priorities can be specified to configure the scheduler. There are several priorities provided by default in OpenShift Container Platform. Other priorities can be customized by providing certain parameters. Multiple priorities can be combined and different weights can be given to each in order to impact the prioritization.

Static Priorities

Static priorities do not take any configuration parameters from the user, except weight. A weight is required to be specified and cannot be 0 or negative.

These are specified in the scheduler policy Configmap, policy-configmap in the openshift-config project.

Default Priorities

The default scheduler policy includes the following priorities. Each of the priority function has a weight of 1 except NodePreferAvoidPodsPriority, which has a weight of 10000.

SelectorSpreadPriority looks for services, replication controllers (RC), replication sets (RS), and stateful sets that match the pod, then finds existing pods that match those selectors. The scheduler favors nodes that have fewer existing matching pods. Then, it schedules the pod on a node with the smallest number of pods that match those selectors as the pod being scheduled.

{"name" : "SelectorSpreadPriority", "weight" : 1}

InterPodAffinityPriority computes a sum by iterating through the elements of weightedPodAffinityTerm and adding weight to the sum if the corresponding PodAffinityTerm is satisfied for that node. The node(s) with the highest sum are the most preferred.

{"name" : "InterPodAffinityPriority", "weight" : 1}

LeastRequestedPriority favors nodes with fewer requested resources. It calculates the percentage of memory and CPU requested by pods scheduled on the node, and prioritizes nodes that have the highest available/remaining capacity.

{"name" : "LeastRequestedPriority", "weight" : 1}

BalancedResourceAllocation favors nodes with balanced resource usage rate. It calculates the difference between the consumed CPU and memory as a fraction of capacity, and prioritizes the nodes based on how close the two metrics are to each other. This should always be used together with LeastRequestedPriority.

{"name" : "BalancedResourceAllocation", "weight" : 1}

NodePreferAvoidPodsPriority ignores pods that are owned by a controller other than a replication controller.

{"name" : "NodePreferAvoidPodsPriority", "weight" : 10000}

NodeAffinityPriority prioritizes nodes according to node affinity scheduling preferences

{"name" : "NodeAffinityPriority", "weight" : 1}

TaintTolerationPriority prioritizes nodes that have a fewer number of intolerable taints on them for a pod. An intolerable taint is one which has key PreferNoSchedule.

{"name" : "TaintTolerationPriority", "weight" : 1}
Other Static Priorities

OpenShift Container Platform also supports the following priorities:

EqualPriority gives an equal weight of 1 to all nodes, if no priority configurations are provided. We recommend using this priority only for testing environments.

{"name" : "EqualPriority", "weight" : 1}

MostRequestedPriority prioritizes nodes with most requested resources. It calculates the percentage of memory and CPU requested by pods scheduled on the node, and prioritizes based on the maximum of the average of the fraction of requested to capacity.

{"name" : "MostRequestedPriority", "weight" : 1}

ImageLocalityPriority prioritizes nodes that already have requested pod container’s images.

{"name" : "ImageLocalityPriority", "weight" : 1}

ServiceSpreadingPriority spreads pods by minimizing the number of pods belonging to the same service onto the same machine.

{"name" : "ServiceSpreadingPriority", "weight" : 1}

Configurable Priorities

You can configure these priorities in the scheduler policy Configmap, policy-configmap in the openshift-config project, to add labels to affect how the priorities.

The type of the priority function is identified by the argument that they take. Since these are configurable, multiple priorities of the same type (but different configuration parameters) can be combined as long as their user-defined names are different.

For information on using these priorities, see Modifying Scheduler Policy.

ServiceAntiAffinity takes a label and ensures a good spread of the pods belonging to the same service across the group of nodes based on the label values. It gives the same score to all nodes that have the same value for the specified label. It gives a higher score to nodes within a group with the least concentration of pods.

"priorities":[
    {
        "name":"<name>", (1)
        "weight" : 1 (2)
        "argument":{
            "serviceAntiAffinity":{
                "label":[
                    "<label>" (3)
                ]
            }
        }
    }
]
1 Specify a name for the priority.
2 Specify a weight. Enter a non-zero positive value.
3 Specify a label to match.

For example:

        "name":"RackSpread", (1)
        "weight" : 1 (2)
        "argument":{
            "serviceAntiAffinity":{
                "label": "rack" (3)
            }
        }
1 Specify a name for the priority.
2 Specify a weight. Enter a non-zero positive value.
3 Specify a label to match.

In some situations using ServiceAntiAffinity based on custom labels does not spread pod as expected. See this Red Hat Solution.

*The labelPreference parameter gives priority based on the specified label. If the label is present on a node, that node is given priority. If no label is specified, priority is given to nodes that do not have a label.

"priorities":[
    {
        "name":"<name>", (1)
        "weight" : 1 (2)
        "argument":{
            "labelPreference":{
                "label": "<label>", (3)
                "presence": true (4)
            }
        }
    }
]
1 Specify a name for the priority.
2 Specify a weight. Enter a non-zero positive value.
3 Specify a label to match.
4 Specify whether the label is required, either true or false.

Sample Policy Configurations

The configuration below specifies the default scheduler configuration, if it were to be specified using the scheduler policy file.

kind: ConfigMap
apiVersion: v1
metadata:
  name: mypolicy
  namespace: openshift-config
  selfLink: /api/v1/namespaces/openshift-config/configmaps/mypolicy
  uid: 83917dfb-4422-11e9-b2c9-0a5e37b2b12e
  resourceVersion: '1100851'
  creationTimestamp: '2019-03-11T17:24:23Z'
data:
  policy.cfg: "{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"PodFitsHostPorts\"},\n\t{\"name\" : \"PodFitsResources\"},\n\t{\"name\" : \"NoDiskConflict\"},\n\t{\"name\" : \"NoVolumeZoneConflict\"},\n\t{\"name\" : \"MatchNodeSelector\"},\n\t{\"name\" : \"HostName\"}\n\t],\n\"priorities\" : [\n\t{\"name\" : \"LeastRequestedPriority\", \"weight\" : 10},\n\t{\"name\" : \"BalancedResourceAllocation\", \"weight\" : 1},\n\t{\"name\" : \"ServiceSpreadingPriority\", \"weight\" : 1},\n\t{\"name\" : \"EqualPriority\", \"weight\" : 1},\n\t{\"name\" : \"labelPreference\", \"label\" : \"rack\", \"presence\" : \"true\", \"weight\" : 2},\n\t]\n}\n "

In all of the sample configurations below, the list of predicates and priority functions is truncated to include only the ones that pertain to the use case specified. In practice, a complete/meaningful scheduler policy should include most, if not all, of the default predicates and priorities listed above.

The following example defines three topological levels, region (affinity) → zone (affinity) → rack (anti-affinity):

"{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"RegionZoneAffinity\", \"label\" : \"region\", \"label\" : \"zone\"}\n\t],\n\"priorities\" : [\n\t{\"name\" : \"serviceAntiAffinity\", \"label\" : \"rack\", \"weight\" : 1},\n\t]\n}\n"

The following example defines three topological levels, city (affinity) → building (anti-affinity) → room (anti-affinity):

"{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"serviceAffinityy\", \"label\" : \"city\"},\n\t],\n\"priorities\" : [\n\t{\"name\" : \"serviceAntiAffinity\", \"label\" : \"building\" \"weight\" : 1}, \n\t{\"name\" : \"serviceAntiAffinity\", \"label\" : \"room\" \"weight\" : 1},\n\t]\n}\n"

The following example defines a policy to only use nodes with the 'region' label defined and prefer nodes with the 'zone' label defined:

"{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"labelsPresence\", \"label\" : \"region\", \"presence\" : \"true\"},\n\t],\n\"priorities\" : [\n\t{\"name\" : \"ZonePreferred\", \"label\" : \"zone\", \"presence\" : \"true\", \"weight\" : 1},\n\t]\n}\n"

The following example combines both static and configurable predicates and also priorities:

"{\n\"kind\" : \"Policy\",\n\"apiVersion\" : \"v1\",\n\"predicates\" : [\n\t{\"name\" : \"labelsPresence\", \"label\" : \"building\", \"presence\" : \"true\"}, \n\t{\"name\" : \"PodFitsHostPorts\"},\n\t{\"name\" : \"MatchNodeSelector\"},\n\t],\n\"priorities\" : [\n\t{\"name\" : \"ZonePreferred\", \"label\" : \"zone\", \"presence\" : \"true\", \"weight\" : 1},\n\t]\n}\n \"