A job executes a task in your OpenShift Dedicated cluster.

A job tracks the overall progress of a task and updates its status with information about active, succeeded, and failed pods. Deleting a job will clean up any pod replicas it created. Jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

See the Kubernetes documentation for more information about jobs.

Understanding jobs and CronJobs

A job tracks the overall progress of a task and updates its status with information about active, succeeded, and failed pods. Deleting a job will clean up any pods it created. Jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

There are two possible resource types that allow creating run-once objects in OpenShift Dedicated:

Job

A regular job is a run-once object that creates a task and ensures the job finishes.

CronJob

A CronJob can be scheduled to run multiple times, use a CronJob.

A CronJob builds on a regular job by allowing you to specify how the job should be run. CronJobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

CronJobs are useful for creating periodic and recurring tasks, like running backups or sending emails. CronJobs can also schedule individual tasks for a specific time, such as if you want to schedule a job for a low activity period.

A CronJob creates a job object approximately once per execution time of its schedule, but there are circumstances in which it fails to create a job or two jobs might be created. Therefore, jobs must be idempotent and you must configure history limits.

Understanding how to create jobs

Both resource types require a job configuration that consists of the following key parts:

  • A pod template, which describes the pod that OpenShift Dedicated creates.

  • An optional parallelism parameter, which specifies how many pods running in parallel at any point in time should execute a job. If not specified, this defaults to the value in the completions parameter.

  • An optional completions parameter, specifying how many successful pod completions are needed to finish a job. If not specified, this value defaults to one.

Understanding how to set a maximum duration for jobs

When defining a job, you can define its maximum duration by setting the activeDeadlineSeconds field. It is specified in seconds and is not set by default. When not set, there is no maximum duration enforced.

The maximum duration is counted from the time when a first pod gets scheduled in the system, and defines how long a job can be active. It tracks overall time of an execution. After reaching the specified timeout, the job is terminated by OpenShift Dedicated.

Understanding how to set a job back off policy for pod failure

A Job can be considered failed, after a set amount of retries due to a logical error in configuration or other similar reasons. Failed Pods associated with the Job are recreated by the controller with an exponential back off delay (10s, 20s, 40s …) capped at six minutes. The limit is reset if no new failed pods appear between controller checks.

Use the spec.backoffLimit parameter to set the number of retries for a job.

Understanding how to configure a CronJob to remove artifacts

CronJobs can leave behind artifact resources such as jobs or pods. As a user it is important to configure history limits so that old jobs and their pods are properly cleaned. There are two fields within CronJob’s spec responsible for that:

  • .spec.successfulJobsHistoryLimit. The number of successful finished jobs to retain (defaults to 3).

  • .spec.failedJobsHistoryLimit. The number of failed finished jobs to retain (defaults to 1).

  • Delete CronJobs that you no longer need:

    $ oc delete cronjob/<cron_job_name>

    Doing this prevents them from generating unnecessary artifacts.

  • You can suspend further executions by setting the spec.suspend to true. All subsequent executions are suspended until you reset to false.

Known limitations

The job specification restart policy only applies to the pods, and not the job controller. However, the job controller is hard-coded to keep retrying jobs to completion.

As such, restartPolicy: Never or --restart=Never results in the same behavior as restartPolicy: OnFailure or --restart=OnFailure. That is, when a job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart.

With the Never policy, the job controller performs the restart. With each attempt, the job controller increments the number of failures in the job status and create new pods. This means that with each failed attempt, the number of pods increases.

With the OnFailure policy, kubelet performs the restart. Each attempt does not increment the number of failures in the job status. In addition, kubelet will retry failed jobs starting pods on the same nodes.

Creating jobs

You create a job in OpenShift Dedicated by creating a job object.

Procedure

To create a job:

  1. Create a YAML file similar to the following:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pi
    spec:
      parallelism: 1    (1)
      completions: 1    (2)
      activeDeadlineSeconds: 1800 (3)
      backoffLimit: 6   (4)
      template:         (5)
        metadata:
          name: pi
        spec:
          containers:
          - name: pi
            image: perl
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
          restartPolicy: OnFailure    (6)
    1. Optional value for how many pod replicas a job should run in parallel; defaults to completions.

    2. Optional value for how many successful pod completions are needed to mark a job completed; defaults to one.

    3. Optional value for the maximum duration the job can run.

    4. Option value to set the number of retries for a job. This field defaults to six.

    5. Template for the pod the controller creates.

    6. The restart policy of the pod. This does not apply to the job controller.

  2. Create the job:

    $ oc create -f <file-name>.yaml

You can also create and launch a job from a single command using oc run. The following command creates and launches the same job as specified in the previous example:

$ oc run pi --image=perl --replicas=1  --restart=OnFailure \
    --command -- perl -Mbignum=bpi -wle 'print bpi(2000)'

Creating CronJobs

You create a CronJob in OpenShift Dedicated by creating a job object.

Procedure

To create a CronJob:

  1. Create a YAML file similar to the following:

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: pi
    spec:
      schedule: "*/1 * * * *"  (1)
      concurrencyPolicy: "Replace" (2)
      startingDeadlineSeconds: 200 (3)
      suspend: true            (4)
      successfulJobsHistoryLimit: 3 (5)
      failedJobsHistoryLimit: 1     (6)
      jobTemplate:             (7)
        spec:
          template:
            metadata:
              labels:          (8)
                parent: "cronjobpi"
            spec:
              containers:
              - name: pi
                image: perl
                command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
              restartPolicy: OnFailure (9)
    1 Schedule for the job specified in cron format. In this example, the job will run every minute.
    2 An optional concurrency policy, specifying how to treat concurrent jobs within a CronJob. Only one of the following concurrent policies may be specified. If not specified, this defaults to allowing concurrent executions.
    • Allow allows CronJobs to run concurrently.

    • Forbid forbids concurrent runs, skipping the next run if the previous has not finished yet.

    • Replace cancels the currently running job and replaces it with a new one.

    3 An optional deadline (in seconds) for starting the job if it misses its scheduled time for any reason. Missed jobs executions will be counted as failed ones. If not specified, there is no deadline.
    4 An optional flag allowing the suspension of a CronJob. If set to true, all subsequent executions will be suspended.
    5 The number of successful finished jobs to retain (defaults to 3).
    6 The number of failed finished jobs to retain (defaults to 1).
    7 Job template. This is similar to the job example.
    8 Sets a label for jobs spawned by this CronJob.
    9 The restart policy of the pod. This does not apply to the job controller.

    The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields are optional. These fields specify how many completed and failed jobs should be kept. By default, they are set to 3 and 1 respectively. Setting a limit to 0 corresponds to keeping none of the corresponding kind of jobs after they finish.

  2. Create the CronJob:

    $ oc create -f <file-name>.yaml

You can also create and launch a CronJob from a single command using oc run. The following command creates and launches the same CronJob as specified in the previous example:

$ oc run pi --image=perl --schedule='*/1 * * * *' \
    --restart=OnFailure --labels parent="cronjobpi" \
    --command -- perl -Mbignum=bpi -wle 'print bpi(2000)'

With oc run, the --schedule option accepts schedules in cron format.

When creating a CronJob, oc run only supports the Never or OnFailure restart policies (--restart).