$ oc get events -n openshift-compliance
This section describes how to troubleshoot the Compliance Operator. The information can be useful either to diagnose a problem or provide information in a bug report. Some general tips:
The Compliance Operator emits Kubernetes events when something important happens. You can either view all events in the cluster using the command:
$ oc get events -n openshift-compliance
Or view events for an object like a scan using the command:
$ oc describe -n openshift-compliance compliancescan/cis-compliance
The Compliance Operator consists of several controllers, approximately one per API object. It could be useful to filter only those controllers that correspond to the API object having issues. If a
ComplianceRemediation cannot be applied, view the messages from the
remediationctrl controller. You can filter the messages from a single controller by parsing with
$ oc -n openshift-compliance logs compliance-operator-775d7bddbd-gj58f \ | jq -c 'select(.logger == "profilebundlectrl")'
The timestamps are logged as seconds since UNIX epoch in UTC. To convert them to a human-readable date, use
date -d @timestamp --utc, for example:
$ date -d @1596184628.955853 --utc
Many custom resources, most importantly
ScanSetting, allow the
debug option to be set. Enabling this option increases verbosity of the OpenSCAP scanner pods, as well as some other helper pods.
If a single rule is passing or failing unexpectedly, it could be helpful to run a single scan or a suite with only that rule to find the rule ID from the corresponding
ComplianceCheckResult object and use it as the
rule attribute value in a
Scan CR. Then, together with the
debug option enabled, the
scanner container logs in the scanner pod would show the raw OpenSCAP logs.
The following sections outline the components and stages of Compliance Operator scans.
The compliance content is stored in
Profile objects that are generated from a
ProfileBundle object. The Compliance Operator creates a
ProfileBundle object for the cluster and another for the cluster nodes.
$ oc get -n openshift-compliance profilebundle.compliance
$ oc get -n openshift-compliance profile.compliance
ProfileBundle objects are processed by deployments labeled with the
Bundle name. To troubleshoot an issue with the
Bundle, you can find the deployment and view logs of the pods in a deployment:
$ oc logs -n openshift-compliance -lprofile-bundle=ocp4 -c profileparser
$ oc get -n openshift-compliance deployments,pods -lprofile-bundle=ocp4
$ oc logs -n openshift-compliance pods/<pod-name>
$ oc describe -n openshift-compliance pod/<pod-name> -c profileparser
With valid compliance content sources, the high-level
ScanSettingBinding objects can be used to generate
apiVersion: compliance.openshift.io/v1alpha1 kind: ScanSetting metadata: name: my-companys-constraints debug: true # For each role, a separate scan will be created pointing # to a node-role specified in roles roles: - worker --- apiVersion: compliance.openshift.io/v1alpha1 kind: ScanSettingBinding metadata: name: my-companys-compliance-requirements profiles: # Node checks - name: rhcos4-e8 kind: Profile apiGroup: compliance.openshift.io/v1alpha1 # Cluster checks - name: ocp4-e8 kind: Profile apiGroup: compliance.openshift.io/v1alpha1 settingsRef: name: my-companys-constraints kind: ScanSetting apiGroup: compliance.openshift.io/v1alpha1
ScanSettingBinding objects are handled by the same controller tagged with
logger=scansettingbindingctrl. These objects have no status. Any issues are communicated in form of events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuiteCreated 9m52s scansettingbindingctrl ComplianceSuite openshift-compliance/my-companys-compliance-requirements created
ComplianceSuite object is created. The flow continues to reconcile the newly created
ComplianceSuite CR is a wrapper around
ComplianceScan CRs. The
ComplianceSuite CR is handled by controller tagged with
This controller handles creating scans from a suite, reconciling and aggregating individual Scan statuses into a single Suite status. If a suite is set to execute periodically, the
suitectrl also handles creating a
CronJob CR that re-runs the scans in the suite after the initial run is done:
$ oc get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE <cron_name> 0 1 * * * False 0 <none> 151m
For the most important issues, events are emitted. View them with
oc describe compliancesuites/<name>. The
Suite objects also have a
Status subresource that is updated when any of
Scan objects that belong to this suite update their
Status subresource. After all expected scans are created, control is passed to the scan controller.
ComplianceScan CRs are handled by the
scanctrl controller. This is also where the actual scans happen and the scan results are created. Each scan goes through several phases:
The scan is validated for correctness in this phase. If some parameters like storage size are invalid, the scan transitions to DONE with ERROR result, otherwise proceeds to the Launching phase.
In this phase, several config maps that contain either environment for the scanner pods or directly the script that the scanner pods will be evaluating. List the config maps:
$ oc -n openshift-compliance get cm \ -l compliance.openshift.io/scan-name=rhcos4-e8-worker,complianceoperator.openshift.io/scan-script=
These config maps will be used by the scanner pods. If you ever needed to modify the scanner behavior, change the scanner debug level or print the raw results, modifying the config maps is the way to go. Afterwards, a persistent volume claim is created per scan to store the raw ARF results:
$ oc get pvc -n openshift-compliance -lcompliance.openshift.io/scan-name=rhcos4-e8-worker
The PVCs are mounted by a per-scan
ResultServer deployment. A
ResultServer is a simple HTTP server where the individual scanner pods upload the full ARF results to. Each server can run on a different node. The full ARF results might be very large and you cannot presume that it would be possible to create a volume that could be mounted from multiple nodes at the same time. After the scan is finished, the
ResultServer deployment is scaled down. The PVC with the raw results can be mounted from another custom pod and the results can be fetched or inspected. The traffic between the scanner pods and the
ResultServer is protected by mutual TLS protocols.
Finally, the scanner pods are launched in this phase; one scanner pod for a
Platform scan instance and one scanner pod per matching node for a
node scan instance. The per-node pods are labeled with the node name. Each pod is always labeled with the
$ oc get pods -lcompliance.openshift.io/scan-name=rhcos4-e8-worker,workload=scanner --show-labels
NAME READY STATUS RESTARTS AGE LABELS rhcos4-e8-worker-ip-10-0-169-90.eu-north-1.compute.internal-pod 0/2 Completed 0 39m compliance.openshift.io/scan-name=rhcos4-e8-worker,targetNode=ip-10-0-169-90.eu-north-1.compute.internal,workload=scanner
+ The scan then proceeds to the Running phase.
The running phase waits until the scanner pods finish. The following terms and processes are in use in the running phase:
init container: There is one init container called
content-container. It runs the contentImage container and executes a single command that copies the contentFile to the
/content directory shared with the other containers in this pod.
scanner: This container runs the scan. For node scans, the container mounts the node filesystem as
/host and mounts the content delivered by the init container. The container also mounts the
ConfigMap created in the Launching phase and executes it. The default script in the entrypoint
ConfigMap executes OpenSCAP and stores the result files in the
/results directory shared between the pod’s containers. Logs from this pod can be viewed to determine what the OpenSCAP scanner checked. More verbose output can be viewed with the
logcollector: The logcollector container waits until the scanner container finishes. Then, it uploads the full ARF results to the
ResultServer and separately uploads the XCCDF results along with scan result and OpenSCAP result code as a
ConfigMap. These result config maps are labeled with the scan name (
$ oc describe cm/rhcos4-e8-worker-ip-10-0-169-90.eu-north-1.compute.internal-pod
Name: rhcos4-e8-worker-ip-10-0-169-90.eu-north-1.compute.internal-pod Namespace: openshift-compliance Labels: compliance.openshift.io/scan-name-scan=rhcos4-e8-worker complianceoperator.openshift.io/scan-result= Annotations: compliance-remediations/processed: compliance.openshift.io/scan-error-msg: compliance.openshift.io/scan-result: NON-COMPLIANT OpenSCAP-scan-result/node: ip-10-0-169-90.eu-north-1.compute.internal Data ==== exit-code: ---- 2 results: ---- <?xml version="1.0" encoding="UTF-8"?> ...
Scanner pods for
Platform scans are similar, except:
There is one extra init container called
api-resource-collector that reads the OpenSCAP content provided by the content-container init, container, figures out which API resources the content needs to examine and stores those API resources to a shared directory where the
scanner container would read them from.
scanner container does not need to mount the host file system.
When the scanner pods are done, the scans move on to the Aggregating phase.
In the aggregating phase, the scan controller spawns yet another pod called the aggregator pod. Its purpose it to take the result
ConfigMap objects, read the results and for each check result create the corresponding Kubernetes object. If the check failure can be automatically remediated, a
ComplianceRemediation object is created. To provide human-readable metadata for the checks and remediations, the aggregator pod also mounts the OpenSCAP content using an init container.
When a config map is processed by an aggregator pod, it is labeled the
compliance-remediations/processed label. The result of this phase are
$ oc get compliancecheckresults -lcompliance.openshift.io/scan-name=rhcos4-e8-worker
NAME STATUS SEVERITY rhcos4-e8-worker-accounts-no-uid-except-zero PASS high rhcos4-e8-worker-audit-rules-dac-modification-chmod FAIL medium
$ oc get complianceremediations -lcompliance.openshift.io/scan-name=rhcos4-e8-worker
NAME STATE rhcos4-e8-worker-audit-rules-dac-modification-chmod NotApplied rhcos4-e8-worker-audit-rules-dac-modification-chown NotApplied rhcos4-e8-worker-audit-rules-execution-chcon NotApplied rhcos4-e8-worker-audit-rules-execution-restorecon NotApplied rhcos4-e8-worker-audit-rules-execution-semanage NotApplied rhcos4-e8-worker-audit-rules-execution-setfiles NotApplied
After these CRs are created, the aggregator pod exits and the scan moves on to the Done phase.
In the final scan phase, the scan resources are cleaned up if needed and the
ResultServer deployment is either scaled down (if the scan was one-time) or deleted if the scan is continuous; the next scan instance would then recreate the deployment again.
It is also possible to trigger a re-run of a scan in the Done phase by annotating it:
$ oc -n openshift-compliance \ annotate compliancescans/rhcos4-e8-worker compliance.openshift.io/rescan=
After the scan reaches the Done phase, nothing else happens on its own unless the remediations are set to be applied automatically with
autoApplyRemediations: true. The OpenShi