$ oc adm diagnostics
The oc adm diagnostics
command runs a series of checks for error conditions in
the host or cluster. Specifically, it:
Verifies that the default registry and router are running and correctly configured.
Checks ClusterRoleBindings
and ClusterRoles
for consistency with base
policy.
Checks that all of the client configuration contexts are valid and can be connected to.
Checks that SkyDNS is working properly and the pods have SDN connectivity.
Validates master and node configuration on the host.
Checks that nodes are running and available.
Analyzes host logs for known errors.
Checks that systemd units are configured as expected for the host.
OpenShift Container Platform can be deployed in many ways: built from source, included in a
VM image, in a container image, or as enterprise RPMs. Each method implies a
different configuration and environment. To minimize environment assumptions,
the diagnostics were added to the openshift
binary so that wherever there is
an OpenShift Container Platform server or client, the diagnostics can run in the exact same
environment.
To use the diagnostics tool, preferably on a master host and as cluster administrator, run:
$ oc adm diagnostics
This runs all available diagnostics, skipping any that do not apply. For example, the NodeConfigCheck does not run unless a node configuration is available. You can also run specific diagnostics by name as you work to address issues. For example:
$ oc adm diagnostics NodeConfigCheck UnitStatus
Diagnostics look for configuration files in standard locations:
Client:
As indicated by the $KUBECONFIG
environment variable variable
~/.kube/config file
Master:
/etc/origin/master/master-config.yaml
Node:
/etc/origin/node/node-config.yaml
Non-standard locations can be specified with flags (respectively,
--config
, --master-config
, and --node-config
). If a configuration file
is not found or specified, related diagnostics are skipped.
Consult the output with the --help
flag for all available options.
Master and node diagnostics are most useful in a specific target environment, which is a deployment of RPMs with Ansible deployment logic. This provides some diagnostic benefits:
Master and node configuration is based on a configuration file in a standard location.
Systemd units are configured to manage the server(s).
All components log to journald.
Having configuration files where Ansible places them means that you will
generally not need to specify where to find them. Running oc adm diagnostics
without flags will look for master and node configurations in the standard
locations and use them if found; this should make the Ansible-installed use case
as simple as possible. Also, it is easy to specify configuration files that are
not in the expected locations:
$ oc adm diagnostics --master-config=<file_path> --node-config=<file_path>
Systemd units and logs entries in journald are necessary for the current log diagnostic logic. For other deployment types, logs may be going into files, to stdout, or may combine node and master. At this time, for these situations, log diagnostics are not able to work properly and will be skipped.
You may have access as an ordinary user, and/or as a cluster-admin user, and/or may be running on a host where OpenShift Container Platform master or node servers are operating. The diagnostics attempt to use as much access as the user has available.
A client with ordinary access should be able to diagnose its connection to the master and run a diagnostic pod. If multiple users or masters are configured, connections will be tested for all, but the diagnostic pod only runs against the current user, server, or project.
A client with cluster-admin access available (for any user, but only the
current master) should be able to diagnose the status of infrastructure such as
nodes, registry, and router. In each case, running oc adm diagnostics
looks
for the client configuration in its standard location and uses it if available.