OpenShift Troubleshooting

The openshift installation failed with errors¶

1
2

E1011 08:07:56.183305       1 auth.go:231] error contacting auth provider time="2023-11-07T17:50:00+03:00" level=error msg="Cluster operator authentication Degraded is True with Ingr wwssStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::ProxyConfigController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nOAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.ocpactest.example.com in route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: \nOAuthServerRouteEndpointAccessibleControllerDegraded: route \"openshift-authentication/oauth-openshift\": status does not have a valid host address\nOAuthServerServiceEndpointAccessibleControllerDegraded: Get \"https://10.132.220.152:443/healthz\": dial tcp 10.132.220.152:443: connect: connection refused\nOAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready\nProxyConfigControllerDegraded: no admitted ingress for route oauth-openshift in namespace openshift-authentication\nWellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap \"oauth-openshift\" not found (check authentication operator, it is supposed to create this)"
time="2023-11-07T17:50:00+03:00" level=error msg="Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerServiceEndpointAccessibleControllerAvailable: Get \"https://10.132.220.152:443/healthz\": dial tcp 10.132.220.152:443: connect: connection refused\nOAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints \"oauth-openshift\" not found\nReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 2 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).\nWellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap \"oauth-openshift\" not found (check authentication operator, it is supposed to create this)"

Issue: 2 of the master nodes available, requires 3 for etcd cluster.

Solution: Check machines, machinesets and nodes :

oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get clusterversion
oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get nodes
oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get machines
oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get machinesets
oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get csr | grep -i pending
oc --kubeconfig=${INSTALL_DIR}/auth/kubeconfig get clusterversion

for i in `oc get csr --no-headers \
| grep -i pending \ 
| awk '{ print $1 }'`;  do oc adm certificate approve $i; 
done

ODF Operator is not able to upgrade¶

Issue: Upgrading ODF operator is not working as expected.

Solution: Check OLM components:

Backup install-plan and delete.
Backup csv and subscription and delete.
Apply subscription again and wait for csv and install-plan to be created.
Approve the operator upgrade manually if approve type is manual.
Set operator approval type to automatic.

OCP Upgrade is stucked for some of the cluster operators¶

Issue: Some of the cluster operators are not able to be upgrade due to the bad conditions.

Solution: Check cluster operators and related namespaces:

1	`oc get co`

* Check Administration->Cluster Settings

Check related namespaces to the operators like openshift-sdn, openshift-storage whether or not the pods have underlying issues.
Check network connectivity especially if there is cluster-wide proxy.

Bundle unpacking failed. reason deadlineexceeded while operator installation!¶

Follow the steps here: https://access.redhat.com/solutions/6459071

List the jobs in the namespace openshift-marketplace

oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<operator_name_keyword>")) | .metadata.name'

Delete the jobs from the ones that were executed above:

$ oc delete job <job-name> -n openshift-marketplace

4. Delete the configmaps from the ones that were executed above:

$ oc delete configmap <job-name> -n openshift-marketplace

If the above scripts don't work, continue with steps below:
Backup install-plan, csv, subscription
Delete install-plan
Delete csv
Delete subscription
Apply same subscription to the cluster.
Check that csv and install-plan is created and operator is healthy.
Continue upgrading.

IMPORTANT: Deadline in seconds is 600 seconds by default.