alertname = InstanceUnreachable
beta_kubernetes_io_arch = amd64
beta_kubernetes_io_os = linux
instance = ibm-srv01.ovirt.org
job = kubernetes-nodes-exporter
kubernetes_io_hostname = ibm-srv01.ovirt.org
node_role_kubernetes_io_compute = true
region = external
type = bare-metal-external
zone = ci
description = ibm-srv01.ovirt.org of job kubernetes-nodes-exporter has been possibly down for more than 10 minutes.
Did we identify the reason for this issue? May it be related to https://issues.redhat.com/browse/KNIECO-2387 ? If disk space runs out pods are evacuated (this should be visible in the event log).
If the issue is no longer relevant let’s close it and related ones for other IBM Cloud hosts.
This node has been stable for the recent weeks with no major errors. journalctl -u origin-node also looks quite clean with no infra related issues.
I do see however that /boot is 97% full (240M out of 250M possible), not sure if this may cause some issues, but probably we have to clean some space.
Other than that I think we can close all active “InstanceUnreachable“ tickets for now and let’s see how stable the nodes are
/boot partition was cleaned up.
Closing as this node has been stable for few weeks now