kubevirt_kubevirt_standard-check-pr jobs often get stuck waiting forever for a connection to get established:
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/376/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/377/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/378/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/380/console
The following message keeps repeating all the time:
17:03:32 [check-patch.el7.x86_64] ++ awk '/virt-controller/ && /true/'
17:03:32 [check-patch.el7.x86_64] ++ kubectl get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:32 [check-patch.el7.x86_64] ++ wc -l
17:03:32 [check-patch.el7.x86_64] ++ cluster/kubectl.sh get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] Unable to connect to the server: dial tcp 192.168.121.111:6443: getsockopt: no route to host
17:03:44 [check-patch.el7.x86_64] + '[' 0 -lt 1 ']'
17:03:44 [check-patch.el7.x86_64] + echo 'Waiting for KubeVirt virt-controller container to become ready ...'
17:03:44 [check-patch.el7.x86_64] Waiting for KubeVirt virt-controller container to become ready ...
17:03:44 [check-patch.el7.x86_64] + kubectl get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] + awk '/virt-controller/ && /true/'
17:03:44 [check-patch.el7.x86_64] + cluster/kubectl.sh get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] + wc -l
17:03:56 [check-patch.el7.x86_64] Unable to connect to the server: dial tcp 192.168.121.111:6443: getsockopt: no route to host
17:03:56 [check-patch.el7.x86_64] 0
17:03:56 [check-patch.el7.x86_64] + sleep 10
Need to implement timeouts as this takes up bare metal systems for days and weeks until someone manually kills the job
Created a patch to enforce time-outs in the jobs:
https://gerrit.ovirt.org/c/87018/
Also working on a PR to enforce the time-outs in the KubeVirt tests themselves:
https://github.com/kubevirt/kubevirt/pull/692
CI-side patch merged, kubevirt patch still undergoing review
I think the patch on KubeVirt was merged, so can we close this?
Not merged yet.
2 months w/o an update, closing this as won't fix, if KubeVirt will need help in fixing the timeout issue on their side, they can reach out to us, we need to make sure the jobs has timeouts on our end, which we did.