kubevirt_kubevirt_standard-check-pr jobs often get stuck

Description

kubevirt_kubevirt_standard-check-pr jobs often get stuck waiting forever for a connection to get established:

http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/376/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/377/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/378/console
http://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/380/console

The following message keeps repeating all the time:

17:03:32 [check-patch.el7.x86_64] ++ awk '/virt-controller/ && /true/'
17:03:32 [check-patch.el7.x86_64] ++ kubectl get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:32 [check-patch.el7.x86_64] ++ wc -l
17:03:32 [check-patch.el7.x86_64] ++ cluster/kubectl.sh get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] Unable to connect to the server: dial tcp 192.168.121.111:6443: getsockopt: no route to host
17:03:44 [check-patch.el7.x86_64] + '[' 0 -lt 1 ']'
17:03:44 [check-patch.el7.x86_64] + echo 'Waiting for KubeVirt virt-controller container to become ready ...'
17:03:44 [check-patch.el7.x86_64] Waiting for KubeVirt virt-controller container to become ready ...
17:03:44 [check-patch.el7.x86_64] + kubectl get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] + awk '/virt-controller/ && /true/'
17:03:44 [check-patch.el7.x86_64] + cluster/kubectl.sh get pods -n kube-system '-ocustom-columns=status:status.containerStatuses[*].ready,metadata:metadata.name' --no-headers
17:03:44 [check-patch.el7.x86_64] + wc -l
17:03:56 [check-patch.el7.x86_64] Unable to connect to the server: dial tcp 192.168.121.111:6443: getsockopt: no route to host
17:03:56 [check-patch.el7.x86_64] 0
17:03:56 [check-patch.el7.x86_64] + sleep 10

Need to implement timeouts as this takes up bare metal systems for days and weeks until someone manually kills the job

Activity

Show:
Barak Korren
February 5, 2018, 8:23 AM
Edited

Created a patch to enforce time-outs in the jobs:
https://gerrit.ovirt.org/c/87018/

Also working on a PR to enforce the time-outs in the KubeVirt tests themselves:
https://github.com/kubevirt/kubevirt/pull/692

Barak Korren
February 6, 2018, 1:07 PM

CI-side patch merged, kubevirt patch still undergoing review

Eyal Edri
February 28, 2018, 2:16 PM

I think the patch on KubeVirt was merged, so can we close this?

Barak Korren
February 28, 2018, 2:55 PM

Not merged yet.

Eyal Edri
May 1, 2018, 7:14 AM

2 months w/o an update, closing this as won't fix, if KubeVirt will need help in fixing the timeout issue on their side, they can reach out to us, we need to make sure the jobs has timeouts on our end, which we did.

Won't Fix

Assignee

Barak Korren

Reporter

Evgheni Dereveanchin

Blocked By

Blocking on code review

Priority

Highest
Configure