kubevirt test runs and nested docker issue
Description
Activity
Barak Korren November 27, 2018 at 8:33 AM
We need to fix this, or at least come up with some idea as to why is this happening, because like I said, it does not make sense.
Also if 1/9 tests fails because of this, you would now rerun all 9 hereby wasting quite a bit of resources....
At the very least I think we may want to add some canary script before your tests start to fail on this and properly flag this as a system error rather then a test failure.
FYI, any idea why we're seeing this again now?
Roman Mohr November 27, 2018 at 8:24 AM
Just to clarify: it is this time not that critical, since as you say the slot probably got immediately recycled. But it happens from time to time and therefore destroys a run.
Barak Korren November 27, 2018 at 8:21 AM
Re-openning
This doesn't really make sense - we now bring up a fresh docker instance each run...
Maybe its some sort of a race condition with Docker not starting up fast enough.... We do have a script that checks it before starting the test so we aught to catch that too.
Roman Mohr November 26, 2018 at 3:43 PM
Roman Mohr October 24, 2018 at 11:13 AM
the problem seems to be resolved for now.
Gal Ben Haim October 23, 2018 at 11:13 AM
FYI
Gal Ben Haim October 23, 2018 at 11:11 AM
I've killed the old STDCI container and spawned a new one instead.
Roman Mohr October 23, 2018 at 10:38 AM
Ok, let me see if it looks better now. For reference here a very bad run https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/2347/artifact/ci_build_summary.html from https://github.com/kubevirt/kubevirt/pull/1627.
Retriggered test runs there now.
Gal Ben Haim October 23, 2018 at 10:35 AM
All the jobs that failed run on `ovirt-srv04.phx.ovirt.org-container-1`.
For the meanwhile, I marked this slave offline in Jenkins.
Looks like the docker daemon in this slave runs correctly, so further investigation is needed.
Hi,
we see a lot of tests which fail with during the setup [1] :
```
/usr/bin/docker-current: Error response from daemon: grpc: the
connection is unavailable.
time="2018-10-23T07:17:55Z" level=error msg="error getting events from
daemon: context canceled"
```
It looks like the docker daemon may not run properly.
Best Regards,
Roman
[1] https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/2344//artifact/check-patch.openshift-3.10-crio-release.el7.x86_64/mock_logs/script/stdout_stderr.log