kubevirt test runs and nested docker issue

Description

Hi,

we see a lot of tests which fail with during the setup [1] :

```
/usr/bin/docker-current: Error response from daemon: grpc: the
connection is unavailable.
time="2018-10-23T07:17:55Z" level=error msg="error getting events from
daemon: context canceled"
```

It looks like the docker daemon may not run properly.

Best Regards,
Roman

[1] https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/2344//artifact/check-patch.openshift-3.10-crio-release.el7.x86_64/mock_logs/script/stdout_stderr.log

Activity

Show:

Roman Mohr November 27, 2018 at 8:24 AM

Just to clarify: it is this time not that critical, since as you say the slot probably got immediately recycled. But it happens from time to time and therefore destroys a run.

Barak Korren November 27, 2018 at 8:21 AM

Re-openning

This doesn't really make sense - we now bring up a fresh docker instance each run...
Maybe its some sort of a race condition with Docker not starting up fast enough.... We do have a script that checks it before starting the test so we aught to catch that too.

Roman Mohr October 24, 2018 at 11:13 AM

the problem seems to be resolved for now.

Gal Ben Haim October 23, 2018 at 11:13 AM

FYI

Gal Ben Haim October 23, 2018 at 11:11 AM

I've killed the old STDCI container and spawned a new one instead.

Roman Mohr October 23, 2018 at 10:38 AM

Ok, let me see if it looks better now. For reference here a very bad run https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/2347/artifact/ci_build_summary.html from https://github.com/kubevirt/kubevirt/pull/1627.

Retriggered test runs there now.

Gal Ben Haim October 23, 2018 at 10:35 AM

All the jobs that failed run on `ovirt-srv04.phx.ovirt.org-container-1`.
For the meanwhile, I marked this slave offline in Jenkins.
Looks like the docker daemon in this slave runs correctly, so further investigation is needed.

Cannot Reproduce

Details

Assignee

Reporter

Priority

Created October 23, 2018 at 8:48 AM
Updated August 29, 2019 at 2:12 PM
Resolved January 20, 2019 at 6:37 PM