Fix Jenkins slave connection dying on vdsm check_merged jobs

Description

Something in the vdsm build_artifacs job makes the Jenkins slave disconnect when it is running. This in turn makes the cleanup scripts not run on the slave leaving it dirty enough to make the next job on that slave fail.

Example of this can be seen here:
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/692/console

Relevant log lines:

Activity

Show:

Eyal Edri January 26, 2017 at 1:37 PM

The issue was in check-merged script.

Eyal Edri January 26, 2017 at 1:36 PM

Sorry, just read response,
So closing this for now, please re-open if anything else is needed from infra.

Eyal Edri January 26, 2017 at 1:36 PM

maybe this is due to the java auto updating in puppet?

danken December 26, 2016 at 12:43 PM

This is indeed due to our buggy check-merged script, which mistakenly called `kill 0`.

Barak Korren December 22, 2016 at 4:25 PM

Tested to see if behaviour would different when running on an EL7 host (right now its running in an EL7 chroot on a Fedora host)
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/777

Same results.

I'm beginning to suspect it something the tests are doing, same hosts run Lago fine for other things.
Thinks it may be this patch:
https://gerrit.ovirt.org/#/c/68078/

He sent an email to devel, investigation will continue.

Barak Korren December 21, 2016 at 5:30 PM

Got this from the slave log in Jenkis:

Barak Korren December 21, 2016 at 4:02 PM

Trying to get more info about this, here are the journal lines we get on the slave when the job fails:

Done

Details

Assignee

Reporter

Components

Priority

Created December 14, 2016 at 8:10 AM
Updated May 25, 2017 at 11:31 AM
Resolved January 26, 2017 at 1:37 PM