Failed ovirt-system-tests_master_check-patch-el7-x86_64

Description

I want to verify [1]. It (expectedly) failed in the past, needed [2].
[2] was merged yesterday and [1] still fails. I retriggered it several
times, and it failed due to different reasons. Last was [3]:

08:11:38 + find /dev/shm/ost/deployment-image-ng-suite-master -type f
'(' -iname 'nose*.xml' -o -iname '*.junit.xml' ')' -exec mv '{}'
exported-artifacts/ ';'
08:11:38 find: ���/dev/shm/ost/deployment-image-ng-suite-master���: No
such file or directory

Please have a look. Thanks,

[1] https://gerrit.ovirt.org/80820
[2] https://gerrit.ovirt.org/79345
[3] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/1467/

Didi

Activity

Show:

Eyal Edri August 29, 2017 at 1:29 PM

Is this still an issue?

Former user August 24, 2017 at 1:14 PM

Interesting, in this particular case it doesn't look like a DNS issue. The connection was as follows:
wget -> proxy01.phx.ovirt.org -> jenkins.ovirt.org

Wget was able to access the proxy and the proxy tried to fetch the URL but got a 404 error from jenkins:

1503475892.751 4 66.187.230.21 TCP_MISS/404 878 GET http://jenkins.ovirt.org/job/ovirt-node-ng_master_build-artifacts-el7-x86_64//artifact/exported-artifacts/ - HIER_DIRECT/66.187.230.92 text/html

I get a 404 error when accessing this URL as well so there's some error in the fetching script:
http://jenkins.ovirt.org/job/ovirt-node-ng_master_build-artifacts-el7-x86_64//artifact/exported-artifacts/

why are we not appending the artifact name or build number to the URL above? In squid logs I see a successful fetch of the Appliance via the proxy with a correct URL:

1503475797.567 9880 66.187.230.21 TCP_REFRESH_UNMODIFIED/200 822060496 GET http://jenkins.ovirt.org/job/ovirt-appliance_master_build-artifacts-el7-x86_64/524/artifact/exported-artifacts/oVirt-Engine-Appliance-CentOS-x86_64-7-20170822.ova - HIER_DIRECT/66.187.230.92 application/octet-stream

This one has both a build number and file name and should be added here.

Anton Marchukov August 24, 2017 at 12:16 PM

There are some occasional flaky DNS stuff going on. But this job has other problems since DNS is not that flaky.

Eyal Edri August 24, 2017 at 9:06 AM

is this still happening, do we know if there was a DNS outage?

Former user August 23, 2017 at 12:38 PM

It still uses lastSuccessful, but you are right , I added a call to buildNumber to see which images we're fetching. Since buildNumber is empty in this case, it produces a wrong download link. Question is, how can buildNumber be empty ?

Anton Marchukov August 23, 2017 at 9:30 AM

We seems to had some DNS outage. Two change queue jobs failed at that time
with symptoms that look like DNS problem, e.g.:

2017-08-23 07:16:59,037::INFO::repoman.common.repo::Resolving artifact
source
http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-ppc64le/2361/
requests.exceptions.ConnectionError: ('Connection aborted.', gaierror(-2,
'Name or service not known'))

or

[ ERROR ] Yum Cannot queue package ovirt-engine-setup: Cannot find a valid
baseurl for repo: base/7/x86_64
[ INFO ] Yum Performing yum transaction rollback

I checked DNS resolution on jenkins master at least and it looks working
now.

Still might be specific to some particular environment/hosts, so will watch
it a bit more.

Anton.


Anton Marchukov
Team Lead - Release Management - RHV DevOps - Red Hat

Nadav Goldin August 23, 2017 at 9:14 AM

I think the link generated by the suite is just incorrect:

should be:

Or whatever was agreed for the node suite.

Eyal Edri August 23, 2017 at 9:10 AM

Maybe its the same issue we see when failing on accessing gerrit.ovirt.org?
it might be global DNS issues.
Also, are we still using proxy.phx.ovirt.org?

2 Resolving proxy01.phx.ovirt.org (proxy01.phx.ovirt.org)... 66.187.230.40
08:11:32 Connecting to proxy01.phx.ovirt.org
(proxy01.phx.ovirt.org)|66.187.230.40|:3128...
connected.
08:11:32 Proxy request sent, awaiting response... 404 Not Found
08:11:32 2017-08-23 08:11:32 ERROR 404: Not Found.
08:11:32
08:11:32 + res=8

On Wed, Aug 23, 2017 at 11:43 AM, Yedidyah Bar David (oVirt JIRA) <

Eyal edri

ASSOCIATE MANAGER

RHV DevOps

EMEA VIRTUALIZATION R&D

Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)

Nadav Goldin August 23, 2017 at 8:52 AM

The failure reason is:

Looks like it is missing the build number from the Jenkins link(iirc it was 'lastSuccessful' in the past), , I think you did some changes there a few weeks ago?

Cannot Reproduce

Details

Assignee

Reporter

Priority

Created August 23, 2017 at 8:43 AM
Updated August 31, 2017 at 1:02 PM
Resolved August 29, 2017 at 1:53 PM