several failures on host not being available - possible networking issue

Description

I have noticed that there are several issues that had failed recently with host not available.
the changes are not directly related to the issue and I cannot see anything in engine to suggest that there is a problem.
it could either be a network issue, a network test suite issue or infra related issue.

I am opening this Jira so we can follow all the failures and see if we can find a common denominator

Activity

Show:

Dafna Ron December 10, 2018 at 10:48 AM

we are actually seeing it just not as often.
we should move this to "race" as its 100% a race

Eyal Edri December 9, 2018 at 9:07 AM

any update on this? if we don't see it anymore, I suggest closing for now.

Dafna Ron November 6, 2018 at 12:51 PM

yes, you are right, the host that is not coming up is indeed a vm and it is 100% true that the test is failing because the vm had not started yet.
However, since this is a random failure, it is not a failure which is related directly to the patch or tests themselves.
Although we can probably ask the developers to fix the tests so they continue to wait for the host to come up, I wanted you/Gal to make sure that we do not have performance issues/networking issues on the slaves which would effect the tests.

Former user November 6, 2018 at 12:41 PM

Looked at the second job presented and it failed due to this:

Could not find hosts that are up in DC test-dc

This is not a networking issue - all hosts are VMs within the bare metal and this is likely a status returned by the oVirt API so may be a real bug. I checked engine.log and add_master_storage_domain test likely failed since the host was just installed and did become active yet. does this repeat often? Maybe we should add a wait or verification of the host being up before trying to add storage?

cc

Cannot Reproduce

Details

Assignee

Reporter

Priority

Created October 16, 2018 at 9:17 AM
Updated August 29, 2019 at 2:59 PM
Resolved August 29, 2019 at 2:59 PM