Tests failed because global_setup.sh failed

Description

Not sure why global setup failed:

+ [[ ! -O /home/jenkins/.ssh ]]
+ [[ ! -G /home/jenkins/.ssh ]]
+ verify_set_permissions 700 /home/jenkins/.ssh
+ local target_permissions=700
+ local path_to_set=/home/jenkins/.ssh
++ stat -c %a /home/jenkins/.ssh
+ local access=700
+ [[ 700 != \7\0\0 ]]
+ return 0
+ [[ -f /home/jenkins/.ssh/known_hosts ]]
+ verify_set_ownership /home/jenkins/.ssh/known_hosts
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ id -un
+ local owner=jenkins
++ id -gn
+ local group=jenkins
+ [[ ! -O /home/jenkins/.ssh/known_hosts ]]
+ [[ ! -G /home/jenkins/.ssh/known_hosts ]]
+ verify_set_permissions 644 /home/jenkins/.ssh/known_hosts
+ local target_permissions=644
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ stat -c %a /home/jenkins/.ssh/known_hosts
+ local access=644
+ [[ 644 != \6\4\4 ]]
+ return 0
+ return 0
+ true
+ log ERROR Aborting.

Build:
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/1048/nodes/125/steps/479/log/?start=0

Activity

Show:
Liora Milbaum
January 1, 2019, 1:17 PM

What do you think we can do on this problem?

Barak Korren
January 1, 2019, 1:26 PM

Like I wrote, I've no idea what could be causing this, further research is needed.

Perhaps try to correlate with knows infra issues, could something on the hypervisor have caused considerable slowness on the slave for example?

I don't think it can be a network issue since network is not involved here when systemctl connects to the local systemd, but I can't rule this out...

Have we seen other occurrences of this?

Liora Milbaum
January 1, 2019, 2:27 PM

Are we logging the systemctl/journalctl logs in case of failure?

Evgheni Dereveanchin
January 2, 2019, 9:48 AM

I've offlined the slave to limit further issues yet logging in showed no problems with systemd and the journal was already rotated. As the system had almost a year of uptime I've updated it to the latest 7.5 snapshot we have defined and rebooted it. I also see vm0097 is offline due to journald issues so will repeat the process and schedule replacement of all el7 VMs with fresh 7.6 systems

Barak Korren
January 16, 2019, 6:59 AM

I brought vm0096 back online now since the issue does not seem to reproduce there now

Assignee

Evgheni Dereveanchin

Reporter

Nir Soffer

Blocked By

None

Priority

Medium
Configure