Tests failed because global_setup.sh failed

Description

Not sure why global setup failed:

+ [[ ! -O /home/jenkins/.ssh ]]
+ [[ ! -G /home/jenkins/.ssh ]]
+ verify_set_permissions 700 /home/jenkins/.ssh
+ local target_permissions=700
+ local path_to_set=/home/jenkins/.ssh
++ stat -c %a /home/jenkins/.ssh
+ local access=700
+ [[ 700 != \7\0\0 ]]
+ return 0
+ [[ -f /home/jenkins/.ssh/known_hosts ]]
+ verify_set_ownership /home/jenkins/.ssh/known_hosts
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ id -un
+ local owner=jenkins
++ id -gn
+ local group=jenkins
+ [[ ! -O /home/jenkins/.ssh/known_hosts ]]
+ [[ ! -G /home/jenkins/.ssh/known_hosts ]]
+ verify_set_permissions 644 /home/jenkins/.ssh/known_hosts
+ local target_permissions=644
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ stat -c %a /home/jenkins/.ssh/known_hosts
+ local access=644
+ [[ 644 != \6\4\4 ]]
+ return 0
+ return 0
+ true
+ log ERROR Aborting.

Build:
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/1048/nodes/125/steps/479/log/?start=0

Activity

Show:

Barak Korren January 16, 2019 at 6:59 AM

I brought vm0096 back online now since the issue does not seem to reproduce there now

Former user January 2, 2019 at 9:48 AM

I've offlined the slave to limit further issues yet logging in showed no problems with systemd and the journal was already rotated. As the system had almost a year of uptime I've updated it to the latest 7.5 snapshot we have defined and rebooted it. I also see vm0097 is offline due to journald issues so will repeat the process and schedule replacement of all el7 VMs with fresh 7.6 systems

Liora Milbaum January 1, 2019 at 2:27 PM

Are we logging the systemctl/journalctl logs in case of failure?

Barak Korren January 1, 2019 at 1:26 PM

Like I wrote, I've no idea what could be causing this, further research is needed.

Perhaps try to correlate with knows infra issues, could something on the hypervisor have caused considerable slowness on the slave for example?

I don't think it can be a network issue since network is not involved here when systemctl connects to the local systemd, but I can't rule this out...

Have we seen other occurrences of this?

Liora Milbaum January 1, 2019 at 1:17 PM

What do you think we can do on this problem?

Eyal Edri December 30, 2018 at 9:07 AM

I know saw some errors with upgrading + rebooting centos 7.6 servers, not sure if its the same.
any ideas?

Barak Korren December 24, 2018 at 6:37 PM

Well the issue is here:

+ sudo -nl /bin/yum + sudo -n /bin/yum install -y postfix Loaded plugins: fastestmirror, versionlock Loading mirror speeds from cached hostfile Package 2:postfix-2.10.1-6.el7.x86_64 already installed and latest version Nothing to do + local failed=0 + for package in '"${packages[@]}"' + rpm -q --quiet --whatprovides postfix + continue + return 0 + sudo -n systemctl enable postfix Failed to execute operation: Connection timed out + sudo -n systemctl start postfix Failed to start postfix.service: Connection timed out See system logs and 'systemctl status postfix.service' for details. + failed=true

It doesn't really make sense to me - how can a local connection to systemd time out?

Node is vm0096.workers-phx.ovirt.org , an el7 node, maybe some CentOS update breaking stuff?

Fixed

Details

Assignee

Reporter

Priority

Created December 24, 2018 at 5:49 PM
Updated August 29, 2019 at 2:12 PM
Resolved August 9, 2019 at 3:07 PM