Tests failed because global_setup.sh failed
Description
Activity
Barak Korren January 16, 2019 at 6:59 AM
I brought vm0096 back online now since the issue does not seem to reproduce there now
Former user January 2, 2019 at 9:48 AM
I've offlined the slave to limit further issues yet logging in showed no problems with systemd and the journal was already rotated. As the system had almost a year of uptime I've updated it to the latest 7.5 snapshot we have defined and rebooted it. I also see vm0097 is offline due to journald issues so will repeat the process and schedule replacement of all el7 VMs with fresh 7.6 systems
Liora Milbaum January 1, 2019 at 2:27 PM
@Barak Korren Are we logging the systemctl/journalctl logs in case of failure?
Barak Korren January 1, 2019 at 1:26 PM
Like I wrote, I've no idea what could be causing this, further research is needed.
Perhaps try to correlate with knows infra issues, could something on the hypervisor have caused considerable slowness on the slave for example?
I don't think it can be a network issue since network is not involved here when systemctl connects to the local systemd, but I can't rule this out...
Have we seen other occurrences of this?
Liora Milbaum January 1, 2019 at 1:17 PM
@Barak Korren What do you think we can do on this problem?
Eyal Edri December 30, 2018 at 9:07 AM
I know @Gal Ben Haim saw some errors with upgrading + rebooting centos 7.6 servers, not sure if its the same.
@Former user any ideas?
Barak Korren December 24, 2018 at 6:37 PM
Well the issue is here:
+ sudo -nl /bin/yum
+ sudo -n /bin/yum install -y postfix
Loaded plugins: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
Package 2:postfix-2.10.1-6.el7.x86_64 already installed and latest version
Nothing to do
+ local failed=0
+ for package in '"${packages[@]}"'
+ rpm -q --quiet --whatprovides postfix
+ continue
+ return 0
+ sudo -n systemctl enable postfix
Failed to execute operation: Connection timed out
+ sudo -n systemctl start postfix
Failed to start postfix.service: Connection timed out
See system logs and 'systemctl status postfix.service' for details.
+ failed=true
It doesn't really make sense to me - how can a local connection to systemd time out?
Node is vm0096.workers-phx.ovirt.org , an el7 node, maybe some CentOS update breaking stuff?
Not sure why global setup failed:
+ [[ ! -O /home/jenkins/.ssh ]]
+ [[ ! -G /home/jenkins/.ssh ]]
+ verify_set_permissions 700 /home/jenkins/.ssh
+ local target_permissions=700
+ local path_to_set=/home/jenkins/.ssh
++ stat -c %a /home/jenkins/.ssh
+ local access=700
+ [[ 700 != \7\0\0 ]]
+ return 0
+ [[ -f /home/jenkins/.ssh/known_hosts ]]
+ verify_set_ownership /home/jenkins/.ssh/known_hosts
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ id -un
+ local owner=jenkins
++ id -gn
+ local group=jenkins
+ [[ ! -O /home/jenkins/.ssh/known_hosts ]]
+ [[ ! -G /home/jenkins/.ssh/known_hosts ]]
+ verify_set_permissions 644 /home/jenkins/.ssh/known_hosts
+ local target_permissions=644
+ local path_to_set=/home/jenkins/.ssh/known_hosts
++ stat -c %a /home/jenkins/.ssh/known_hosts
+ local access=644
+ [[ 644 != \6\4\4 ]]
+ return 0
+ return 0
+ true
+ log ERROR Aborting.
Build:
https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/1048/nodes/125/steps/479/log/?start=0