Jenkins times out when fetching from gerrit randomly

Description

At the first stage of ovirt-engine jobs, Jenkins attemps to fetch ovirt-engine, this sometimes fail with the following error:

12:55:02 hudson.plugins.git.GitException: Command "git fetch --tags --progress git://gerrit.ovirt.org/ovirt-engine.git +refs/heads/:refs/remotes/origin/" returned status code 143:

It can time out while downloading(and then you see 'counting objects messages') or sometimes it simply fails after 10 times(which is set as the time out parameter for git)

So far I've tested few issues:

  1. Tried cloning ovirt-engine from one of the slaves for 300-500 times in a row, outside of Jenkins. This showed we have around a 6.5% failure rate with a time out of 90 seconds, which should be reasonable as the current engine is 244MB.

  2. Some of the exceptions appear in the Gerrit error_log as:

    [2016-06-29 19:18:26,933] [NioProcessor-1] WARN com.google.gerrit.sshd.GerritServerSession : Exception caught org.apache.sshd.common.SshException: Received 97 on unknown channel 0
    at org.apache.sshd.common.session.AbstractConnectionService.getChannel(AbstractConnectionService.java:301)

    however, there was no one-to-one match between failing to clone and the exceptions in gerrit.

  3. Did the same test but from our github mirror, there were almost no failures at all(less than 1%)

Assignee

infra

Reporter

Nadav Goldin

Blocked By

None

Priority

High
Configure